Mp3Compression
Added in v0.12.0, updated in v0.42.0
Compress the audio using an MP3 encoder to lower the audio quality. This may help machine learning models deal with compressed, low-quality audio.
This transform depends on fast-mp3-augment , lameenc or pydub /ffmpeg.
Starting with v0.42.0, the default backend is "fast-mp3-augment"
, which performs the encode-decode
round-trip entirely in memory and in parallel threads. This makes the transform significantly faster than the older
"pydub"
and "lameenc"
backends and avoids writing temporary files to disk. Here's the result from a small benchmark
that ran 3 short audio snippets (~7-9s) through each backend:
Note: When using "fast-mp3-augment"
or "lameenc"
, these are the only supported sample rates: 8000, 11025, 12000, 16000, 22050, 24000, 32000, 44100, 48000
Input-output example
Here we input a high-quality speech recording and apply Mp3Compression
with a bitrate of 32 kbps:
Input sound | Transformed sound |
---|---|
Usage example
from audiomentations import Mp3Compression
transform = Mp3Compression(
min_bitrate=16,
max_bitrate=96,
backend="fast-mp3-augment",
preserve_delay=False,
p=1.0
)
augmented_sound = transform(my_waveform_ndarray, sample_rate=48000)
Mp3Compression API
min_bitrate
:int
• unit: kbps • range: [8,max_bitrate
]- Default:
8
. Minimum bitrate in kbps max_bitrate
:int
• unit: kbps • range: [min_bitrate
, 320]- Default:
64
. Maximum bitrate in kbps backend
:str
• choices:"fast-mp3-augment"
,"pydub"
,"lameenc"
-
Default:
"fast-mp3-augment"
."fast-mp3-augment"
: In-memory computation with parallel threads for encoding and decoding. Uses LAME encoder and minimp3 decoder under the hood. This is the recommended option."pydub"
: Uses pydub + ffmpeg under the hood. Does not delay the output compared to the input. It is comparatively slow (writes temporary files to disk). Does not supportpreserve_delay=True
."lameenc"
: Slow (writes a temporary file to disk). Introduces encoder + decoder delay, so the output is not in sync with the input. Does not supportpreserve_delay=False
. Note that bitrates below 32 kbps are only supported for low sample rates (up to 24000 Hz). As of v0.42.0, this backend is deprecated.
preserve_delay
:bool
-
Default:
False
.If
False
, the output length and timing will match the input.
IfTrue
, include LAME encoder delay + filter delay (a few tens of milliseconds) and padding in the output. This makes the output
1) longer than the input
2) delayed (out of sync) relative to the inputNormally, it makes sense to set
preserve_delay
toFalse
, but if you want outputs that include the short, almost silent part in the beginning, you here have the option to get that. quality
:int
• range: [0, 9]-
Default:
7
. LAME-specific parameter (between 0 and 9) that controls a trade-off between audio quality and speed:
0: higher quality audio at the cost of slower processing
9: faster processing at the cost of lower quality audioNote: If using
backend="pydub"
, this parameter gets silently ignored. p
:float
• range: [0.0, 1.0]- Default:
0.5
. The probability of applying this transform.