Skip to content

Mp3Compression

Added in v0.12.0, updated in v0.42.0

Compress the audio using an MP3 encoder to lower the audio quality. This may help machine learning models deal with compressed, low-quality audio.

This transform depends on fast-mp3-augment , lameenc or pydub /ffmpeg.

Starting with v0.42.0, the default backend is "fast-mp3-augment", which performs the encode-decode round-trip entirely in memory and in parallel threads. This makes the transform significantly faster than the older "pydub" and "lameenc" backends and avoids writing temporary files to disk. Here's the result from a small benchmark that ran 3 short audio snippets (~7-9s) through each backend:

Mp3Compression backend performance benchmark results

Note: When using "fast-mp3-augment" or "lameenc", these are the only supported sample rates: 8000, 11025, 12000, 16000, 22050, 24000, 32000, 44100, 48000

Input-output example

Here we input a high-quality speech recording and apply Mp3Compression with a bitrate of 32 kbps:

Input-output waveforms and spectrograms

Input sound Transformed sound

Usage example

from audiomentations import Mp3Compression

transform = Mp3Compression(
    min_bitrate=16,
    max_bitrate=96,
    backend="fast-mp3-augment",
    preserve_delay=False,
    p=1.0
)

augmented_sound = transform(my_waveform_ndarray, sample_rate=48000)

Mp3Compression API

min_bitrate: int • unit: kbps • range: [8, max_bitrate]
Default: 8. Minimum bitrate in kbps
max_bitrate: int • unit: kbps • range: [min_bitrate, 320]
Default: 64. Maximum bitrate in kbps
backend: str • choices: "fast-mp3-augment", "pydub", "lameenc"

Default: "fast-mp3-augment".

  • "fast-mp3-augment": In-memory computation with parallel threads for encoding and decoding. Uses LAME encoder and minimp3 decoder under the hood. This is the recommended option.
  • "pydub": Uses pydub + ffmpeg under the hood. Does not delay the output compared to the input. It is comparatively slow (writes temporary files to disk). Does not support preserve_delay=True.
  • "lameenc": Slow (writes a temporary file to disk). Introduces encoder + decoder delay, so the output is not in sync with the input. Does not support preserve_delay=False. Note that bitrates below 32 kbps are only supported for low sample rates (up to 24000 Hz). As of v0.42.0, this backend is deprecated.
preserve_delay: bool

Default: False.

If False, the output length and timing will match the input.
If True, include LAME encoder delay + filter delay (a few tens of milliseconds) and padding in the output. This makes the output
1) longer than the input
2) delayed (out of sync) relative to the input

Normally, it makes sense to set preserve_delay to False, but if you want outputs that include the short, almost silent part in the beginning, you here have the option to get that.

quality: int • range: [0, 9]

Default: 7. LAME-specific parameter (between 0 and 9) that controls a trade-off between audio quality and speed:
0: higher quality audio at the cost of slower processing
9: faster processing at the cost of lower quality audio

Note: If using backend="pydub", this parameter gets silently ignored.

p: float • range: [0.0, 1.0]
Default: 0.5. The probability of applying this transform.

Source code

audiomentations/augmentations/mp3_compression.py