Skip to content

Handle gapless MP3 Xing/Info durations#3198

Open
Tolriq wants to merge 1 commit intoandroidx:mainfrom
Tolriq:mp3_gapless
Open

Handle gapless MP3 Xing/Info durations#3198
Tolriq wants to merge 1 commit intoandroidx:mainfrom
Tolriq:mp3_gapless

Conversation

@Tolriq
Copy link
Copy Markdown
Contributor

@Tolriq Tolriq commented May 5, 2026

LAME Xing/Info headers can include encoder delay and padding. Mp3Extractor already propagates those values into Format so decoded audio is rendered gaplessly, but the SeekMap duration was still computed from the untrimmed MPEG frame count. For CBR Info files this can expose a source duration that is longer than the samples that will actually be played, which can break gapless transitions.

Split the Xing/Info duration into two concepts: raw duration from the frame count, used for bitrate and byte-position calculations, and gapless duration with encoder delay and padding removed, exposed from SeekMap.

Keep CBR average-bitrate derivation on the raw duration to avoid changing byte-position seeking. When a CBR seeker is given an explicit gapless duration, map seeks at the advertised end of the stream to the raw data end so the SeekMap endpoint contract is preserved.

Add focused tests for raw vs gapless duration calculation, CBR seek endpoint handling, and Info-frame duration/bitrate behavior, and update affected extractor dumps.

Fixes #3183.

@Tolriq Tolriq mentioned this pull request May 5, 2026
1 task
@icbaker icbaker self-assigned this May 6, 2026
@icbaker icbaker self-requested a review May 6, 2026 15:56
@icbaker
Copy link
Copy Markdown
Collaborator

icbaker commented May 6, 2026

GitHub isn't letting me add review comments for some reason, so here's a quick bit of initial feedback


The part of this PR related to the LAME_TO_DECODED_PCM_TRIM_OFFSET_SAMPLES constant & its usages seem unrelated to the duration vs raw duration split. Please can you send this as a separate PR?


In the bear-vbr-xing-header-no-toc.mp3.cbr-seeking-always.0.dump dump file, I'd expect getPosition(DURATION) to return timeUs=DURATION (concretely: getPosition(2783979) = [[timeUs=2783979, position=38396]]). But it seems timeUs is still returning the 'raw' duration (same for many other dump files).

Xing/Info frame counts describe the raw MP3 frame timeline, but SeekMap#getDurationUs should expose the gapless playback duration when encoder delay and padding are present.

Keep both values in MP3 seekers: use the raw duration for bitrate and byte-position calculations, and use the gapless duration for the public duration and end-of-stream seek points. This makes getPosition(durationUs) return a seek point whose timeUs is the requested public duration.

Issue: androidx#3183

Test: ./gradlew :lib-extractor:testDebugUnitTest --tests androidx.media3.extractor.mp3.ConstantBitrateSeekerTest --tests androidx.media3.extractor.mp3.Mp3ExtractorTest --tests androidx.media3.extractor.mp3.XingFrameTest
@Tolriq
Copy link
Copy Markdown
Contributor Author

Tolriq commented May 6, 2026

Yeah sorry I did not fully verify that part, I completely underestimated the complete mess that is mp3 seeking and gapless before going into this.

I've moved the lame part in another PR, but used AI to rebase it on main to not depend on this one. It's probably best to not really look at the second one deeply until this is one is done or not.

* Returns the raw duration before subtracting encoder delay/padding, or {@link C#TIME_UNSET} if
* not known.
*/
default long getRawDurationUs() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm wondering if we can avoid introducing this method (and therefore avoid having to distinguish between these concepts of 'duration' and 'raw duration').

It looks like it's used in two ways:

  1. To calculate a bitrate in Mp3Extractor as part of the 'enhanced CBR' logic
  2. In the implementation of ConstantBitrateSeeker.getSeekPoints to ensure the final byte position is reported correctly.

For (1), I see that Seeker already has a getAverageBitrate() method, which should fit for the use-case (based on its documentation of Returns the average bitrate (usually derived from the duration and length of the file)) - except it seems that XingSeeker and VbriSeeker implement this incorrectly by directly returning the bitrate of the MPEG frame that contains the seek metadata.

For (2) I think you can use getDataEndPosition() instead?

If this sounds right, would you be up for:

  1. Sending a PR to fix the bug in the getAverageBitrate() impls in XingSeeker and VbriSeeker
  2. Rebasing this PR on top once that's merged, and then removing getRawDurationUs and using getAverageBitrate() and getDataEndPosition() instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use Codex to split things and move things yes if that's not a problem for you. Way less time consuming.

Also Codex: Caveat: getDataEndPosition() is documented as immediately after audio data, so using it directly may be one byte/frame past the desired seek point. The implementation should likely use the last valid seek position derived from data end, for example dataEndPosition - normalizedFrameSize, while returning timeUs = durationUs. we start to be a little outside of my confort zone so I don't really have value on the decisions, just did tests and debug to try to reach solutions. I now mostly rely on your expertise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants