Skip to content

stream.hls: refactor and fix attribute list parser #5125

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bastimeyer
Copy link
Member

This fixes and cleans up the M3U8Parser's attribute list regex+parser and some individual attribute parser methods, and it adds missing unit tests.

The old attribute list regex was incorrect and was also missing some stuff. This is now fixed and it's a bit more strict when parsing the attribute list, but it's not doing that 100% according to the spec either, because the previous one also wasn't (and shouldn't, to be able to support broken/invalid playlists).
https://www.rfc-editor.org/rfc/rfc8216#section-4.2

One key difference here is that the entire attribute list gets discarded if there's a parsing issue with one of the attributes in the attributes list. Previously, this was not the case and the already parsed attributes of a specific tag were returned and it kept parsing via re.finditer(), which could lead to invalid results. That's also the reason, why the comma separator has been added to the regex.

It also adds a couple of warning log messages, which can be useful, even though the input string is not included in the log message. I didn't add warnings for everything, though.

But before I'm opening this for merging, I will check and see if this breaks the parsing of actual production playlists.

@bastimeyer bastimeyer marked this pull request as draft January 28, 2023 11:14
@bastimeyer bastimeyer marked this pull request as ready for review February 4, 2023 12:07
- Rewrite and fix attribute list regex
- Be more strict when parsing multiple attributes and expect the
  mandatory comma separator, but be lenient about spaces surrounding
  attributes (off-spec), similar to the old implementation
- Add missing support for signed decimal floating point numbers
- Fix invalid range of hexadecimal sequences and allow uppercase prefix
- Fix character range of enumerated strings being too narrow
- Fix character range of quoted strings
- Discard entire attribute list on parsing failure and emit warning
- Add unit tests
- Turn methods into classmethods
- Emit warning messages on parsing failure
- Add unit tests
@bastimeyer bastimeyer force-pushed the stream/hls/refactor-m3u8parser branch from adfc61a to 9e9c0a6 Compare February 6, 2023 22:10
@gravyboat
Copy link
Member

@bastimeyer Is there any further testing you want to do or is this ready to go? Code looks good.

@bastimeyer bastimeyer merged commit 04da9f5 into streamlink:master Feb 7, 2023
@bastimeyer bastimeyer deleted the stream/hls/refactor-m3u8parser branch February 7, 2023 08:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants