Skip to content

stream.hls: parse M3U8 from Response obj directly #4552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

bastimeyer
Copy link
Member

@bastimeyer bastimeyer commented May 25, 2022


Passing the HTTP Response object to the parser and iterating its content prevents having to keep a copy of the entire response content in memory. Support for str is kept for backwards compatibility and because some tests rely on it and would need to get rewritten.

The content is now also always read as UTF-8, as defined by RFC 8216. This was previously guessed by chardet/charset_normalizer if no HTTP response headers were set (see #4329) and it required custom overrides if there were issues while figuring out the unknown encodings. All HLS tests have been using implicit utf-8 encoding the entire time. We could add tests for invalid encodings in the future, but I don't think it's important.


I have a couple more changes planned for the HLSStream+ and M3U8+ implementations. But that's not ready yet and unrelated to this PR. Just want to quickly talk about that.

One of the changes is using typing.Generic + typing.TypeVar for making it easier to subclass HLS streams, the parser and other stuff, without having to suppress invalid type informations or method signatures (see the Twitch plugin for example). Using dataclasses instead of named tuples is also one of the things which will make subclassing/extending HLS logic easier.

Another change is passing the parsed master playlist object to the HLSStreams created by parse_variant_playlist, so additional data can be read by the media playlists and their parsers. It currently only adds the master playlist URL to the {Muxed,}HLSStream for the to_manifest_url method (I also want to change this interface eventually, but that may be a breaking change). I noticed the master playlist issue while rewriting Twitch's low latency stuff two days ago, because there's metadata in the master playlist that should be read by the media playlist. It's also relevant for EXT-X-SESSION-DATA and EXT-X-SESSION-KEY, which are currently not implemented.

@bastimeyer bastimeyer force-pushed the stream/hls/parse-http-responses branch from 27a4344 to 4f4bd5c Compare May 25, 2022 17:21
@bastimeyer bastimeyer changed the title stream.hls: parse HLS from Response obj directly stream.hls: parse M3U8 from Response obj directly May 25, 2022
@gravyboat
Copy link
Member

One of the changes is using typing.Generic + typing.TypeVar for making it easier to subclass HLS streams...

This sounds like a valid change and I don't see any downside to it.

Another change is passing the parsed master playlist object to the HLSStreams created by...

I'm not too concerned with breaking changes. I'm not really sure what other feedback to provide here, what would be the downside of implementing this change?


Is there more you want to do in this PR? If not I'll merge it, the changes look good.

- Accept both `str` and `requests.Response` in `M3U8Parser.parse`
- Always set encoding of variant and media HLS playlists to utf-8
  https://datatracker.ietf.org/doc/html/rfc8216#section-9
- Remove old custom utf-8 overrides
@bastimeyer bastimeyer force-pushed the stream/hls/parse-http-responses branch from 4f4bd5c to 4870866 Compare May 25, 2022 22:16
@bastimeyer
Copy link
Member Author

No, there's nothing to be added here apart from new tests which test invalid encodings, which is not that useful.

requests.Response.iter_lines reference here, btw:
https://github.com/psf/requests/blob/v2.27.1/requests/models.py#L794

Compared to requests.Response.text:
https://github.com/psf/requests/blob/v2.27.1/requests/models.py#L846

@gravyboat gravyboat merged commit d8f0e1e into streamlink:master May 25, 2022
@bastimeyer bastimeyer deleted the stream/hls/parse-http-responses branch May 25, 2022 22:38
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 27, 2022
…#4552)

- Accept both `str` and `requests.Response` in `M3U8Parser.parse`
- Always set encoding of variant and media HLS playlists to utf-8
  https://datatracker.ietf.org/doc/html/rfc8216#section-9
- Remove old custom utf-8 overrides
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants