-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
plugins.youtube: rewrite and remove API calls #3797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
plugins.youtube: rewrite and remove API calls #3797
Conversation
This can't be fixed entirely unless we parse the JS code and get the JSON data from the AST. I could change the regex and extract the entire JS from the This is the entire script tag content: <script nonce="...">var ytInitialPlayerResponse = {};var meta = document.createElement('meta'); meta.name = 'referrer'; meta.content = 'origin-when-cross-origin'; document.getElementsByTagName('head')[0].appendChild(meta);</script> |
@back-to diff --git a/src/streamlink/plugins/youtube.py b/src/streamlink/plugins/youtube.py
index 3b5b2531..eaf3f26a 100644
--- a/src/streamlink/plugins/youtube.py
+++ b/src/streamlink/plugins/youtube.py
@@ -38,7 +38,13 @@ class YouTube(Plugin):
https?://youtu\.be/(?P<video_id_short>[0-9A-z_-]{11})
""", re.VERBOSE)
- _re_ytInitialPlayerResponse = re.compile(r"""var ytInitialPlayerResponse\s*=\s*({.*?});""", re.DOTALL)
+ _re_ytInitialPlayerResponse = re.compile(r"""
+ var\s+ytInitialPlayerResponse\s*=\s*({.*?});\s*
+ var\s+meta\s*=\s*document\.createElement\('meta'\);\s*
+ meta\.name\s*=\s*'referrer';\s*
+ meta\.content\s*=\s*'origin-when-cross-origin';\s*
+ document.getElementsByTagName\('head'\)\[0]\.appendChild\(meta\);
+ """, re.VERBOSE | re.DOTALL)
_re_mime_type = re.compile(r"""^(?P<type>\w+)/(?P<container>\w+); codecs="(?P<codecs>.+)"$""")
_url_canonical = "https://www.youtube.com/watch?v={video_id}" |
- move stuff from global stuff to plugin class - remove unneeded oembed metadata stuff
By translating URLs directly, this will save at least one redirected HTTP request later on and thus reduce init time.
- Replace `_find_video_id` and `_get_stream_info` with `_get_data` Read data from embedded `ytInitialPlayerResponse` JSON Redirect to canonical URL if required (eg. embedded live streams) - Split validation schema into three and optimize: 1. playabilitystatus 2. videodetails 3. streamingdata - Refactor `_get_streams` and properly use new validation schemas Treat all streams without a URL property as protected - Refactor and reformat `_create_adaptive_streams`
This fixes parameters in the consent-submit-redirection when a URL with an ampersand was set.
cfd7329
to
eabf228
Compare
I've added I've also squashed the first two commits into one, which is better if you want to keep the individual commits when merging. |
I'm here because somebody linked to this in a pytube issue, wanted to look at your solution to the get_video_info problem. If you're interested in doing parsing like this, I actually wrote a parser that I haven't had problems with for doing exactly this over here. Out of curiosity, did you test your changes against age-restricted videos? That's the stumbling block that I'm running into right now using the innertube API endpoints, and I'm wondering if you happen to know if the stream data for age-restricted videos is available on the page or not |
Custom parsing solutions are always bad, same with cheap regexes like this, and the only true solution when trying to extract data from JS is using a parser which supports the latest ECMA versions and their syntaxes and then reading and traversing the generated AST. We've had these discussions several times now (eg. #2534), but the proposed python dependencies were either outdated or unmaintained and thus not ideal. I hope we can spark up a discussion again now that the youtube plugin is affected by this problem. But let's not discuss this here. No, I have not checked all types of videos with these changes. My OP post has a list of types I've checked. YT-Authentication was not supported before and testing age-restricted videos therefore doesn't make much sense. |
I wasn't sure if your OP was extensive or not for what you had tested. Pytube doesn't support yt auth either, but by passing certain parameters to the |
It should work with most of the videos, other small changes can be done later on. |
…#3797) By translating URLs directly, this will save at least one redirected HTTP request later on and thus reduce init time.
- Replace `_find_video_id` and `_get_stream_info` with `_get_data` Read data from embedded `ytInitialPlayerResponse` JSON Redirect to canonical URL if required (eg. embedded live streams) - Split validation schema into three and optimize: 1. playabilitystatus 2. videodetails 3. streamingdata - Refactor `_get_streams` and properly use new validation schemas Treat all streams without a URL property as protected - Refactor and reformat `_create_adaptive_streams`
This fixes parameters in the consent-submit-redirection when a URL with an ampersand was set.
You make "live" suffix mandatory for "/c/" and "/user/" URLs. Is it a bug or feature? This works:
But this doesn't work:
|
Intentional change, as there's no data embedded for finding the live streams.
Channel URLs without |
You shouldn't append this suffix, because the cases exist where different live streams available with and without the suffix. |
When a channel has multiple live streams (eg. https://youtube.com/hospitalrecords - perfect to test, as there are always two live streams online), then Youtube will redirect to the first one when accessing the channel URL with the Implementing a URL translation in Streamlink when the Your mpv/yt-dl example doesn't make sense, as it'll access all channel uploads and will return a playlist of multiple videos. This logic does not apply to Streamlink. |
Not always. In the case of SkyNews it is last, not first.
The fact is that before the last update, streamlink could play different streams with and without the suffix, so You could choose what You need. Yes, we can use video id for the exact choice, but it is not permanent and requires sometimes a playlist update.
That is better than nothing. And this means that they can find the streams in some way. And in most cases, the first of them is a live stream. |
Streamlink can't use the old Youtube API anymore (the whole reason for this change), so what you're expecting / asking for is not possible anymore. If you don't provide a canonical livestream/video URL, and a proper URL translation is not possible like in the case of channel URLs, then Streamlink can't do anything about it. Streamlink can't output playlists or multiple videos, and whatever other tools will or can do is irrelevant. |
URL without the suffix is the main URL for tv channels where their main live stream is present. In the past, I have encountered cases where the URL with suffix leads to something different. Thus URL without the suffix is main, while the URL with the suffix is auxiliary. So if now streamlink couldn't play URL without the suffix that means that it has lost the essential feature for some ridiculous ideological reasons. Along with this feature, You can lose a certain number of users this way, who may prefer youtube-dl in such specific cases at least. That is because the final decision is made on what URL (i.e. TV channel) You can play and not on ideological preferences. |
- Revert `isLiveContent` from streamlink#3797 use `isLive` - Added more log details for `videoDetails` basically same as streamlink#3026 closes streamlink#3872
…#3872) - Revert `isLiveContent` from streamlink#3797 use `isLive` - Added more log details for `videoDetails` basically same as streamlink#3026 closes streamlink#3872
Resolves #3795
This removes the private API calls from youtube, as those are currently "a bit" wonky and return 404s more often than not.
As said, I have split up this rewrite into multiple commits for reviewing purposes (and also added a couple more changes since my last push mentioned in #3795), but the commits could also be squashed if you want to.
Changes are documented in each commit message.
Invalid video
Scheduled live stream
Protected
Adaptive video
Adaptive video - embedded
Adaptive video - short URL
HLS Live
HLS Live - channel URL
HLS Live - embedded
HLS Live - embedded channel URL (requires additional request of canonical URL)