Skip to content

plugins.youtube: rewrite and remove API calls #3797

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 18, 2021

Conversation

bastimeyer
Copy link
Member

Resolves #3795

This removes the private API calls from youtube, as those are currently "a bit" wonky and return 404s more often than not.

As said, I have split up this rewrite into multiple commits for reviewing purposes (and also added a couple more changes since my last push mentioned in #3795), but the commits could also be squashed if you want to.

Changes are documented in each commit message.


Invalid video

$ streamlink -l debug 'https://www.youtube.com/watch?v=foooooooooo'
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/watch?v=foooooooooo
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][error] Could not get video info: Video nicht verfügbar
error: No playable streams found on this URL: https://www.youtube.com/watch?v=foooooooooo

Scheduled live stream

$ streamlink -l debug 'https://www.youtube.com/channel/UCM39V4aT21lAebPlJToSN2Q/live'
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/channel/UCM39V4aT21lAebPlJToSN2Q/live
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][error] Could not get video info: Diese Live-Veranstaltung beginnt in 3 Tage.
error: No playable streams found on this URL: https://www.youtube.com/channel/UCM39V4aT21lAebPlJToSN2Q/live

Protected

$ streamlink -l debug 'https://www.youtube.com/watch?v=UWcdeWZ_ZuM'
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/watch?v=UWcdeWZ_ZuM
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][debug] Using video ID: UWcdeWZ_ZuM
[plugins.youtube][debug] This video may be protected.
error: This plugin does not support protected videos, try youtube-dl instead

Adaptive video

$ streamlink -l debug --title '{author} - {title}' 'https://www.youtube.com/watch?v=aqz-KE-bpKQ' best
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/watch?v=aqz-KE-bpKQ
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][debug] Using video ID: aqz-KE-bpKQ
[plugins.youtube][debug] MuxedStream: v 299 a 258 = 1080p60
[plugins.youtube][debug] MuxedStream: v 308 a 258 = 1440p60
[plugins.youtube][debug] MuxedStream: v 315 a 258 = 2160p60
[plugins.youtube][debug] MuxedStream: v 302 a 258 = 720p60
[plugins.youtube][debug] MuxedStream: v 135 a 258 = 480p
[plugins.youtube][debug] MuxedStream: v 133 a 258 = 240p
[plugins.youtube][debug] MuxedStream: v 160 a 258 = 144p
[cli][info] Available streams: audio_mp4a, audio_opus, 144p (worst), 240p, 360p, 480p, 720p, 720p60, 1080p60, 1440p60, 2160p60 (best)
[cli][info] Opening stream: 2160p60 (muxed-stream)
[cli][info] Starting player: mpv
[cli.output][debug] Opening subprocess: mpv "--force-media-title=Blender - Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film" -

Adaptive video - embedded

$ streamlink -l debug --title '{author} - {title}' 'https://www.youtube.com/embed/aqz-KE-bpKQ' best
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/embed/aqz-KE-bpKQ
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][debug] Using video ID: aqz-KE-bpKQ
[plugins.youtube][debug] MuxedStream: v 299 a 258 = 1080p60
[plugins.youtube][debug] MuxedStream: v 308 a 258 = 1440p60
[plugins.youtube][debug] MuxedStream: v 315 a 258 = 2160p60
[plugins.youtube][debug] MuxedStream: v 302 a 258 = 720p60
[plugins.youtube][debug] MuxedStream: v 135 a 258 = 480p
[plugins.youtube][debug] MuxedStream: v 133 a 258 = 240p
[plugins.youtube][debug] MuxedStream: v 160 a 258 = 144p
[cli][info] Available streams: audio_mp4a, audio_opus, 144p (worst), 240p, 360p, 480p, 720p, 720p60, 1080p60, 1440p60, 2160p60 (best)
[cli][info] Opening stream: 2160p60 (muxed-stream)
[cli][info] Starting player: mpv
[cli.output][debug] Opening subprocess: mpv "--force-media-title=Blender - Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film" -

Adaptive video - short URL

$ streamlink -l debug --title '{author} - {title}' 'https://youtu.be/aqz-KE-bpKQ' best
[cli][info] Found matching plugin youtube for URL https://youtu.be/aqz-KE-bpKQ
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][debug] Using video ID: aqz-KE-bpKQ
[plugins.youtube][debug] MuxedStream: v 299 a 258 = 1080p60
[plugins.youtube][debug] MuxedStream: v 308 a 258 = 1440p60
[plugins.youtube][debug] MuxedStream: v 315 a 258 = 2160p60
[plugins.youtube][debug] MuxedStream: v 302 a 258 = 720p60
[plugins.youtube][debug] MuxedStream: v 135 a 258 = 480p
[plugins.youtube][debug] MuxedStream: v 133 a 258 = 240p
[plugins.youtube][debug] MuxedStream: v 160 a 258 = 144p
[cli][info] Available streams: audio_mp4a, audio_opus, 144p (worst), 240p, 360p, 480p, 720p, 720p60, 1080p60, 1440p60, 2160p60 (best)
[cli][info] Opening stream: 2160p60 (muxed-stream)
[cli][info] Starting player: mpv
[cli.output][debug] Opening subprocess: mpv "--force-media-title=Blender - Big Buck Bunny 60fps 4K - Official Blender Foundation Short Film" -

HLS Live

$ streamlink -l debug --title '{author} - {title}' 'https://www.youtube.com/watch?v=Rf4jJzziJko' best
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/watch?v=Rf4jJzziJko
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][debug] Using video ID: Rf4jJzziJko
[plugins.youtube][debug] This video is live.
[plugins.youtube][debug] This video may be protected.
[utils.l10n][debug] Language code: en_US
[cli][info] Available streams: 144p (worst), 240p, 360p, 480p, 720p, 1080p (best)
[cli][info] Opening stream: 1080p (hls)
[cli][info] Starting player: mpv
[cli.output][debug] Opening subprocess: mpv "--force-media-title=Hospital Records - Drum & Bass Non-Stop Liquid - To Relax/Chill To" -

HLS Live - channel URL

$ streamlink -l debug --title '{author} - {title}' 'https://www.youtube.com/channel/UCw49uOTAJjGUdoAeUcp7tOg/live' best
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/channel/UCw49uOTAJjGUdoAeUcp7tOg/live
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][debug] Using video ID: Rf4jJzziJko
[plugins.youtube][debug] This video is live.
[plugins.youtube][debug] This video may be protected.
[utils.l10n][debug] Language code: en_US
[cli][info] Available streams: 144p (worst), 240p, 360p, 480p, 720p, 1080p (best)
[cli][info] Opening stream: 1080p (hls)
[cli][info] Starting player: mpv
[cli.output][debug] Opening subprocess: mpv "--force-media-title=Hospital Records - Drum & Bass Non-Stop Liquid - To Relax/Chill To" -

HLS Live - embedded

$ streamlink -l debug --title '{author} - {title}' 'https://www.youtube.com/embed/Rf4jJzziJko' best
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/embed/Rf4jJzziJko
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][debug] Using video ID: Rf4jJzziJko
[plugins.youtube][debug] This video is live.
[plugins.youtube][debug] This video may be protected.
[utils.l10n][debug] Language code: en_US
[cli][info] Available streams: 144p (worst), 240p, 360p, 480p, 720p, 1080p (best)
[cli][info] Opening stream: 1080p (hls)
[cli][info] Starting player: mpv
[cli.output][debug] Opening subprocess: mpv "--force-media-title=Hospital Records - Drum & Bass Non-Stop Liquid - To Relax/Chill To" -

HLS Live - embedded channel URL (requires additional request of canonical URL)

$ streamlink -l debug --title '{author} - {title}' 'https://www.youtube.com/embed/live_stream?channel=UCw49uOTAJjGUdoAeUcp7tOg' best
[cli][info] Found matching plugin youtube for URL https://www.youtube.com/embed/live_stream?channel=UCw49uOTAJjGUdoAeUcp7tOg
[plugins.youtube][debug] c_data_keys: gl, m, pc, continue, ca, x, v, t, hl, src, uxe
[plugins.youtube][debug] Using video ID: Rf4jJzziJko
[plugins.youtube][debug] This video is live.
[plugins.youtube][debug] This video may be protected.
[utils.l10n][debug] Language code: en_US
[cli][info] Available streams: 144p (worst), 240p, 360p, 480p, 720p, 1080p (best)
[cli][info] Opening stream: 1080p (hls)
[cli][info] Starting player: mpv
[cli.output][debug] Opening subprocess: mpv "--force-media-title=Hospital Records - Drum & Bass Non-Stop Liquid - To Relax/Chill To" -

@bastimeyer bastimeyer added the plugin issue A Plugin does not work correctly label Jun 17, 2021
@bastimeyer bastimeyer changed the title plugins.youtube: move stuff from global scope plugins.youtube: rewrite and remove API calls Jun 17, 2021
@bastimeyer
Copy link
Member Author

bastimeyer commented Jun 17, 2021

@back-to
3d2848a#r52311523

won't work for videos with }; in the title
https://www.youtube.com/watch?v=CqSfYBGfYEc
error: Unable to parse JSON: Unterminated string starting at:

This can't be fixed entirely unless we parse the JS code and get the JSON data from the AST.

I could change the regex and extract the entire JS from the <script> tag and then give the regex a stronger anchor at the end, but that doesn't solve the problem entirely, as titles with };\s*var would also break the regex.


This is the entire script tag content:

<script nonce="...">var ytInitialPlayerResponse = {};var meta = document.createElement('meta'); meta.name = 'referrer'; meta.content = 'origin-when-cross-origin'; document.getElementsByTagName('head')[0].appendChild(meta);</script>

@bastimeyer
Copy link
Member Author

@back-to
Making the regex super strict has the potential to break the plugin as soon as YT adjusts a tiny thing

diff --git a/src/streamlink/plugins/youtube.py b/src/streamlink/plugins/youtube.py
index 3b5b2531..eaf3f26a 100644
--- a/src/streamlink/plugins/youtube.py
+++ b/src/streamlink/plugins/youtube.py
@@ -38,7 +38,13 @@ class YouTube(Plugin):
         https?://youtu\.be/(?P<video_id_short>[0-9A-z_-]{11})
     """, re.VERBOSE)
 
-    _re_ytInitialPlayerResponse = re.compile(r"""var ytInitialPlayerResponse\s*=\s*({.*?});""", re.DOTALL)
+    _re_ytInitialPlayerResponse = re.compile(r"""
+        var\s+ytInitialPlayerResponse\s*=\s*({.*?});\s*
+        var\s+meta\s*=\s*document\.createElement\('meta'\);\s*
+        meta\.name\s*=\s*'referrer';\s*
+        meta\.content\s*=\s*'origin-when-cross-origin';\s*
+        document.getElementsByTagName\('head'\)\[0]\.appendChild\(meta\);
+    """, re.VERBOSE | re.DOTALL)
     _re_mime_type = re.compile(r"""^(?P<type>\w+)/(?P<container>\w+); codecs="(?P<codecs>.+)"$""")
 
     _url_canonical = "https://www.youtube.com/watch?v={video_id}"

- move stuff from global stuff to plugin class
- remove unneeded oembed metadata stuff
By translating URLs directly, this will save at least one
redirected HTTP request later on and thus reduce init time.
- Replace `_find_video_id` and `_get_stream_info` with `_get_data`
  Read data from embedded `ytInitialPlayerResponse` JSON
  Redirect to canonical URL if required (eg. embedded live streams)
- Split validation schema into three and optimize:
  1. playabilitystatus
  2. videodetails
  3. streamingdata
- Refactor `_get_streams` and properly use new validation schemas
  Treat all streams without a URL property as protected
- Refactor and reformat `_create_adaptive_streams`
This fixes parameters in the consent-submit-redirection when a URL with
an ampersand was set.
@bastimeyer bastimeyer force-pushed the plugins/youtube/rewrite branch from cfd7329 to eabf228 Compare June 17, 2021 22:08
@bastimeyer
Copy link
Member Author

I've added \s*var\s+meta\s*= to the regex, as a compromise, since it's way more unlikely that someone has set a title that matches the regex's anchor };\s*var\s+meta\s*= than just }; and there's more room for YT to make changes without breaking the plugin.

Diff:
https://github.com/streamlink/streamlink/compare/cfd7329455781f92e8b45cc8e81ad810072905b4..eabf22875d72b223ccd47c08ede25c14a9a13144

I've also squashed the first two commits into one, which is better if you want to keep the individual commits when merging.

@tfdahlin
Copy link

@back-to
3d2848a#r52311523

won't work for videos with }; in the title
https://www.youtube.com/watch?v=CqSfYBGfYEc
error: Unable to parse JSON: Unterminated string starting at:

This can't be fixed entirely unless we parse the JS code and get the JSON data from the AST.

I could change the regex and extract the entire JS from the <script> tag and then give the regex a stronger anchor at the end, but that doesn't solve the problem entirely, as titles with };\s*var would also break the regex.

This is the entire script tag content:

<script nonce="...">var ytInitialPlayerResponse = {};var meta = document.createElement('meta'); meta.name = 'referrer'; meta.content = 'origin-when-cross-origin'; document.getElementsByTagName('head')[0].appendChild(meta);</script>

I'm here because somebody linked to this in a pytube issue, wanted to look at your solution to the get_video_info problem. If you're interested in doing parsing like this, I actually wrote a parser that I haven't had problems with for doing exactly this over here.

Out of curiosity, did you test your changes against age-restricted videos? That's the stumbling block that I'm running into right now using the innertube API endpoints, and I'm wondering if you happen to know if the stream data for age-restricted videos is available on the page or not

@bastimeyer
Copy link
Member Author

Custom parsing solutions are always bad, same with cheap regexes like this, and the only true solution when trying to extract data from JS is using a parser which supports the latest ECMA versions and their syntaxes and then reading and traversing the generated AST. We've had these discussions several times now (eg. #2534), but the proposed python dependencies were either outdated or unmaintained and thus not ideal. I hope we can spark up a discussion again now that the youtube plugin is affected by this problem. But let's not discuss this here.

No, I have not checked all types of videos with these changes. My OP post has a list of types I've checked. YT-Authentication was not supported before and testing age-restricted videos therefore doesn't make much sense.

@tfdahlin
Copy link

I wasn't sure if your OP was extensive or not for what you had tested. Pytube doesn't support yt auth either, but by passing certain parameters to the get_video_info endpoint, we were able to extract stream data for age-restricted videos without auth which is why I was asking.

@back-to back-to merged commit a054715 into streamlink:master Jun 18, 2021
@back-to
Copy link
Collaborator

back-to commented Jun 18, 2021

It should work with most of the videos, other small changes can be done later on.

@bastimeyer bastimeyer deleted the plugins/youtube/rewrite branch June 18, 2021 12:09
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request Jun 19, 2021
…#3797)

By translating URLs directly, this will save at least one
redirected HTTP request later on and thus reduce init time.
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request Jun 19, 2021
- Replace `_find_video_id` and `_get_stream_info` with `_get_data`
  Read data from embedded `ytInitialPlayerResponse` JSON
  Redirect to canonical URL if required (eg. embedded live streams)
- Split validation schema into three and optimize:
  1. playabilitystatus
  2. videodetails
  3. streamingdata
- Refactor `_get_streams` and properly use new validation schemas
  Treat all streams without a URL property as protected
- Refactor and reformat `_create_adaptive_streams`
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request Jun 19, 2021
This fixes parameters in the consent-submit-redirection when a URL with
an ampersand was set.
@vstavrinov
Copy link
Contributor

You make "live" suffix mandatory for "/c/" and "/user/" URLs. Is it a bug or feature?

This works:

mpv http://youtube.com/c/SkyNews

But this doesn't work:

streamlink http://youtube.com/c/SkyNews

@bastimeyer
Copy link
Member Author

You make "live" suffix mandatory for "/c/" and "/user/" URLs. Is it a bug or feature?

Intentional change, as there's no data embedded for finding the live streams.

Channel URLs without /live suffix could be translated and /live appended in this case and Streamlink could then fail regularly if there's no live stream available, but I didn't think about that.

@vstavrinov
Copy link
Contributor

You shouldn't append this suffix, because the cases exist where different live streams available with and without the suffix.

@bastimeyer
Copy link
Member Author

This works:
mpv http://youtube.com/c/SkyNews
cases exist where different live streams are available with and without the suffix

When a channel has multiple live streams (eg. https://youtube.com/hospitalrecords - perfect to test, as there are always two live streams online), then Youtube will redirect to the first one when accessing the channel URL with the /live suffix.

Implementing a URL translation in Streamlink when the /live suffix is missing is a tradeoff, because providing a channel URL without /live suffix is an invalid input URL, as it's not a canonical live stream URL.

Your mpv/yt-dl example doesn't make sense, as it'll access all channel uploads and will return a playlist of multiple videos. This logic does not apply to Streamlink.

@vstavrinov
Copy link
Contributor

When a channel has multiple live streams (eg. https://youtube.com/hospitalrecords - perfect to test, as there are always two live streams online), then Youtube will redirect to the first one when accessing the channel URL with the /live suffix.

Not always. In the case of SkyNews it is last, not first.

Implementing a URL translation in Streamlink when the /live suffix is missing is a tradeoff, because providing a channel URL without /live suffix is an invalid input URL, as it's not a canonical live stream URL.

The fact is that before the last update, streamlink could play different streams with and without the suffix, so You could choose what You need. Yes, we can use video id for the exact choice, but it is not permanent and requires sometimes a playlist update.

Your mpv/yt-dl example doesn't make sense, as it'll access all channel uploads and will return a playlist of multiple videos. This logic does not apply to Streamlink.

That is better than nothing. And this means that they can find the streams in some way. And in most cases, the first of them is a live stream.

@bastimeyer
Copy link
Member Author

Streamlink can't use the old Youtube API anymore (the whole reason for this change), so what you're expecting / asking for is not possible anymore. If you don't provide a canonical livestream/video URL, and a proper URL translation is not possible like in the case of channel URLs, then Streamlink can't do anything about it. Streamlink can't output playlists or multiple videos, and whatever other tools will or can do is irrelevant.

@vstavrinov
Copy link
Contributor

URL without the suffix is the main URL for tv channels where their main live stream is present. In the past, I have encountered cases where the URL with suffix leads to something different. Thus URL without the suffix is main, while the URL with the suffix is auxiliary. So if now streamlink couldn't play URL without the suffix that means that it has lost the essential feature for some ridiculous ideological reasons. Along with this feature, You can lose a certain number of users this way, who may prefer youtube-dl in such specific cases at least. That is because the final decision is made on what URL (i.e. TV channel) You can play and not on ideological preferences.

back-to added a commit to back-to/streamlink that referenced this pull request Jul 17, 2021
- Revert `isLiveContent` from streamlink#3797 use `isLive`
- Added more log details for `videoDetails`

basically same as streamlink#3026

closes streamlink#3872
back-to added a commit that referenced this pull request Jul 20, 2021
- Revert `isLiveContent` from #3797 use `isLive`
- Added more log details for `videoDetails`

basically same as #3026

closes #3872
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request Jul 22, 2021
…#3872)

- Revert `isLiveContent` from streamlink#3797 use `isLive`
- Added more log details for `videoDetails`

basically same as streamlink#3026

closes streamlink#3872
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin issue A Plugin does not work correctly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

plugins.youtube.py: 404 Intermittent and Repeatable (and related, but not duplicate of 3724 of other 404 issues)
4 participants