Skip to content

plugins.twitch: add --twitch-force-client-integrity, remove CI token decoding+parsing logic #6113

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 4, 2024

Conversation

bastimeyer
Copy link
Member

@bastimeyer bastimeyer commented Aug 4, 2024

Resolves #6109

  1. Adds the --twitch-force-client-integrity plugin argument (I didn't want to add another plugin arg when the CI token stuff was implemented, but since people are actively patching the plugin, let's just add the arg to make their lives a bit easier)
  2. Removes the now broken CI token decoding+parsing due to the new token format with encrypted data

I also checked removing the headless=False override, and requesting access tokens worked just fine. This however might be because they simply don't care about the is_bad_bot flag right now. So what should we do now? Remove the value override and let the user decide again via --webbrowser-headless (defaults to True)? Should Twitch make any changes though, then people will actively have to set --webbrowser-headless=false, which was the point of the override.

@Hakkin, any opinions on that?

A different client-integrity token format is returned now
which includes encrypted data, so we can't decode and parse the data
anymore in order to check whether `is_bad_bot is False`.

If `is_bad_bot is True`, then requesting the streaming access token
might fail, which is the same result.
@bastimeyer bastimeyer added the plugin issue A Plugin does not work correctly label Aug 4, 2024
@Hakkin
Copy link
Contributor

Hakkin commented Aug 4, 2024

This however might be because they simply don't care about the is_bad_bot flag right now.

I don't think this is the case. There are still some GQL requests that will fail an integrity check on default headless (with the HeadlessChrome user agent), for example, requesting further than page 1 on a stream directory listing. In my testing, these requests pass the integrity test if the HeadlessChrome user-agent is overridden with the one from the base browser.

That being said, I've just tested one of these endpoints with a token generated by headless Chromium using this patch, and it fails the integrity test.
In testing, I think it's caused by the user-agent streamlink is sending. By default it's using the built-in Chrome user-agent, which is
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36,
but the actual user-agent of my Chrome version is
User-Agent=Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36

Notice Windows NT 10.0; Win64; x64 vs X11; Linux x86_64. Forcing the user agent to the Linux version, the token passed the integrity test, so it seems this is the only discrepancy they're detecting for now. It seems the user agent logic needs to be a little more nuanced to work correctly.

If you look at the logic in Selenium:
https://github.com/seleniumbase/SeleniumBase/blob/feca61a3c283858c185596ef3d0941e519c2be31/seleniumbase/core/browser_launcher.py#L4183-L4203
They're actually starting a browser session, reading the user agent, then basing their new overridden user agent on that, so the overridden user agent will always match the base user agent rather than using a hard coded one. I'm not sure if there's a more elegant way to go about it than that.

@Hakkin
Copy link
Contributor

Hakkin commented Aug 4, 2024

Sorry, to be more clear, this patch works correctly and generates good tokens when used in non-headless mode, forcing headless is what causes the above issue, so it was in response to your question about that.

@bastimeyer
Copy link
Member Author

it was in response to your question

Yes, all good, I'm fully aware...

It seems the user agent logic needs to be a little more nuanced to work correctly.
If you look at the logic in Selenium

Reading the UA and then modifying it requires a secondary web browser launch and CDP initialization, which I tried to avoid by re-using the UA string from Streamlink's useragents module. This is obviously limited to only one platform, which is the Windows-desktop one (it was chosen because it's the one that had and still has the highest market share, so Streamlink's HTTP requests don't stand out in server logs). Reading the UA from a secondary launch could of course be cached, but it'd still require a more sophisticated initialization logic and a cache expiration, so not much is gained from this solution.

There are two other solutions though for overriding the UA header in the CDP session and that's:

  1. Intercepting and overriding each request in the client logic manually. That would require making the UA override from the launch argument optional.
  2. Not setting the --user-agent launch argument and instead calling the Emulation.setUserAgentOverride CDP method after reading+modifying the UA string from the CDP in the CDPClientSession initialization before running any client logic.

But let's not discuss this here. It's off topic. This PR is about the added plugin argument and CI token decoder removal, which is working fine. Please open another issue for potential headless changes of the Twitch plugin. Thanks.

@bastimeyer bastimeyer merged commit 6114f8d into streamlink:master Aug 4, 2024
23 checks passed
@bastimeyer bastimeyer deleted the plugins/twitch/6109 branch August 4, 2024 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plugin issue A Plugin does not work correctly
Projects
None yet
Development

Successfully merging this pull request may close these issues.

plugins.twitch: Client integrity token acquisition broken
2 participants