Skip to content

webbrowser.cdp.client: update UA in headless mode #6114

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

bastimeyer
Copy link
Member

@bastimeyer bastimeyer commented Aug 4, 2024

  • Add headless keyword and attribute to CDPClient
  • If in headless mode, read user-agent from CDP, remove headless hint, and then update UA via Network.setUserAgentOverride()
  • Remove --user-agent=... Chromium launch argument if in headless mode as it also overrides the UA platform

Replaces #6111

See #6113
/ping @Hakkin

No changes to any plugins / CDPClient implementation yet...

Overriding the UA string via Network.setUserAgentOverride() worked in my AWS-WAF test. I don't know the difference between this and Emulation.setUserAgentOverride() though. The method from the CDP emulation domain might do more than overriding network request headers.

I also avoided any other params like userAgentMetadata because I don't think it's necessary, as we're only interested in removing the "Headless" part from the UA string, and not change any platform values. So the defaults will be kept by Chromium.


So in order to test this in the Twitch plugin, the forced headless=False parameter needs to be removed and then specific GQL API endpoints need to be checked with the CI token included. As said in #6113, the streaming access tokens seem to work regardless the (now hidden) is_bad_bot status, while some GQL API endpoints allegedly fail. I haven't tested this yet and will check tomorrow. That's it for me today.

- Add `headless` keyword and attribute to `CDPClient`
- If in headless mode, read user-agent from CDP, remove headless hint,
  and then update UA via `Network.setUserAgentOverride()`
- Remove `--user-agent=...` Chromium launch argument if in headless mode
  as it also overrides the UA platform
@Hakkin
Copy link
Contributor

Hakkin commented Aug 5, 2024

Just tested, this seems to have fixed it, the headless tokens verify integrity now, at least from Linux where they previously failed. Also re-tested on Brave, Edge and Opera from Linux and all pass in headless as well.

@bastimeyer
Copy link
Member Author

bastimeyer commented Aug 5, 2024

The CI tokens (including device IDs) generated by the Twitch plugin, regardless whether headless is true or false, result in a failed integrity check when used in GQL requests. Like I've said, requesting the streaming access token works regardless whether the CI token is valid or not.

$ curl 'https://gql.twitch.tv/gql' \
        -H 'accept: */*' \
        -H 'accept-language: en-US' \
        -H 'cache-control: no-cache' \
        -H 'client-id: kimne78kx3ncx6brgo4mv6wki5h1ko' \
        -H 'client-integrity: CI-TOKEN-HERE' \
        -H 'client-session-id: SESSION-ID' \
        -H 'client-version: 2b8a9a7a-d732-4442-9fc4-124d9748666d' \
        -H 'content-type: text/plain;charset=UTF-8' \
        -H 'dnt: 1' \
        -H 'origin: https://www.twitch.tv' \
        -H 'pragma: no-cache' \
        -H 'priority: u=1, i' \
        -H 'referer: https://www.twitch.tv/' \
        -H 'sec-ch-ua: "Chromium";v="127", "Not)A;Brand";v="99"' \
        -H 'sec-ch-ua-mobile: ?0' \
        -H 'sec-ch-ua-platform: "Linux"' \
        -H 'sec-fetch-dest: empty' \
        -H 'sec-fetch-mode: cors' \
        -H 'sec-fetch-site: same-site' \
        -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36' \
        -H 'x-device-id: DEVICE-ID' \
        --data-raw '[{"operationName":"BrowsePage_AllDirectories","variables":{"limit":30,"options":{"recommendationsContext":{"platform":"web"},"requestID":"JIRA-VXP-2397","sort":"VIEWER_COUNT","tags":[]},"cursor":"eyJzIjoyNiwiZCI6ZmFsc2UsInQiOnRydWV9"},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"2f67f71ba89f3c0ed26a141ec00da1defecb2303595f5cda4298169549783d9e"}}}]' -s | jq
[
  {
    "errors": [
      {
        "message": "failed integrity check",
        "path": [
          "directoriesWithTags"
        ]
      }
    ],
    "data": {
      "directoriesWithTags": null
    },
    "extensions": {
      "challenge": {
        "type": "integrity"
      },
      "durationMilliseconds": 3,
      "operationName": "BrowsePage_AllDirectories",
      "requestID": "..."
    }
  }
]

So if you say that you were able to get a valid token (headless or not), then show me.

$ streamlink --twitch-force-client-integrity --twitch-purge-client-integrity twitch.tv/CHANNEL
$ jq -r \
  '.["twitch:client-integrity"].value | "-H \"x-device-id: " + .[0] + "\" -H \"client-integrity: " + .[1] + "\""' \
  "${XDG_CACHE_HOME:-~/.cache}/streamlink/plugin-cache.json"

@Hakkin
Copy link
Contributor

Hakkin commented Aug 5, 2024

Very odd that it fails for you, I just tried it again using your curl command and my tokens work.
(the overridden plugin is just forcing headless)

[hakkin@hakkin-pc streamlink]$ streamlink --twitch-force-client-integrity --twitch-purge-client-integrity twitch.tv/CHANNEL
[session][info] Plugin twitch is being overridden by /home/hakkin/.local/share/streamlink/plugins/twitch.py (sha256:c237f9b76b80a9e072c2886f5f539c5fc6a314add1c12aa6016ae2f141d98884)
[cli][info] Found matching plugin twitch for URL twitch.tv/CHANNEL
[plugins.twitch][info] Removing cached client-integrity token...
[plugins.twitch][info] Acquiring new client-integrity token...
[webbrowser.webbrowser][info] Launching web browser: /usr/bin/chromium
error: No playable streams found on this URL: twitch.tv/CHANNEL
[hakkin@hakkin-pc streamlink]$ jq -r   '.["twitch:client-integrity"].value | "-H \"x-device-id: " + .[0] + "\" -H \"client-integrity: " + .[1] + "\""'   ~/.cache/streamlink/plugin-cache.json
-H "x-device-id: viVu5QXUCqUzi50xG2Be4K9HZjLYSLqo" -H "client-integrity: v4.local.ivd8Vq6gg5_Um-43xgGiu9CDCxGYkFttC1l_2MOdezxymX_I_AxHOyiCzZ-plPb6wnDhWaiexHzJ1W2mgFITDTktX1cBzfdJX5blVVt8JS7dgDAwzzw7Sui6bk6qLebS-PBA6rKN4JsIkKHkDqwUFs2DqQbkvmyWbGchTaTjGMiQRJFke3wTzvC_Wv95IYw8K2xn-JRBpNPHRTq9pGHee4Kh2KY0pkfe3rKVjl96BTVMUeYT41F8rwGJ60lxrWrpGPEpvmMLurQkIREgrNdJ7TM49UUygAeNXBelLhF8wgeoJN25cljEgj6tW3Qq0gMwGc6nqubgtNj-8KZ-NhJdnqaLmmovjpbOmb05kfodRgizgU23aqo0ILv2HqMHL7N1esLzf23-hFk1Barn9jT82MoYlTrWJDtLxNpSJf2_4Zd2jP8vWeIEr1rf0oos3PBxpA"
[hakkin@hakkin-pc streamlink]$ curl 'https://gql.twitch.tv/gql' \
        -H 'accept: */*' \
        -H 'accept-language: en-US' \
        -H 'cache-control: no-cache' \
        -H 'client-id: kimne78kx3ncx6brgo4mv6wki5h1ko' \
        -H 'client-version: 2b8a9a7a-d732-4442-9fc4-124d9748666d' \
        -H 'content-type: text/plain;charset=UTF-8' \
        -H 'dnt: 1' \
        -H 'origin: https://www.twitch.tv' \
        -H 'pragma: no-cache' \
        -H 'priority: u=1, i' \
        -H 'referer: https://www.twitch.tv/' \
        -H 'sec-ch-ua: "Chromium";v="127", "Not)A;Brand";v="99"' \
        -H 'sec-ch-ua-mobile: ?0' \
        -H 'sec-ch-ua-platform: "Linux"' \
        -H 'sec-fetch-dest: empty' \
        -H 'sec-fetch-mode: cors' \
        -H 'sec-fetch-site: same-site' \
        -H 'user-agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36' \
        -H "x-device-id: viVu5QXUCqUzi50xG2Be4K9HZjLYSLqo" -H "client-integrity: v4.local.ivd8Vq6gg5_Um-43xgGiu9CDCxGYkFttC1l_2MOdezxymX_I_AxHOyiCzZ-plPb6wnDhWaiexHzJ1W2mgFITDTktX1cBzfdJX5blVVt8JS7dgDAwzzw7Sui6bk6qLebS-PBA6rKN4JsIkKHkDqwUFs2DqQbkvmyWbGchTaTjGMiQRJFke3wTzvC_Wv95IYw8K2xn-JRBpNPHRTq9pGHee4Kh2KY0pkfe3rKVjl96BTVMUeYT41F8rwGJ60lxrWrpGPEpvmMLurQkIREgrNdJ7TM49UUygAeNXBelLhF8wgeoJN25cljEgj6tW3Qq0gMwGc6nqubgtNj-8KZ-NhJdnqaLmmovjpbOmb05kfodRgizgU23aqo0ILv2HqMHL7N1esLzf23-hFk1Barn9jT82MoYlTrWJDtLxNpSJf2_4Zd2jP8vWeIEr1rf0oos3PBxpA" \
        --data-raw '[{"operationName":"BrowsePage_AllDirectories","variables":{"limit":30,"options":{"recommendationsContext":{"platform":"web"},"requestID":"JIRA-VXP-2397","sort":"VIEWER_COUNT","tags":[]},"cursor":"eyJzIjoyNiwiZCI6ZmFsc2UsInQiOnRydWV9"},"extensions":{"persistedQuery":{"version":1,"sha256Hash":"2f67f71ba89f3c0ed26a141ec00da1defecb2303595f5cda4298169549783d9e"}}}]' -s | jq
[
  {
    "data": {
      "directoriesWithTags": {
        "edges": [
          {
            "cursor": "eyJzIjoyNywiZCI6ZmFsc2UsInQiOnRydWV9",
            "trackingID": null,
            "node": {
              "id": "143106037",
              "slug": "ea-sports-fc-24",
              "displayName": "EA Sports FC 24",
              "name": "EA Sports FC 24",
              "avatarURL": "https://static-cdn.jtvnw.net/ttv-boxart/143106037_IGDB-285x380.jpg",
              "viewersCount": 9970,
              "tags": [
                {
                  "id": "22e434b6-ca88-46e8-91ef-c18ee1cb8a67",
                  "isLanguageTag": false,
                  "localizedName": "Simulation",
                  "tagName": "Simulation",
                  "__typename": "Tag"
                },
                {
                  "id": "0d4233af-7ac6-49da-937d-e0f42b7db187",
                  "isLanguageTag": false,
                  "localizedName": "Sports Game",
                  "tagName": "Sports Game",
                  "__typename": "Tag"
                }
              ],
              "originalReleaseDate": "2023-09-29T00:00:00Z",
              "__typename": "Game"
            },
            "__typename": "GameEdge"
          },

...

@Hakkin
Copy link
Contributor

Hakkin commented Aug 5, 2024

I forgot to mention, I had to edit your jq command a little, yours was including an extra " in the client-integrity header. I don't know if you were actually using that command to generate the curl headers, but if so, that might be why it's failing for you.

@bastimeyer
Copy link
Member Author

I copied it from the file manually... Just posted the command afterwards. Going to fix my comment now... Still, the CI tokens don't work, headless or not.

@Hakkin
Copy link
Contributor

Hakkin commented Aug 5, 2024

If even non-headless is giving you invalid tokens then it's something outside of this pull request failing...if you browse to Twitch on a normal browser window with fresh Chromium profile, do you get valid tokens?

@bastimeyer
Copy link
Member Author

do you get valid tokens

Yes of course...

So I checked again after a few hours, and now I'm getting valid tokens in both modes. Weird...

I think this was caused by copying the curl command from Chromium's dev tools when I was logged in on Twitch and me forgetting to remove the authorization header or the session id header, so they flagged the session id or my IP address when I replaced the CI token and device ID headers. Not sure, could be wrong, but that's the best explanation I have.

I'm still not confident with forcing the headless mode on Twitch though. It's important that the plugin should always work, even if Twitch makes CI tokens mandatory again and headless mode gets detected by them (again). I think setting the default value of --webbrowser-headless to True was a mistake. We should change that, add a hint to the webbrowser launch log message that this behavior can always be changed, and then remove the headless=... override from the Twitch plugin. Then users will be made aware that the browser window can be suppressed, nothing is forced in any plugin and it will work despite JS code detecting headless Chromium. The only annoyance is that users who are unaware and won't read will be confused by this.

Either way, this is all unrelated to this PR. I'm going to merge this now, as it's a better implementation than launching Chromium with the --user-agent=... argument.

@bastimeyer bastimeyer merged commit 88f1715 into streamlink:master Aug 5, 2024
23 checks passed
@bastimeyer bastimeyer deleted the webbrowser/cdp/client/update-ua-in-headless branch August 5, 2024 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants