Skip to content

session: implement lazy plugins loading #5822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 14, 2024

Conversation

bastimeyer
Copy link
Member

@bastimeyer bastimeyer commented Feb 11, 2024

  • Add plugins_lazy keyword to Streamlink session class
  • Load pre-built plugins JSON data including matchers and arguments data
  • Fall back to loading all built-in plugins if loading JSON data fails
  • Iterate matcher/argument data of loaded and still unloaded plugins
  • Load plugin modules of unloaded plugins when matching URLs are found
  • Refactor utils.modules and add get_finder() function
  • Change --plugins output from "Loaded plugins" to "Available plugins"
  • Add tests
  • Move matcher priority tests from test_session to test_plugins

Resolves #4741

This currently includes the commits of #5793, as explained there...


Loading plugins lazily saves a couple of MiBs of memory and makes resolving plugins a bit faster. It's not that much faster than I was hoping for, but it's an improvement nonetheless.

Instead of loading all module files of Streamlink's built-in plugins into memory at once, the pre-built plugins JSON data (#5793) is loaded and Matchers/Arguments objects are created from that and used in place of the actual objects from the plugin modules when resolving URLs.

The plugins data loading implementation always compares the checksum of the _plugins.json file (stored in the package RECORDS metadata) before evaluating it, so users can't easily modify it. I think this is better than applying a validation schema to the JSON data and unnecessarily slowing down the session initialization.

What I don't like though is that the match_url() debug log message when loading the resolved plugin is written to the output before the environment/package/config debug messages. This is because of the CLI implementation which wants to load all plugin config files first, which is only possible after resolving the plugin from the input URL.

Sideloading plugins still works the same way as before, where sideloaded plugins override built-in ones, even when those built-in ones aren't loaded yet.


Alternatively, the lazy plugins loading could've been implemented using Python's pickle module, which would probably lead to better loading times, but this would come at the cost of not being able to read the plugins data, unlike JSON.


Some CPU time and max-memory comparisons between 6.5.1 and this branch using GNU Time:

$ python -m venv ~/venv/{old,new}
$ ~/venv/old/bin/python -m pip -q install streamlink==6.5.1
$ ~/venv/new/bin/python -m pip -q install git+https://github.com/streamlink/streamlink@refs/pull/5822/head

6.5.1

$ /usr/bin/time -f '%E %M' ~/venv/old/bin/python -m streamlink --plugins
Loaded plugins: abematv, adultswim, afreeca, albavision, aloula, app17, ard_live, ard_mediathek, artetv, atpchallenger, atresplayer, bbciplayer, bfmtv, bigo, bilibili, blazetv, bloomberg, booyah, brightcove, btv, cbsnews, cdnbg, ceskatelevize, cinergroup, clubbingtv, cmmedia, cnews, crunchyroll, dailymotion, dash, delfi, deutschewelle, dlive, dogan, dogus, drdk, earthcam, euronews, facebook, filmon, foxtr, galatasaraytv, goltelevision, goodgame, googledrive, gulli, hiplayer, hls, http, htv, huajiao, huya, idf1, indihometv, invintus, kugou, linelive, livestream, lnk, lrt, ltv_lsm_lv, mdstrm, mediaklikk, mediavitrina, mildom, mitele, mixcloud, mjunoon, mrtmk, n13tv, nasaplus, nhkworld, nicolive, nimotv, nos, nownews, nrk, okru, olympicchannel, oneplusone, onetv, openrectv, pandalive, piaulizaportal, picarto, piczel, pixiv, pluto, pluzz, qq, radiko, radionet, raiplay, reuters, rtbf, rtpa, rtpplay, rtve, rtvs, ruv, sbscokr, showroom, sportal, sportschau, ssh101, stadium, steam, streamable, streann, stv, svtplay, swisstxt, telefe, telemadrid, tf1, trovo, turkuvaz, tv360, tv3cat, tv4play, tv5monde, tv8, tv999, tvibo, tviplayer, tvp, tvrby, tvrplus, tvtoya, twitcasting, twitch, ustreamtv, ustvnow, vidio, vimeo, vinhlongtv, vk, vkplay, vtvgo, webtv, welt, wwenetwork, youtube, yupptv, zattoo, zdf_mediathek, zeenews, zengatv, zhanqi
0:00.18 41472

$ /usr/bin/time -f '%E %M' ~/venv/old/bin/python -m streamlink twitch.tv/gorgc
[cli][info] Found matching plugin twitch for URL twitch.tv/gorgc
Available streams: audio_only, 160p (worst), 360p, 480p, 720p60, 1080p60 (best)
0:01.32 53436

PR

$ /usr/bin/time -f '%E %M' ~/venv/new/bin/python -m streamlink --plugins
Available plugins: abematv, adultswim, afreeca, albavision, aloula, app17, ard_live, ard_mediathek, artetv, atpchallenger, atresplayer, bbciplayer, bfmtv, bigo, bilibili, blazetv, bloomberg, booyah, brightcove, btv, cbsnews, cdnbg, ceskatelevize, cinergroup, clubbingtv, cmmedia, cnews, crunchyroll, dailymotion, dash, delfi, deutschewelle, dlive, dogan, dogus, drdk, earthcam, euronews, facebook, filmon, foxtr, galatasaraytv, goltelevision, goodgame, googledrive, gulli, hiplayer, hls, http, htv, huajiao, huya, idf1, indihometv, invintus, kugou, linelive, livestream, lnk, lrt, ltv_lsm_lv, mdstrm, mediaklikk, mediavitrina, mildom, mitele, mixcloud, mjunoon, mrtmk, n13tv, nasaplus, nhkworld, nicolive, nimotv, nos, nownews, nrk, okru, olympicchannel, oneplusone, onetv, openrectv, pandalive, piaulizaportal, picarto, piczel, pixiv, pluto, pluzz, radiko, radionet, raiplay, reuters, rtpa, rtpplay, rtve, rtvs, ruv, sbscokr, showroom, sportal, sportschau, ssh101, stadium, steam, streamable, streann, stv, svtplay, swisstxt, telefe, telemadrid, tf1, trovo, turkuvaz, tv360, tv3cat, tv4play, tv5monde, tv8, tv999, tvibo, tviplayer, tvp, tvrby, tvrplus, tvtoya, twitcasting, twitch, ustreamtv, ustvnow, vidio, vimeo, vinhlongtv, vk, vkplay, vtvgo, webtv, welt, wwenetwork, youtube, yupptv, zattoo, zdf_mediathek, zeenews, zengatv, zhanqi
0:00.16 38144

$ /usr/bin/time -f '%E %M' ~/venv/new/bin/python -m streamlink twitch.tv/gorgc
[cli][info] Found matching plugin twitch for URL twitch.tv/gorgc
Available streams: audio_only, 160p (worst), 360p, 480p, 720p60, 1080p60 (best)
0:01.30 50348

@bastimeyer bastimeyer force-pushed the session/plugins/lazy branch 2 times, most recently from 8c5da03 to 5e2150a Compare February 11, 2024 16:01
@gravyboat
Copy link
Member

Reviewed this last night but had to put some thought in to your points:

The plugins data loading implementation always compares the checksum of the _plugins.json file (stored in the package RECORDS metadata) before evaluating it, so users can't easily modify it. I think this is better than applying a validation schema to the JSON data and unnecessarily slowing down the session initialization.

I'm fine with this.

What I don't like though is that the match_url() debug log message when loading the resolved plugin is written to the output before the environment/package/config debug messages. This is because of the CLI implementation which wants to load all plugin config files first, which is only possible after resolving the plugin from the input URL.

Is this really that big a deal? I know it is a bit annoying but are there cases where we'll be somehow missing debug data that is written later or that will result in more difficult debugging because of this?

Alternatively, the lazy plugins loading could've been implemented using Python's pickle module, which would probably lead to better loading times, but this would come at the cost of not being able to read the plugins data, unlike JSON.

Let's avoid pickle, load times are good enough and being able to read the plugin data is useful.

I won't mark as approved in case there is anything else you want the check, but the changes seem fine to me.

@bastimeyer bastimeyer force-pushed the session/plugins/lazy branch 2 times, most recently from 840c433 to cc0a276 Compare February 14, 2024 12:21
- Add `plugins_lazy` keyword to `Streamlink` session class
- Load pre-built plugins JSON data including matchers and arguments data
- Fall back to loading all built-in plugins if loading JSON data fails
- Iterate matcher/argument data of loaded and still unloaded plugins
- Load plugin modules of unloaded plugins when matching URLs are found
- Refactor `utils.modules` and add `get_finder()` function
- Change `--plugins` output from "Loaded plugins" to "Available plugins"
- Add tests
- Move matcher priority tests from `test_session` to `test_plugins`
@bastimeyer bastimeyer merged commit 6a551f6 into streamlink:master Feb 14, 2024
@bastimeyer bastimeyer deleted the session/plugins/lazy branch February 14, 2024 12:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

session: don't load all built-in plugins at once
2 participants