Skip to content

cache: rewrite and improve cache file I/O #6568

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 26, 2025

Conversation

bastimeyer
Copy link
Member

  • Schedule cache file writes instead of writing immediately when pruning data on get() or when setting/pruning data on set():
    Use a write debounce timer of 3s and also add an atexit callback
  • Keep a dirty state and prevent unnecessary writes
  • Don't ignore cache file loading errors and instead log a warning msg
  • Always set cache file encoding to UTF-8 when reading or writing
  • Refactor cache file writing logic, use higher level stdlib APIs, and log write errors instead of raising some of them

Ref #6501

This doesn't fix the data structure issues pointed out in #6501, but it should already be an improvement.

One thing I will have to take a look at before merging (when I get the time) is checking on Windows in regards to the file encoding. Previously, no file encoding was set when reading or writing, which means on Windows it defaults to iso-8859-15 or whatever else the system is using. JSON data is always encoded as UTF-8, but I don't know how that affects the overall file encoding, so this might break loading existing cache files.


$ streamlink -l trace --twitch-force-client-integrity --twitch-purge-client-integrity twitch.tv/arteezy
[02:13:07.830701][MainThread][cli][debug] OS:         Linux-6.15.2-1-git-x86_64-with-glibc2.41
[02:13:07.830847][MainThread][cli][debug] Python:     3.13.3
[02:13:07.830919][MainThread][cli][debug] OpenSSL:    OpenSSL 3.5.0 8 Apr 2025
[02:13:07.830960][MainThread][cli][debug] Streamlink: 7.4.0+14.g0fc8a060
[02:13:07.830997][MainThread][cli][debug] Dependencies:
[02:13:07.832085][MainThread][cli][debug]  certifi: 2025.6.15
[02:13:07.832695][MainThread][cli][debug]  isodate: 0.7.2
[02:13:07.833013][MainThread][cli][debug]  lxml: 5.4.0
[02:13:07.833437][MainThread][cli][debug]  pycountry: 24.6.1
[02:13:07.833707][MainThread][cli][debug]  pycryptodome: 3.23.0
[02:13:07.834062][MainThread][cli][debug]  PySocks: 1.7.1
[02:13:07.834381][MainThread][cli][debug]  requests: 2.32.4
[02:13:07.834714][MainThread][cli][debug]  trio: 0.30.0
[02:13:07.835005][MainThread][cli][debug]  trio-websocket: 0.12.2
[02:13:07.835311][MainThread][cli][debug]  urllib3: 2.5.0
[02:13:07.835626][MainThread][cli][debug]  websocket-client: 1.8.0
[02:13:07.835743][MainThread][cli][debug] Arguments:
[02:13:07.835796][MainThread][cli][debug]  url=twitch.tv/arteezy
[02:13:07.835845][MainThread][cli][debug]  --loglevel=trace
[02:13:07.835900][MainThread][cli][debug]  --player=mpv
[02:13:07.835940][MainThread][cli][debug]  --player-args=--cache=yes --demuxer-max-back-bytes=2G
[02:13:07.835988][MainThread][cli][debug]  --stream-segment-threads=2
[02:13:07.836026][MainThread][cli][debug]  --hls-live-edge=2
[02:13:07.836074][MainThread][cli][debug]  --webbrowser-headless=True
[02:13:07.836129][MainThread][cli][debug]  --twitch-disable-ads=True
[02:13:07.836169][MainThread][cli][debug]  --twitch-low-latency=True
[02:13:07.836207][MainThread][cli][debug]  --twitch-api-header=[('Authorization', 'OAuth ...')]
[02:13:07.836245][MainThread][cli][debug]  --twitch-force-client-integrity=True
[02:13:07.836280][MainThread][cli][debug]  --twitch-purge-client-integrity=True
[02:13:07.836350][MainThread][cli][info] Found matching plugin twitch for URL twitch.tv/arteezy
[02:13:07.836479][MainThread][cache][trace] Loading cache file: /home/basti/.cache/streamlink/plugin-cache.json
[02:13:07.836642][MainThread][plugins.twitch][debug] Getting live HLS streams for arteezy
[02:13:07.836698][MainThread][plugins.twitch][info] Removing cached client-integrity token...
[02:13:07.836747][MainThread][cache][trace] Scheduling write to cache file: 3.0s
[02:13:07.836915][MainThread][cache][trace] Scheduling write to cache file: 3.0s
[02:13:07.837093][MainThread][plugins.twitch][info] Acquiring new client-integrity token...
[02:13:07.933077][MainThread][webbrowser.webbrowser][info] Launching web browser: /usr/bin/chromium (headless=True)
[02:13:10.485982][MainThread][webbrowser.webbrowser][debug] Waiting for web browser process to terminate
[02:13:10.837203][CacheSaveThread][cache][trace] Writing to cache file: /home/basti/.cache/streamlink/plugin-cache.json
[02:13:10.994751][MainThread][cache][trace] Scheduling write to cache file: 3.0s
[02:13:11.350206][MainThread][plugins.twitch][debug] {'adblock': False, 'geoblock_reason': '', 'hide_ads': False, 'server_ads': True, 'show_ads': True}
[02:13:11.373451][MainThread][utils.l10n][debug] Language code: en_US
Available streams: audio_only, 160p (worst), 360p, 480p, 720p60, 1080p60 (best)
[02:13:12.205726][MainThread][cache][trace] Writing to cache file: /home/basti/.cache/streamlink/plugin-cache.json
$ streamlink -l trace --twitch-force-client-integrity --twitch-purge-client-integrity twitch.tv/arteezy best
[02:14:06.008879][MainThread][cli][debug] OS:         Linux-6.15.2-1-git-x86_64-with-glibc2.41
[02:14:06.009000][MainThread][cli][debug] Python:     3.13.3
[02:14:06.009057][MainThread][cli][debug] OpenSSL:    OpenSSL 3.5.0 8 Apr 2025
[02:14:06.009101][MainThread][cli][debug] Streamlink: 7.4.0+14.g0fc8a060
[02:14:06.009142][MainThread][cli][debug] Dependencies:
[02:14:06.010239][MainThread][cli][debug]  certifi: 2025.6.15
[02:14:06.010830][MainThread][cli][debug]  isodate: 0.7.2
[02:14:06.011130][MainThread][cli][debug]  lxml: 5.4.0
[02:14:06.011538][MainThread][cli][debug]  pycountry: 24.6.1
[02:14:06.011800][MainThread][cli][debug]  pycryptodome: 3.23.0
[02:14:06.012140][MainThread][cli][debug]  PySocks: 1.7.1
[02:14:06.012451][MainThread][cli][debug]  requests: 2.32.4
[02:14:06.012768][MainThread][cli][debug]  trio: 0.30.0
[02:14:06.013045][MainThread][cli][debug]  trio-websocket: 0.12.2
[02:14:06.013339][MainThread][cli][debug]  urllib3: 2.5.0
[02:14:06.013648][MainThread][cli][debug]  websocket-client: 1.8.0
[02:14:06.013767][MainThread][cli][debug] Arguments:
[02:14:06.013816][MainThread][cli][debug]  url=twitch.tv/arteezy
[02:14:06.013855][MainThread][cli][debug]  stream=['best']
[02:14:06.013901][MainThread][cli][debug]  --loglevel=trace
[02:14:06.013954][MainThread][cli][debug]  --player=mpv
[02:14:06.013993][MainThread][cli][debug]  --player-args=--cache=yes --demuxer-max-back-bytes=2G
[02:14:06.014040][MainThread][cli][debug]  --stream-segment-threads=2
[02:14:06.014077][MainThread][cli][debug]  --hls-live-edge=2
[02:14:06.014127][MainThread][cli][debug]  --webbrowser-headless=True
[02:14:06.014172][MainThread][cli][debug]  --twitch-disable-ads=True
[02:14:06.014210][MainThread][cli][debug]  --twitch-low-latency=True
[02:14:06.014247][MainThread][cli][debug]  --twitch-api-header=[('Authorization', 'OAuth ...')]
[02:14:06.014284][MainThread][cli][debug]  --twitch-force-client-integrity=True
[02:14:06.014319][MainThread][cli][debug]  --twitch-purge-client-integrity=True
[02:14:06.014390][MainThread][cli][info] Found matching plugin twitch for URL twitch.tv/arteezy
[02:14:06.014519][MainThread][cache][trace] Loading cache file: /home/basti/.cache/streamlink/plugin-cache.json
[02:14:06.014680][MainThread][plugins.twitch][debug] Getting live HLS streams for arteezy
[02:14:06.014736][MainThread][plugins.twitch][info] Removing cached client-integrity token...
[02:14:06.014788][MainThread][cache][trace] Scheduling write to cache file: 3.0s
[02:14:06.014947][MainThread][cache][trace] Scheduling write to cache file: 3.0s
[02:14:06.015129][MainThread][plugins.twitch][info] Acquiring new client-integrity token...
[02:14:06.111088][MainThread][webbrowser.webbrowser][info] Launching web browser: /usr/bin/chromium (headless=True)
[02:14:08.662786][MainThread][webbrowser.webbrowser][debug] Waiting for web browser process to terminate
[02:14:09.015220][CacheSaveThread][cache][trace] Writing to cache file: /home/basti/.cache/streamlink/plugin-cache.json
[02:14:09.171841][MainThread][cache][trace] Scheduling write to cache file: 3.0s
[02:14:09.600178][MainThread][plugins.twitch][debug] {'adblock': False, 'geoblock_reason': '', 'hide_ads': False, 'server_ads': True, 'show_ads': True}
[02:14:09.624081][MainThread][utils.l10n][debug] Language code: en_US
[02:14:10.545995][MainThread][cli][info] Available streams: audio_only, 160p (worst), 360p, 480p, 720p60, 1080p60 (best)
[02:14:10.546096][MainThread][cli][info] Opening stream: 1080p60 (hls)
[02:14:10.546143][MainThread][cli][info] Starting player: mpv
[02:14:10.546208][MainThread][plugins.twitch][info] Will skip ad segments
[02:14:10.546256][MainThread][plugins.twitch][info] Low latency streaming (HLS live edge: 2)
[02:14:10.546664][TwitchHLSStreamWorker-0][stream.hls][debug] Reloading playlist
[02:14:10.546767][MainThread][cli][debug] Pre-buffering 8192 bytes
[02:14:10.585478][TwitchHLSStreamWorker-0][plugins.twitch][info] Waiting for pre-roll ads to finish, be patient
[02:14:10.585548][TwitchHLSStreamWorker-0][plugins.twitch][info] Detected advertisement break of 15 seconds
[02:14:10.585615][TwitchHLSStreamWorker-0][stream.hls][debug] First Sequence: 0; Last Sequence: 4
[02:14:10.585660][TwitchHLSStreamWorker-0][stream.hls][debug] Start offset: 0; Duration: None; Start Sequence: 3; End Sequence: None
[02:14:10.585699][TwitchHLSStreamWorker-0][stream.hls][debug] Adding segment 3 to queue
[02:14:10.586007][TwitchHLSStreamWorker-0][stream.hls][debug] Adding segment 4 to queue
[02:14:10.726419][TwitchHLSStreamWriter-0][stream.hls][debug] Discarding segment 3
[02:14:10.877886][TwitchHLSStreamWriter-0][stream.hls][info] Filtering out segments and pausing stream output
[02:14:10.878015][TwitchHLSStreamWriter-0][stream.hls][debug] Discarding segment 4
[02:14:12.172124][CacheSaveThread][cache][trace] Writing to cache file: /home/basti/.cache/streamlink/plugin-cache.json
[02:14:12.546767][TwitchHLSStreamWorker-0][stream.hls][debug] Reloading playlist
[02:14:12.578794][TwitchHLSStreamWorker-0][stream.hls][debug] Adding segment 5 to queue
[02:14:12.604738][TwitchHLSStreamWriter-0][stream.hls][debug] Discarding segment 5
[02:14:14.546726][TwitchHLSStreamWorker-0][stream.hls][debug] Reloading playlist
[02:14:14.682740][TwitchHLSStreamWorker-0][stream.hls][debug] Adding segment 6 to queue
[02:14:14.708444][TwitchHLSStreamWriter-0][stream.hls][debug] Discarding segment 6
^CInterrupted! Exiting...
[02:14:15.194276][MainThread][cli][info] Closing currently open stream...
[02:14:15.194380][MainThread][stream.segmented][debug] Closing worker thread
[02:14:15.194438][MainThread][stream.segmented][debug] Closing writer thread

@bastimeyer

This comment was marked as outdated.

@bastimeyer bastimeyer marked this pull request as draft June 20, 2025 00:19
@bastimeyer bastimeyer force-pushed the cache/io-and-atomicity branch 2 times, most recently from 5443780 to 1114b7e Compare June 20, 2025 12:23
- Schedule cache file writes instead of writing immediately
  when pruning data on get() or when setting/pruning data on set():
  Use a write debounce timer of 3s and also add an `atexit` callback
- Keep a dirty state and prevent unnecessary writes
- Don't ignore cache file loading errors and instead log a warning msg
- Always set cache file encoding to UTF-8 when reading or writing
- Refactor cache file writing logic, use higher level stdlib APIs,
  and log write errors instead of raising some of them
@bastimeyer bastimeyer force-pushed the cache/io-and-atomicity branch from 1114b7e to 16c190f Compare June 26, 2025 22:06
@bastimeyer bastimeyer marked this pull request as ready for review June 26, 2025 22:07
@bastimeyer bastimeyer merged commit dca5a37 into streamlink:master Jun 26, 2025
23 checks passed
@bastimeyer bastimeyer deleted the cache/io-and-atomicity branch June 26, 2025 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant