-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add FilteredHLSStream #3187
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add FilteredHLSStream #3187
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3187 +/- ##
==========================================
+ Coverage 52.60% 52.72% +0.11%
==========================================
Files 243 244 +1
Lines 15606 15642 +36
==========================================
+ Hits 8210 8247 +37
+ Misses 7396 7395 -1 |
d80b1a8
to
1dfdd8f
Compare
I've added info log messages when filtering starts and ends and changed the initial event lock state, as it's unnecessary to have it locked initially. Even though tests still need to be written/finished, I'd really appreciate a review, because the filtering is working fine in production. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fine to me @bastimeyer. Should probably get a second review in case I missed something, but I don't see anything I'm concerned about.
Well, I was hoping to get some feedback rather than a plain approval. 😞 Did you run the code changes with the suggested examples? I'm struggling a bit writing tests for this where I can advance the writer's write calls and the reader's read calls step by step from the main thread, so I can test the IOError handling and thread events/locks with full coverage. I want it to be a real integration test with a full HLS stream, but unfortunately my Python expertise isn't that great regarding threading and I'm running into some timing issues here. I really want to get this done as soon as possible, so that we can publish a new release with an embedded ads workaround and a working filtering that doesn't time out, if Twitch decides to block the workaround again. I don't want to deal with questions about the embedded ads on Twitch anymore. This is extremely annoying. If I haven't made any progress on this until tomorrow, I'll open a PR with a changelog for a new release. If you want, we could also do it today and ignore the filtering timeout issue. |
1dfdd8f
to
7656ba1
Compare
@bastimeyer Sorry for the lack of clarity, yes I did run it and I used your example as well as running it with I'm not quite sure how to handle writing the tests for this either as I've always struggled with python threading. Timing issues are hard to solve and I haven't been writing enough python recently to provide the knowledge we need. @back-to Any insight or suggestions on how to handle this? I agree with your annoyance. I don't want to deal with this again either and I'm sick of it. My issue with doing a release before handling the filtering timeout issue is that you and I both know people are going to start complaining about the filtering timeout issue. If we can't come up with a good way to test this I'd rather get this in without tests and deal with that not working for people than listen to complaints about filtering timeout. |
7656ba1
to
ae88884
Compare
OK, I've managed to write the tests. Not pretty and lots of verbose setup code, and I'm also pretty sure that this could probably be done way easier, but whatever. This is still unfinished.
Will try to fix the remaining issues tomorrow. |
I just looked the tests over, they're a bit messy as you noted but decent enough in structure that I think we can consider this acceptable. If someone comes up with a better idea they can propose a commit on to your branch, but if not this will do. I'll check in on this issue around 23:00 your time tomorrow when I take my lunch break and if no one else has weighed in by then let's merge this. |
ae88884
to
394fd41
Compare
Switching from threading.Condition to threading.Event made it work in py2 now and the one race condition with the workaround now doesn't exist anymore. The only issue I've noticed while running the tests is the requests_mocker, which doesn't seem to properly intercept all network requests to Overall, I think I'm satisfied with it now. I should have tested everything and the coverage is 100%. Let's see if @back-to wants to review this, otherwise we can merge this later today / tomorrow. Btw, if you're wondering why it's messy, hooking into the right methods of the HLS classes is a bit difficult and the other HLS tests are nowhere near as detailed / "thorough". I have annotated more lines in the latest push, and since every test assertion has a message, I don't think it's too bad. |
@bastimeyer My comment wasn't meant as a criticism merely trying to note that your concern over them not being pretty wasn't a big deal. I just looked over the tests again and they look good to me. The additional comments are also helpful. |
Adds subclasses of the HLSStreamReader and HLSStreamWriter for filtering out sequences and ignoring the read timeout while filtering.
and filter out advertisement sequences
394fd41
to
61a95a2
Compare
Added one final refinement, so that the default And the requests_mock issue I was talking about when generating a coverage report is caused by a different test (tested it on master). That's probably the Twitch plugin tests, which I will try to rewrite and simplify later at some point, not sure when. That's the error, btw:
|
@bastimeyer I just looked over the tests and am happy with them. Do you still want another review? If not let's merge this. |
If we're going to publish a new release today or tomorrow, then you can go ahead and merge if you want. Don't squash the commits please. |
@bastimeyer Done. I can check the changelog PR after work. |
@gravyboat There's a problem with a race condition in one of the tests which I didn't spot... And now two commits on the master branch have failed already: That's a bit unfortunate, because I let my tests run for at least 500 times in both py3 and py2 because I wasn't too sure as I've said. I guess my CPU is too fast for the issue to occur, so I didn't notice it. Let me see if I can fix this quickly. |
@bastimeyer It's not ideal but you could put some sleeps in just to ensure the race conditions don't happen. It's dirty but if you can't get it working it's an option. |
|
||
thread.await_write() | ||
thread.await_write() | ||
self.assertFalse(reader.filter_event.is_set(), "Lets the reader wait if filtering") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commenting here, so that it's clear where the issue is.
Adding sleeps is a last resort solution, and I don't really want this there.
I don't understand the issue if I'm being honest. The writer and main thread should both be synced properly after the await_write() call by the main thread, and the writer thread should have cleared the reader event, meaning that filter_event.is_set() should be false at this point.
The only thing that could go wrong here is the patched should_filter_sequence method, which could have returned a False value and altered the intended code path. And it's confusing to me that this has only happened in py2 so far.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test failure has started to happen after my last change with the patch decorator (if we've not been unlucky with successful runs prior to that). Is it possible that it gets replaced lazily and too late in some cases?
Why has the test_filtered_closed
test not failed yet? It's basically the same setup at the beginning of both tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see how we could have been that unlucky with previous successful runs. Lazy replacement is possible. I just ran the tests in my local (Windows) environment and the new twitch tests passed perfectly fine. It seems pretty suspicious as well that this issue is limited to the CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've run this specific test on my test-account's fork 100 times and it didn't fail:
https://github.com/bastimeyer-test/streamlink-test/actions/runs/266135453
Running all tests 100 times however did fail eventually (with failfast enabled to cancel the other runners):
https://github.com/bastimeyer-test/streamlink-test/actions/runs/266138718
I will revert the last change now and then run it again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's something else wrong with the await_write() method or the general thread syncing method that's being used, because it now failed at a different place:
https://github.com/bastimeyer-test/streamlink-test/runs/1147228622?check_suite_focus=true#step:5:1246
that's annoying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some more debugging by adding detailed log messages and found the source of the issue (but not the cause yet).
In the test_filtered_no_timeout
test for example, the two segments of the first playlist are supposed to be filtered out and the two segments of the second playlist are supposed to be written. This means that when trying to write the first two segments, it should ignore them and lock the reader thread. Sometimes this isn't done though, because the writer already starts with the third sequence, the first segment of the second playlist. Since this segment should not get filtered out, the lock in the reader thread is of course not set and the test fails.
So the question is, why does the writer start writing the third segment sometimes? The worker, which is putting / queuing up segments into the writer is for some reason starting with the second playlist.
According to the log output, this isn't the fault of the worker, so it must be requests_mock
's fault:
https://github.com/bastimeyer-test/streamlink-test/runs/1147915591?check_suite_focus=true#step:5:639
I've already tried moving the requests_mock setup from the reader thread where it currently is to the main thread and into the unittest.TestCase class's setUp
and tearDown
methods, but it didn't help. The playlist request responses are set up correctly:
https://requests-mock.readthedocs.io/en/latest/response.html#response-lists
Is it possible that it's interference with the other tests, most likely the Twitch plugin's tests, where the requests_mock doesn't get reset properly? That would explain the network error messages I was talking about earlier. Remember that running only the hls_filtered tests 100 times in a row by the CI runners worked fine (link above).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed some issues with stalled threads on teardown in the hls_filtered tests by rewriting them and joining all spawned threads in the main thread's tearDown method and waiting for their termination, but the issue still occurs, now much rarer though, which is interesting.
When printing the name of all running threads after a clean teardown, it looks like some ThreadPoolExecuter
segment-fetch-threads of one of the previously run stream workers are still running, even though all other tests are calling close() on their spawned streams every time. Those left over fetch-threads are bleeding into the current test run and are making the requests on the mocked URLs, which are the same between tests:
https://github.com/bastimeyer-test/streamlink-test/runs/1149311834?check_suite_focus=true#step:5:1418
If I change the mocked URLs in just the hls_filtered tests, then there are no failures 🎉🎉🎉, which means these bleeding threads are coming from other tests runs. Those are probably test_twitch.py or test_hls.py, because those are the only ones which are opening HLSStreams and reading them, the latter being more likely since it's run right before test_hls_filtered.py.
https://github.com/bastimeyer-test/streamlink-test/actions/runs/266894833 (no failed tests after 30 mins of execution - canceled the workflow after that, because I didn't want to wait any longer)
I will open a PR with fixes later after cleaning up all my debugging. This will be a fix and a workaround at the same time, because I won't touch the other "faulty" tests with their bleeding threads now. This can be done later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix/workaround (commit ID will be different than my PR):
https://github.com/bastimeyer-test/streamlink-test/commit/8be37ef6fb145ccb83ad11b51fa0e1453b5e0cdd
All tests running 50 times in a row on all CI runners (currently still running since 20 mins while I'm writing this):
https://github.com/bastimeyer-test/streamlink-test/actions/runs/266975807
It has some downsides such as `--hls-timeout`, Plugin specified FilteredHLSStream should be used instead. streamlink#3187 --- Afreeca Plugin as an Example streamlink#3408 --- Revert streamlink#1432
Closes #3178
Closes #3165
This adds the general purpose
FilteredHLSStreamWriter
andFilteredHLSStreamReader
classes which filter out HLS segments and prevent theIOError("Read timeout")
from being raised by theRingBuffer
while segments are being filtered out and the buffer not being filled / written to.I'm not sure if this is the best or even a good implementation. The problem is that I don't want to subclass the
RingBuffer
for this and don't want to modify thestreamlink_cli
logic either. This means that theIOError
needs to be caught and prevented from being raised by adding another blocking event in the reader thread while filtering is done in the writer thread. I am not sure if I have missed some edge cases with my implementation.This can be "tested" in production via
streamlink --loglevel=trace --hls-timeout=5 --twitch-disable-ads twitch.tv/CHANNEL best --stdout > /dev/null
The
--hls-timeout
value needs of course to be shorter than the stream's ads, but still high enough to not starve the read iteration in between playlist refreshes while waiting for the next segments (--twitch-low-latency
or--hls-playlist-reload-time
helps here).Also, while testing on Twitch, the current ads workaround needs to be reverted first: