Skip to content

Add FilteredHLSStream #3187

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 21, 2020

Conversation

bastimeyer
Copy link
Member

@bastimeyer bastimeyer commented Sep 19, 2020

Closes #3178
Closes #3165

This adds the general purpose FilteredHLSStreamWriter and FilteredHLSStreamReader classes which filter out HLS segments and prevent the IOError("Read timeout") from being raised by the RingBuffer while segments are being filtered out and the buffer not being filled / written to.

I'm not sure if this is the best or even a good implementation. The problem is that I don't want to subclass the RingBuffer for this and don't want to modify the streamlink_cli logic either. This means that the IOError needs to be caught and prevented from being raised by adding another blocking event in the reader thread while filtering is done in the writer thread. I am not sure if I have missed some edge cases with my implementation.

This can be "tested" in production via

streamlink --loglevel=trace --hls-timeout=5 --twitch-disable-ads twitch.tv/CHANNEL best --stdout > /dev/null

The --hls-timeout value needs of course to be shorter than the stream's ads, but still high enough to not starve the read iteration in between playlist refreshes while waiting for the next segments (--twitch-low-latency or --hls-playlist-reload-time helps here).

Also, while testing on Twitch, the current ads workaround needs to be reverted first:

diff --git a/src/streamlink/plugins/twitch.py b/src/streamlink/plugins/twitch.py
index 6bdfd452..a327d405 100644
--- a/src/streamlink/plugins/twitch.py
+++ b/src/streamlink/plugins/twitch.py
@@ -317,7 +317,7 @@ class TwitchAPI(object):
     # Private API calls
 
     def access_token(self, endpoint, asset, **params):
-        return self.call("/api/{0}/{1}/access_token".format(endpoint, asset), private=True, **dict(platform="_", **params))
+        return self.call("/api/{0}/{1}/access_token".format(endpoint, asset), private=True, **params)
 
     def hosted_channel(self, **params):
         return self.call_subdomain("tmi", "/hosts", format="", **params)

@bastimeyer bastimeyer added stream: HLS WIP Work in process labels Sep 19, 2020
@codecov
Copy link

codecov bot commented Sep 19, 2020

Codecov Report

Merging #3187 into master will increase coverage by 0.11%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #3187      +/-   ##
==========================================
+ Coverage   52.60%   52.72%   +0.11%     
==========================================
  Files         243      244       +1     
  Lines       15606    15642      +36     
==========================================
+ Hits         8210     8247      +37     
+ Misses       7396     7395       -1     

@bastimeyer bastimeyer force-pushed the plugins/twitch/filter-timeout branch from d80b1a8 to 1dfdd8f Compare September 20, 2020 16:51
@bastimeyer
Copy link
Member Author

I've added info log messages when filtering starts and ends and changed the initial event lock state, as it's unnecessary to have it locked initially.

Even though tests still need to be written/finished, I'd really appreciate a review, because the filtering is working fine in production.

Copy link
Member

@gravyboat gravyboat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me @bastimeyer. Should probably get a second review in case I missed something, but I don't see anything I'm concerned about.

@bastimeyer
Copy link
Member Author

Well, I was hoping to get some feedback rather than a plain approval. 😞 Did you run the code changes with the suggested examples? I'm struggling a bit writing tests for this where I can advance the writer's write calls and the reader's read calls step by step from the main thread, so I can test the IOError handling and thread events/locks with full coverage. I want it to be a real integration test with a full HLS stream, but unfortunately my Python expertise isn't that great regarding threading and I'm running into some timing issues here.

I really want to get this done as soon as possible, so that we can publish a new release with an embedded ads workaround and a working filtering that doesn't time out, if Twitch decides to block the workaround again. I don't want to deal with questions about the embedded ads on Twitch anymore. This is extremely annoying. If I haven't made any progress on this until tomorrow, I'll open a PR with a changelog for a new release. If you want, we could also do it today and ignore the filtering timeout issue.

@bastimeyer bastimeyer force-pushed the plugins/twitch/filter-timeout branch from 1dfdd8f to 7656ba1 Compare September 20, 2020 20:32
@gravyboat
Copy link
Member

@bastimeyer Sorry for the lack of clarity, yes I did run it and I used your example as well as running it with --twitch-low-latency and results are actually better for me with this implementation than they currently are on the live build as I get stuttering using low latency because of how the segments are pulled once the ad is finished.

I'm not quite sure how to handle writing the tests for this either as I've always struggled with python threading. Timing issues are hard to solve and I haven't been writing enough python recently to provide the knowledge we need. @back-to Any insight or suggestions on how to handle this?

I agree with your annoyance. I don't want to deal with this again either and I'm sick of it. My issue with doing a release before handling the filtering timeout issue is that you and I both know people are going to start complaining about the filtering timeout issue. If we can't come up with a good way to test this I'd rather get this in without tests and deal with that not working for people than listen to complaints about filtering timeout.

@bastimeyer bastimeyer force-pushed the plugins/twitch/filter-timeout branch from 7656ba1 to ae88884 Compare September 21, 2020 00:41
@bastimeyer
Copy link
Member Author

bastimeyer commented Sep 21, 2020

OK, I've managed to write the tests. Not pretty and lots of verbose setup code, and I'm also pretty sure that this could probably be done way easier, but whatever.

This is still unfinished. Tests are currently not working in py2 (didn't pay any attention to that, so I'm not sure what fails/stalls and how it can be fixed) and the requests mocker seems to have some issues as well, but only if the entire test suite is run with coverage data for some reason (doesn't seem to be the case on the CI side 😕).

Other than that, there's one race condition with the test setup in one of the tests which I could work around. Consecutively ran the tests 300 times and had no other race conditions.

Will try to fix the remaining issues tomorrow.

@gravyboat
Copy link
Member

I just looked the tests over, they're a bit messy as you noted but decent enough in structure that I think we can consider this acceptable. If someone comes up with a better idea they can propose a commit on to your branch, but if not this will do. I'll check in on this issue around 23:00 your time tomorrow when I take my lunch break and if no one else has weighed in by then let's merge this.

@bastimeyer bastimeyer force-pushed the plugins/twitch/filter-timeout branch from ae88884 to 394fd41 Compare September 21, 2020 02:34
@bastimeyer
Copy link
Member Author

Switching from threading.Condition to threading.Event made it work in py2 now and the one race condition with the workaround now doesn't exist anymore.

The only issue I've noticed while running the tests is the requests_mocker, which doesn't seem to properly intercept all network requests to http://mocked/path/..., but for some reason this only happens when running all tests with coverage report. On the CI runners, this doesn't happen, which is weird. It's possible though that it's caused by the Twitch plugin tests and not the ones here.

Overall, I think I'm satisfied with it now. I should have tested everything and the coverage is 100%. Let's see if @back-to wants to review this, otherwise we can merge this later today / tomorrow.

Btw, if you're wondering why it's messy, hooking into the right methods of the HLS classes is a bit difficult and the other HLS tests are nowhere near as detailed / "thorough". I have annotated more lines in the latest push, and since every test assertion has a message, I don't think it's too bad.

@bastimeyer bastimeyer removed the WIP Work in process label Sep 21, 2020
@bastimeyer bastimeyer marked this pull request as ready for review September 21, 2020 02:58
@gravyboat
Copy link
Member

@bastimeyer My comment wasn't meant as a criticism merely trying to note that your concern over them not being pretty wasn't a big deal. I just looked over the tests again and they look good to me. The additional comments are also helpful.

Adds subclasses of the HLSStreamReader and HLSStreamWriter for filtering
out sequences and ignoring the read timeout while filtering.
and filter out advertisement sequences
@bastimeyer bastimeyer force-pushed the plugins/twitch/filter-timeout branch from 394fd41 to 61a95a2 Compare September 21, 2020 13:24
@bastimeyer
Copy link
Member Author

Added one final refinement, so that the default should_filter_sequence method actually gets tested.

And the requests_mock issue I was talking about when generating a coverage report is caused by a different test (tested it on master). That's probably the Twitch plugin tests, which I will try to rewrite and simplify later at some point, not sure when. That's the error, btw:

[stream.hls][warning] Failed to reload playlist: Unable to open URL: http://mocked/path/playlist.m3u8 (HTTPConnectionPool(host='mocked', port=80): Max retries exceeded with url: /path/playlist.m3u8 (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f4955825df0>: Failed to establish a new connection: [Errno -2] Name or service not known')))

@gravyboat
Copy link
Member

@bastimeyer I just looked over the tests and am happy with them. Do you still want another review? If not let's merge this.

@bastimeyer
Copy link
Member Author

bastimeyer commented Sep 21, 2020

If we're going to publish a new release today or tomorrow, then you can go ahead and merge if you want.
I will write a quick changelog and open a PR then.

Don't squash the commits please.

@gravyboat gravyboat merged commit 1456f2a into streamlink:master Sep 21, 2020
@gravyboat
Copy link
Member

@bastimeyer Done. I can check the changelog PR after work.

@bastimeyer
Copy link
Member Author

@gravyboat There's a problem with a race condition in one of the tests which I didn't spot...

And now two commits on the master branch have failed already:
https://github.com/streamlink/streamlink/runs/1146925732#step:5:239

That's a bit unfortunate, because I let my tests run for at least 500 times in both py3 and py2 because I wasn't too sure as I've said. I guess my CPU is too fast for the issue to occur, so I didn't notice it. Let me see if I can fix this quickly.

@gravyboat
Copy link
Member

@bastimeyer It's not ideal but you could put some sleeps in just to ensure the race conditions don't happen. It's dirty but if you can't get it working it's an option.


thread.await_write()
thread.await_write()
self.assertFalse(reader.filter_event.is_set(), "Lets the reader wait if filtering")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting here, so that it's clear where the issue is.

Adding sleeps is a last resort solution, and I don't really want this there.

I don't understand the issue if I'm being honest. The writer and main thread should both be synced properly after the await_write() call by the main thread, and the writer thread should have cleared the reader event, meaning that filter_event.is_set() should be false at this point.

The only thing that could go wrong here is the patched should_filter_sequence method, which could have returned a False value and altered the intended code path. And it's confusing to me that this has only happened in py2 so far.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failure has started to happen after my last change with the patch decorator (if we've not been unlucky with successful runs prior to that). Is it possible that it gets replaced lazily and too late in some cases?

Why has the test_filtered_closed test not failed yet? It's basically the same setup at the beginning of both tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how we could have been that unlucky with previous successful runs. Lazy replacement is possible. I just ran the tests in my local (Windows) environment and the new twitch tests passed perfectly fine. It seems pretty suspicious as well that this issue is limited to the CI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run this specific test on my test-account's fork 100 times and it didn't fail:
https://github.com/bastimeyer-test/streamlink-test/actions/runs/266135453

Running all tests 100 times however did fail eventually (with failfast enabled to cancel the other runners):
https://github.com/bastimeyer-test/streamlink-test/actions/runs/266138718

I will revert the last change now and then run it again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like there's something else wrong with the await_write() method or the general thread syncing method that's being used, because it now failed at a different place:
https://github.com/bastimeyer-test/streamlink-test/runs/1147228622?check_suite_focus=true#step:5:1246
that's annoying

Copy link
Member Author

@bastimeyer bastimeyer Sep 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some more debugging by adding detailed log messages and found the source of the issue (but not the cause yet).

In the test_filtered_no_timeout test for example, the two segments of the first playlist are supposed to be filtered out and the two segments of the second playlist are supposed to be written. This means that when trying to write the first two segments, it should ignore them and lock the reader thread. Sometimes this isn't done though, because the writer already starts with the third sequence, the first segment of the second playlist. Since this segment should not get filtered out, the lock in the reader thread is of course not set and the test fails.

So the question is, why does the writer start writing the third segment sometimes? The worker, which is putting / queuing up segments into the writer is for some reason starting with the second playlist.

According to the log output, this isn't the fault of the worker, so it must be requests_mock's fault:
https://github.com/bastimeyer-test/streamlink-test/runs/1147915591?check_suite_focus=true#step:5:639

I've already tried moving the requests_mock setup from the reader thread where it currently is to the main thread and into the unittest.TestCase class's setUp and tearDown methods, but it didn't help. The playlist request responses are set up correctly:
https://requests-mock.readthedocs.io/en/latest/response.html#response-lists

Is it possible that it's interference with the other tests, most likely the Twitch plugin's tests, where the requests_mock doesn't get reset properly? That would explain the network error messages I was talking about earlier. Remember that running only the hls_filtered tests 100 times in a row by the CI runners worked fine (link above).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed some issues with stalled threads on teardown in the hls_filtered tests by rewriting them and joining all spawned threads in the main thread's tearDown method and waiting for their termination, but the issue still occurs, now much rarer though, which is interesting.

When printing the name of all running threads after a clean teardown, it looks like some ThreadPoolExecuter segment-fetch-threads of one of the previously run stream workers are still running, even though all other tests are calling close() on their spawned streams every time. Those left over fetch-threads are bleeding into the current test run and are making the requests on the mocked URLs, which are the same between tests:
https://github.com/bastimeyer-test/streamlink-test/runs/1149311834?check_suite_focus=true#step:5:1418

If I change the mocked URLs in just the hls_filtered tests, then there are no failures 🎉🎉🎉, which means these bleeding threads are coming from other tests runs. Those are probably test_twitch.py or test_hls.py, because those are the only ones which are opening HLSStreams and reading them, the latter being more likely since it's run right before test_hls_filtered.py.
https://github.com/bastimeyer-test/streamlink-test/actions/runs/266894833 (no failed tests after 30 mins of execution - canceled the workflow after that, because I didn't want to wait any longer)

I will open a PR with fixes later after cleaning up all my debugging. This will be a fix and a workaround at the same time, because I won't touch the other "faulty" tests with their bleeding threads now. This can be done later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix/workaround (commit ID will be different than my PR):
https://github.com/bastimeyer-test/streamlink-test/commit/8be37ef6fb145ccb83ad11b51fa0e1453b5e0cdd
All tests running 50 times in a row on all CI runners (currently still running since 20 mins while I'm writing this):
https://github.com/bastimeyer-test/streamlink-test/actions/runs/266975807

@bastimeyer bastimeyer deleted the plugins/twitch/filter-timeout branch October 9, 2020 06:52
back-to added a commit to back-to/streamlink that referenced this pull request Dec 13, 2020
It has some downsides such as `--hls-timeout`,
Plugin specified FilteredHLSStream should be used instead.

streamlink#3187

---

Afreeca Plugin as an Example

streamlink#3408

---

Revert streamlink#1432
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Twitch plugin read timeout Streamlink Crashing after Commercial
2 participants