-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
utils.url: make update_scheme always update target #4053
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
utils.url: make update_scheme always update target #4053
Conversation
Let's take a look at the Any calls that have Next are the calls with This means that this is the list of plugins that need to be checked and updated if there are any issues with http to https redirection. As you can see, all of them except tv999 use
How should this issue be solved?There are two solutions (which can also be combined):
(1) could be done in this PR as an additional commit, but I would prefer solving (2) in a separate PR for visibility and legibility reasons. |
Previously, update_scheme only added schemes on scheme-less URLs and kept target URLs which included a scheme intact. This is confusing for plugin implementers and it has already been used incorrectly a lot of times when trying to force the HTTPS protocol on unencrypted HTTP URLs. Since the method is called update_scheme, it should do exactly that, in all cases. However, always updating the target scheme will break those calls of update_scheme where the scheme was previously only added if needed, eg. on scheme-less URLs, partial URLs or URL paths. An example of this are scheme-less input URLs passed to Streamlink's `Session.resolve_url()`, which has to implicitly set the HTTP scheme. Always overriding the scheme would turn explicit HTTPS schemes to HTTP, which can break certain sites/plugins. The various stream-protocol plugins have the same issue and have to implicitly set a scheme as well. This change therefore introduces the `force=True` keyword, so that URL schemes can be set implicitly again in the methods and plugins mentioned above by setting force to False. Another effect of forcing the scheme override by default is the pattern currently used in many plugins, which often update (partial) URLs by calling `update_scheme(self.url, target)`. Since `self.url` depends on the user's input and since the input URL's scheme is implicitly set to HTTP by the Session's URL resolver, target URLs can be downgraded from HTTPS to HTTP, which is not desired. This is not a problem in cases where the server will redirect, but it can be a problem where the server returns an error instead. These plugins will therefore need to be fixed so that they either set the URL schemes implicitly or so that they force the according scheme. Another solution (which will be done eventually) is the change of the implicit input URL scheme from HTTP to HTTPS in `Session.resolve_url()` and the various stream-protocol plugins, which will turn all scheme-less input URLs to HTTPS by default, unless the user explicitly sets an HTTP URL. - always update target URL scheme with current URL scheme - add force=True keyword and do not override scheme if force is False - set force=False on all implicitly updated URL schemes - `Session.resolve_url` - `plugins.{akamaihd,hds,hls,http}` - parametrize tests, re-order/re-format assertions and add new ones
Update URL schemes to https instead of using the input URL as reference and set force to False on those plugins which support multiple sites, so that the https scheme only gets added if it is missing.
38d7a5a
to
7cfd434
Compare
Thanks for the write up. Correct me if I'm wrong but it seems like option 2 will be the most beneficial and easy to understand going forward? |
The changes of this PR should be good now and all plugins got fixed (see the commit message of the second commit). Switching from http to https by default on scheme-less URLs as described by (2) is now independent of this, as it only affects the input URL and not the |
…k#4053) Previously, update_scheme only added schemes on scheme-less URLs and kept target URLs which included a scheme intact. This is confusing for plugin implementers and it has already been used incorrectly a lot of times when trying to force the HTTPS protocol on unencrypted HTTP URLs. Since the method is called update_scheme, it should do exactly that, in all cases. However, always updating the target scheme will break those calls of update_scheme where the scheme was previously only added if needed, eg. on scheme-less URLs, partial URLs or URL paths. An example of this are scheme-less input URLs passed to Streamlink's `Session.resolve_url()`, which has to implicitly set the HTTP scheme. Always overriding the scheme would turn explicit HTTPS schemes to HTTP, which can break certain sites/plugins. The various stream-protocol plugins have the same issue and have to implicitly set a scheme as well. This change therefore introduces the `force=True` keyword, so that URL schemes can be set implicitly again in the methods and plugins mentioned above by setting force to False. Another effect of forcing the scheme override by default is the pattern currently used in many plugins, which often update (partial) URLs by calling `update_scheme(self.url, target)`. Since `self.url` depends on the user's input and since the input URL's scheme is implicitly set to HTTP by the Session's URL resolver, target URLs can be downgraded from HTTPS to HTTP, which is not desired. This is not a problem in cases where the server will redirect, but it can be a problem where the server returns an error instead. These plugins will therefore need to be fixed so that they either set the URL schemes implicitly or so that they force the according scheme. Another solution (which will be done eventually) is the change of the implicit input URL scheme from HTTP to HTTPS in `Session.resolve_url()` and the various stream-protocol plugins, which will turn all scheme-less input URLs to HTTPS by default, unless the user explicitly sets an HTTP URL. - always update target URL scheme with current URL scheme - add force=True keyword and do not override scheme if force is False - set force=False on all implicitly updated URL schemes - `Session.resolve_url` - `plugins.{akamaihd,hds,hls,http}` - parametrize tests, re-order/re-format assertions and add new ones
Update URL schemes to https instead of using the input URL as reference and set force to False on those plugins which support multiple sites, so that the https scheme only gets added if it is missing.
I just noticed that this breaks socks4/socks5 proxy URLs for the https-proxy option, because it now always overrides the scheme (no force=False param). This needs to be fixed. |
Previously, update_scheme only added schemes on scheme-less URLs and
kept target URLs which included a scheme intact.
This is confusing for plugin implementers and it has already been used
incorrectly a lot of times when trying to force the HTTPS protocol on
unencrypted HTTP URLs. Since the method is called update_scheme, it
should do exactly that, in all cases.
However, always updating the target scheme will break those calls of
update_scheme where the scheme was previously only added if needed, eg.
on scheme-less URLs, partial URLs or URL paths.
An example of this are scheme-less input URLs passed to Streamlink's
Session.resolve_url()
, which has to implicitly set the HTTP scheme.Always overriding the scheme would turn explicit HTTPS schemes to HTTP,
which can break certain sites/plugins. The various stream-protocol
plugins have the same issue and have to implicitly set a scheme as well.
This change therefore introduces the
force=True
keyword, so that URLschemes can be set implicitly again in the methods and plugins mentioned
above by setting force to False.
Another effect of forcing the scheme override by default is the pattern
currently used in many plugins, which often update (partial) URLs by
calling
update_scheme(self.url, target)
. Sinceself.url
depends onthe user's input and since the input URL's scheme is implicitly set to
HTTP by the Session's URL resolver, target URLs can be downgraded from
HTTPS to HTTP, which is not desired. This is not a problem in cases
where the server will redirect, but it can be a problem where the server
returns an error instead.
These plugins will therefore need to be fixed so that they either set
the URL schemes implicitly or so that they force the according scheme.
Another solution (which will be done eventually) is the change of the
implicit input URL scheme from HTTP to HTTPS in
Session.resolve_url()
and the various stream-protocol plugins, which will turn all scheme-less
input URLs to HTTPS by default, unless the user explicitly sets an HTTP
URL.
Session.resolve_url
plugins.{akamaihd,hds,hls,http}
ref #4047
Here's a list of all
update_scheme
calls. As you can see, every call with "http://" as current URL have the force=False parameter set. The other ones are a mixture of "https://" (also via constants) andself.url
. Theself.url
ones are problematic for the reasons mentioned above, but as I've also said, there's an upgrade path for that by switching from http by default to https (#4047).Please don't merge unless this we're 100% sure that everything's good here.