-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Motivation
In order for Streamlink's Session
to be able to find a matching plugin for the given input URL, plugins must implement the can_handle_url
classmethod. For historical reasons and due to the work of many different people, every plugin is doing its own thing with how it handles the method's return value and how the URL regex(es) is/are defined, which is not only inconsistent, but also makes it difficult to read, adds complexity as well as maintenance burden and makes it difficult for new users to learn writing plugins.
The current API also doesn't allow for static code analysis of the URL regex(es), which would be a requirement if we want to eventually replace the inefficient plugin loading logic in the Session
with a pre-build step that generates a JSON object with an array of built-in plugins and their regexes + matching priorities, that can then be used by the Session
to find only a single plugin and load it instead of everything at once.
Proposal
A better solution would therefore be having strict definitions of each plugin's URL regex(es) and their matching priorities, in a declarative way.
Plugin.can_handle_url(url)
and Plugin.priority(url)
could then be removed from built-in plugins and deprecated for third-party plugins.
The Plugin.matchers
attribute
Let's define a list of URL regexes and priorities.
from typing import ClassVar, List, NamedTuple, Pattern
class Matcher(NamedTuple):
pattern: Pattern
priority: int
class Plugin:
matchers: ClassVar[List[Matcher]] = None
A plugin could implement it like this (more elegant solution down below):
from streamlink.plugin import LOW_PRIORITY, Matcher, Plugin
class MyPlugin(Plugin):
matchers = [
Matcher(re.compile("primary")),
Matcher(re.compile("secondary", FLAGS), priority=LOW_PRIORITY)
]
And the Session
could resolve an input URL like this:
from streamlink.plugin.plugin import Matcher, NO_PRIORITY, Plugin
class Session:
def resolve_url(self, url: str, follow_redirects: bool = False) -> Plugin
url = update_scheme("http://", url)
matcher: Matcher
candidate = None
priority = NO_PRIORITY
for plugin in self.plugins.values():
for matcher in plugin.matchers or []:
if matcher.priority > priority and matcher.pattern.match(url) is not None:
candidate = plugin
priority = matcher.priority
if candidate:
return candidate(url)
# `follow_redirects` logic...
raise NoPluginError
The pluginmatcher
class decorator
Since defining the matchers
attribute on each plugin doesn't 100% ensure consistency and also requires the import of the Matcher
, a decorator could be used instead, which makes defining URLs at the top/head of each plugin definition mandatory and doesn't require special imports.
from typing import Callable, Pattern, Type
def pluginmatcher(pattern: Pattern, priority: int = NORMAL_PRIORITY) -> Callable[[Type[Plugin]], Type[Plugin]]:
matcher = Matcher(pattern, priority)
def decorator(cls: Type[Plugin]) -> Type[Plugin]:
if not issubclass(cls, Plugin):
raise TypeError(f"{cls!r} is not a Plugin")
if cls.matchers is None:
cls.matchers = []
cls.matchers.insert(0, matcher)
return cls
return decorator
import re
from streamlink.plugin import LOW_PRIORITY, Plugin, pluginmatcher
@pluginmatcher(re.compile(
r"https?://foo\.bar/"
))
@pluginmatcher(priority=LOW_PRIORITY, pattern=re.compile(r"""
https?://baz\.qux/
""", re.VERBOSE))
class MyPlugin(Plugin):
pass
An alternative decorator implementation could be compiling the regex in the decorator itself, so that re.compile
doesn't have to be called in each plugin. I'd prefer it this way, as it's cleaner and it'd also simplify static code analysis in the future.
def pluginmatcher(pattern: str, flags: int = 0, priority: int = NORMAL_PRIORITY) -> Callable[[Type[Plugin]], Type[Plugin]]:
matcher = Matcher(re.compile(pattern, flags), priority)
# ...
@pluginmatcher(
# language=PythonRegExp
r"https?://foo\.bar/"
)
@pluginmatcher(
# language=PythonVerboseRegExp
r"""
https?://baz\.qux/
""",
re.VERBOSE,
priority=LOW_PRIORITY
)
class MyPlugin(Plugin):
pass
One drawback though is that some IDEs / editors won't be able to parse the pattern string as a regex anymore without annotations, configs or plugins. PyCharm for example needs the re.compile(pattern)
call for detecing the regex language injection in the pattern parameter, otherwise a # language=PythonRegExp
or # language=PythonVerboseRegExp
annotation is needed, and writing custom language injection rules in the IDE is not simple and also not portable.
The Plugin.matches
, Plugin.match
and Plugin.matcher
attributes
As it is common with many plugins, URL regexes have capture groups where data gets read from. Since there are no custom re.Pattern
class-attributes/variables anymore (_re_url
, etc.), to be able to match the input URL to extract some data, each plugin would have to call self.matchers[n].pattern.match(self.url)
, which is awkward.
The Plugin
should automatically define re.Match
results in its constructor, for every item in the matcher
list and for the one that first matched.
The matcher results should also be recomputed whenever the url gets updated.
from typing import ClassVar, List, Match, NamedTuple, Optional, Pattern, Sequence
class Matcher(NamedTuple):
pattern: Pattern
priority: int
class Plugin:
matchers: ClassVar[List[Matcher]] = None
matches: Sequence[Optional[Match]]
matcher: Pattern
match: Match
_url: str
@property
def url(self) -> str:
return self._url
@url.setter
def url(self, value: str):
self._url = value
matches = [(pattern, pattern.match(value)) for pattern, priority in self.matchers or []]
self.matches = tuple(m for p, m in matches)
self.matcher, self.match = next(((p, m) for p, m in matches if m is not None), (None, None))
def __init__(self, url: str) -> None:
self.url = url
# ...
See this HLSPlugin
example, which gets its self.match
from the first matching regex. The .groupdict()
call works for both regexes, as they define the same capture group names.
Multiple regexes with different capture groups can be accessed via self.matches[n]
and the matching regex itself via self.matcher
(in case that's needed).
@pluginmatcher(re.compile(
r"hls(?:variant)?://(?P<url>\S+)(?:\s(?P<params>.+))?"
))
@pluginmatcher(priority=LOW_PRIORITY, pattern=re.compile(
r"(?P<url>\S+\.m3u8(?:\?\S*)?)(?:\s(?P<params>.+))?"
))
class HLSPlugin(Plugin):
def _get_streams(self):
data = self.match.groupdict()
url = update_scheme("http://", data.get("url"))
params = parse_params(data.get("params"))
# ...
Ideas
Caching
Instead of a Matcher
NamedTuple
, a custom Matcher
class could be implemented which caches the pattern's match result, so that the regexes don't have to be matched against the input URL twice, first in Session.resolve_url(url)
and in the Plugin
constructor afterwards. However, as long as all plugins need to be in memory at the same time and are kept for the entire runtime, caching results of irrelevant plugins doesn't make much sense.
Plugin metadata
Instead of having to manually maintain plugin_matrix.rst
in the docs, metadata could be added to the pluginmatcher
decorator, which would describe the URLs in a natural human-readable way.
Feedback
I'd appreciate some feedback about my proposed changes and whether they make sense or not. Have I missed something obvious? Is there a better or more simple way? Not worth it?
I know this would be a big change and would require a lot of work to update every plugin, but I think this will be worth it for the reasons mentioned above. After having updated nearly half of the plugins yesterday, I haven't found a single case yet where I had problems, but I know there are some plugins I have yet to update like VK for example which have complex can_handle_url
logic that needs to be translated (complex plugin matching logic doesn't make sense).
As mentioned earlier, a deprecation path could be implemented for third-party plugins, so that this won't be a breaking change, but I haven't thought about any of that yet.
This is just an early proposal / suggestion, so let's not rush things and discuss this first. An actual implementation can also wait until it's the right time.