Improved plugins URL matching API

# Motivation

In order for Streamlink's `Session` to be able to find a matching plugin for the given input URL, plugins must implement the `can_handle_url` classmethod. For historical reasons and due to the work of many different people, every plugin is doing its own thing with how it handles the method's return value and how the URL regex(es) is/are defined, which is not only inconsistent, but also makes it difficult to read, adds complexity as well as maintenance burden and makes it difficult for new users to learn writing plugins.

The current API also doesn't allow for static code analysis of the URL regex(es), which would be a requirement if we want to eventually replace the inefficient plugin loading logic in the `Session` with a pre-build step that generates a JSON object with an array of built-in plugins and their regexes + matching priorities, that can then be used by the `Session` to find only a single plugin and load it instead of everything at once.

# Proposal

A better solution would therefore be having strict definitions of each plugin's URL regex(es) and their matching priorities, in a declarative way.

`Plugin.can_handle_url(url)` and `Plugin.priority(url)` could then be removed from built-in plugins and deprecated for third-party plugins.


## The `Plugin.matchers` attribute

Let's define a list of URL regexes and priorities.

```py
from typing import ClassVar, List, NamedTuple, Pattern

class Matcher(NamedTuple):
    pattern: Pattern
    priority: int

class Plugin:
    matchers: ClassVar[List[Matcher]] = None
```

A plugin could implement it like this (more elegant solution down below):

```py
from streamlink.plugin import LOW_PRIORITY, Matcher, Plugin

class MyPlugin(Plugin):
    matchers = [
        Matcher(re.compile("primary")),
        Matcher(re.compile("secondary", FLAGS), priority=LOW_PRIORITY)
    ]
```

And the `Session` could resolve an input URL like this:

```py
from streamlink.plugin.plugin import Matcher, NO_PRIORITY, Plugin

class Session:
    def resolve_url(self, url: str, follow_redirects: bool = False) -> Plugin
        url = update_scheme("http://", url)

        matcher: Matcher
        candidate = None
        priority = NO_PRIORITY
        for plugin in self.plugins.values():
            for matcher in plugin.matchers or []:
                if matcher.priority > priority and matcher.pattern.match(url) is not None:
                    candidate = plugin
                    priority = matcher.priority

        if candidate:
            return candidate(url)

        # `follow_redirects` logic...

        raise NoPluginError
```

### The `pluginmatcher` class decorator

Since defining the `matchers` attribute on each plugin doesn't 100% ensure consistency and also requires the import of the `Matcher`, a decorator could be used instead, which makes defining URLs at the top/head of each plugin definition mandatory and doesn't require special imports.

```py
from typing import Callable, Pattern, Type

def pluginmatcher(pattern: Pattern, priority: int = NORMAL_PRIORITY) -> Callable[[Type[Plugin]], Type[Plugin]]:
    matcher = Matcher(pattern, priority)

    def decorator(cls: Type[Plugin]) -> Type[Plugin]:
        if not issubclass(cls, Plugin):
            raise TypeError(f"{cls!r} is not a Plugin")
        if cls.matchers is None:
            cls.matchers = []
        cls.matchers.insert(0, matcher)

        return cls

    return decorator
```

```py
import re
from streamlink.plugin import LOW_PRIORITY, Plugin, pluginmatcher

@pluginmatcher(re.compile(
    r"https?://foo\.bar/"
))
@pluginmatcher(priority=LOW_PRIORITY, pattern=re.compile(r"""
    https?://baz\.qux/
""", re.VERBOSE))
class MyPlugin(Plugin):
    pass
```

An alternative decorator implementation could be compiling the regex in the decorator itself, so that `re.compile` doesn't have to be called in each plugin. I'd prefer it this way, as it's cleaner and it'd also simplify static code analysis in the future.

```py
def pluginmatcher(pattern: str, flags: int = 0, priority: int = NORMAL_PRIORITY) -> Callable[[Type[Plugin]], Type[Plugin]]:
    matcher = Matcher(re.compile(pattern, flags), priority)

    # ...
```

```py
@pluginmatcher(
    # language=PythonRegExp
    r"https?://foo\.bar/"
)
@pluginmatcher(
    # language=PythonVerboseRegExp
    r"""
    https?://baz\.qux/
    """,
    re.VERBOSE,
    priority=LOW_PRIORITY
)
class MyPlugin(Plugin):
    pass
```

One drawback though is that some IDEs / editors won't be able to parse the pattern string as a regex anymore without annotations, configs or plugins. PyCharm for example needs the `re.compile(pattern)` call for detecing the regex language injection in the pattern parameter, otherwise a `# language=PythonRegExp` or `# language=PythonVerboseRegExp` annotation is needed, and writing custom language injection rules in the IDE is not simple and also not portable.


## The `Plugin.matches`, `Plugin.match` and `Plugin.matcher` attributes

As it is common with many plugins, URL regexes have capture groups where data gets read from. Since there are no custom `re.Pattern` class-attributes/variables anymore (`_re_url`, etc.), to be able to match the input URL to extract some data, each plugin would have to call `self.matchers[n].pattern.match(self.url)`, which is awkward.

The `Plugin` should automatically define `re.Match` results in its constructor, for every item in the `matcher` list and for the one that first matched.

The matcher results should also be recomputed whenever the url gets updated.

```py
from typing import ClassVar, List, Match, NamedTuple, Optional, Pattern, Sequence

class Matcher(NamedTuple):
    pattern: Pattern
    priority: int

class Plugin:
    matchers: ClassVar[List[Matcher]] = None
    matches: Sequence[Optional[Match]]
    matcher: Pattern
    match: Match

    _url: str

    @property
    def url(self) -> str:
        return self._url

    @url.setter
    def url(self, value: str):
        self._url = value

        matches = [(pattern, pattern.match(value)) for pattern, priority in self.matchers or []]
        self.matches = tuple(m for p, m in matches)
        self.matcher, self.match = next(((p, m) for p, m in matches if m is not None), (None, None))

    def __init__(self, url: str) -> None:
        self.url = url

        # ...
```

See this `HLSPlugin` example, which gets its `self.match` from the first matching regex. The `.groupdict()` call works for both regexes, as they define the same capture group names.

Multiple regexes with different capture groups can be accessed via `self.matches[n]` and the matching regex itself via `self.matcher` (in case that's needed).

```py
@pluginmatcher(re.compile(
    r"hls(?:variant)?://(?P<url>\S+)(?:\s(?P<params>.+))?"
))
@pluginmatcher(priority=LOW_PRIORITY, pattern=re.compile(
    r"(?P<url>\S+\.m3u8(?:\?\S*)?)(?:\s(?P<params>.+))?"
))
class HLSPlugin(Plugin):
    def _get_streams(self):
        data = self.match.groupdict()
        url = update_scheme("http://", data.get("url"))
        params = parse_params(data.get("params"))

        # ...
```


## Ideas

### Caching

Instead of a `Matcher` `NamedTuple`, a custom `Matcher` class could be implemented which caches the pattern's match result, so that the regexes don't have to be matched against the input URL twice, first in `Session.resolve_url(url)` and in the `Plugin` constructor afterwards. However, as long as all plugins need to be in memory at the same time and are kept for the entire runtime, caching results of irrelevant plugins doesn't make much sense.

### Plugin metadata

Instead of having to manually maintain `plugin_matrix.rst` in the docs, metadata could be added to the `pluginmatcher` decorator, which would describe the URLs in a natural human-readable way.


# Feedback

I'd appreciate some feedback about my proposed changes and whether they make sense or not. Have I missed something obvious? Is there a better or more simple way? Not worth it?

I know this would be a big change and would require a lot of work to update every plugin, but I think this will be worth it for the reasons mentioned above. After having updated nearly half of the plugins yesterday, I haven't found a single case yet where I had problems, but I know there are some plugins I have yet to update like VK for example which have complex `can_handle_url` logic that needs to be translated (complex plugin matching logic doesn't make sense).

As mentioned earlier, a deprecation path could be implemented for third-party plugins, so that this won't be a breaking change, ~~but I haven't thought about any of that yet~~.

This is just an early proposal / suggestion, so let's not rush things and discuss this first. An actual implementation can also wait until it's the right time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improved plugins URL matching API #3814

Motivation

Proposal

The `Plugin.matchers` attribute

The `pluginmatcher` class decorator

The `Plugin.matches`, `Plugin.match` and `Plugin.matcher` attributes

Ideas

Caching

Plugin metadata

Feedback

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improved plugins URL matching API #3814

Description

Motivation

Proposal

The Plugin.matchers attribute

The pluginmatcher class decorator

The Plugin.matches, Plugin.match and Plugin.matcher attributes

Ideas

Caching

Plugin metadata

Feedback

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

The `Plugin.matchers` attribute

The `pluginmatcher` class decorator

The `Plugin.matches`, `Plugin.match` and `Plugin.matcher` attributes