Skip to content

plugin.api.validate: refactor, turn into package #4514

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 8, 2022

Conversation

bastimeyer
Copy link
Member

@bastimeyer bastimeyer commented May 5, 2022

A lot of changes with lots of details. Please see the individual commit messages for the explanation and motivation of these changes. The added tests for the newly added ValidationError class also make the motivation a bit more clear.

Backwards compatiblity is / should be kept.


  • Refactor plugin.api.validate module
    • Turn module into package with sub-modules and define a proper public interface (keep backwards compatibility)
      • schemas: classes which register attributes used by their respective validate implementations
      • validators: functions which can internally call validate and which return something that can be validated
      • validate: singledispatch functions which implement the validation logic for schemas and various other types
    • Rename validation schema classes for better internal references
    • Refactor Schema, AllSchema, AnySchema, GetItemSchema and Callable
    • Fix XmlElementSchema
    • Refactor singledispatch validation functions
    • Add some type annotations and use f-strings
    • Update comments and fix lots of grammar errors
  • Improve validation error handling/printing
    • Implement ValidationError (inherit from ValueError for backwards compatibility)
      • Allow for collecting multiple errors (via AnySchema) and keep an error stack (error contexts)
      • Properly print errors and error stack when converting to string
      • Add schema/validator names to raised ValidationError instances, for better legibility on error output
    • Truncate strings in error messages (eg. long JSON/HTML/XML data, etc.)
  • Add and update tests
    • Rewrite / clean up old tests and resolve the last missing code coverage
  • Add a few more convenience schemas/validators for patterns that come up all the time (will check later)
  • Add basic documentation (will check later)

@bastimeyer bastimeyer added WIP Work in process API: validate labels May 5, 2022
@bastimeyer bastimeyer force-pushed the validate-api-refactor branch from 38de126 to ae8757e Compare May 5, 2022 12:03
@bastimeyer
Copy link
Member Author

bastimeyer commented May 5, 2022

Example of the ValidationError output with this diff applied to the YouTube plugin, which breaks its validation schema:

diff --git a/src/streamlink/plugins/youtube.py b/src/streamlink/plugins/youtube.py
index 13a5872e..bc086bf2 100644
--- a/src/streamlink/plugins/youtube.py
+++ b/src/streamlink/plugins/youtube.py
@@ -157,7 +157,7 @@ class YouTube(Plugin):
                 "microformat": validate.all(
                     validate.any(
                         validate.all(
-                            {"playerMicroformatRenderer": dict},
+                            {"playerMicroformatRenderer": float},
                             validate.get("playerMicroformatRenderer")
                         ),
                         validate.all(
error: ValidationError(dict):
  Unable to validate value of key 'microformat'
  Context(AnySchema):
    ValidationError(dict):
      Unable to validate value of key 'playerMicroformatRenderer'
      Context(type):
        Type of <{'thumbnail': {'thumbnails': [{'url': 'https://i.ytimg....> should be float, but is dict
    ValidationError(dict):
      Key 'microformatDataRenderer' not found in <{'playerMicroformatRenderer': {'thumbnail': {'thumbnail...>

Compare that to the error message of the same diff applied to master, which is just pure nonsense.

@bastimeyer bastimeyer removed the WIP Work in process label May 5, 2022
@bastimeyer bastimeyer force-pushed the validate-api-refactor branch 3 times, most recently from 898414c to 40c9fb1 Compare May 6, 2022 12:30
@bastimeyer
Copy link
Member Author

I've renamed the sub-modules in my latest push, so that there's no naming conflict between the validate sub-module and the exported validate singledispatch function.

This should be ready for review now. The other incomplete goals I've listed in the OP are not important.

Please see the invidual commits and their commit messages for the motivation of these changes. As you can see, I applied some additional changes in the first commit while splitting up the module, but I could change that and add more simple commits, if that helps reviewing.


Also, a still unresolved question is whether ValidationError should inherit from other exceptions and not just ValueError. For example, when validating a regular expression, its result will either be None or an instance of re.Match, and calling validate.get afterwards on None raises a TypeError, which is why a wrapping validate.any(None, ...) is always needed. A different solution for this could be adding a new validation schema for regexes, where this is handled accordingly.

@bastimeyer bastimeyer added the WIP Work in process label May 6, 2022
@bastimeyer
Copy link
Member Author

I'm going to include a rewrite of the tests in this PR (not yet finished). There are a couple of things I noticed while rewriting the tests which should be improved.

There's a broken type check in the TransformSchema validation, some validate calls should be wrapped for better error messages, and I think some stuff should be re-arranged.

bastimeyer added 9 commits May 7, 2022 12:20
Turn into SchemaContainer subclasses
Turn into validation schema with a singledispatch function.
This is required for the upcoming module split.
Remove callable check from the base validate singledispatch function
and register the abc.Callable type instead.

Also fix type validation in the transform schema.
Add missing tail attribute and clone child nodes
Inherit from the all-schema directly to avoid having an unnecessary
singledispatch type definition.
Turn module into package with multiple logical sub-modules:
- Define a public interface in the package's `__init__` module
- Split validation schemas, validators and validate logic
  - schemas: classes which register attributes used by their
    respective `validate` implementations
  - validators: functions which can internally call `validate`
    and which return something that can be validated
  - validate: singledispatch functions which implement the validation
    logic for schemas and various other types
- Rename validation schemas for better internal references
- Rename singledispatch methods

Other clean-up work:
- Update comments and fix grammar
- Add type annotations
- Use f-strings
- Use `str` instead of the `text` alias
- Simplify some code blocks
- Rearrange classes and functions
- Rephrase certain error messages
- Add a few more tests for better code coverage
- Implement `ValidationError`
  - Inherit from `ValueError` to preserve backwards compatiblity
  - Allow collecting multiple errors (AnySchema)
  - Keep an error stack of parent `ValidationError`s or other exceptions
  - Format error stack when converting error to string
- Raise `ValidationError` instead of `ValueError`
  - Add error contexts where it makes sense
  - Add schema names to error instances
- Add and update tests
- Change signature of `ValidationError` to be able to pass template
  strings and template variables which can get truncated individually.
- Make error messages consistent by using string representations for
  template variables where it makes sense
Completely rewrite tests using pytest, with full coverage
@bastimeyer bastimeyer force-pushed the validate-api-refactor branch from 40c9fb1 to 3153e8d Compare May 7, 2022 14:24
@bastimeyer bastimeyer removed the WIP Work in process label May 7, 2022
@bastimeyer
Copy link
Member Author

Apologies for the back-and-forth with this, but as said, there were a couple of things which I felt needed to be adjusted.

I've rebased the PR branch and split up everything into more logical commits for easier review. The latest push now also includes a complete rewrite of the tests based on pytest with a proper separation of the tested schemas and validations, and it now covers 100%. Every commit of this branch should however be valid on its own.

I am pretty much satisfied with the changes now and unless there's a bug, I won't be changing anything here anymore. There's still some stuff that can be improved in the future though.

@back-to back-to merged commit d09112a into streamlink:master May 8, 2022
@bastimeyer bastimeyer deleted the validate-api-refactor branch May 8, 2022 09:01
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 8, 2022
Add missing tail attribute and clone child nodes
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 8, 2022
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 8, 2022
Turn into validation schema with a singledispatch function.
This is required for the upcoming module split.
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 8, 2022
Remove callable check from the base validate singledispatch function
and register the abc.Callable type instead.

Also fix type validation in the transform schema.
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 8, 2022
Inherit from the all-schema directly to avoid having an unnecessary
singledispatch type definition.
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 9, 2022
Remove callable check from the base validate singledispatch function
and register the abc.Callable type instead.

Also fix type validation in the transform schema.
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 9, 2022
Inherit from the all-schema directly to avoid having an unnecessary
singledispatch type definition.
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 13, 2022
Turn module into package with multiple logical sub-modules:
- Define a public interface in the package's `__init__` module
- Split validation schemas, validators and validate logic
  - schemas: classes which register attributes used by their
    respective `validate` implementations
  - validators: functions which can internally call `validate`
    and which return something that can be validated
  - validate: singledispatch functions which implement the validation
    logic for schemas and various other types
- Rename validation schemas for better internal references
- Rename singledispatch methods

Other clean-up work:
- Update comments and fix grammar
- Add type annotations
- Use f-strings
- Use `str` instead of the `text` alias
- Simplify some code blocks
- Rearrange classes and functions
- Rephrase certain error messages
- Add a few more tests for better code coverage
- Implement `ValidationError`
  - Inherit from `ValueError` to preserve backwards compatiblity
  - Allow collecting multiple errors (AnySchema)
  - Keep an error stack of parent `ValidationError`s or other exceptions
  - Format error stack when converting error to string
- Raise `ValidationError` instead of `ValueError`
  - Add error contexts where it makes sense
  - Add schema names to error instances
- Add and update tests
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 13, 2022
- Change signature of `ValidationError` to be able to pass template
  strings and template variables which can get truncated individually.
- Make error messages consistent by using string representations for
  template variables where it makes sense
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 13, 2022
Completely rewrite tests using pytest, with full coverage
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 13, 2022
Turn module into package with multiple logical sub-modules:
- Define a public interface in the package's `__init__` module
- Split validation schemas, validators and validate logic
  - schemas: classes which register attributes used by their
    respective `validate` implementations
  - validators: functions which can internally call `validate`
    and which return something that can be validated
  - validate: singledispatch functions which implement the validation
    logic for schemas and various other types
- Rename validation schemas for better internal references
- Rename singledispatch methods

Other clean-up work:
- Update comments and fix grammar
- Add type annotations
- Use f-strings
- Use `str` instead of the `text` alias
- Simplify some code blocks
- Rearrange classes and functions
- Rephrase certain error messages
- Add a few more tests for better code coverage
- Implement `ValidationError`
  - Inherit from `ValueError` to preserve backwards compatiblity
  - Allow collecting multiple errors (AnySchema)
  - Keep an error stack of parent `ValidationError`s or other exceptions
  - Format error stack when converting error to string
- Raise `ValidationError` instead of `ValueError`
  - Add error contexts where it makes sense
  - Add schema names to error instances
- Add and update tests
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 13, 2022
- Change signature of `ValidationError` to be able to pass template
  strings and template variables which can get truncated individually.
- Make error messages consistent by using string representations for
  template variables where it makes sense
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 13, 2022
Completely rewrite tests using pytest, with full coverage
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 13, 2022
Turn module into package with multiple logical sub-modules:
- Define a public interface in the package's `__init__` module
- Split validation schemas, validators and validate logic
  - schemas: classes which register attributes used by their
    respective `validate` implementations
  - validators: functions which can internally call `validate`
    and which return something that can be validated
  - validate: singledispatch functions which implement the validation
    logic for schemas and various other types
- Rename validation schemas for better internal references
- Rename singledispatch methods

Other clean-up work:
- Update comments and fix grammar
- Add type annotations
- Use f-strings
- Use `str` instead of the `text` alias
- Simplify some code blocks
- Rearrange classes and functions
- Rephrase certain error messages
- Add a few more tests for better code coverage
- Implement `ValidationError`
  - Inherit from `ValueError` to preserve backwards compatiblity
  - Allow collecting multiple errors (AnySchema)
  - Keep an error stack of parent `ValidationError`s or other exceptions
  - Format error stack when converting error to string
- Raise `ValidationError` instead of `ValueError`
  - Add error contexts where it makes sense
  - Add schema names to error instances
- Add and update tests
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 13, 2022
- Change signature of `ValidationError` to be able to pass template
  strings and template variables which can get truncated individually.
- Make error messages consistent by using string representations for
  template variables where it makes sense
Billy2011 added a commit to Billy2011/streamlink-27 that referenced this pull request May 13, 2022
Completely rewrite tests using pytest, with full coverage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants