Looking for suggestions on improving parallelism #3164
Replies: 4 comments 3 replies
-
🥰Thank you @bradwilson for starting/hosting this conversation. Awesome! Next, I've silently never really liked the 2x defaults. I "get it" but just never really liked it. I've always wanted to have every test run in parallel. Not just every test class run in parallel. I'm a strong supporter of test isolation so I really try to avoid shared context as much as possible. That said I have been doing Assembly/Project level shared context -> create test containers once at start of test run. Then have my tests all want to run in parallel after that against the shared context. Sometimes I've felt like I wanted Test Collections where the 'collection' does a single setup (eg. connect to a db or create a specific test container) but then all the tests after that are parallel. 1a: I feel like this design is reasonable. I would be generally sitting in the "Everything runs in parallel" option. But as mentioned above, I use Test Containers which I would run at the start of the run, once. This would then be a SHARED CONTEXT which means I now loose parallelism, so I'm not done/out :( 1b: As mentioned above - yes I do. I currently don't have a "clean" work around. Here's my opinionated scenario: Isolated DB Tests with a single db per test.
I'm doing the above in xUnit v3 but it's the default class-level parallel. Because of the first step which is the shared context, then option 4/Disable parallelism only when there is shared context would really hurt me, if this setting was auto detected/set.
-- |
Beta Was this translation helpful? Give feedback.
-
Looking at how this gets implemented, I am wondering about our current messages and what should be done. Let's assume we implement this parallelism sorting system, and let's assume someone has chosen "everything runs in parallel". Each parallel "group" is a single test case. Let's assume the developer has three tests in one test class, and four tests in another test class, and both those test classes are in the same test collection. We have starting/finished messages for each layer of execution:
The assembly level message is easy to ensure we get one pair. 😄 However, all the way down this stack, in the existing model we guarantee only one pair per element is sent, because of the way we've defined parallelism. For example, all tests in a collection are run sequentially against each other, so by virtue of that grouping, we can guarantee that there is one singular place from which to send the singular pair of test collection starting & finished messages for a given test collection. In a model where everything can be parallelized, given my example above, I have 7 test cases in the test collection, but they're all being run in parallel against each other. I think this leaves at least three options for dealing with the message pairs:
I'm concerned that the most "correct" way to do this is by the by-far most computationally and memory expensive version (don't forget, everything I'm talking about here applies to test classes and test methods as well). Not only do we need to compute those groupings and keep them for the duration of the run, we also need to add a potential parallelism bottleneck since we have to lock around the collections so we can accurately keep track of first start vs. last finish, which feels like a parallelism bottleneck of sorts (and the faster the tests are at executing, the larger a %age of the execution time is spent in this bookkeeping). Part of me wonders whether these messages are providing value to any runner out there. We certainly don't use them in any of our runners; the only thing people tend to care about is actually test starting/test result/test finished. So part of me wonders if I should just remove these messages entirely (at least the test collection/test class/test method versions). Edit: I removed the 7x option due to my reply below |
Beta Was this translation helpful? Give feedback.
-
I'd just add that we have the parallelization requirement of "parallelize everything in a given assembly, but don't run tests of different assemblies in parallel". So, supporting a pluggable model of determining which tests to parallelize, with implementation for a couple of common scenarios, looks like a good approach. |
Beta Was this translation helpful? Give feedback.
-
👋🏻 Hi @bradwilson - just touching base. Is there anything we can help with here, to try and get some momentum with this? I'm firmly in the camp "Run everything in parallel, always" with ways to opt-out -per method-. Like a trait or attribute or something. Are you just trying to find some time to make some decisions here? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We have two open issues right now related to how tests run in parallel:
As I work through what the options for parallelism should be, I'm thinking maybe trying to set up a strict list of choices is probably the wrong strategy, so here's what I'm considering.
For starters, we should state where we are today: we have two modes of parallelization, with some modifiers.
Additionally, we the two issues linked again are requests for:
Now that we're at (at least) four different ways of defining how to run things in parallel, it feels like the old way of specifying parallelism isn't sufficient. So I think I want to change how we specify intra-assembly parallelism in configuration (remove
parallelizeTestCollections
) and command line switches (rework-parallel
), and instead replace it with a user-customizable "parallelism sorter".I'm not 100% sure whether this design is where I'll land, but it's a starting point for discussion.
The component would implement some interface, and be registered at the assembly-level. It would me given a list of all the test cases, and it would sort them into groups which serve two functions: (a) zero or one "non-parallel" group, which means any test case in that group is not run in parallel against any other test (this is to accommodate the "non-parallel" test collections today, as well as being extended to perhaps the test class and/or test method level); (b) zero or more "parallel" groups which contain tests which cannot be run in parallel against each other within the same group, but can be run in parallel against any test in any other "parallel" group.
For the purposes of illustration below, let's assume that there are ways to opt out of parallelism on a per-test-collection basis (that exists today, via
CollectionDefinition.DisableParallelization = true
) as well as perhaps on a per-test-class and per-test-method basis. We'll just call those "non-parallel tests" for simplicity.The default behaviors would work like this:
The new behaviors could be accommodated with:
That's the sum total of my thoughts. Removing this as a configuration item with built-in behavior, and instead replacing it with a user-customizable, compile-time (assembly-level) choice, should allow us to be able to add new rule sets later without mucking around with configuration options. It should also allow users to do unusual things that seem right for them, like my (currently over complicated) sample of parallelizing based on namespace, which would become fairly trivial in the new design.
So here are my open questions:
What do you think of this design?
a. Is the design reasonable, or too complex? Do you have an alternate design that's cleaner/simpler while still allowing for all the requirements?
b. Do you have parallelization requirements that are different than the four options shown here? Can you accomplish those goals with the generalized design here?
What do you think of the idea of removing the intra-assembly parallelism options (configuration files, command line switches, etc.)? What about keeping the inter-assembly parallelism options (for the multi-assembly runners like our first party Console and MSBuild runners)? Should we also remove those configuration file options and leave this decision solely to the runner based on command line options?
Beta Was this translation helpful? Give feedback.
All reactions