Improve ergonomics of updating expected row counts

# Overview

Updating the expected row counts for dbt can be frustrating. Some issues that have come up:

## Needing to run the full ETL locally

- You have to re-run the entire ETL to be 100% confident that you've got the correct expected row counts.
- Not everybody's computer is up to doing this task, or it takes too long.
- Ideally we would get reliable row-count expectations just from rematerializing a subset of the assets locally.
- Alternatively, we could generate new row-count expectations by running a full ETL using a `workflow_dispatch`
- After the build completes, the row count update script could be run on all tables, and the resulting CSV(s) would get uploaded along with all the other outputs to `gs://builds.catalyst.coop/build-id/` and could be downloaded
- (note that right now only files in PUDL_OUTPUT get uploaded, so if a change is detected, we'd need to copy the full row counts CSV over there to be saved)
- Under normal circumstances, there should be no change of any kind.

## Using non-standard partition columns

- We have some default columns that are used to partition tables so we have more granular row count expectations
- These are not always the right partitioning columns, and so we have the option of specifying other columns in the tests.
- However, when adding a new table and row count expectations for the first time, the script doesn't allow you to specify what non-standard columns you would like to use. So you have to add it, and then edit the test spec in `schema.yml` and then go back and regenerate the expectations. Having a direct

## Removing obsolete partition values

(maybe this is/was already fixed, by @jdangerx's PR making row count checks more strict?) 

- When updating row counts, if there's an old partition value that's no longer relevant (e.g. because the partitioning column has been changed) the script will add new records to the row count CSV, but doesn't remove the obsolete records.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Improve ergonomics of updating expected row counts #4333

Overview

Needing to run the full ETL locally

Using non-standard partition columns

Removing obsolete partition values

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Improve ergonomics of updating expected row counts #4333

Description

Overview

Needing to run the full ETL locally

Using non-standard partition columns

Removing obsolete partition values

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions