Skip to content

EIA 860M data provenance inaccurately reflected in current DAG #4327

Open
@e-belfer

Description

@e-belfer

Describe the bug

A clear and concise description of what the bug is.

Right now, we have raw_eia860m data, and the only asset downstream of this is _core_eia860m__changelog_generators. However, we're also integrating the latest month of EIA 860M data into the raw_eia860 assets in extract_eia860():

raw_eia860__all_dfs = pudl.extract.eia860m.append_eia860m(
            eia860_raw_dfs=raw_eia860__all_dfs, eia860m_raw_dfs=eia860m_raw_dfs
        )

What does this mean? Our Dagster asset graph (DAG) doesn't accurately represent EIA 860M data as being upstream of the EIA 860 assets, or the many assets downstream of them. It's also not particularly clear where the EIA 860M data is getting merged into the 860 data, and the way we're handling the input data here is non-standard with the rest of our system.

Bug Severity

How badly is this bug affecting you?

  • Low: The bug isn't causing me problems, but something's still wrong here.

Expected behavior

A clear and concise description of what you expected to happen, or what you expected the data to look like.

  • We should remove the eia_settings.eia860.eia860m setting from the ETL, as we never set it to be False and it's not clear what the use case is here.
  • We should add raw_eia860m__all_dfs() as an input asset to the extract_eia860() multi_asset, with the step filtering the last year of data from each dataframe rather than reextracting the 860M data from scratch.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThings that are just plain broken.eia860Anything having to do with EIA Form 860

    Type

    No type

    Projects

    Status

    New

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions