Date & time columns in PUDL Parquet outputs do not always parse as native types

# Overview

Currently, if you read one of the PUDL Parquet files that contains a date or datetime field into pandas, what dtype you get depends on how you read it in, and by default our date columns get converted into objects, which then need to be converted manually using e.g. `pd.datetime()` which is a hassle and will not be intuitive to all users.

Given that we are just distributing the data, and not a software package, it would be nice if the easiest ways of reading the data in the wild also reflected the dtypes that we use internally when producing the data, since those are the types that we expect to work and test with.

PyArrow provides a bunch of rich time dtypes, including a date with day resolution, and Pandas seems to have taken that type as [the solution to this problem](https://github.com/pandas-dev/pandas/issues/32473).

## Questions

* What are the appropriate types?
* How should we tell people to read this data to get usable time types?
* Should we change the PUDL dtypes so they work smoothly with the outside world?
* Should we switch over to using PyArrow dtypes by default throughout PUDL (maybe when pandas 3.0 lands?)

## Current Behavior

### `report_date`
* `pudl.helpers.get_parquet_table()` --> `datetime64[s]` (explicit imposition of the PUDL dtype)
* `pd.read_parquet(dtype_backend="pyarrow")` --> `date32[day][pyarrow]`
* `pandas.read_parquet().convert_dtypes()` --> `object`

### `datetime_utc`
* `pudl.helpers.get_parquet_table()` --> `datetime64[s]` (explicit imposition of the PUDL dtype)
* `pd.read_parquet(dtype_backend="pyarrow")` --> `timestamp[ms][pyarrow]`
* `pandas.read_parquet().convert_dtypes()` --> `datetime64[ms]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Date & time columns in PUDL Parquet outputs do not always parse as native types #4326

Overview

Questions

Current Behavior

`report_date`

`datetime_utc`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Date & time columns in PUDL Parquet outputs do not always parse as native types #4326

Description

Overview

Questions

Current Behavior

report_date

datetime_utc

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`report_date`

`datetime_utc`