Skip to content

Tests: Adjust software tests for CrateDB #284

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

amotl
Copy link
Contributor

@amotl amotl commented Jul 5, 2025

About

Retroactively add software tests for CrateDB by adjusting the existing test suite.

It is very unlikely the patch will get merged in its current form, so it will need to be maintained separately to provide quality assurance for the CrateDB source and destination adapters.

Thoughts

The patch clearly demonstrates where CrateDB is a different animal compared to other DBMS. We don't know yet how to compress or re-shape those anomaly-balancing boilerplate adjustments, in order to reduce the noise of the patch, but we hope to be able to resolve one or another obstacle for the better as we go.

Backlog

  • After making a start per main_test.py, expand into sources_test.py and destinations_test.py as well.
  • Get rid of the worst code smells, by fixing or improving the dlt adapter, or by providing relevant monkey patches.
  • Release CrateDB Testcontainer for Python to get rid of development dependency to cratedb-toolkit.

References

@amotl amotl force-pushed the cratedb-tests branch 3 times, most recently from 24d5131 to 8466d1c Compare July 6, 2025 12:51
Copy link
Contributor Author

@amotl amotl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding a few comments at relevant spots to highlight the worst code smells.

Comment on lines +819 to +832
# CrateDB needs an explicit flush to make data available for reads immediately.
if dest_engine.dialect.name == "crate":
dest_engine.execute(f"REFRESH TABLE {schema_rand_prefix}.output")
Copy link
Contributor Author

@amotl amotl Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: Is it really needed? Doesn't the dlt adapter already flush the relevant table after concluding the load operation, or is it missing in ingestr?

Comment on lines -710 to +846
assert res[0] == (1, "val1", as_datetime("2022-01-01"))
assert res[1] == (2, "val2", as_datetime("2022-02-01"))

# Compensate for CrateDB types and insert order.
if dest_connection_url.startswith("cratedb://"):
assert (1, "val1", 1640995200000) in res
assert (2, "val2", 1643673600000) in res
else:
assert res[0] == (1, "val1", as_datetime("2022-01-01"))
assert res[1] == (2, "val2", as_datetime("2022-02-01"))
Copy link
Contributor Author

@amotl amotl Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. C'est la vie. Let me add a few thoughts about it.

Maybe improve this spot by patching as_datetime() at runtime in the context of CrateDB instead, in order to reduce the cluttering on the actual data validation?

On the other hand, aren't there corresponding converter hooks in crate-python already, which could make it work fluently?

Is it even a concern of improving the SQLAlchemy dialect with relevant features?

Comment on lines +849 to +870
# CrateDB: Compensate for "Type `date` does not support storage".
updated_at_type = "DATE"
if dest_connection_url.startswith("cratedb://"):
updated_at_type = "TIMESTAMP"

source_engine = sqlalchemy.create_engine(source_connection_url)
with source_engine.begin() as conn:
conn.execute(f"DROP SCHEMA IF EXISTS {schema_rand_prefix}")
conn.execute(f"CREATE SCHEMA {schema_rand_prefix}")
conn.execute(
f"CREATE TABLE {schema_rand_prefix}.input (id INTEGER, val VARCHAR(20), updated_at DATE)"
f"CREATE TABLE {schema_rand_prefix}.input (id INTEGER, val VARCHAR(20), updated_at {updated_at_type})"
Copy link
Contributor Author

@amotl amotl Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Problem

Even if the data source uses DATE types 1, some type mapping should transparently translate this into TIMESTAMP, right? If this can be made happen, this code can also go away.

References

Footnotes

  1. Or TIME, as both DATE and TIME can't be stored with CrateDB.

@amotl amotl force-pushed the cratedb-tests branch 3 times, most recently from e1c80a6 to 4cd8559 Compare July 6, 2025 14:33
@amotl amotl changed the title CrateDB: Add software tests CrateDB: Adjust software tests Jul 6, 2025
@amotl amotl changed the title CrateDB: Adjust software tests Tests: Adjust software tests for CrateDB Jul 6, 2025
@amotl amotl force-pushed the cratedb-tests branch 6 times, most recently from 45f2c7c to d20b837 Compare July 6, 2025 18:04
Makefile Outdated
Comment on lines 25 to 28
set -a; source test.env; set +a; TESTCONTAINERS_RYUK_DISABLED=true pytest -n auto -x -rP -vv --tb=short --durations=10 --cov=ingestr --no-cov-on-fail
set -a; source test.env; set +a; TESTCONTAINERS_RYUK_DISABLED=true pytest -n auto -x -rP -vv --tb=short --durations=10 --cov=ingestr --no-cov-on-fail -k "not cratedb"
set -a; source test.env; set +a; TESTCONTAINERS_RYUK_DISABLED=true pytest -rP -vv --tb=short --durations=10 --cov=ingestr --no-cov-on-fail -k "cratedb"
Copy link
Contributor Author

@amotl amotl Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discovered that the CrateDB Testcontainers implementation is not ready to be used together with xdist yet, so we are invoking integration tests against CrateDB separately now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant