Skip to content

Naming things: Rename _kafka_msg_id to _kafka__msg_id #289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

amotl
Copy link
Contributor

@amotl amotl commented Jul 6, 2025

About

Unfortunately, CrateDB does not permit column names using leading underscores, like _kafka_msg_id. However, if two underscores are included, like _kafka__msg_id, it works.

The patch improves storage alignment with the other already existing column _kafka__data, which is already using two underscores in its name.

Thoughts

We certainly know the obstacle is on CrateDB, while it is not a problem for any other database destination covered by ingestr.

However, we didn't see many dependencies on this column name, so we are humbly asking if you could do us a favor to rename that column, so we will not need to find a different workaround in form of a runtime patch. 🙏

References

Unfortunately, CrateDB does not permit column names using leading
underscores. However, if two underscores are included, it works.

The patch improves storage alignment with the other column which gets
materialized as `_kafka__data` also using two underscores.
@amotl amotl marked this pull request as ready for review July 6, 2025 16:13
@amotl
Copy link
Contributor Author

amotl commented Jul 7, 2025

The patch improves storage alignment with the other already existing column _kafka__data, which is already using two underscores in its name.

We think the new field _kafka__msg_id is much better aligned with the others now.

SHOW CREATE TABLE kafka_demo;
CREATE TABLE IF NOT EXISTS "doc"."kafka_demo" (
   "_kafka__partition" BIGINT,
   "_kafka__topic" TEXT,
   "_kafka__offset" BIGINT,
   "_kafka__ts__type" BIGINT,
   "_kafka__ts__value" TIMESTAMP WITH TIME ZONE,
   "_kafka__data" TEXT,
   "_kafka__msg_id" TEXT,
   "__dlt_load_id" TEXT NOT NULL,
   "__dlt_id" TEXT NOT NULL
);

Copy link
Contributor

@karakanb karakanb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a breaking change for people that are using ingestr with kafka. Could this be implemented as an adapter that modifies the data that is only registered for CrateDB?

@amotl
Copy link
Contributor Author

amotl commented Jul 10, 2025

Hi Burak.

This seems like a breaking change for people that are using ingestr with Kafka.

That's true. Let's do it more conservatively to not introduce breaking changes without previously emitting warnings. So, the primary question is: Do you agree with the change in general, in order to align the fields on matters of "naming things"? If so, let's publish this change progressively, like any good software is doing it? I will be happy to bring in corresponding changes to this patch. If you don't like it at all, let's just close the PR.

Could this be implemented as an adapter that modifies the data that is only registered for CrateDB?

Yes. We often use monkeypatching for such matters, only changing code paths in certain situations, optimally by using a context manager so the change doesn't pollute the runtime context permanently, but only within the scope of a specific invocation.

In this case, because CrateDB currently uses a custom dlt destination adapter anyway per dlt-cratedb, we already added a corresponding adjustment per crate/dlt-cratedb@7c14632. In this spirit, we absolutely do not have an immediate need for this change at all, so the case here is effectively just meant as a long-term improvement, which leads to my previous question:

If you agree with the change in general, in order to align the fields on matters of "naming things", let's publish this change progressively, by first bringing in a relevant UserWarning that will educate about an upcoming breaking change, then implementing the change including an option to toggle it back to legacy mode?

With kind regards,
Andreas.

NB: I've also toggled the PR back into draft mode to give it more time to grow the patch into a better shape.

@amotl
Copy link
Contributor Author

amotl commented Jul 10, 2025

[...] then implementing the change including an option to toggle the payload decoder back to legacy mode, so downstream users will always have the option to run ingestr like before and receive the same results.

We would use a dedicated URL parameter on the Kafka source URL to tune the output, when applicable. Silly idea?

@amotl
Copy link
Contributor Author

amotl commented Jul 13, 2025

We submitted a patch that can satisfy new requirements (together with other features) while retaining backwards compatibility.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants