Skip to content

Clarify that key uniqueness depends only on binary representation, recommend normalization #966

@SnoopJ

Description

@SnoopJ

I've just learned about #891 and I'm excited to see that the TOML specification is improving Unicode support.

Do I understand right that this changeset makes no recommendations for implementers when it comes to equivalence of keys? I see a note on normalization on the related issue, but if I understand the PR correctly, keys that are "equivalent" under one of the normalization forms of UAX#15 will be distinct under the specification unless their binary representations are identical.

That note suggests that a warning/suggestion for implementers might be added to the spec, but it looks like that never happened. This issue is a request that such a note be added to at least make implementers (and users of parsers that don't bother with normalization) aware of the potential confusion of keys, as in the example of ñaña (NFC form, 6 bytes in UTF-8) vs. ñaña (NFKD form, 8 bytes in UTF-8).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions