Skip to content

Projecting indels across gap causes length change and non-reversibility #758

Open
@davmlaw

Description

@davmlaw

I understand a variant growing bigger if the destination reference has an insertion, but shouldn't it be put back when it goes the other way?

original_hgvs = "NM_015120.4(ALMS1):c.36_38dupGGA"

def print_hgvs(sv):
    length = sv.posedit.pos.end - sv.posedit.pos.start
    print(f"hgvs='{sv}' - {length=}")

var_c = parse(original_hgvs)
print_hgvs(var_c)
var_g = c_to_g(var_c)
print_hgvs(var_g)
var_c2 = g_to_c(var_g, var_c.ac)
print_hgvs(var_c2)

Output:

hgvs='NM_015120.4(ALMS1):c.36_38dup' - length=2
hgvs='NC_000002.12:g.73385937_73385942dup' - length=5
hgvs='NM_015120.4:c.72_77dup' - length=5

Normlization?

I noticed that if you normalize this 1st, the problem goes away.

I think this is because normalization shifts the variant away from the gap. But this shouldn't matter? If you do need to normalize before projection then perhaps we should automatically do this or raise a warning or error if not normalized?

var_c_orig = parse(original_hgvs)
var_c = normalize(var_c_orig)
print(f"Normalized: {var_c_orig} => {var_c}")
print_hgvs(var_c)
var_g = c_to_g(var_c)
print_hgvs(var_g)
var_c2 = g_to_c(var_g, var_c.ac)
print_hgvs(var_c2)

Output:

Normalized: NM_015120.4(ALMS1):c.36_38dup => NM_015120.4:c.75_77dup
hgvs='NM_015120.4:c.75_77dup' - length=2
hgvs='NC_000002.12:g.73385940_73385942dup' - length=2
hgvs='NM_015120.4:c.75_77dup' - length=2

Note - while searching issues I found discussion about alignment gaps (on this transcript!) on #514

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions