Skip to content

Read duplication ? #22

Open
Open
@karlkashofer

Description

@karlkashofer

I use your code in a exome pipeline which includes manta.
Manta is super picky about the input bam files and it chokes on one of our current bams with a "Unexpected alignment name collision".

It seems to be caused by a read which is duplicated in the bam file:

`devarea@sx253:~/karl/PathoCromwell/cromwell-executions/Agilent_Exome_Single/43dccd69-b489-4c7b-b92b-bd9d00490356/call-AgilentDedup/execution$ samtools view D2758-8_Trimmed_Mapped_Deduped.bam chr22 | grep "A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT"

A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 99 chr22 22734872 60 100M34S = 22734872 100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,+,95S39M,60,0;
A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 147 chr22 22734872 60 100M34S = 22734872 -100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,-,95S39M,60,0;
A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 147 chr22 22734872 60 100M34S = 22734872 -100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,-,95S39M,60,0;
`

As you can see, the second read is present twice in the bam produced by umicollapse.

If i do the same grep on the input bam before umicollapse i dont see the duplicate read:

`devarea@sx253:~/karl/PathoCromwell/cromwell-executions/Agilent_Exome_Single/43dccd69-b489-4c7b-b92b-bd9d00490356/call-AgilentDedup/execution$ samtools view ../inputs/816700548/D2758-8.bam chr22 | grep "A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT"

A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 99 chr22 22734872 60 100M34S = 22734872 100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,+,95S39M,60,0;
A01664:83:HHTWYDMXY:2:2434:20672:19977_TAAGTCCTGT 147 chr22 22734872 60 100M34S = 22734872 -100 GTGACGTTGGTAGTTATAACCGTGTCTCCTGGTACCAGCAGCCCCCAGGCACAGCCCCCAAACTCATGATTTATGAGGTCAGTAATCGGCCCTCAGGGGTTTGTGCTGTACAAATACAAGCTCCTGCCACGGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFF NM:i:0 MD:Z:100 MC:Z:100M34S AS:i:100 XS:i:72 RG:Z:D2758-8 SA:Z:chrX,43744471,-,95S39M,60,0;
devarea@sx253:~/karl/PathoCromwell/cromwell-executions/Agilent_Exome_Single/43dccd69-b489-4c7b-b92b-bd9d00490356/call-AgilentDedup/execution$ `

Any idea what might have caused that ?
regards,
KK

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions