fastq processing

Hi, Thanks for a quick tool. 
I've been using this to UMICollapse my bam files. 
Now I want to utilize it on fastq files.
I am confused by the statement: 

"fastq: the input is a FASTQ file. This deduplicates the entire FASTQ file based on each entire read sequence. Note that the "UMI" would be the entire read sequence."

Does this mean that the UMIs are collapsed and the actual reads are not looked at? 
For perspective: I have 100M read depth with UMI of 12 nt length. I want to be sure there is no collapsing of reads that are not identical but by random chance have same UMIs. Can you elaborate?

I am also curious about the --tag option. 
I am looking for miRNAs, and my plan is that after collapsing of the UMIs, I would NOT use the --tag option, and instead proceed directly to fastx collapser to get an abundance table. 
For this I reckon I don't need to know how many UMIs were part of one group, or? 

I hope you can find the time to give me some input. 

I plan on implementing this tool in my future UMI analyses at either fastq and bam level. 

Cheers, 
Maibritt
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fastq processing #17

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

fastq processing #17

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions