Skip to content

Feature request - Is there potential to support bgzip? #92

Open
@pettyalex

Description

@pettyalex

This is a clever solution to get the decompression / compression into a separate process, very helpful little tool. This pattern would work well for a couple additional compression tools that my group uses. I'm going to take a look at adding support for these formats, which should be pretty straightforward.

BGZF, or "blocked gzip" is a format that's used pretty widely in bioinformatics, it's basically a lot of gzipped files concatenated together, with some extra info in the headers and an index in a separate file saying where to seek. It's decompressible by normal gzip, so we actually see bgzf files as .gz more often than .bgz. It'd be really great to be able to compress bgz files with xopen as well. The blocked gzip reference implementation is distributed with htslib as a binary called bgzip, and is available both from conda and most linux distros native packages (tabix on Ubuntu, for example).

Also, it'd be great to see this support zstd as well, which is just an excellent general purpose compression tool that I expect to rapidly grow in usage in the next few years.

Edit: To be clear, both of these tools are already usable from Python, there's a bgzip implementation here, and zstd has excellent Python bindings available, but getting the compression into another process like xopen does makes for much better performance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions