991
DNAddy
(mander.xyz)
A place for majestic STEMLORD peacocking, as well as memes about the realities of working in a lab.

Rules
This is a science community. We use the Dawkins definition of meme.
That sounds like a marker file. It's a bit different than a sequence file.
Molecular markers are linked to specific sequences in the DNA. These markers are generally close by or in the gene of interest. All the extra columns described its characteristics and results. Anyplace in the entire genome where there is one nucleotide difference (polymorphic) can be another marker. There's millions of these and they add up to massive files.
A sequence file is basically just a long boring sequence of nucleotides and are not that large. Now some of the files you use to generate the sequence. Let's just say they had to wait almost 20 years for computers to get fast enough to process those files in a reasonable time. Those make the marker files look like childs play.
I'm not familiar with the name of the file I'm currently working with tbh. It's used to create the annotation files for regenie analyses. It has every variant for every gene within the biobank. There's far more than just missense; there are stop/start gain/loss, splice donor/acceptor, frameshifts, and ptv. It contains primateAI scores, spliceAI scores, cava data, clinvar data, and more.