User Tools

Site Tools


wiki:curtain_bin_vs_txt

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
wiki:curtain_bin_vs_txt [2018/07/27 16:07]
david [Some other binary types]
wiki:curtain_bin_vs_txt [2018/07/27 16:36] (current)
david [Some other binary types]
Line 89: Line 89:
 ==== Some other binary types ==== ==== Some other binary types ====
  
 +===== Executable =====
 <​code>​ <​code>​
 $ file /bin/ls $ file /bin/ls
Line 96: Line 97:
 You see that the command //ls// is actually a computer program that is a binary executable. ​ You see that the command //ls// is actually a computer program that is a binary executable. ​
  
 +===== Binary Alignment Map =====
 <​code>​ <​code>​
 $ file AR122.bam $ file AR122.bam
Line 105: Line 106:
 A genomics data file format BAM ([[https://​www.biorxiv.org/​content/​early/​2015/​05/​29/​020024|binary format based on samtools Sequence Alignment Map]]) is recognized as having gzip compressed data, but the command doesn'​t know the full data type. A genomics data file format BAM ([[https://​www.biorxiv.org/​content/​early/​2015/​05/​29/​020024|binary format based on samtools Sequence Alignment Map]]) is recognized as having gzip compressed data, but the command doesn'​t know the full data type.
  
-==== old ====+==== Text-based Genomics filetypes ​====
  
 +8-)
  
-DELETEME+As with binary data, text data can have a more specific format. The extensions: //.txt, .sh, .bash, .c, .gff, fasta, .py// are all common file extensions that a genomics user sees on a linux computer. ​
  
-One way to keep track of things is with the file extension. .txt, .sh, .bash, .c, .gff, fasta, .py are all common file extensions that a genomics user sees on a linux computer. They are also //all text files//. The thing that distinguishes them, from the name, is their file extension. Incorrectly labelling a filetype by its extension leads to confusion for the user down the line.+It is up to the genomicist and the programs he or she uses to produce/validate ​files with the given formats.
  
 Examples of text formats in genomics. Examples of text formats in genomics.
  
-Non-text binary files can be data or a program. Data usual has a file extension, such as .bamor .gzwhereas binary programs have no extension (in linux), but have the executable permission bit set+^ Filetype ^ description ^ extension ​^ format definition ^ 
 +| Fasta | DNA/​RNA/​Protein sequence with header | .fasta, .fa | [[http://​genetics.bwh.harvard.edu/​pph/​FASTA.html|link]] | 
 +| Fastq | DNA/​RNA/​Protein sequence with header + quality information | .fastq, .fq | [[http://​maq.sourceforge.net/​fastq.shtml|link]] | 
 +| GFF   | Gene Feature Format | .gff | [[https://​genome.ucsc.edu/​FAQ/​FAQformat.html#​format3|link]] |  
 +| BED   | Browser Extensible Data | .bed | [[https://​genome.ucsc.edu/​FAQ/​FAQformat.html#​format1|link]] | 
 +| SAM   | Sequence Alignment Map | .sam | [[http://​www.htslib.org/​doc/​sam.html|link]] |
  
  
  
wiki/curtain_bin_vs_txt.1532729220.txt.gz · Last modified: 2018/07/27 16:07 by david