 +One way to keep track of things is with the file extension. ​
 +It is a way for the user or a program to check to see if the file is the right format. Ultimately, however, the data is still the same regardless of the file extension.
-One way to keep track of things is with the file extension. .txt, .sh, .bash, .c, .gff, fasta, .py are all common file extensions that a genomics user sees on a linux computer. ​They are also //all text files//. The thing that distinguishes them, from the name, is their file extension. Incorrectly labelling a filetype by its extension leads to confusion for the user down the line.+<​code>​ 
 +$ mv bedtools-2.25.0.tar.gz theExtDOESNT.matter 
 +$ file theExtDOESNT.matter  
 +theExtDOESNT.matter:​ gzip compressed data, from Unix, last modified: Wed Sep  2 22:42:14 2015 
 +$ gunzip theExtDOESNT.matter  
 +gzip: theExtDOESNT.matter:​ unknown suffix -- ignored 
 +$ mv theExtDOESNT.matter bedtools-2.25.0.tar.gz 
 +$ gunzip -v bedtools-2.25.0.tar.gz  
 +bedtools-2.25.0.tar.gz:​ 63.2% -- replaced ​with bedtools-2.25.0.tar 
 +So the file extension ​//does// matter-- for programs that make use of itBut the data is unchanged. 
 +==== Some other binary types ==== 
 +===== Executable ===== 
 +$ file /bin/ls 
 +/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=6129e7403942b90574b8c28439d128ff5515efeb,​ stripped 
 +You see that the command //ls// is actually a computer program that is a binary executable.  
 +===== Binary Alignment Map ===== 
 +$ file AR122.bam 
 +AR122.bam: gzip compressed data, extra field 
 +A genomics data file format BAM ([[https://​​content/​early/​2015/​05/​29/​020024|binary format based on samtools Sequence Alignment Map]]) is recognized as having gzip compressed data, but the command doesn'​t know the full data type. 
 +==== Text-based Genomics filetypes ==== 
 +As with binary data, text data can have a more specific format. The extensions: //.txt, .sh, .bash, .c, .gff, fasta, .py// are all common file extensions that a genomics user sees on a linux computer. ​ 
 +It is up to the genomicist and the programs he or she uses to produce/​validate files with the given formats.
 Examples of text formats in genomics. Examples of text formats in genomics.
-Non-text binary files can be data or a program. Data usual has a file extension, such as .bamor .gzwhereas binary programs have no extension (in linux), but have the executable permission bit set+^ Filetype ^ description ^ extension ​^ format definition ^ 
 +| Fasta | DNA/​RNA/​Protein sequence with header | .fasta, .fa | [[http://​​pph/​FASTA.html|link]] | 
 +| Fastq | DNA/​RNA/​Protein sequence with header + quality information | .fastq, .fq | [[http://​​fastq.shtml|link]] | 
 +| GFF   | Gene Feature Format | .gff | [[https://​​FAQ/​FAQformat.html#​format3|link]] |  
 +| BED   | Browser Extensible Data | .bed | [[https://​​FAQ/​FAQformat.html#​format1|link]] | 
 +| SAM   | Sequence Alignment Map | .sam | [[http://​​doc/​sam.html|link]] |
