User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
wiki:curtain_bin_vs_txt [2018/07/27 15:52]
david [Metadata and the file extension]
wiki:curtain_bin_vs_txt [2018/07/27 16:36] (current)
david [Some other binary types]
Line 85: Line 85:
 </​code>​ </​code>​
-==== old ====+So the file extension //does// matter-- for programs that make use of it. But the data is unchanged.
 +==== Some other binary types ====
-DELETEME+===== Executable ===== 
 +$ file /bin/ls 
 +/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=6129e7403942b90574b8c28439d128ff5515efeb,​ stripped 
-One way to keep track of things is with the file extension. .txt, .sh, .bash, .c, .gff, fasta, .py are all common file extensions that a genomics user sees on a linux computer. ​They are also //all text files//. The thing that distinguishes them, from the name, is their file extension. Incorrectly labelling a filetype by its extension leads to confusion for the user down the line.+You see that the command //ls// is actually a computer program that is a binary executable.  
 +===== Binary Alignment Map ===== 
 +file AR122.bam 
 +AR122.bam: gzip compressed data, extra field 
 +A genomics data file format BAM ([[https://​​content/​early/​2015/​05/​29/​020024|binary format based on samtools Sequence Alignment Map]]) is recognized as having gzip compressed data, but the command doesn'​t know the full data type. 
 +==== Text-based Genomics filetypes ==== 
 +As with binary data, text data can have a more specific format. The extensions: //.txt, .sh, .bash, .c, .gff, fasta, .py// are all common file extensions that a genomics user sees on a linux computer. ​ 
 +It is up to the genomicist and the programs he or she uses to produce/​validate files with the given formats.
 Examples of text formats in genomics. Examples of text formats in genomics.
-Non-text binary files can be data or a program. Data usual has a file extension, such as .bamor .gzwhereas binary programs have no extension (in linux), but have the executable permission bit set+^ Filetype ^ description ^ extension ​^ format definition ^ 
 +| Fasta | DNA/​RNA/​Protein sequence with header | .fasta, .fa | [[http://​​pph/​FASTA.html|link]] | 
 +| Fastq | DNA/​RNA/​Protein sequence with header + quality information | .fastq, .fq | [[http://​​fastq.shtml|link]] | 
 +| GFF   | Gene Feature Format | .gff | [[https://​​FAQ/​FAQformat.html#​format3|link]] |  
 +| BED   | Browser Extensible Data | .bed | [[https://​​FAQ/​FAQformat.html#​format1|link]] | 
 +| SAM   | Sequence Alignment Map | .sam | [[http://​​doc/​sam.html|link]] |
wiki/curtain_bin_vs_txt.1532728368.txt.gz · Last modified: 2018/07/27 15:52 by david