User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
wiki:curtain_bin_vs_txt [2018/07/27 14:06]
david [What the text viewer displays (ASCII character set)]
wiki:curtain_bin_vs_txt [2018/07/27 16:36] (current)
david [Some other binary types]
Line 70: Line 70:
 </​code>​ </​code>​
 +One way to keep track of things is with the file extension. ​
 +It is a way for the user or a program to check to see if the file is the right format. Ultimately, however, the data is still the same regardless of the file extension.
-One way to keep track of things is with the file extension. .txt, .sh, .bash, .c, .gff, fasta, .py are all common file extensions that a genomics user sees on a linux computer. ​They are also //all text files//. The thing that distinguishes them, from the name, is their file extension. Incorrectly labelling a filetype by its extension leads to confusion for the user down the line.+<​code>​ 
 +$ mv bedtools-2.25.0.tar.gz theExtDOESNT.matter 
 +$ file theExtDOESNT.matter  
 +theExtDOESNT.matter:​ gzip compressed data, from Unix, last modified: Wed Sep  2 22:42:14 2015 
 +$ gunzip theExtDOESNT.matter  
 +gzip: theExtDOESNT.matter:​ unknown suffix -- ignored 
 +$ mv theExtDOESNT.matter bedtools-2.25.0.tar.gz 
 +$ gunzip -v bedtools-2.25.0.tar.gz  
 +bedtools-2.25.0.tar.gz:​ 63.2% -- replaced ​with bedtools-2.25.0.tar 
 +So the file extension ​//does// matter-- for programs that make use of itBut the data is unchanged. 
 +==== Some other binary types ==== 
 +===== Executable ===== 
 +$ file /bin/ls 
 +/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=6129e7403942b90574b8c28439d128ff5515efeb,​ stripped 
 +You see that the command //ls// is actually a computer program that is a binary executable.  
 +===== Binary Alignment Map ===== 
 +$ file AR122.bam 
 +AR122.bam: gzip compressed data, extra field 
 +A genomics data file format BAM ([[https://​​content/​early/​2015/​05/​29/​020024|binary format based on samtools Sequence Alignment Map]]) is recognized as having gzip compressed data, but the command doesn'​t know the full data type. 
 +==== Text-based Genomics filetypes ==== 
 +As with binary data, text data can have a more specific format. The extensions: //.txt, .sh, .bash, .c, .gff, fasta, .py// are all common file extensions that a genomics user sees on a linux computer. ​ 
 +It is up to the genomicist and the programs he or she uses to produce/​validate files with the given formats.
 Examples of text formats in genomics. Examples of text formats in genomics.
-Non-text binary files can be data or a program. Data usual has a file extension, such as .bamor .gzwhereas binary programs have no extension (in linux), but have the executable permission bit set+^ Filetype ^ description ^ extension ​^ format definition ^ 
 +| Fasta | DNA/​RNA/​Protein sequence with header | .fasta, .fa | [[http://​​pph/​FASTA.html|link]] | 
 +| Fastq | DNA/​RNA/​Protein sequence with header + quality information | .fastq, .fq | [[http://​​fastq.shtml|link]] | 
 +| GFF   | Gene Feature Format | .gff | [[https://​​FAQ/​FAQformat.html#​format3|link]] |  
 +| BED   | Browser Extensible Data | .bed | [[https://​​FAQ/​FAQformat.html#​format1|link]] | 
 +| SAM   | Sequence Alignment Map | .sam | [[http://​​doc/​sam.html|link]] |
wiki/curtain_bin_vs_txt.1532722010.txt.gz · Last modified: 2018/07/27 14:06 by david