# Computational biology at CSU

## DSCI 512: RNA-seq

### Questions?

wiki:curtain_bin_vs_txt

This is an old revision of the document!

# What is text format versus binary?

Isn't EVERYTHING binary?

The answer is in character encodings. Everything on a computer is binary, but programs that read text know to map the numbers represented by a text file to certain characters.

# How is text binary?

Well, programs that read text really read binary, but know to display it as text.

Let's say that I wanted to know what binary numbers represent the text `Hey Mom!`

That representation is called a character encoding, and we are using ASCII.

From the table on that page, I know that 72, 101 and 121 respectively encode the letters ``` H```, `e`, `y`.

We can use python to verify that:

```\$ python -c 'print chr(72), chr(101), chr(121)'
H e y```

But how is this binary? Numbers like 72 are decimals, that is, they use a base 10 system. The base 2 representation (the binary string) can also be gotten with python, this time using string formatting.

```\$ python -c 'print "{0:b}".format(72)'
1001000```

The string of 1's and 0's isn't very useful to humans. But humans have other utilities to work with binary. One of them is “hexdump”, let's use hexdump to decode that character string `Hey Mom!`

```\$ echo -ne 'Hey Mom!' | hexdump -C
00000000  48 65 79 20 4d 6f 6d 21                           |Hey Mom!|
00000008```
output 'splain
`00000000` starting position
`48 65 79 20 4d 6f 6d 21` hexadecimal numbers corresponding to `H e y (sp) M o m !`
`|Hey Mom!|` the ASCII encoding
`00000008` end position

It's easier to read binary numbers as hexadecimal (base 16), so that's how they are displayed by hexdump by default.

Here's a way to show the decimals with the hexdump command:

```\$ echo -ne 'Hey Mom!' | hexdump -v -e '4/1 "%d " " =|"' -e '4/1 "%_p|" "\n"'
72 101 121 32 =|H|e|y| |
77 111 109 33 =|M|o|m|!|```

Now we know how to write “Mom!” with the numbers 77, 111, 109, and 33. Let's use the python built-in function “chr” again, this time to write the full string.

```\$ python -c 'print chr(72), chr(101), chr(121), chr(32)'
H e y
\$ python -c 'print chr(77), chr(111), chr(109), chr(33)'
M o m !
\$ python -c 'print chr(72), chr(101), chr(121), chr(32), chr(77), chr(111), chr(109), chr(33)'
H e y   M o m !```

TA-DA!