User Tools

Site Tools


wiki:curtain_bin_vs_txt

This is an old revision of the document!


What is text format versus binary?

Isn't EVERYTHING binary?

The answer is in character encodings. Everything on a computer is binary, but programs that read text know to map the numbers represented by a text file to certain characters.

Even text is binary

All things in a modern computer are binary. Each program that reads binary data must know what to do with it. That's true for text viewers, like Microsoft Word, nano, and even the terminal itself.

Take the following excerpt from a famous play:

What the text viewer sees

01000001 01101100 01100001 01110011 00100001 00100000 01110000 01101111 01101111 01110010
00100000 01011001 01101111 01110010 01101001 01100011 01101011 00101110 00100000 01001001
00100000 01101011 01101110 01100101 01110111 00100000 01101000 01101001 01101101 00101100
00100000 01001000 01101111 01110010 01100001 01110100 01101001 01101111 00111011 00100000
01100001 00100000 01100110 01100101 01101100 01101100 01101111 01110111 00100000 01101111
01100110 00100000 01101001 01101110 01100110 01101001 01101110 01101001 01110100 01100101
00100000 01101010 01100101 01110011 01110100 00101100 00100000 01101111 01100110 00100000
01101101 01101111 01110011 01110100 00100000 01100101 01111000 01100011 01100101 01101100
01101100 01100101 01101110 01110100 00100000 01100110 01100001 01101110 01100011 01111001
00111011 00100000 01101000 01100101 00100000 01101000 01100001 01110100 01101000 00100000
01100010 01101111 01110010 01101110 01100101 00100000 01101101 01100101 00100000 01101111
01101110 00100000 01101000 01101001 01110011 00100000 01100010 01100001 01100011 01101011
00100000 01100001 00100000 01110100 01101000 01101111 01110101 01110011 01100001 01101110
01100100 00100000 01110100 01101001 01101101 01100101 01110011 00111011 00100000 01100001
01101110 01100100 00100000 01101110 01101111 01110111 00101100 00100000 01101000 01101111
01110111 00100000 01100001 01100010 01101000 01101111 01110010 01110010 01100101 01100100
00100000 01101001 01101110 00100000 01101101 01111001 00100000 01101001 01101101 01100001
01100111 01101001 01101110 01100001 01110100 01101001 01101111 01101110 00100000 01101001
01110100 00100000

But how is this binary? Numbers like 72 are decimals, that is, they use a base 10 system. The base 2 representation (the binary string) can also be gotten with python, this time using string formatting.

$ python -c 'print "{0:b}".format(72)'
1001000

The string of 1's and 0's isn't very useful to humans. But humans have other utilities to work with binary. One of them is “hexdump”, let's use hexdump to decode that character string Hey Mom!

$ echo -ne 'Hey Mom!' | hexdump -C
00000000  48 65 79 20 4d 6f 6d 21                           |Hey Mom!|
00000008
output 'splain
00000000 starting position
48 65 79 20 4d 6f 6d 21 hexadecimal numbers corresponding to H e y (sp) M o m !
|Hey Mom!| the ASCII encoding
00000008 end position

It's easier to read binary numbers as hexadecimal (base 16), so that's how they are displayed by hexdump by default.

Here's a way to show the decimals with the hexdump command:

$ echo -ne 'Hey Mom!' | hexdump -v -e '4/1 "%d " " =|"' -e '4/1 "%_p|" "\n"'
72 101 121 32 =|H|e|y| |
77 111 109 33 =|M|o|m|!|

Now we know how to write “Mom!” with the numbers 77, 111, 109, and 33. Let's use the python built-in function “chr” again, this time to write the full string.

$ python -c 'print chr(72), chr(101), chr(121), chr(32)'
H e y
$ python -c 'print chr(77), chr(111), chr(109), chr(33)'
M o m !
$ python -c 'print chr(72), chr(101), chr(121), chr(32), chr(77), chr(111), chr(109), chr(33)'
H e y   M o m !

TA-DA!

wiki/curtain_bin_vs_txt.1532678081.txt.gz · Last modified: 2018/07/27 01:54 by david