This is an old revision of the document!
Before we can start to manipulate genomics datasets, we need to know how to obtain the data. Even if you are generating your own data, you will almost certainly need reference genome sequences and genome annotation information to guide the analyses of your data. A common challenge for biologists is obtaining reference data and manipulating it so that it can be used in their analyses. Linux tools are incredibly useful in this regard.
Some Common Features of a Computer/Server
Accessing servers via the command line (ssh and sftp)
$ ssh username@server_address
A common method for downloading data from a public server (ftp)
ftp….file transfer protocol.
$ ftp ftp.someaddress.org
Transferring data between computers/servers (scp)
scp….secure copy protocol.
$ scp /path/to/local/file username@hostname:/path/to/remote/file
Run multiple sessions in one window or keep processes running on a remote server without a job scheduler (GNU screen)
*See UNIX cheat sheet and exercises for additional information.