Code+Repository

=Helpful Code for Various Computing and Programming Platforms= toc

//**Rename columns in a table**//
- Use "Edit Dataset" button in SQLShare - After executing the query, press the "Update Dataset" button below the query window. - Use the "as" function.

Example: code format="sql" SELECT [table1].column1 AS newcolumnname1, [table1].column2 as newcolumnname2 FROM [table1] code Code explanation:

SELECT: Standard SQL to identify the source information you want to work with. [table1]: This is the name of your source table. .column1: Specifies which column in [table1] to act upon. Note: This is the actual name of the column (e.g. if the column was named "SeqID", you'd have .SeqID; not .column1) as: Instructs SQL to rename specified column from [table1] newcolumnname1: Assigns your desired name to the column. ,: Allows you to separate multiple commands. Can be used extensively to allow for multiple commands in a single line. In this example, it is used to rename two columns in [table1]. FROM: Standard SQL to identify location of source data.

Practical application of the "as" function: Example: code format="sql" SELECT GeneCGS+CDS_CGs as Total_CGs FROM Stats_mRNA_CGs

code Code explanation:

This **//adds, mathematically, the numerical content of two columns//** (GeneCGS and CDS_CGs) that exist in the Stats_mRNA_CGs table and names the resulting column "Total_CGs".

//Count the number of terms in a column//
code format="sql" SELECT column_name, count(column_name) FROM your_table_name GROUP BY column_name code Code explanation:

SELECT: Standard SQL to identify the source information you want to work with. column_name: Name of the column with the terms in it that you would like to count. count(column_name): For our purposes, this will count the number of same terms in the specified column. However, the function is actually counting the number of rows in the table that contain the same terms in the specified column. FROM: Standard SQL to identify location source of data. GROUP BY: Groups the data by the terms in the specified column.

//Conventional Joining of tables (Keeping everything in first table)//
code format="sql" SELECT * FROM [sr320@washington.edu].[table_QPX_Experiment_UniqueReads_simple.txt] Left JOIN [sr320@washington.edu].[QPX_CLC_experiment_Kal_tab.txt] ON [sr320@washington.edu].[table_QPX_Experiment_UniqueReads_simple.txt]."Feature ID"=[sr320@washington.edu].[QPX_CLC_experiment_Kal_tab.txt]."Feature ID"

code Notes:

Single file upload code format="python" python singleupload.py sr320@washington.edu "API key" /Users/sr320/Desktop/J_contignetworks.txt Jaketable code Notes: "API key" is a unique alpha-numeric code. more at http://escience.washington.edu/get-help-now/sqlshare-python-api
 * SQL Share Python API**

Restarting CLC Server (bigfish)
code cd /Volumes/bigfishRAID/common/CLCGenomicsServer ./CLCGenomicsServer status ./CLCGenomicsServer start ./CLCGenomicsServer status code

Count the number of lines in a file
code $wc -l < /path/to/file code

Explanation: wc = Terminal command for "word count" -l = Flag to specify line count < = Redirect your file as the input data source for the "wc" command

4/23/2014 user:kubu4

Count the number of instances/occurrences of items in a column in a file
code $cut -f1 | sort | uniq -c code

Explanation:

cut - Command to apply options to the designated field (i.e. column). -f1 - -f1 = specifies which field (i.e. column) to work on. Change the number to specify your desired column. Or, you can even specify a range of columns to work on (e.g. -f2-6) "|" - Pipe which sends the results of the previous command ("cut" in this example) to another command. sort - Sorts the info from the "cut" command in ascending order. "|" - Piper which sends the results of the previous command ("sort" in this example) to another command. uniq -c - Counts the number of occurrences of each unique "word" in the specified column(s). (Note: Technically "uniq -c" is counting the occurrence of each unique //line// in the specified column, not the actual "words." It is this reason (that uniq occurrences are tallied by line) that you use the cut command to have the uniq command focus on a single column from the source file.

6/23/2014 user:kubu4

Moving multiple files from multiple subdirectories up one level into a single folder.
Example code is below which will move any files with the extension ".sra" in any subdirectories to a new, single directory. An explanation of how it works after the code. code find /volume1/web/trilobite/Crassostrea_gigas_HTSdata/SRP014 -iname '*.sra' -exec mv '{}' /volume1/web/trilobite/Crassostrea_gigas_HTSdata/SRP014 \; code

Directory listing of all files in a folder on a website
Add a file called ".htaccess" to any directory where a public directory listing is desired. To the .htaccess file, add " Options +Indexes".

This can be accomplished by navigating to the desired directory and entering the following into Terminal:

echo "Options +Indexes" > .htaccess

Code explanation: echo - Normally used to print some text to the screen. But, used in this context, the text that is supposed to be printed to the screen gets written to our file. "Options +Indexes" - The text contained in the quotations will be written to the desired file. "Options +Indexes" is an instruction for Apache (a common server software) to display an index of all files in the directory. > - This is the write-to-file instruction. .htaccess - The name of the file that you want your echo command to write to. If the file name does not already exist, this file will be created.

Download all files from a website to a location on Eagle
This example is how to download all high-throughput sequencing files from UW HTGU directly to Eagle (NOTE: This will not work natively in Windows, as Windows doesn't have a SSH client. Download and install Cygwin or OpenSSH to gain SSH functionality in Windows). 1. SSH into Eagle - Open Terminal code ssh root@Eagle.IP.address code - The first time you use this, you will be asked if you trust this IP address. Enter: yes - Enter admin account password when prompted

2. Change to desired working directory under "volume1" directory. Starting directory is "root/". Will have to go up a level and then change to "volume1" directory.

3. Download all files in the specific directory on the UW HTGU server. - Note: Your file location will differ from the example code listed below.

Example (download all data from a folder from HTGU, requiring a username and password): code wget -r --http-user=HTGUusername --http-password=HTGUpassword --no-check-certificate --reject="index.html*" --secure-protocol=auto -np -e robots=off --wait=2 https://data.htseq.org/UW_White/FCC1FB1/ code Code explanation:

wget: Program on Eagle that allows remote file retrieval -r: Instructs wget to download recursively, meaning all folders/files underneath the final directory in the URL will be downloaded. --http-user=: Allows you to enter the user name to log on to the HTGU server --http-password=: Allows you to enter the password to log on to the HTGU server --no-check-certificate: Tells wget not to check the server's certificate, although it still seems to check it and let you know that the certificates don't match. --reject="index.html*": Tells wget to skip any index.html files. We don't need these. --secure-protocol=auto: Since the HTGU site is https (the 's' stands for "secure"), this tells wget to automatically determine which type of security is in use. -np: Stands for "no parent." This tells wget to only go down through directories and not go up. This prevents wget from potentially downloading every single file above the specified directory. -e robots=off: Tells wget to ignore the "robots.txt" file on the HTGU server. The "robots.txt" file prevents webcrawlers from using too many server resources, which wget normally respects. --wait=2: Tells wget to wait 2 seconds between each file download request. This is simply to ease the load on the HTGU servers, since we're initiating such a large download task. https://data.htseq.org/HTGUusername/yourdatafolder/: Tells wget where the data is that you want to download.

Note: Despite specifying recursive downloading, directory structure is NOT maintained. All files are downloaded, but are not contained in any folders.

- 3/20/2013 user:kubu4

Copy Files from/to Locations Using SSH (Terminal)
(NOTE: This will not work natively in Windows, as Windows doesn't have a SSH client. Download and install Cygwin or OpenSSH to gain SSH functionality in Windows).

Copy something from this system to some other system: code scp /path/to/local/file username@hostname:/path/to/remote/file

code

Copy something from some system to some other system: code scp username1@hostname1:/path/to/file username2@hostname2:/path/to/other/file

code

Copy something from another system to this system: code scp username@hostname:/path/to/remote/file /path/to/local/file code

- 12/31/2013 user:kubu4

Set Up SSH Keys for More Secure SSH to Hummingbird(Terminal)
For any SSH connection to Humminbird (or any server), SSH keys should be used.

Instructions are for Macintosh OS X 1. Generate your SSH keys. In Terminal, paste the following: code ssh-keygen -t rsa code Press "Enter" at both prompts for a passphrase. Note: You can certainly enter a passphrase if desired, but the passphrase is only needed in instances where someone has obtained control of your laptop/desktop that you use to connect to Hummingbird. By not entering a passphrase, you obtain the luxury of being able to use SSH without the need to type a password.

2. Obtain a script to simplify copying your SSH key to the server. In Terminal, paste the following: code sudo curl https://raw.github.com/beautifulcode/ssh-copy-id-for-OSX/master/ssh-copy-id.sh -o /usr/local/bin/ssh-copy-id -k code

3. Change permissions on the "ssh-copy-id" script to make it executable. In Terminal, paste the following: code sudo chmod +x /usr/local/bin/ssh-copy-id code

4. Copy your SSH key to the server using the "ssh-copy-id" script. In Terminal, paste the following: code ssh-copy-id username@Hummingbird's.IP.address code You'll be prompted to enter your password for your Hummingbird user account.

5. Test it out. In Terminal, connect to Humminbird using SSH.

-3/17/2014 user:kubu4

View md5 checksum hash (in OSX Terminal)
A md5 checksum hash is a unique "fingerprint" for each and every file. This hash (which is a string of characters) is used to compare file integrity after transferring a file to a new location (or checking file integrity before and after compression). If the file has successfully transferred (i.e. has not been corrupted), the md5 hash will be the same as it was before the file was transferred.

Before copying/moving a file (particularly large files): code $md5 /original/path/to/file code

After copying/moving your file: code $md5 /new/path/to/file code

Compare the hashes provided from both before and after moving the files. If the hashes do NOT match, then the file was modified during transfer and the file should be transferred again.

6/18/2014 user:kubu4

Create and compare md5 checksum hash files (in OSX Terminal)
The following step is performed twice: once before transferring files and again once the files are finished transferring. It is likely helpful to transfer your pre-transfer md5 file when you transfer your files. After transferring, make a new md5 file (with a different name than your pre-transfer md5 file!) from the files that were transferred.

Create a file containing md5 hashes of multiple files code $md5 file1 file2 file3 >> YourDesiredFileName.md5 code

Code explanation:

md5 - Program to generate hashes file1 file2 file3 - The files you want hashes generated from. Can be any number of files. ">>" - Directs the output of md5 to new file and appends each file's hash to the new file. YourDesiredFileName.md5 - The file name you want to give. Keep the ".md5" suffix for easy file identification.

Now, compare pre- and post-transfer md5 hash files code $diff PreTransferMD5hashes.md5 PostTransferMD5hases.md5 code

Code explanation: diff - Compares files and lists any lines in the files that have differences PreTransferMD5hashes.md5 - The name of the md5 file containing the hashes of the files pre-transfer. PostTransferMD5hases.md5 - The name of the md5 file containing the hashes of the files post-transfer.

If there are any differences in the hashes in the two md5 files, those files were corrupted during transfer and should be retransferred.

6/20/2014 user:kubu4

Create gzipped tarball file (in Terminal)
This will combine any number of files/directories into a single file (tar) and then compress (gzip) that file to make it smaller. code $tar -cvzf YourDesiredFileName.tar.gz file1 file2 file3 code

Code explanation: tar - Name of program that combines all files/directories listed into a single file (.tar) -cvzf - c = create a new archive, v = Verbose to list files being added to archive, z = gzip, f = file archive YourDesiredFileName.tar.gz - User's desired file name for the tarball gzip file that will be created. Keep the ".tar.gz" part of the file name! file1 file2 file3 - The list of files and/or directories that you want to include in the tarball. This is not limited to only 3 files/directories.

6/18/2014 user:kubu4

Decompress and extract tarball gzipped file (in Terminal)
This will perform two functions simulataneously - Unzip (decompress) the tarball file - Extract the contents of the tarball file. code $tar -zxvf filename.tar.gz code

Code explanation: tar - Name of program to use. -zxvf - z = Work on gzip compression automatically when reading compressed files, x = Extract archive, v = Verbose to list files extracted, f = Read the archive file specified filename.tar.gz - name of the tarball gzipped file that you want to extract

6/20/2014 user:kubu4

View files contained in a gzipped tarball file (in Terminal)
code $tar -tvf filename.tar.gz code

Code explanation: tar - Name of program to use. -tvf - t = Table of file contents, v = Verbose to list files in archive, f = Read the archive file specified filename.tar.gz - Name of the gzipped tarball file whose content you'd like to view.

6/20/2014 user:kubu4

Append (add) files to an existing tarball file (in Terminal)
code $tar -rf path/to/tarball/file.tar path/to/file/to/add code

NOTE: If the tarball is gzipped, you will have to unzip the tarball, append your file, then re-zip the tarball. 9/4/2014 user:kubu4

Decompress (unzip) a gzipped file (in Terminal)
code $gunzip file.gz code 9/4/2014 user:kubu4

Compress (gzip) a file (in Terminal)
code $gzip filename code 9/4/2014 user:kubu4

**Count Number of Sequences in a FASTA File**
code $awk '/>/ { count++ } END { print count }' path/to/InputFastFile.fasta code

Code explanation: code />/ code - Tells awk to look for whatever regular expression is between the two slashes. In this instance, a greater-than symbol (which identifies a new sequence entry in a FASTA file).

code { count++ } code - Tells awk to assign the variable called "count" to the results of counting ("++") all of the greater-than symbols it encounters

code END{ print count } code - Tells awk to print the variable called "count", which is the total number of ">" in our input FASTA file

NOTE: If the sequence IDs of your FASTA file also contain ">", the one-liner listed above will report an inaccurate result. However, as FASTA formatting goes, a ">" should never appear in the file other than as a sequence ID indicator.

3/31/2014 user:kubu4

**Filter FASTA File by Minimum Sequence Length**
Just change the number "200" in the code below to your desired minimum sequence length. code $ awk '!/^>/ { next } { getline seq } length(seq) >= 200 { print $0 "\n" seq }' /path/to/InputFastaFile.fasta > /path/to/OutputFastaFile.fasta code Code explanation: code !/^>/ {next } code – If a line (i.e. record) begins with a “>”, go to the next line (record). The "!" tells awk to skip the regular expression that immediatley follows. The "^" tells awk that the regular expression it's looking for should only match if it's at the beginning of a line. Finally, the regular expression we're looking for in this example is the ">", which denotes the sequence descriptor portion of FASTA files. code { getline seq } code – “getline” reads the next record and assigns the entire record to a variable called “seq” code length(seq) >=200 code – If the length of the “seq” record is greater than, or equal to, 200 then… code {print $0 "\n" seq>} code – Print all records ($0) of the variable “seq” in the file that matched our conditions, each on a new line (“\n”)

3/31/2014 user:kubu4

__**sed One-Liners**__
This is the same as Find and Replace in programs like Microsoft Word, but will run on files that are too large to be opened with such programs.
 * Basic Substitutions in a File**

code $sed 's/text_you_want_replaced/replacement_text/g' path/to/source_file > path/to/destination_file code

Code explanation:

- "s" invokes the substitute command in sed - "g" tells sed to apply the substitute command globally on each line. Without the "g" argument, sed will only apply your substitute command to the first instance it encounters on each line that contains your "text_you_want_replaced".

NOTE: This command is case sensitive and will only match EXACLTY what you enter in as the "text_you_want_replaced". If you need more flexibility (e.g. having sed find variations like upper- and lowercase text), it exists, but is a bit too in-depth to go into here.

5/20/2014 user:kubu4

Audible Notification of Job Completion (Terminal)
After typing in a Terminal command (and before hitting 'Enter') add the following:
 * say "Text that you want to have the computer read aloud to notify you that the job is finished"

Code explanation: say - The speech-to-text command for Terminal "text between quotes" - This is the text that you want to be read when your job is finished.
 * - Semicolon separates the "say" command from the previous command.

- 11/8/2013 user:kubu4

Show Hidden Files in Mac OSX 10.8, 10.7 and 10.6 (Terminal)
In Terminal enter code defaults write com.apple.Finder AppleShowAllFiles True code

After pressing enter, the Finder needs to be restarted for the changes to take place. In Terminal type: code killall Finder code

To revert back, simple repeat from the first step, but change the word "True" in the first entry to "False". Remember to follow up with the killall Finder command to restart the Finder! user:kubu4 3/5/2014

Start IPython in a web browser (via Terminal or Command Line):
code ipython notebook code Or, type this URL into a browser: http://127.0.0.1:8888 user:kubu4 10/25/2013

Start a specific notebook in a web browser (via Terminal or Command Line):
code ipython notebook /path/to_your/notebook.ipynb code NOTE: By saving your notebook files to Dropbox, you will have access to all your notebooks on any computer with Dropbox installed. NOTE: Default notebook save location (on Mac OSX) is: /Users///Username///Notebook_name.ipynb. user:kubu4 10/25/2013

Run IPython on a remote computer (e.g. Hummingbird)
The following instructions are for Linux or OS X

Tunneling into Hummingbird 1. Open Terminal. 2. Enter code ssh -N -L localhost:8000:localhost:7000 srlab@Hummingbird.IP.address code Note: If it's your first SSH session into Hummingbird, you may be prompted with a message. Answer "yes" to the message. After that, you may have to start the SSH session again. Note: If you get a message that port 8000 is already in use, feel free to change to a different port (i.e. any number greater than 8000, but less than 65000) and try again. 3. Enter the srlab password on Hummingbird Note: You can change "srlab" to your personal login on Hummingbird, if you so desire. 4. Terminal should look like it's active at this point (i.e. it does NOT return the command prompt)

The above steps have created a tunnel into Hummingbird.

Start IPython on Hummingbird 1. Open a new Terminal window. Do NOT close the previous window. We need it open to maintain that tunnel. 2. Enter code ssh srlab@Hummingbird.IP.address code 3. Enter the srlab password for Hummingbird Note: You should login to the same account as specified in the tunnel instructions above (e.g. srlab and srlab). Note: You should notice that your command prompt now has "Hummingbird" in it, indicating you are now inside Hummingbird. Note: You may wish to change directories to the directory on Hummingbird that you want IPython to launch from. 4. Enter code ipython notebook --no-browser --port=7000 code

The above steps have remotely logged you into Hummingbird and starts IPython without the browser.

Start IPython in your browser. 1. Open a browser (e.g. Chrome, Firefox, etc) on your computer. 2. In the URL bar, enter code localhost:8000 code Note: The port listed here (8000) needs to match the first localhost port entered in Step 2 of the tunneling procedure described above.

The web interface for IPython should open, but since you specified port 7000 (that's what the ':7000' means in the above step), your browser tunnels into Hummingbird and launches it from Hummingbird. You are now using IPython on Hummingbird!

Mount your Eagle web folder for easy access to your files when using IPython remotely on Hummingbird.
1. Open a new Terminal window and SSH into Hummingbird (see instructions above).

2. Make the directory where you want your Eagle web in the /Volumes directory. Example: code mkdir /Volumes/Eagle code

3. Mount your Eagle web folder. Example: code mount_afp -i "afp://username@eagle.fish.washington.edu/web/web_folder_name" /Volumes/Eagle code Explanation: mount_afp = Specifies drive mounting protocol -i = Prompts for user password after connection to server "afp://username@server.address/share/folder" = Specifies username, address and which folder to connect to. /Volumes/Eagle = This should match the directory you created in Step #2.

4. Enter password when prompted.

user:kubu4 3/14/2014

Change the default save location for IPython notebooks (via Terminal)
There are two choices for this:

Option #1 - Change to your desired directory each time you start IPython
Start Terminal. Change to the desired directory. Launch IPython.

NOTE: To speed this up, you could just create a shortcut (a.k.a. a symbolic/sym link to your desired directory) so that instead of typing in a very long directory path each time, you could just type: cd yoursymlink

Option #2 - Modify your IPython configuration file to indicate the desired location at IPython startup automatically
Locate your IPython installation location. - In Terminal, type: code ipython locate code The output of that will tell you where IPython is installed. Navigate to that directory via the Finder (NOTE: You will have to turn on the ability to view Hidden Files. See the section on this page above for instructions on how to do that.). Browse to the "Profile Default" folder. If you do NOT have a file called "ipython_notebook_config.py" then type the following in Terminal:

code ipython profile create code In your "ipython_notebook_config.py" file, find and edit (open the file with a text editor) these lines (hint: they're at the very end of the "ipython_notebook_config.py" file): code code AND code code Change them to: code c.NotebookManager.notebook_dir=u'/path/to/your/notebooks' code AND code c.NotebookManager.notebook_dir=u'/path/to/your/notebooks' code where you enter in the directory path where you want your notebooks saved by default in the '/path/to/your/notebooks'. Also note that the # symbol needs to be removed before the lines that begin with "c."
 * 1) The directory to use for notebooks.
 * 2) c.NotebookManager.notebook_dir=u'/path/to/your/notebooks'
 * 1) The directory to use for notebooks.
 * 2) c.FileNotebookManager.notebook_dir = u'/Users/Sam'
 * 1) The directory to use for notebooks.
 * 1) The directory to use for notebooks.

Save the file.

Now, when you launch IPython, it should start you in the directory you specified in the "ipython_notebook_config.py" file! user:kubu4 3/5/2104

Execute non-native (i.e. downloaded from the internet) command line programs (via Terminal):
Option #1 example code !/Applications/sratoolkit.2.3.3-4-mac64/bin/fastq-dump.2.3.3-3 /Users/Sam/Desktop/SRR039705.sra code Explanation: This option requires two elements. The first required element is the "!". This is equivalent to "./" to run a program in Terminal. The second required element is to specify the full path of the program you want to use ( !/Applications/sratoolkit.2.3.3-4-mac64/bin/fastq-dump.2.3.3-3). This is different than using Terminal, where you can navigate to the file directory and run the program while in the directory.

Option #2 example code !fastq-dump.2.3.3-3 /Users/Sam/Desktop/SRR039705.sra code Explanation: This option requires that the command line program be located in (on Mac OSX): /usr/local/bin See the section immediately below this for instructions on how to copy a program to /usr/local/bin NOTE: In order to enable this, some permissions need to be changed on /usr/local/bin. Using command line, enter: sudo chmod 4755 /usr/local/bin This will now allow you to add additional users and grant them read/write permissions via the Finder's "Get Info" menu. In Finder, use the "Go" > "Go to folder..." and enter /usr/local/bin. Right-click on the "bin" folder, "Get Info" and then you can add your user account and enable read/write permissions (you must be an administrator to make these changes).

Copy program for a command line program to /usr/local/bin (via Terminal):
code !cp -i /path/to_your_file/your_file /usr/local/bin code Explanation: ! - Equivalent to "./" in Terminal. Runs the target command cp - The copy command in Terminal -i - If destination contains duplicate file name, it will prompt you before overwriting the existing file /path/to_your_file/your_file - The path to the file you want to copy /usr/local/bin - The path to the destination folder of your file user:kubu4 10/25/2013

Run code in a cell:
Type your code in a cell and press Ctrl+Enter - This will execute the code in just that cell and the cursor will remain in that cell. user:kubu4 10/8/2013

Run code in a cell and insert a new cell below:
Type your code in a cell and press Shift+Enter - This will execute the code in just that cell, insert a cell below and put your cursor in the new cell. user:kubu4 10/8/2013

Install a python module/package:
After downloading your desired module/package, navigate to the module's folder in Terminal or Command Line and type: code python setup.py install --user code - This will initiate python and run the setup file (setup.py) for the module. Don't know why "--user" is necessary, but it seems to be. user:kubu4 10/8/2013 Hummingbird.IP.address