Day 3 Materials
Downloads
- Download Slides
- Download Exercise files
References
- Interpretting Error Messages
- Basic File Input/Output
-
Note that this reference uses
Python2, and hence theprint()function does not use parenthesis (instead theprintstatement is used)
-
Day 3 solutions can be found here
Part I: Functions
-
Write a function to compute the
GCcontent of a DNA sequence. The function should accept a single argument, the DNA sequence, and return the GC percentage. Test your function with the nucleotide sequence"AGCTATAGCATAGC". -
Write a function that calculates the percentage of a given nucleotide from a DNA sequence. The function should accept two arguments: the nucleotide of interest and the DNA sequence. It should return the nucleotide percentage. Test your function with the nucleotide sequence
"AGCTATAGCATAGC". -
Write a function that calculates the percentage each nucleotide in a given DNA sequence. of a given nucleotide from a DNA sequence. The function should accept a single argument, the DNA sequence, and return a dictionary containing
key:valuepairs ofnucleotide:percentage. You can assume that the provided sequence contains only A, C, G, T. Test your function with the nucleotide sequence"AGCTATAGCATAGC". - Write a function to guess whether a provided sequence is DNA or protein. For this task, assume that any sequence comprised of % A, C, G, T is a DNA sequence. Test your function with the following two sequences:
"AGCTATGCATACGAGCATAGC""AGIILLCPKLKKQWTATWCAGCATADSARCVLMKGC"
- Modify the previous function to ignore all ambiguities in calculations. Use this list of ambiguous characters for this task:
ambig = ["B", "J", "N", "O", "X", "Z"]. Test your function with the following sequence:"APAPPPKKLRATNNYPOPPBXXXXXNTYGCTATLMQASDFTDTCATAGC"
Part II: File Input/Output
Files used in these exercises can be downloaded from the course website. Be sure to write your scripts in the same directory as these files!
-
Open the file
file1.txtin read-mode, and print its contents to screen. Use the.read()method, which saves the contents of the file to a single string. Perform this task twice: once usingopenandclose, and once usingwithcontrol-flow. -
Open the file
file1.txtin read-mode, and save all lines in this file to a list using the.readlines()method. Write a new file calledupper_file1.txtwhich contains the same contents offile1.txtbut in upper-case. Try to do this task using a single for-loop. -
Open the newly created file
upper_file1.txtin read-mode. Loop over the file lines without using.read()or.readlines(), and print out lines as you loop. -
Modify the previous for-loop to only print out lines in
upper_file1.txtwhich contain at least (i.e. ) 5 letterE’s. -
You should notice 20 files named
file1.txt, file2.txt, ..., file20.txt. Write a for-loop to open each of these files (Hint: use therange()function to loop over file names). For each file, print each line that contains more than 25 characters. -
Write another for-loop over the same 20 files. For each file, create a second file named
fileX_odd.txt(where X=1-20) which contains only the odd-numbered lines from the original file. For this, use a counter in the for-loop that goes over file lines (this will count the line numbers), but be careful: Remember that python indexing starts from 0, but the first line is technically line #1! -
Convert our zoo-keeper dictionaries into a comma-separated file with the header
animal,vore,food, and rows should contain corresponding information, i.e.lion,carnivore,meat. Perform this task with a single for-loop.category = {"lion": "carnivore", "gazelle": "herbivore", "anteater": "insectivore", "alligator": "homovore", "hedgehog": "insectivore", "cow": "herbivore", "tiger": "carnivore", "orangutan": "frugivore"} feed = {"carnivore": "meat", "herbivore": "grass", "frugivore": "mangos", "homovore": "visitors", "insectivore": "termites"} -
Create a second zoo-keeper file by converting the CSV into a tab-separated file. Perform this task by reading in the CSV, replacing commas with tabs, where tabs can be created as the string
"\t". For example, the following snippet will replace all commas with tabs in a string calledmystring:mystring2 = mystring.replace(",", "\t")