We are searching data for your request:
Upon completion, a link will appear to access the found materials.
The phrase “control flow” refers to the fact that constructs like for-loops change the flow of program execution away from the simple top-to-bottom order. There are several other types of control flow we will cover, two of which are “conditional” in nature.
If-statements allow us to conditionally execute a block of code, depending on a variable referencing a Boolean
False, or more commonly a condition that returns a Boolean
False. The syntax is fairly simple, described here with an example.
All the lines from the starting
ifto the last line in an
else:block are part of the same logical construct. Such a construct must have exactly one
ifconditional block, may have one or more
elifblocks (they are optional), and may have exactly one catchall
elseblock at the end (also optional). Each conditional is evaluated in order: the first one that evaluates to
Truewill run, and the rest will be skipped. If an
elseblock is present, it will run if none of the earlier
elifblocks did as a “last resort.”
Just like with for-loops, if-statements can be nested inside of other blocks, and other blocks can occur inside if-statement blocks. Also just like for-loops, Python uses indentation (standard practice is four spaces per indentation level) to indicate block structure, so you will get an error if you needlessly indent (without a corresponding control flow line like
else) or forget to indent when an indentation is expected.
The above code would print
Number short: 2 number long: 2.
While-loops are less often used (depending on the nature of the programming being done), but they can be invaluable in certain situations and are a basic part of most programming languages. A while-loop executes a block of code so long as a condition remains
True. Note that if the condition never becomes
False, the block will execute over and over in an “infinite loop.” If the condition is
Falseto begin with, however, the block is skipped entirely.
The above will print
Counter is now: 0, followed by
Counter is now: 1,
Counter is now: 2,
Counter is now: 3, and finally
Done. Counter ends with: 4. As with using a for-loop over a range of integers, we can also use a while-loop to access specific indices within a string or list.
The above code will print
base is: A, then
base is: C, and so on, ending with
base is: Tbefore finally printing
Done. While-loops can thus be used as a type of fine-grained for-loop, to iterate over elements of a string (or list), in turn using simple integer indexes and
syntax. While the above example adds
base_indexon each iteration, it could just as easily add some other number. Adding
3would cause it to print every third base, for example.
Boolean Operators and Connectives
We’ve already seen one type of Boolean comparison,
<, which returns whether the value of its left-hand side is less than the value of its right-hand side. There are a number of others:
|less than or equal to?|
|greater than or equal to?|
|not equal to?|
These comparisons work for floats, integers, and even strings and lists. Sorting on strings and lists is done in lexicographic order: an ordering wherein item A is less than item B if the first element of A is less than the first element of B; in the case of a tie, the second element is considered, and so on. If in this process we run out of elements for comparison, that shorter one is smaller. When applied to strings, lexicographic order corresponds to the familiar alphabetical order.
Let’s print the sorted version of a Python list of strings, which does its sorting using the comparisons above. Note that numeric digits are considered to be “less than” alphabetic characters, and uppercase letters come before lowercase letters.
Boolean connectives let us combine conditionals that return
Falseinto more complex statements that also return Boolean types.
These can be grouped with parentheses, and usually should be to avoid confusion, especially when more than one test follow a
Finally, note that generally each side of an
orshould result in only
False. The expression
a == 3 or a == 7has the correct form, whereas
a == 3 or 7does not. (In fact,
7in the latter context will be taken to mean
True, and so
a == 3 or 7will always result in
Notice the similarity between
==, and yet they have dramatically different meanings: the former is the variable assignment operator, while the latter is an equality test. Accidentally using one where the other is meant is an easy way to produce erroneous code. Here
count == 1won’t initialize
1; rather, it will return whether it already is
1(or result in an error if
countdoesn’t exist as a variable at that point). The reverse mistake is harder to make, as Python does not allow variable assignment in if-statement and while-loop definitions.
In the above, the intent is to determine whether the length of
seqis a multiple of 3 (as determined by the result of
len(seq)%3using the modulus operator), but the if-statement in this case should actually be
if remainder == 0:. In many languages, the above would be a difficult-to-find bug (
remainderwould be assigned to
0, and the result would be
Trueanyway!). In Python, the result is an error:
SyntaxError: invalid syntax.
Still, a certain class of dangerous comparison is common to nearly every language, Python included: the comparison of two float types for equality or inequality.
Although integers can be represented exactly in binary arithmetic (e.g.,
751in binary is represented exactly as
1011101111), floating-point numbers can only be represented approximately. This shouldn’t be an entirely unfamiliar concept; for example, we might decide to round fractions to four decimal places when doing calculations on pencil and paper, working with 1/3 as 0.3333. The trouble is that these rounding errors can compound in difficult-to-predict ways. If we decide to compute (1/3)*(1/3)/(1/3) as 0.3333*0.3333/0.3333, working left to right we’d start with 0.3333*0.3333 rounded to four digits as 0.1110. This is then divided by 0.3333 and rounded again to produce an answer of 0.3330. So, even though we know that (1/3)*(1/3)/(1/3) == 1/3, our calculation process would call them unequal because it ultimately tests 0.3330 against 0.3333!
Modern computers have many more digits of precision (about 15 decimal digits at a minimum, in most cases), but the problem remains the same. Worse, numbers that don’t need rounding in our Base-10 arithmetic system do require rounding in the computer’s Base-2 system. Consider 0.2, which in binary is 0.001100110011, and so on. Indeed,
0.2 * 0.2 / 0.2 == 0.2results in
While comparing floats with
>=is usually safe (within extremely small margins of error), comparison of floats with
!=usually indicates a misunderstanding of how floating-point numbers work. In practice, we’d determine if two floating-point values are sufficiently similar, within some defined margin of error.
Counting Stop Codons
As an example of using conditional control flow, we’ll consider the file
seq.txt, which contains a single DNA string on the first line. We wish to count the number of potential stop codons
"TGA"that occur in the sequence (on the forward strand only, for this example).
Our strategy will be as follows: First, we’ll need to open the file and read the sequence from the first line. We’ll need to keep a counter of the number of stop codons that we see; this counter will start at zero and we’ll add one to it for each
"TGA"subsequence we see. To find these three possibilities, we can use a for-loop and string slicing to inspect every 3bp subsequence of the sequence; the 3bp sequence at index
seq[0:3], the one at position
seq[1:4], and so on.
We must be careful not to attempt to read a subsequence that doesn’t occur in the sequence. If
seq = "AGAGAT", there are only four possible 3bp sequences, and attempting to select the one starting at index 4,
seq[4:7], would result in an error. To make matters worse, string indexing starts at
0, and there are also the peculiarities of the inclusive/exclusive nature of
slicing and the
To help out, let’s draw a picture of an example sequence, with various indices and 3bp subsequences we’d like to look at annotated.
Given a starting index
index, the 3bp subsequence is defined as
seq[index:index + 3]. For the sequence above,
15. The first start index we are interested in is
0, while the last start index we want to include is
len(seq) - 3. If we were to use the
range()function to return a list of start sequences we are interested in, we would use
range(0, len(seq) - 3 + 1), where the
+ 1accounts for the fact that
range()includes the first index, but is exclusive in the last index.
We should also remember to run
.strip()on the read sequence, as we don’t want the inclusion of any
Notice in the code below (which can be found in the file
stop_count_seq.py) the commented-out line
While coding, we used this line to print each codon to be sure that 3bp subsequences were reliably being considered, especially the first and last in
AAT). This is an important part of the debugging process because it is easy to make small “off-by-one” errors with this type of code. When satisfied with the solution, we simply commented out the print statement.
For windowing tasks like this, it can occasionally be easier to access the indices with a while-loop.
If we wished to access nonoverlapping codons, we could use
index = index + 3rather than
index = index + 1without any other changes to the code. Similarly, if we wished to inspect 5bp windows, we could replace instances of
5(or use a
- The molecular weight of a single-stranded DNA string (in g/mol) is (count of
"A")*313.21 + (count of
"T")*304.2 + (count of
"C")*289.18 + (count of
"G")*329.21 – 61.96 (to account for removal of one phosphate and the addition of a hydroxyl on the single strand).
Write code that prints the total molecular weight for the sequence in the file
seq.txt. The result should be
21483.8. Call your program
- The file
seqs.txtcontains a number of sequences, one sequence per line. Write a new Python program that prints the molecular weight of each on a new line. For example:
You may wish to use substantial parts of the answer for question 1 inside of a loop of some kind. Call your program
- The file
ids_seqs.txtcontains the same sequences as
seqs.txt; however, this file also contains sequence IDs, with one ID per line followed by a tab character (
Call your program
Because the tab characters cause the output to align differently depending on the length of the ID string, you may wish to run the output through the command line tool
-toption, which automatically formats tab-separated input.
- Create a modified version of the program in question 3 of chapter 15, “Collections and Looping, Part 1: Lists and
for,” so that it also identifies the locations of subsequences that are self-overlapping. For example,
"at positions 1, 3, 5, 7, and 14.