Now for a perl exercise.

 

It is relatively easy to write a little program in perl to manipulate sequence files. I had no programmin experience until I bought "Beginning Perl for Bioinformatics". We will use some of the exercises you can download from the book to lead you through the programming steps in learning perl.

Ususally your program will read in sequence information from a file, or that you type in, and perform some manipulation or search, then output the result to a file or to the screen.

The data is usually sored and manipulated in placeholders called variables, which begin with a "$", like "$DNA1"

We will be running these programs on mudbug. You will need to connect to it using telnet.

Open the command prompt window

 

Type: telnet mudbug.bio.uno.edu

enter the username and password you used before.

when mudbug is connected type:

cd perlex

Here's a really simple program from the book that puts 2 sequences together:

#!/usr/bin/perl -w
   # Example 4-2 Concatenating DNA
   # Store two DNA fragments into two variables called $DNA1 and $DNA2
   $DNA1 = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
   $DNA2 = 'ATAGTGCCGTGAGAGTGATGTAGTA';
   # Print the DNA onto the screen
   print "Here are the original two DNA fragments:\n\n";
   print $DNA1, "\n";
   print $DNA2, "\n\n";
   # Concatenate the DNA fragments into a third variable and print them
   # Using "string interpolation"
   $DNA3 = "$DNA1$DNA2";
   print "Here is the concatenation of the first two fragments (version 1):\n\n";
   print "$DNA3\n\n";
   # An alternative way using the "dot operator":
   # Concatenate the DNA fragments into a third variable and print them
   $DNA3 = $DNA1 . $DNA2;
   print "Here is the concatenation of the first two fragments (version 2):\n\n";
   print "$DNA3\n\n";
   # Print the same thing without using the variable $DNA3
   print "Here is the concatenation of the first two fragments (version 3):\n\n";
   print $DNA1, $DNA2, "\n";
   exit;

to run it type:

perl example4-2.pl

The program has comment lines in it that begin with # and guide you through the logic. The results are printed to the screen using the print statements. In this case the sequence information is included in the program. You would have to change the program every time you had a new sequence.

How about doing something a little more useful. Often times you will find that you want to look at the reverse comlement of a sequence (for example, if your gene happens to be transcribed right-to-left). Here's a simple program that tries to do this in a couple of ways:

#!/usr/bin/perl -w
   # Example 4-4 Calculating the reverse complement of a strand of DNA
   # The DNA
   $DNA = 'ACGGGAGGACGGGAAAATTACTACGGCATTAGC';
   # Print the DNA onto the screen
   print "Here is the starting DNA:\n\n";
   print "$DNA\n\n";
   # Calculate the reverse complement
   # Warning: this attempt will fail!
   #
   # First, copy the DNA into new variable $revcom 
   # (short for REVerse COMplement)
   # Notice that variable names can use lowercase letters like
   # "revcom" as well as uppercase like "DNA". In fact,
   # lowercase is more common.
   #
   # It doesn't matter if we first reverse the string and then
   # do the complementation; or if we first do the complementation
   # and then reverse the string. Same result each time.
   # So when we make the copy we'll do the reverse in the same statement.
   #
   $revcom = reverse $DNA;
   #
   # Next substitute all bases by their complements,
   # A->T, T->A, G->C, C->G
   #
   $revcom =~ s/A/T/g;
   $revcom =~ s/T/A/g;
   $revcom =~ s/G/C/g;
   $revcom =~ s/C/G/g;
   # Print the reverse complement DNA onto the screen
   print "Here is the reverse complement DNA:\n\n";
   print "$revcom\n";
   #
   # Oh-oh, that didn't work right!
   # Our reverse complement should have all the bases in it, since the
   # original DNA had all the bases-but ours only has A and G!
   #
   # Do you see why?
   #
   # The problem is that the first two substitute commands above change
   # all the A's to T's (so there are no A's) and then all the
   # T's to A's (so all the original A's and T's are all now A's).
   # Same thing happens to the G's and C's all turning into G's.
   #
   print "\nThat was a bad algorithm, and the reverse complement was wrong!\n";
   print "Try again ... \n\n";
   # Make a new copy of the DNA (see why we saved the original?)
   $revcom = reverse $DNA;
   # See the text for a discussion of tr///
   $revcom =~ tr/ACGTacgt/TGCAtgca/;
   # Print the reverse complement DNA onto the screen
   print "Here is the reverse complement DNA:\n\n";
   print "$revcom\n";
   print "\nThis time it worked!\n\n";
   exit;

 

to run it type:

perl example4-2.pl

To do something really useful, the programs need to be a bit more complicated. They are often broken into smaller subprograms that do parts of the job. This time we will read in a fasta file of DNA, get the reverse complement, and translate it.

to run it type:

perl revandtrans.pl

to see the program file, click this link: revandtrans.pl