DataMining

Strategies for getting what you need from databases

Often you are able to obtain large amounts of wanted data, but not in a format that is very useful to you. How can you sort through what you have to get what you want?

Sometimes the answer may be in using something already familiar to you. for example:

Microsoft Excel

Recall the Report file that was generated when you did the sequence assembly project.

There was a lot of information embeded in the file (names of sequence reads, number of bases used from each, what Contig it was added to, etc.):


Scanned against repeated sequences:
Time to do prepass: 0:1:10
Preassembly Elapsed Time 0:0:0
Construction parameters:
Match Size 12
Maximum Added Gap Length in Contig 20
Maximum Added Gap Length in Sequence 20
Minimum Match Percentage 80
Maximum Register Shift Difference 20
Lastgroup Considered 2
Gap Penalty 0.20
Gap Length Penalty 0.70
Consensus Threshold 75
Entering 1762 sequences on 12/13/01, 11:34 AM
CREATING NEW contig 1: from 58-2-11-C07.r.1.scf(1>933)
ENTERING 58-2-2-C10.f.1.scf(4>942) in Contig 1: percent match 96
ENTERING 58-2-11-H02.r.1.scf(1>894) in Contig 1: percent match 96
CREATING NEW contig 2: from 58-2-6-F06.r.1.scf(1>795)
ENTERING 58-2-2-B07.f.1.scf(8>1014) in Contig 2: percent match 96
ENTERING 58-2-12-A01.r.1.scf(1>897) in Contig 1: percent match 96
ENTERING 58-2-10-H06.r.1.scf(1>783) in Contig 2: percent match 96
ENTERING 58-2-12-F02.r.1.scf(6>889) in Contig 1: percent match 96
ENTERING 58-2-3-D11.r.1.scf(1>693) in Contig 1: percent match 97
ENTERING 58-2-7-E01.r.1.scf(1>793) in Contig 1: percent match 97
CREATING NEW contig 3: from 58-2-12-B05.r.1.scf(1>952)
ENTERING 58-2-3-B07.f.1.scf(8>680) in Contig 1: percent match 99
CREATING NEW contig 4: from 58-2-10-C04.r.1.scf(1>769)
CREATING NEW contig 5: from 58-2-1-D09.r.1.scf(8>800)
ENTERING 58-2-11-E01.r.1.scf(1>964) in Contig 3: percent match 85
ENTERING 58-2-4-C01.r.1.scf(3>784) in Contig 1: percent match 95


In looking at the Strategy view of the entire dataset, I noticed that there were many plasmids that had been sequenced with one primer, but not the other, shown as black reads below:

How many were there and what are all their names so we can ask to have them sequenced?

First I modified a copy of the assembly report, for ease in reading into Excel. This is often not necessary, but in this case, I wanted to sort through the names of the reads to look for paired reads. This meant breaking up the name with tab characters:

For this I used a text editor's find and replace command to insert tabs (\t) within the file names. Now I can import (just open the file, Excel will guide you through) into Excel and analyze the names of the files. different parts of the filename are now different fields (columns) within the file.Let's look at what's in a name:

58-2-4-C01.r.1.scf

All the files begin with "58-2-". That is the name of the sequencing project. Next is "4-" the microtiter plate number for the plasmid. "C01" is the well location on the plate."r" is a reaction using the reverse primer; all plasmids should be sequenced once each with the forward (f) and reverse primers. "1" is the read number, some reactions are repeated.

So here is how it looks in Excel:

I have hidden the text in front of the name and the names have been sorted. You see that there were reads from both plasmids for 58-2-1A01, 1 forward and 2 reverse reads for 1A02, but only single reads from 1A03 and 1A04. These are the ones we want to find. We can put a conditional formula in column O to help us:

This means "if the value in cell F7 equals the one below it [F7=F8] then put a 1 in O7 [,1] OR if l F7 equals the one above it [,IF(F7=F6,1] then put a 1 in O7. If neither of these is true, put a 0 in O7 [,0))].

So 1A03 and 1A04 should both have "0" in column O, as would all unique reads. But what if there are more than 1 read in one direction, but none in the other? The next 2 columns help there. Column P keeps track of the number of reads with the same name:

This conditional statement says: IF the value in cell F11 is the same as F10, then add 1 to the value of P10 (the cell above the formula), otherwise, put a 1 in P11. So the first read from a plasmid will have a 1 in this column, the next a 2, the third will have a 3. etc.

But what about the case of 2 reverse reads, but no forward reads? Column Q takes care of that:

If there are more than 2 reads, it compare the value in G to the cell 2 above it, if they match it puts a 1 in column Q. If there is more than 1 read, it compares the value of G to the cell directly above and again puts a 1 here. If there is only one read, or it is the first read with that name, then it puts a 0 in column Q.

Read name diagnostics
58-2- 1 A 3 r 2 0 1 0 unique
58-2- 1 A 4 r 2 0 1 0 unique
58-2- 1 A 6 f 1 1 1 0 normal
58-2- 1 A 6 r 2 1 2 0 normal
58-2- 1 A 7 f 1 1 1 0 extra OK
58-2- 1 A 7 r 1 1 2 0 extra OK
58-2- 1 A 7 r 2 1 3 0 extra OK
58-2- 1 A 8 f 1 1 1 0 extra OK
58-2- 1 A 8 r 1 1 2 0 extra OK
58-2- 1 A 8 r 2 1 3 0 extra OK
58-2- 1 A 9 f 1 1 1 0 normal
58-2- 1 A 9 r 2 1 2 0 normal
58-2- 1 A 10 r 1 1 1 0 2 reads, 1 pr.
58-2- 1 A 10 r 2 1 2 1 2 reads, 1 pr
58-2- 1 A 11 r 1 1 1 0 2 reads, 1 pr
58-2- 1 A 11 r 2 1 2 1 2 reads, 1 pr
58-2- 1 A 12 r 1 1 1 0 2 reads, 1 pr
58-2- 1 A 12 r 2 1 2 1 2 reads, 1 pr

So you should be able to sort the spreadsheet to identify different classes of reads, but sorting will change the values of the diagnostic columns, so you need to copy and paste the values elsewhere, so you can sort the numbers without changing them.

You can also download new data from websites into Excel. This allows you to do calculations on the most up-to-date information available

sed, awk, grep

There are several comand-line programs available on any UNIX computer (like the rs6000 -if you have a tulane email address, you can login to rs6000.tcs.tulane.edu with your mail account login). They are very fast at handling text files. Once you learn to use them, they can be very powerful and big timesavers. You can tie them together with scripts to perform multiple manipulations on multiple files. you can type "man program_name" to get help.

awk

Can search for text strings, compare values of fields, and output results however you tell it:

jnolan% awk '/HUMAN/ {print $2, $1}' cytcox.aln
---MALPLRPLTRGLASA--------AKGGHGGAG------------------ARTWRLL COXD_HUMAN
--MAVVGVSSVSRLLGRSRPQLGRPMSSGAHGEEGS-----------------ARMWKTL COXE_HUMAN
TFVLALPSVALCTFNSYL-HSGH--RERPE--------FRPYQHLRIRTKPYPWGDGNHT COXD_HUMAN
TFFVALPGVAVSMLNVYL-KSHHGEHERPE--------FIAYPHLRIRTKPFPWGDGNHT COXE_HUMAN
LFHNSHVNPLP-TGYEHP---- COXD_HUMAN
LFHNPHVNPLP-TGYEDE---- COXE_HUMAN

The program searched for lines containing 'HUMAN' and printed out the second field first, followed by the first field.

[nolan:lecture/723/seqs] jnolan% ^HUMAN^ELVIS^
awk '/ELVIS/ {print $2, $1}' cytcox.aln
[nolan:lecture/723/seqs] jnolan% awk '!~/ELVIS/ {print $2, $1}' cytcox.aln
-no result!-

awk '/COXE_HUMAN/ {print $2}' cytcox.aln
--MAVVGVSSVSRLLGRSRPQLGRPMSSGAHGEEGS-----------------ARMWKTL
TFFVALPGVAVSMLNVYL-KSHHGEHERPE--------FIAYPHLRIRTKPFPWGDGNHT
LFHNPHVNPLP-TGYEDE----

returns just the sequence for COXE_HUMAN. A good way to extract sequences from an alignment.

We can take this result and pass it through a second program sed, which is a text editor:

awk '/COXE_HUMAN/ {print $2}' cytcox.aln | sed 's/-//g'
MAVVGVSSVSRLLGRSRPQLGRPMSSGAHGEEGSARMWKTL
TFFVALPGVAVSMLNVYLKSHHGEHERPEFIAYPHLRIRTKPFPWGDGNHT
LFHNPHVNPLPTGYEDE

We repeated the awk command, but used the pipe "|" to pass the rsult to sed without ever seeing the first result. Then we used sed to do a global search and replace to get rid of all "-".

Now we have our old sequence back!

Other useful unix commands:

comm file1 file2 - compares file1 with file2. Output is in 3 columns: lines unique to file1, lines unique to file2, lines common to both.

diff file1 file2 - compares file1 with file2. Output has barckets indicating where differences are in the files

diff cytcox.aln cytcox2.aln
16,17d15
< COXE_YEAST ---MFR---QCAKRYASSLPPNALKPAFGPPDKVAAQKFKESLMATEKHAKDTSNMWVKI
< COXE_SCHPO MSMMNRNIGFLSRTLKTSVPKRAGLLSFRAYSNEAKVNWLEEVQAEEEHAKRSSEFWKKV
32,33d29
< COXE_YEAST SVWVALPAIALTAVNTYFVEKEHAEHREHLKHVPDSEWPRDYEFMNIRSKPFFWGDGDKT
< COXE_SCHPO TYYIGGPALILASANAYYIYCKHQEHAKHVEDTDPG-----YSFENLRFKKYPWGDGSKT
48,49d43
< COXE_YEAST LFWNPVVNRHIEHDD-------
< COXE_SCHPO LFWNDKVN-HLKKDDE------

cut filename - extracts columns of text using character numbers or field numbers

cut -c17-76 cytcox.aln
multiple sequence alignment

-MAMSPAATVARRRLAAA--------SQGSH-EGG------------------ARTWKIL
--MASPASMAARRVLSAA--------SHAGH-EGGS-----------------ARTWKIL
---MALPLKSLSRGLASA--------AKGDHGGTG------------------ARTWRFL
---MALPLRPLTRGLASA--------AKGGHGGAG------------------ARTWRLL
------PLKVLSRSMASA--------SKGDHGGAG------------------ANTWRLL
---MALPLKVLSRSMASA--------AKGDHGGAG------------------ANTWRLL
MASAVLSASRVSRPLGRALPGLRRPMSSGAHGEEGS-----------------ARMWKAL
MASAVLSASRVSGLLGRALPRVGRPMSSGAHGEEGS-----------------ARIWKAL
--------------------------SSGAHGEEGS-----------------ARMWKAL
--MAVVGVSSVSRLLGRSRPQLGRPMSSGAHGEEGS-----------------ARMWKTL
--MAAAAWSRVSQLLGRSRLQVGRPMSSGAHGEEGS-----------------ARMWKAL
---MNRLAQPATRSVVKTFQRKSSGSFYGSNNVEGFKESYVTPLKQAHNA---SETWKKI
---MFR---QCAKRYASSLPPNALKPAFGPPDKVAAQKFKESLMATEKHAKDTSNMWVKI
MSMMNRNIGFLSRTLKTSVPKRAGLLSFRAYSNEAKVNWLEEVQAEEEHAKRSSEFWKKV
. :. * :
SFVLALPGVGVCMANAYM-KMQAHSHDPPE--------FVPYPHLRIRTKPWPWGDGNHS
SFVLALPGVAVCIANAYM-KMQQHSHEPPE--------FVAYSHLRIRTKKWPWGDGNHS
TFGLALPSVALCTLNSWL-HSGH--RERPA--------FIPYHHLRIRTKPFSWGDGNHT
TFVLALPSVALCTFNSYL-HSGH--RERPE--------FRPYQHLRIRTKPYPWGDGNHT
TFVLALPSVALCSLNCWM-HAGH--HERPE--------FIPYHHLRIRTKPFSWGDGNHT
TFVLALPGVALCSLNCWM-HAGH--HERPE--------FIPYHHLRIRTKPFAWGDGNHT
TYFVALPGVGVSMLNVFL-KSRHEEHERPP--------FVAYPHLRIRTKPFPWGDGNHT
TYFVALPGVGVSMLNVFL-KSRHEEHERPE--------FVAYPHLRIRTKPFPWGDGNHT
TLFVALPGVGVSMLNVFM-KSHHGEEERPE--------FVAYPHLRIRSKPFPWGDGNHT
TFFVALPGVAVSMLNVYL-KSHHGEHERPE--------FIAYPHLRIRTKPFPWGDGNHT
TYFVALPGVGVSMLNVYL-KSHHEEHERPE--------FIAYPHLRIRSKPFPWGDGNHT
FFIASIPCLALTMYAAFKDHKKHMSHERPE--------HVEYAFLNVRNKPFPWSDGNHS
SVWVALPAIALTAVNTYFVEKEHAEHREHLKHVPDSEWPRDYEFMNIRSKPFFWGDGDKT
TYYIGGPALILASANAYYIYCKHQEHAKHVEDTDPG-----YSFENLRFKKYPWGDGSKT
. * : : : . * . .:* * : *.**.::
LFHNAHTNALP-TGYEGPHH--
LFHNPHENALP-EGYEGPRH--
FFHNPRVNPLP-TGYEKP----
LFHNSHVNPLP-TGYEHP----
LFHNPHVNPLP-TGYEQP----
LFHNPHVNPLP-TGYEHP----
LFHNPHVNPLP-TGYEDE----
LFHNPHMNPLP-TGYEDE----
LFHNPHVNPLP-TGYEDE----
LFHNPHVNPLP-TGYEDE----
LFHNPHVNPLP-TGYEDV----
LFHNKAEQFVPGVGFEADREKH
LFWNPVVNRHIEHDD-------
LFWNDKVN-HLKKDDE------
:* * : .

paste file1 file2

puts 2 files together side-by-side

paste -d"Z" names seqs
CLUSTAL WZ multiple sequence alignment
Z
Z
COXE_CYPCZ-MAMSPAATVARRRLAAA--------SQGSH-EGG------------------ARTWKIL
COXE_ONCMZ--MASPASMAARRVLSAA--------SHAGH-EGGS-----------------ARTWKIL
COXD_BOVIZ---MALPLKSLSRGLASA--------AKGDHGGTG------------------ARTWRFL
COXD_HUMAZ---MALPLRPLTRGLASA--------AKGGHGGAG------------------ARTWRLL
COXD_RAT Z------PLKVLSRSMASA--------SKGDHGGAG------------------ANTWRLL
COXD_MOUSZ---MALPLKVLSRSMASA--------AKGDHGGAG------------------ANTWRLL
COXE_MOUSZMASAVLSASRVSRPLGRALPGLRRPMSSGAHGEEGS-----------------ARMWKAL
COXE_RAT ZMASAVLSASRVSGLLGRALPRVGRPMSSGAHGEEGS-----------------ARIWKAL
COXE_BOVIZ--------------------------SSGAHGEEGS-----------------ARMWKAL
COXE_HUMAZ--MAVVGVSSVSRLLGRSRPQLGRPMSSGAHGEEGS-----------------ARMWKTL
COXE_RABIZ--MAAAAWSRVSQLLGRSRLQVGRPMSSGAHGEEGS-----------------ARMWKAL
COXE_CAEEZ---MNRLAQPATRSVVKTFQRKSSGSFYGSNNVEGFKESYVTPLKQAHNA---SETWKKI
COXE_YEASZ---MFR---QCAKRYASSLPPNALKPAFGPPDKVAAQKFKESLMATEKHAKDTSNMWVKI
COXE_SCHPZMSMMNRNIGFLSRTLKTSVPKRAGLLSFRAYSNEAKVNWLEEVQAEEEHAKRSSEFWKKV
Z . :. * :
Z
COXE_CYPCZSFVLALPGVGVCMANAYM-KMQAHSHDPPE--------FVPYPHLRIRTKPWPWGDGNHS
COXE_ONCMZSFVLALPGVAVCIANAYM-KMQQHSHEPPE--------FVAYSHLRIRTKKWPWGDGNHS
COXD_BOVIZTFGLALPSVALCTLNSWL-HSGH--RERPA--------FIPYHHLRIRTKPFSWGDGNHT
COXD_HUMAZTFVLALPSVALCTFNSYL-HSGH--RERPE--------FRPYQHLRIRTKPYPWGDGNHT
COXD_RAT ZTFVLALPSVALCSLNCWM-HAGH--HERPE--------FIPYHHLRIRTKPFSWGDGNHT
COXD_MOUSZTFVLALPGVALCSLNCWM-HAGH--HERPE--------FIPYHHLRIRTKPFAWGDGNHT
COXE_MOUSZTYFVALPGVGVSMLNVFL-KSRHEEHERPP--------FVAYPHLRIRTKPFPWGDGNHT
COXE_RAT ZTYFVALPGVGVSMLNVFL-KSRHEEHERPE--------FVAYPHLRIRTKPFPWGDGNHT
COXE_BOVIZTLFVALPGVGVSMLNVFM-KSHHGEEERPE--------FVAYPHLRIRSKPFPWGDGNHT
COXE_HUMAZTFFVALPGVAVSMLNVYL-KSHHGEHERPE--------FIAYPHLRIRTKPFPWGDGNHT
COXE_RABIZTYFVALPGVGVSMLNVYL-KSHHEEHERPE--------FIAYPHLRIRSKPFPWGDGNHT
COXE_CAEEZFFIASIPCLALTMYAAFKDHKKHMSHERPE--------HVEYAFLNVRNKPFPWSDGNHS
COXE_YEASZSVWVALPAIALTAVNTYFVEKEHAEHREHLKHVPDSEWPRDYEFMNIRSKPFFWGDGDKT
COXE_SCHPZTYYIGGPALILASANAYYIYCKHQEHAKHVEDTDPG-----YSFENLRFKKYPWGDGSKT
Z . * : : : . * . .:* * : *.**.::
Z
COXE_CYPCZLFHNAHTNALP-TGYEGPHH--
COXE_ONCMZLFHNPHENALP-EGYEGPRH--
COXD_BOVIZFFHNPRVNPLP-TGYEKP----
COXD_HUMAZLFHNSHVNPLP-TGYEHP----
COXD_RAT ZLFHNPHVNPLP-TGYEQP----
COXD_MOUSZLFHNPHVNPLP-TGYEHP----
COXE_MOUSZLFHNPHVNPLP-TGYEDE----
COXE_RAT ZLFHNPHMNPLP-TGYEDE----
COXE_BOVIZLFHNPHVNPLP-TGYEDE----
COXE_HUMAZLFHNPHVNPLP-TGYEDE----
COXE_RABIZLFHNPHVNPLP-TGYEDV----
COXE_CAEEZLFHNKAEQFVPGVGFEADREKH
COXE_YEASZLFWNPVVNRHIEHDD-------
COXE_SCHPZLFWNDKVN-HLKKDDE------
Z:* * : .

Perl

One of the most powerful mining and manipulation tools. Fairly easy to learn, and once you do, you can do just about anything. There is a specific site that has hundreds of bioinformatic tools at bioperl.org. There is an excellent text available from O'Reilly Publishers, titled "Beginning Perl for Bioinformatics". It has some handy exercises you can download to lead you through the programming steps in learning perl.

Perl is very useful for analyzing sequences and parsing results.

Parsing: extracting data from a result in a useful manner. for example BLAST results:
Lots of info, but it is hard to compare and compile all the results from one search. Parsers search through the file and organize it into fields:

Query Seq Name Start Subj End Subj Query Start Query End Score Bits Score 2 Expect Length Overlap Length Identities Total % Identities
Contig15 64577 bp 1 1032 48366 45271 1702 4408 0 1032 3095 809 1032 78%
Contig15 64577 bp 1 658 50333 48372 1149 2972 0 660 1961 569 658 86%
Contig15 64577 bp 1 610 34786 32954 1117 2890 0 610 1832 544 611 89%
Contig15 64577 bp 1 659 32920 30944 1089 2817 0 659 1976 533 659 80%
Contig15 64577 bp 2 511 19393 17861 883 2281   587 1532 427 511 83%
Contig15 64577 bp 1 602 43345 41534 969 2506 0 602 1811 462 604 76%
Contig15 64577 bp 2 521 26507 24948 957 2475 0 521 1559 485 520 93%
Contig15 64577 bp 1 523 34525 32954 957 2474 0 523 1571 470 524 89%
Contig15 64577 bp 1 524 30249 28681 932 2408 0 524 1568 450 524 85%
Contig15 64577 bp 1 575 52890 51160 925 2391 0 575 1730 450 578 77%
Contig15 64577 bp 1 505 34471 32954 922 2384 0 505 1517 453 506 89%
Contig15 64577 bp 40 427 24406 23243 720 1859   427 1163 365 388 94%
Contig15 64577 bp 151 401 25787 25014 74.7 182 2.00E-14   773 78 273 28%
Contig15 64577 bp 1 416 34204 32954 746 1926 0 416 1250 366 417 87%

Can use in spreadsheet program.

Parsing takes advantage of key features of the document that can be used to divide a document into important parts and assigns them to variables to use for out puttin data in a useful format:

BLAST Parser variables:
$hsp->hit->seq_id
$hsp->subject->length
$hsp->score
$hsp->bits
$hsp->P $hsp->sbjctFrame
$hsp->match
$hsp->length

$hsp->percent
$hsp->positive
$hsp->querySeq
$hsp->homologySeq
$hsp->sbjctSeq
$hsp->hit->start
$hsp->hit->end

$hsp->query->start
$hsp->query->end
Where they come from in the BLAST output:
>uvsX_Aeh1 RecA-like recomb. pro; DNA-ATPase[seq_id]
          Length = 411[subject length]

 Score =  439 bits (1130), Expect = e-124[Score and P]
 Identities = 203/357 (56%), Positives = 278/357 (77%)[match/length][positive]
 Frame = -2

Query: 23561 MSDLKSRLIKASTSKLTAELTASKFFNEKDVVRTKIPMMNIALSGEITGGMQSGLLILAG 23382 
             +  L S+L   S++K+++ L  SKFFN+KD VRT++P++N+A+SGE+ GG+  GL +LAG
Sbjct: 13    LGSLMSKLAGTSSNKMSSVLADSKFFNDKDCVRTRVPLLNLAMSGELDGGLTPGLTVLAG 72

Query: 23381 PSKSFKSNFGLTMVSSYMRQYPDAVCLFYDSEFGITPAYLRSMGVDPERVIHTPVQSLEQ 23202 PSK FKSN L V++Y+R+YPDAVC+F+D+EFG TP Y S GVD RVIH P +++E+ Sbjct: 73 PSKHFKSNLSLVFVAAYLRKYPDAVCIFFDNEFGSTPGYFESQGVDISRVIHCPFKNIEE 132
Query: 23201 LRIDMVNQLDAIERGEKVVVFIDSLGNLASKKETEDALNEKVVSDMTRAKTMKSLFRIVT 23022 L+ D+V +L+AIERG++V+VF+DS+GN ASKKE +DA++EK VSDMTRAK +KSL R++T Sbjct: 133 LKFDIVKKLEAIERGDRVIVFVDSIGNAASKKEIDDAIDEKSVSDMTRAKQIKSLTRMMT 192
Query: 23021 PYFSTKNIPCIAINHTYETQEMFSKTVMGGGTGPMYSADTVFIIGKRQIKDGSDLQGYQF 22842 PY + +IP I + HTY+TQEM+SK V+ GGTG YS+DTV IIG++Q KDG +L GY F Sbjct: 193 PYLTVNDIPAIMVAHTYDTQEMYSKKVVSGGTGITYSSDTVIIIGRQQEKDGKELLGYNF 252
Query: 22841 VLNVEKSRTVKEKSKFFIDVKFDGGIDPYSGLLDMALELGFVVKPKNGWYAREFLDEETG 22662 VLN+EKSR VKE+SK ++V F GGI+ YSG+LD+ALE+GFVVKP NGW++R FLDEETG Sbjct: 253 VLNMEKSRFVKEQSKLPLEVTFQGGINTYSGMLDIALEVGFVVKPSNGWFSRAFLDEETG 312
Query: 22661 EMIREEKSWRAKDTNCTTFWGPLFKHQPFRDAIKRAYQLGAIDSNEIVEAEVDELIN 22491 E++ E++ WR DTNC FW P+F HQPF+ A ++L ++ + V EVDEL + Sbjct: 313 ELVEEDRKWRRADTNCLEFWKPMFAHQPFKTACSDMFKLKSVAVKDEVFDEVDELFS 369 >60plus39_Aeh1 DNA topoisomerase sub.; DNAdep. ATPase; memb-assoc Length = 613 Score = 412 bits (1058), Expect = e-115 Identities = 214/471 (45%), Positives = 296/471 (62%) Frame = -1 Query: 5325 IKNEIKILSDIEHIKKRSGMYIGSSANETHERFMFGKWESVQYVPGLVKLIDEIIDNSVD 5146 + E K+LSD EH + MYIGS++ ETH+ + GK+ + YVPGLVK+ DE+IDNSVD Sbjct: 1 MSQEFKVLSDKEHCLINTDMYIGSTSTETHDVLVDGKFVQIAYVPGLVKITDEVIDNSVD 60

Can be used to glean data from almost any useful source for further manipulation.

GBCH723 Home Page