An overlooked (free!) sequence alignment service

Harvey E. Ballard, Jr. hballard at STUDENTS.WISC.EDU
Fri Feb 2 10:21:48 CST 1996


I assume that you are obtaining sequences for ITS or another fast-evolving
spacer.  Before you bother spending money, I would wait to examine the
level of difficulty you experience in aligning the sequences by hand.  In
Viola, the sequences show the highest variability reported for species
within a genus, but I have still had little difficulty aligning the
sequences with confidence.  Some folks here have used SeqApp software to
manipulate the sequences.  I simply use PAUP (we have been using the
beta-test version, which has a few more useful features and runs faster
than 3.1.1).  I will bet that you can do all the alignment initially--and
maybe finally--using PAUP without the extravagant software.  Most published
ITS papers have used visual alignment to prepare a data set for analysis.
[Everyone has their favorite software; I'm both stingy with my time and
money and want to know as little about computer-work as possible to get the
desired result.  This is what I do, but I'm sure others have their own
methods that work as well.]

I should also tell you of a wonderful service now available on the World
Wide Web (we use NetScape Navigator to get to it and now have it
"bookmarked") based on the mainframe computer of Baylor College of
Medicine.  Via the Web you can submit a data matrix of sequences by
"cut-and-paste" directly from an initial PAUP file (minus a few unnecessary
command items).  The mainframe will accept up to 20,000 characters and give
up to an hour of free time to analyzing your matrix with ClustalW, a
program that aligns sequences and inserts gaps to find a matrix that yields
the shortest generated trees.  My aligned matrix of 55 taxa and 750
nucleotides for Viola ITS was too big, so I broke it into two halves--ITS 1
and ITS 2--and submitted them separately.  The program did an amazing job
of finding slightly different alignments for some troublesome taxa and,
with slight adjustment from me afterwards, the matrix provides fewer,
better resolved trees than it did before.

My recommendation, then, would be to visually align your sequences and then
submit the matrix as a cut-and-paste (stripping off the "options" commands
of PAUP first and any trailing assumptions blocks) to the mainframe at
Baylor College of Medicine.  The official name is "BCM Search Launcher:
Multiple Sequence Alignments".  Use ClustalW (one of the first of several
options).  Paste the stripped-down PAUP file in the window and click the
"search" button, then wait a minute or a few (go have tea if it's a big
matrix).  The screen will reveal a new arrangement of taxa and their
sequences when it's done.  Go to the middle of the screen--the software
provides "two" matrix formats, the second is easiest to paste back into a
PAUP file--block and copy, then close the Web site and get into PAUP,
pasting the realigned sequences into a file.  Use the "find" and "replace
all" edit functions to remove the ">" symbols at the beginning of the taxon
names and block the hard returns and remove those at the end of each line
of sequence to make an unbroken line for each sequence (PAUP doesn't like
hard returns).  When you've cleaned it up, go back and reinsert any
polymorphisms and question-mark areas--ClustalW frustratingly changes these
to "?" and "-" respectively.  Double-check your matrix for any silly
realignments that you can judge are best modified, make a careful count of
the new length of the aligned sequences, change the parameters on your
realigned PAUP file, and run it to your heart's content.  It still amazes
me that the service is free, and I've heard rumors that a recently
published paper evaluating ClustalW's utility extolls its virtues as an aid
to sequence alignment.

So, unless your sequences are too variable--in which case the spacer you're
using may not be suitable for the kinds of questions you might be
asking--you shouldn't have to spend additional money when pre-alignment
visually with PAUP (or MacClade) and submission to ClustalW with final
editing for "funnies" should be perfect.

--
Harvey E. Ballard, Jr.
Department of Botany, University of Wisconsin-Madison
132 Birge, 430 Lincoln Drive
Madison, WI 53706-1381
Fax: (608) 262-7509; office phone: (608) 262-2792 (Rm. 161, Herbarium);
Systma lab phone: (608) 262-4422
e-mail: hballard at students.wisc.edu  OR  hbviolet at macc.wisc.edu




More information about the Taxacom mailing list