"Characters", "character states", and other threads

Fri Jul 22 17:30:47 CDT 2005

Hi all,

Curtis Clark wrote:

>> On 2005-07-21 22:37, Mike Dallwitz wrote:
>
>
>>>> John's view seems to be that the terms are interchangeable. If this
>>>> is the
>>>> prevailing usage, I suppose it has to be accepted. But it would be a
>>>> pity -
>>>> there was once a useful distinction between the terms.
>>
>>
>> John himself has stated that his views in many areas are not the
>> prevailing ones. I think we can safely assume that this is not a
>> counter-example.
>
>

I completely agree. I actually just have a paper out in which I
explicitly discuss the conceptual differences between 'characters' and
'character states', reflecting what I consider to be current prevailing
usage. It's "Parsimony and the problem of inapplicables in sequence
data", on pages 81-116 of Vic Albert's recent book on "Parsimony,
phylogeny and genomics" (Oxford University Press,
http://www.oup.co.uk/isbn/0-19-856493-7 ).

Lest anyone should supect me of spamming activities, that paper also
discusses several other issues that are relevant for some recent ongoing
threads here. The general idea in the paper is that parsimony seeks to
maximize the amount of similarity in a data set that, in a logically
correct way, can be interpreted as homology. Obviously this is not a new
idea. It was most succinctly put forward in Farris's 1983 paper on "The
logical basis of Phylogenetic analysis" (pp. 7-36 in "Advances in
Cladistics II", N. Platnick and V. Funck, eds., Columbia University
Press, New York), who in turn built on the ideas of Hennig (who,
incidentally, often referred to character states as characters, and
admitted doing so).

In my paper, I re-examine this principle of maximizing similarity that
can be explained as homology (including polarity and the use of
outgroups) and then argue that it is sufficient to cladistically analyze
morphological and molecular data alike. Not very surprising, if you ask
me, but there's a twist: this simple principle is also sufficient to
provide a logical basis for parsimony analyses of sequence data that are
not aligned prior to analysis. The general framework for optimizing
unaligned sequences on trees goes back to the work of David Sankoff and
coworkers in the 1970's, but this and similar work later on never got
into the question if the above principle could be used to set parameters
such as substitution costs and indel costs that appear in such
optimization algorithms.

It can. To put a long and technical story short (details are in the
paper in Vic's book), for most   approximative optimization algorithms
for unaligned sequences that are in use nowadays (including Wheeler's
Direct Optimization and Fixed States algorithms as implemented in the
program POY), the following cost regime is a good approximation to
search for the trees that maximize the total amount of equally weighted
sequence similarity: substitution costs two, gap opening cost three, and
gap extension cost one.

Best

Jan De Laet