`Dichotomy', DELTA, and INTKEY
Mike Dallwitz
miked at ENTO.CSIRO.AU
Wed Mar 22 10:06:43 CST 1995
22 March 1995
> From: Renaud Fortuner <fortuner at math.u-bordeaux.fr>
>
> Rigidity of Delta: I was thinking of the way the characters are defined,
> this cumbersome coding in particular, which looks as simple as an English
> zip code!
Here's an example.
#4. culms/
1. woody and persistent/
2. herbaceous/
This differs from what you would ordinarily write only in the extra symbols
`#' and `/', which are there to allow programs to identify the components of
the character. (A person can do this without the extra symbols, by
understanding the meaning.)
> You say : By `dichotomous' I presume you mean `deterministic' or
> `non-probabilistic'.
>
> No, not really, because any identification system IS probabilistic. What you
> call deterministic would be a probabilistic system with a 100% probability.
>
> Identification is always a probabilistic process. Unless you are a pompous
> egotistic fool, you can never be 100% sure that you have correctly
> identified a species. (In the previous sentence, the word "never" should be
> taken with a probability of 95.4% as there are cases when you ARE 100% sure,
> e.g., when you identify H.sapiens sapiens Q although there are times when I
> wonder). I prefer a similarity program which tells me that the unknown is
> 100% similar to species A and 99% similar to species B to a key which tells
> me that it IS species A (with species B 50 lines below).
>
> No, by dichotomous key I mean a key which works by dichotomy (in logic,
> dividing a class into 2 opposed subclasses), a key which asks me if the
> flowers are red or white, and too bad if my specimen is pink or if my
> population includes 75% red flowers and 25% white ones (Mendel was there).
OK. Let's try to agree on terminology first.
A deterministic procedure is one in which the only probabilities considered
are 0 and 1. A probabilistic procedure is one which can deal with any
probabilities. Of course, I intend these terms to apply to the algorithms used
by the program, not to the results achieved by the user.
I would like to make it clear to non-users that DELTA can deal with
non-dichotomous characters - multistate characters, numeric characters, and
text `characters'. Renaud is referring to dichotomous identification
procedures, in which, at each step, the remaining taxa are divided into those
which match the specimen, and those which don't.
INTKEY is dichotomous in this sense, but this does not affect its usefulness
for the non-probabilistic data which it handles. Let's consider the above
example of flower colour, and assume that the character is
#6. flowers/
1. white/
2. blue/
3. red/
Depending on the circumstances, the data might be coded in one of these ways.
6,1/3 (i.e. character 6, state 1 or state 3)
6,1<25%>/2<75%>
6,1-3<pink>
6,1-3<depending on soil acidity>
6,1&3<spotted>
6,1/1-3<pink>/1&3<spotted>/3
etc.
These variants are (currently) useful only for generating natural-language
descriptions - they are all treated the same in INTKEY. (If the distinctions
are important for indentification, you should define a separate character
state for `pink', and probably a separate character for `spotted'.)
Again depending on circumstances, the INTKEY user attempting to identify a
specimen from this taxon might enter 6,1 or 6,3 or 6,1/3. (I am putting it
this way for brevity; it would normally be done via menus.) The first would
eliminate taxa coded 6,2 or 6,3 or 6,2/3. The last would eliminate only taxa
coded 6,2, but is the safest option, and should be used if the user is in any
doubt.
INTKEY accomodates errors by the user AND errors in the data by means of an
`error tolerance' parameter, which defaults to 0. Taxa are eliminated as
possible identifications only if they differ from the specimen in more
characters than the currently set tolerance. The tolerance may be altered
manually (up and down) during an identification, and, if the `autotolerance'
feature is set (as it is in the `simplified' and `automatic' modes of the
Windows INTKEY), the program automatically changes the value as appropriate.
By this means, the user can make any number of errors, yet still go on to a
correct identification, provided the correct information eventually outweighs
the incorrect. At any stage, the user has access to the number of differences
between any taxon and the specimen, and the actual character values which have
given rise to these differences. Surely this is more helpful than knowing that
a taxon is `99% similar' to the specimen.
Mike Dallwitz Internet md at ento.csiro.au
CSIRO Division of Entomology Fax +61 6 246 4000
GPO Box 1700, Canberra ACT 2601, Australia Phone +61 6 246 4075
More information about the Taxacom
mailing list