Identification keys (xDelta-l)

Fri Jun 16 17:52:50 CDT 1995

                                                                  16 June 1995

> From: Lawrence Kirkendall <Lawrence.Kirkendall at ZOO.UIB.NO>
> To: Taxacom

> For the (nonexpert-in-this-group) user trying to identify large numbers of
> specimens an enormous amount of time would be saved if THE MOST COMMON
> SPECIES CAME OUT FIRST! How many of you have taken that into
> consideration? Am I wrong in believing that, for most groups, 5-10 species
> will make up 50-75% or more of all _individuals_ in a museum collection?

Our key-generation program, KEY, has parameters that can be set so that more
abundant taxa tend to come out early in the key. As far as I know, no one has
ever used this option in practice. The important consideration is actually not
the relative abundance of a taxon, but the relative frequency with which the
key will be used to identify the taxon. Common (or distinctive) taxa may tend
to be well known, so that it may not be necessary to use a key to identify
them. For example, there would be no point in making lions and tigers come out
early in a key for identifying cats.

> In any given level, there are taxa that are very distinctive to the naive
> user. Much time would be saved if there were some sort of overview of
> these at the beginning of the identification process, perhaps before the
> key even begins.

A closely related point was made in a Taxacom discussion last year. It was
referring to INTKEY, but the same character-selection method is used in KEY.
Here is an extract from my previous posting (`Response to comments on DELTA',
27Apr94).

    > I feel that the search algorithm is the opposite of what it
    > should be. I would prefer a routine that separates the taxa
    > with unique characters FIRST.

    There are many algorithms for choosing characters to be used in
    identification, but as far as I know all of them look for characters that
    divide the remaining taxa into subgroups that are as nearly equal as
    possible. This tends to minimize the number of steps required in an
    identification. For example, for a group of 100 taxa, the use of
    characters which are optimal in this sense would lead to a key requiring 6
    or 7 steps (characters) for an identification, whereas the use of
    characters which split off one taxon at a time would lead to a key
    requiring an average of 50 steps for an identification. If you need
    further convincing, just try doing some INTKEY identifications choosing
    characters from near the bottom of the `BEST' menu, rather than from the
    top.

    Another important consideration in choosing characters is the ease of use
    or `reliability' of the characters. This is a subjective matter, which
    often depends on the context in which the key will be used. In KEY (our
    key-generation program) and INTKEY, reliabilities may be set for each
    character. The relative importance attached to the reliability and the
    separating power of a character are controlled by a parameter, RBASE,
    which may be set by the user. By suitable choice of reliability and/or
    RBASE, any character may be forced to be the `best' character (provided
    that it has any separating power at all).

    Characters which split one taxon from all the rest are often preferred by
    taxonomists, and could be given high reliability to force their use.
    However, such preferences should be examined critically. A distinction may
    seem obvious to the expert who has a mental picture of all of the taxa.
    Novices may make the distinction accurately when identifying a specimen of
    the unusual taxon, but will they do so with other specimens, particularly
    if they have never seen an example of the unusual taxon? How will the
    overall accuracy of the identification be affected, taking into account
    the much greater number of characters that will have to be used?

No one should be contemplating constructing conventional keys as
identification aids, as interactive identification is so much better. (There
may be other reasons for constructing them; currently the main one is probably
the editorial policy of some publications.) The main, potential advantages of
interactive identification are: entry and deletion of characters in any order
during an identification (allowing difficult or unavailable characters to be
avoided); the ability to allow for errors (whether made by the user or in the
data); the ability for the user to express uncertainty; direct handling of
numeric values (i.e. without dividing into ranges); the practicality of using
characters that are very variable within taxa; convenient access to notes and
illustrations to clarify character definitions.

Mike Dallwitz                                  Email md at ento.csiro.au
CSIRO Division of Entomology                   Fax +61 6 246 4000
GPO Box 1700, Canberra ACT 2601, Australia     Phone +61 6 246 4075