Interactive identification

Mon Mar 27 18:11:06 CST 1995

                                                                 27 March 1995

> From: Renaud Fortuner <fortuner at math.u-bordeaux.fr>
> To: Taxacom
>
> You must agree that a procedure which can deal with ANY probabilities is by
> nature "better" (i.e., more general) than one which is restricted to the
> extreme of the range.

It would seem that I do agree, because on 21 March, I wrote: "I don't doubt
that probabilistic methods are important or essential in some fields. I would
like to incorporate them in INTKEY ...". I have added this to my list of
desirable attributes of identification programs.

> I believe we have another problem of terminology here. Multistate, numeric,
> and text are types of characters. Dichotomy is a method of identification.
> You can use dichotomy with any type of character.

There IS a potential problem. On 24 March, I wrote: "Renaud is referring to
dichotomous identification procedures, in which, at each step, the remaining
taxa are divided into those which match the specimen, and those which don't."
I wrote this because I thought that there was a possibility that you might be
misunderstood. A contrast is often made between dichotomous keys (those using
only 2-state characters) and polychotomous keys (those allowing characters
with 3 or more states). In your sense, these are both dichotomous.

> Your example of how the red-white-blue flower character is treated in Delta
> is exactly what I had in mind when I said "rigidity" or lack of flexibility.
> ... In my system, you would define the character as : structure = flower,
> descriptor = color, states = red, white, blue, pink ... It's a WYSIWYWD
> system: what you see is what you write down! If the unknown has spotted
> flower, the user just adds "spotted" to the list of states. ... When I want
> to say "pink", I prefer to write "pink" than "6,1-3<pink>" [i.e. white
> to red (pink)], but I suppose this is a question of personal taste.

Call it personal taste if you like; I prefer to call it taxonomic judgement.
Writing down a feature (e.g. flower) followed by all the terms that have ever
been used to describe an aspect of that feature (e.g. colour) does not usually
lead to a sound and useful character definition. Taken to an extreme, this
approach could lead to a separate state for every taxon. Such a `character'
would be taxonomically almost useless, as all it would tell us is that the
taxa are all different. To get comparative data, we need to know which taxa
are similiar, so the number of states needs to be kept reasonably small.

If you find a specimen that doesn't fit comfortably into existing state
definitions, it's not good enough to `just add' a suitable term to the list of
states. If the character is changed at all, it needs to be completely
rethought. For example, perhaps the `colour' character needs to be become two
or more: `hue' and `saturation'; or `whether uniformly coloured', `background
(or only) colour', `secondary colour'. Or perhaps a state definition needs to
be modified: `white' changed to `pale'. Even if an extra state is added, the
boundaries of existing states may move: what might have been considered near
enough to white when the contrasting state was red, might now be closer to
pink. None of these changes can be made lightly once substantial numbers of
taxa have already been coded, because, in principle, every coding will need to
be re-examined in the light of the new character definition(s).

For the above reasons, it is essential that the recording system be CAPABLE of
using existing character states, plus a qualifying comment. This doesn't, of
course, prevent you from redefining characters if necessary.

> The important point is, are you able to enter information in the middle of
> an identification session? I mean, suppose you have been working for a
> while, narrowed down your choices to a couple of species, then you discover
> that flower color is an important character, but that your unknown is pink
> and pink does not appear in the list of existing states. Can you, without
> closing your identification tools, open a "schema tool", add the word "pink"
> to the list of state, go back to your ID tool, and continue from there? If
> you can do this with Delta, fine. If not, you need an updated system.

INTKEY is only an identification program, and cannot modify its data. However,
in a multitasking environment, it is easy to open another application to alter
the data, without exiting from INTKEY. There is no way to continue with an
incomplete identification using the new data, but this requirement, far from
being important, doesn't make any sense. You don't know what data to add
(apart from the new state description) until the identification has been
completed, as far as possible, and a decision made as to whether the specimen
is a new variant of an existing taxon, or belongs to a new taxon.

> The INTKEY "error tolerance" parameter is a clever way to make the best of a
> bad method (dichotomy), but it is far from perfect. If the threshold is set
> at 2 you will have: first error = 0% degradation; second error 0%
> degradation; third error = 100% degradation. You have diminished the risk of
> having degradation occur (at the expense of a loss in discrimination,
> obviously), but when the user goes over the threshold, degradation is total.
> This is hardly what I would call graceful. By contrast, a similarity
> coefficient degrades gracefully because of its very nature.

The error tolerance is simply a threshold controlling which taxa are displayed
as the `remaining' taxa, and which taxa are considered in the calculation of
the `best' characters to use next. For example, if you had used 3 characters
up to the current point in an identification, the default error tolerance of 0
would let you see only those taxa that did not differ from the specimen. By
setting the tolerance to 1, you would see those that differed in 0 or 1
characters. If you set the tolerance to 3, you would see all the taxa. You can
freely move the value up and down at any time. By the way, how does your
program calculate the `best' characters, if it can't make some assumption
about the taxa that remain to be separated?

The `number of differences' used by INTKEY is, of course, a (dis)similarity
measure. It differs from the measures usually used in phenetic analysis in
that the contribution of each character is always 0 or 1 (not a fraction), and
in not being normalized by dividing by the number of characters used.

> How would you enter metadata, for example, even if new fields were added to
> the Delta format? For example, how would you say that all color characters
> are ambiguous?

SET RELIABILITY COLOR,0
(Or, equivalently, click on `Set' in the menu bar; in the menu which appears,
click on `Reliability; in the dialog box which appears, click on `color'; in
the entry box which appears, enter `0'.)

Mike Dallwitz                                  Internet md at ento.csiro.au
CSIRO Division of Entomology                   Fax +61 6 246 4000
GPO Box 1700, Canberra ACT 2601, Australia     Phone +61 6 246 4075