[Taxacom] New Interactive Key to Wetland Monocots
Mike Dallwitz
M.J.Dallwitz at netspeed.com.au
Fri Nov 17 10:09:48 CST 2006
> There is a new Interactive Key to Wetland Monocots of the US (ca. 2400 taxa)
> available. See http://npdc.usda.gov/technical/plantid_wetland_mono.html ...
> The automated plant key runs in a new version of SLIKS.
This software has serious design flaws, as well as serious (and obvious) bugs.
Why waste resources writing, or preparing data for, software that is
inferior in many ways to software that was available 35 years ago? Why not
build on known techniques, rather than reinventing the square wheel?
Here are a few specific comments.
DATA STRUCTURES
SLIKS uses isolated character states, not characters. This approach was used
in punched-card keys, but is seldom used in computer keys (Meka is a notable
exception). Some problems with this approach are:
(1) It's more difficult for the user to understand the character states
without comparing them with contrasting states. Contrasting states may
or may not be available; even if some are available, it may be difficult
for the user to know which ones they are, or whether _all_ contrasting
states are available.
(2) If a contrasting state is not explicitly available, a complicated
interface is needed to allow the user to indicate that the available
state is _not_ exhibited by the specimen. SLIKS doesn't allow this (Meka
does).
(3) A complicated interface is needed to allow the user to express
uncertainty. For example, the user might want to say 'I'm not sure
whether the petals are yellow or greenish (but I know they're not white,
blue, orange, or brownish)'. SLIKS doesn't allow this (Meka does).
A user familiar with more conventional programs might select both
states, but this doesn't have the desired effect - in fact, it will ruin
the identification.
(4) Because there are no true characters, it's impossible to calculate
the 'best' characters, i.e., those that would tend to lead to shorter
and more reliable identifications. It's not just that this isn't
currently implemented in SLIKS - it _can't_ be implemented without
drastic changes in the interface and, more importantly, in the data.
(5) Large numbers of character states have to be examined and rejected
(because they don't match the specimen), while trying to find states
that do match the specimen. With each state examined, there is a
possibility of error. This can make the probability of a correct
identification very small. See worked example below.
There's no provision for numeric characters (at least, I didn't see any
examples of them).
It's not clear whether there's provision for missing values (I didn't see any).
LACK OF IMPORTANT FEATURES
Many features important for quick and reliable identification are lacking.
For example:
Finding the 'best' characters to use.
Expressing uncertainty.
Error tolerance.
Locating errors if they are made.
Explanatory notes and illustrations for the characters. (There is
a kind of glossary feature, using Google searches, but characters
often can't be fully understood just from definitions of the words
used in them.)
BUGS OR DEFICIENCIES IN THE IMPLEMENTATION
Alterations to the layout of the panes aren't preserved when the key is
restarted. This is particularly annoying because the default layout is bad:
the 'Keep all characters visible' option can't be seen without scrolling,
and the right-hand pane is much narrower than the left.
The right-hand pane is used for the list of remaining taxa _and_ for
descriptions. It doesn't scroll automatically, and new information is
appended, rather than replacing the current information. As a result, it's
easy to think that things aren't working at all. For example, if you ask for
a description of a taxon, usually nothing seems to happen, because the
description is out of sight at the bottom of the pane. (If you're alert, you
can notice a change in the size of the thumb in the scroll bar.) If you ask
for another description, it's appended, and, for good measure, an identical
list of the remaining taxa is appended too.
There is a similar problem with identification. Usually, when you select
states and press the 'Matching Taxa' button, nothing seems to happen. The
only exception is the _first_ time you do this at the start of an
identification, _provided_ that you haven't already asked for a description.
As before, the reason is that the new information (the states used and the
new list of remaining taxa) is appended, instead of replacing the previous
information.
When the 'Matching Taxa' button is pressed, the refreshed list of character
states is displayed from the top, so that you usually have to scroll to get
back to the position where you were perusing the states.
EXAMPLE OF AN IDENTIFICATION
The Poaceae key has 282 character states and 705 taxa. Instead of using a
specimen, we can work from a description generated from the key, say
Agrostis idahoensis (which I chose at random). After scanning the first 50
states, 'checking' (i.e. selecting) the ones corresponding to the
'specimen', and pressing 'Matching Taxa', we proceed by examining the
remaining states, which start:
51. Leaf sheath more or less hairy or prickly on surface
52. Leaf sheath hairy at summit, throat, or collar
53. Leaf sheath or blade prominently keeled
54. Leaf sheath and blade differentiated
55. Leaf sheath inflated or distended
59. Leaf blades disarticulating from sheath, deciduous at ligule
60. Leaf blades very short, 1-2 cm long
We have to compare these states, one by one, with the specimen. (This is
presumably more difficult with a real specimen than with the description.)
If we make no mistake, state 54 is the only one which should be checked. The
other 6 states have been laboriously examined FOR NO POSSIBLE GAIN, but
always with the risk of making a mistake, any one of which would be fatal;
that is, if we check any of these states, the 'correct' taxon will be
eliminated. (And as there is no error tolerance, there is no possibility of
recovery.)
Thus we proceed up to, say, state 100, and then press 'Matching Taxa'. And
so on. We arrive at the answer after checking 26 states, and examining but
not checking 78. (The number examined but not checked depends on how
frequently you press 'Matching Taxa', and also whether you skip any of the
states that you could correctly check.)
Experiments have shown that the probability of error in using a character
(or, in this case, a state) is around 0.05-0.1, but let's be optimistic and
assume it's 0.02. Then the probability of getting the right answer is
0.98^78 = 0.21 - a decidedly 'lightweight' result. Also, a lot of valuable
time has been used.
For comparison, using the (roughly) corresponding subset of Clayton's 'Grass
Species of the World', and using the characters of highest separating power,
the same identification requires 6 characters. The probability of getting
the right answer is 0.98^6 = 0.89.
--
Mike Dallwitz
Contact information: http://delta-intkey.com/contact/dallwitz.htm
DELTA home page: http://delta-intkey.com
More information about the Taxacom
mailing list