Confidence
Mr Fortuner connection modem
fortuner at MATH.U-BORDEAUX.FR
Sat Jun 10 07:21:56 CDT 1995
I have imagined what I call an "endorsement function" for a semi-automatic
assessment of the reliability of an identification. This endorsement process
depends on 4 areas that influence the reliability of the data:
1The expertise
of the user. Obviously data entered by an expert is more to be trusted than
data entered by a beginner.
2What I call the PIF (Personal Intuitive Feeling
) of the user, (actually, in French "argot", le pif is the nose; it also means
the intuition). We usually know when a piece of data is lousy. we know when
we are guessing at a character rather than actually seeing it. A high PIF
value means we know the data is good. a low PIF value means we are not so
sure.
The PIF itself depends of two factors, how clearly the character was
observed, and how consistent it is from one specimen to the next. Obviously,
the PIF of an expert is more to be trusted than that of a beginner.
3The
general observation set-up. For nematodes, which are microscopic worms, this
includes the number of specimens and their quality (freshly killed, well fixed
or distorted), and the type and quality of the optics.
4The reliability of
the character itself, as defined by 3 types of metadata:
- how conspicuous
(visible) is the biological structure being described;
- how ambiguous
(difficult to define) is the character;
- and finally the variability of the
character in the group considered.
Each factor is scored (from metadata
attached to characters or to biological groups, from a table describing the
expertise of the user, and from general data about a particular identification
session) from 0 to 1, and these scores are used in an endorsement
algorithm.
I am still working on the perfect algorithm but the general idea
is:
If the user is an expert, trust his PIF;
If the user is a novice, trust
his data if he has done a good job (good optics, good specimens)
all that
depending on the goodness of the character itself.
The endorsement factor can
be used at several level. It can be used to weigh each character in a
similarity algorithm; it can select the most reliable characters to be used in
an elimination tool (e.g., a dichotomous key); it can be used to give a global
level of confidence in the result of an identification, which is what the
original question was about.
Renaud Fortuner
fortuner at math.u-bordeaux.fr
More information about the Taxacom
mailing list