[Taxacom] Reproducibility of descriptive data

Sat Aug 8 00:42:02 CDT 2009

Bob Mesibov wrote:

> what we need is ... structured character data in perpetually interpretable 
> digital form

DELTA provides such a form. It's fairly easily interpretable without a 
computer program, unlike most other formats (e.g. Nexus, SDD). See, for example,

    Britton, E.B. 1986. A revision of the Australian chafers
    (Coleoptera: Scarabaeidea: Melolonthinae). Vol. 4. Tribe Liparetrini:
    genus Colpochila. Aust. J. Zool., Suppl. ser. 118: 1-135.

in which descriptions are published (on paper) in DELTA format.

> Structuring of character data too iffy because of interpretation errors ...?

Yes - taxonomists don't pay enough attention to reproducibility of data. 
This issue should be addressed in the earliest stages of a new project - see 
  below. As far as I know, no one has ever followed these suggestions.

As far as I know, the only tests of data reproducibility have been in the 
context of testing keys, where the emphasis has been on the key methodology 
rather than on the data per se. See http://delta-intkey.com/www/idtests.htm. 
  (And, incidentally, the methods used for assessing the key methodologies 
were often poor.)

-----------------------------------

DELTA-L: 'Starting a new DELTA dataset', M. Dallwitz, 25 Feb 09

...

Pay particular attention to this paragraph [from the 'User’s guide to the 
DELTA Editor' (http://delta-intkey.com/www/delta-ed.htm)]:

'If you are starting a data set from scratch, begin by entering only a few 
characters and a few taxa (say about 5 of each), and recording the data for 
these taxa. Then test all the applications you intend to use — for example, 
produce natural-language descriptions, a conventional key, an interactive 
key, and a cladistic tree. Then add a few more characters and taxa, and 
repeat the testing. This iterative procedure helps you detect any problems, 
particularly poor character definitions, before you have recorded much data.'

Many users ignore this advice. For example, they may put together a large 
character list before starting to enter data for the taxa.

The computing aspects are fairly easy, but the biological aspects are 
difficult for most people - though they don't necessarily realise it. It's 
difficult to define characters in ways that will achieve your goals, e.g. 
easy identification, readable descriptions, classification. This is true no 
matter what software you are using.

Ideally, a character list should be tested by having several people 
independently record descriptions of about 10 disparate taxa. The different 
versions of the descriptions will inevitably be different, i.e. the results 
will not be reproducible. You need to know about this very early in the process.

The Differences option in Intkey can easily pinpoint the discrepancies. The 
reasons for them can then be discussed, and the character list refined on 
this basis. This process should then be iterated, adding a few more taxa 
each time, until the character list seems satisfactory.

-- 
Mike Dallwitz
Contact information: http://delta-intkey.com/contact/dallwitz.htm
DELTA home page: http://delta-intkey.com