Taxacom: a class of errors in Worms (and similar databases)

Erikjan Rijkers er at xs4all.nl
Sun Feb 23 23:50:14 CST 2025


Op 2/24/25 om 00:06 schreef Geoff Read:
> perhaps not a good example of the general situation in a database

Perhaps - but it's not exactly an accidental error:

Of the names in the GBIF backbone file (admittedly from 202308) which 
has 2,614,593 accepted species names (in 268,274 genera), there is more 
than 1 % erroneously unparenthesized.

 From GBIF name records that have status='ACCEPTED':

[...]
Zygopleura Koken, 1892  | Zygopleura plebia Herrick, 1887
Zygosoma Labbé, 1899    | Zygosoma gibbosum Greeff, 1880
Zygota Förster, 1856    | Zygota congener Zetterstedt, 1840
Zynodes Whalley, 1970   | Zynodes strigerella Hampson, 1903
Zyzzyva T.L.Casey, 1922 | Zyzzyva squamosa C.H.Boheman, 1844
(28836 rows)

The 'accepted' names from checklistbank.org (from 202502) are 0.5% 
erroneously unparenthesized (~13,000 names).

> 
> In WoRMS I think a general search for instances of lack of parenthesis where there is a younger genus name requires the WoRMS database managers to do the search. Editors & users don't have the complex search capability to find the mismatches.
> 
> So, as Mark Costello suggested, an approach to the WoRMS data team to investigate for other instances would be a great idea.

Surely the biologists/curators (who I would expect might be on TAXACOM) 
should instruct their own technical people.

Erikjan


More information about the Taxacom mailing list