Spelling detection and correction in Taxonomic Databases
Joseph H. Kirkbride, Jr.
jkirkbri at ASRR.ARSUSDA.GOV
Tue May 28 17:04:10 CDT 2002
From: John Wiersema [sbmljw at ars-grin.gov]
Sent: Tuesday, May 28, 2002 1:24 PM
To: edalcin
Cc: Joe Kirkbride; tmetz at cgiar.org
Subject: Re: Spelling detection and correction in Taxonomic Databases
(fwd)
Dear Eduardo:
Your message was forwarded to me by a colleague who subscribes to TAXACOM.
With some input from myself, Dr. Thomas Metz of IPGRI in Rome has
developed a similar tool for comparing lists of names to the GRIN
database. You can find this utility at
http://pgrdoc.ipgri.cgiar.org/taxcheck/grin/index.html, with a full
explanation of its usage, where it seems to be now publically available
for use.
Regards,
_________________________________________________________________________
John H. Wiersema, Ph.D.
United States Department of Agriculture/Agricultural Research Service
Systematic Botany & Mycology Laboratory
Bldg. 011A, Beltsville Agricultural Research Center (BARC-West)
Beltsville, MD 20705-2350 U.S.A.
Tel: 1-301-504-9181 Fax: 1-301-504-5810 Email: jwiersema at ars-grin.gov
_________________________________________________________________________
> ---------- Forwarded message ----------
> Date: Fri, 24 May 2002 14:10:01 +0100
> From: edalcin <edalcin at ONMAIL.COM.BR>
> To: TAXACOM at USOBI.ORG
> Subject: Spelling detection and correction in Taxonomic Databases
>
> Dear taxacom,
>
> I'm an MPhil/PhD student at University of Southampton - UK, working with
> some computational techniques in order to detect (and maybe correct)
> "bad data" in taxonomic databases. These techniques are organized in
> three different approaches: structural, contextual and spelling errors.
>
> I'm working with some taxonomic databases where the most important are:
>
> * Species 2000 - 51,918 "unique names"
> * ILDIS - 15,616 "unique names"
> * Northeast of Brazil Plants Checklist - 7,691 "unique names"
> * Atlantic Rain Forest (Brazil) - 1,802 "unique names"
>
> I'm focussing on the spelling errors approach at this moment and I'm
> wondering if anyone is working in any similar or related approach in
> order to share our experiences.
>
> I would like to know, as well, if any taxacom members that have
> Taxonomic Databases would like to have their database checked by the
> tools that I'm working on. These tools generate a list of "suspect pairs
> of names", that could be spelling errors, using different algorithms.
>
> Here are some examples that arise from the cited dbs:
>
> * Spirodela polyrhiza
> Spirodela polyrrhiza
>
> * Inga brachystachya
> Inga brachystachys
>
> * Squatina occulta
> Squatina oculata
>
> * Steindachneria argentea
> Steindachnerina argentea
>
> * Tephrosia clementii
> * Tephrosia clementis
>
> * Rhipsalis cassutha
> Rhipsalis cassyta
>
> * Epidendrum cinnabarimum
> Epidendrum cinnabarinum
>
> * Fleurya aestuans
> Fleurya aestyans
>
>
> Thank you in advance for any comments and contributions to my work.
>
> -------------------
> Eduardo Dalcin
> edalcin at soton.ac.uk
> -------------------
>
More information about the Taxacom
mailing list