[Taxacom] formation of zoological names with Mc, Mac, etc.

Tony.Rees at csiro.au Tony.Rees at csiro.au
Tue Sep 1 01:08:51 CDT 2009


Rich Pyle wrote:

<snip>

> The question is whether one can have a machine 
> look at a text string and *decide* which link should be used, 
> rather than having a human *tell* it (a "time consuming 
> process"); this is the crux of Francisco's point:

Yes, and also the crux of the services being developed around the Global
Names Index, which are based on existing efforts to implement "fuzzy
matching" [Tony Rees -- this is your cue...]

That's not my forte, but what I'd like to see emerge from such services is
not that the computers *decide* how to link, but rather use some sort of
standard metric of liklihood, which the Human can use to make the final
call.  Thus, though it may involve a human, I think the services could cut
down the "time consuming" part dramatically (with access to the right
databases, page images, etc.)  This is slightly different from how Doug
described it, but ultimately it would end up being the same thing.  The easy
matches (only one, high-probability match) would probably be accepted
automatically in most cases; but the not-so-certain matches would be flagged
for human scrutiny.  But even in those cases, if all the necessary resources
are one mouse-click away (e.g., page images of original literature through
BHL), then the job would be a LOT easier than it is now.

</snip>


Okay, okay...

Actually I do this all the time, as a part of the process of adding names to my IRMNG genera compilation - in essence I wish to sort "incoming" names according to whether or not they are already on the master list, and add them if new.

To do this I need to match genus (and in some cases specific) names exactly or fuzzily, also cited authorities and years exactly or fuzzily, while also taking into account taxonomic placement as available. What I may end up with is (e.g.) a list of exact or near matches on scientific name pairs and degree of author similarity on a 0-1 scale using the "compare_auth" portion of taxamatch (you can test this yourself at http://www.cmar.csiro.au/datacentre/taxamatch-tests.htm if interested), additionally with other info as available e.g. higher taxonomy, cited publication info and nomenclatural notes, etc. There seems to be a numeric threshold of the computed authority similarity of around 0.2-0.4 below which *most* of the results appear likely to be different and above which *most* appear likely to be acceptably the same to a human eye: e.g. the following snippet, below. Actually the threshold is not really fixed, it seems to vary according to the characteristics of the data being compared too, but that's par for this particular course I guess.

Basically in this case the machine does not replace the human reader but the pre-sorting that the algorithm can do makes the list a lot easier to scan and spot the exceptions.

(I do similar things with searching for near matches on scientific names: use the algorithm to (1) determine if any near matches are present, and (2) present candidates for scrutiny in pre-sorted groups which are then easy to eyeball and for a human to accept or reject).

I can expand on these further as desired, but this is probably enough except for those who may be gluttons for punishment!

Regards - Tony

--------

Here's some sample output of my "authority comparison" values of real genus name pairs with identical spelling (in this case, for fungi) - i.e. either homonyms, or duplicates for which I only need one instance (where the authors are sufficiently different I deem it a new name i.e. homonym, and will upload it as such). In general the software should expand known author abbreviations to the full version, but in at least one case below, the selected abbreviation (Bat.) is either not on my list, or associated with a different name: in fact (on checking) it is on the list twice, first associated with Batenburg, and second with Batista - obviously undesirable, not sure what is the best way forward in this instance...)

AUTHORITY_1
AUTHORITY_2
AUTH_SIMILARITY

Junius ex Linnaeus, 1753
Persoon, 1801
0.0361

Meschinelli, 1892
A. Straus, 1950
0.0417

Tode, 1790
E.M. Fries, 1832
0.0494

Meschinelli, 1898
T.N. Hermann in B.S. Sokolov, 1979
0.0503

Arthur & Bisby, 1921
B. Renault, 1894
0.0741

(Persoon) Roussel, 1806
S.F. Gray, 1821
0.0847

Linnaeus, 1753
E.M. Fries, 1822
0.1062

Renault, 1896
D. Ellis, 1916
0.1217

Sprengel, 1827
Bosc ex E.M. Fries, 1829
0.1351

P. Micheli ex Persoon, 1794
E.M. Fries, 1832
0.1357

Tode, 1790
Tode ex Palisot de Beauvois in F. Cuvier, 1805
0.2003

Nees, 1816
C.G.D. Nees ex A.T. Brongniart in F. Cuvier, 1824
0.2026

Link, 1815
Link ex Brongniart in Willdenow in F. Cuvier, 1824
0.22

Nees, 1816
C.G.D. Nees ex Brongniart in F. Cuvier, 1824
0.2241

C.C. Chen ex W.H. Ko, H.S. Chang, H.J. Su, C.C. Chen & L.S. Leu, 1978
C.-C. Chen, 1961
0.2257

Sowerby, 1803
Persoon, 1822
0.2289

Nees & T. Nees, 1818
C.G.D. Nees ex Leman in F. Cuvier, 1821
0.2384

Link, 1809
Link ex Wallroth in Bluff & Fingerhuth, 1833
0.2443

Fell, Statzell, I.L. Hunter & Phaff, 1970
Fell et al., 1969
0.2464

Tode, 1790
Tode ex Kunze & J.K. Schmidt, 1823
0.2487

Link, 1816
Link ex A.T. Brongniart in F. Cuvier, 1824
0.2489

Batista, 1960
Bat.
0.2539

Kunze, 1817
G. Kunze & J.K. Schmidt ex E.M. Fries, 1832
0.2594

Ozkose, B.J. Thomas, D.R. Davies, G.W. Griff. & Theodorou, 2001
E. Ozkose et al., 2001
0.2651

Persoon, 1797
(Persoon ex E.M. Fries) S.F. Gray, 1821
0.269

Sherwood, 1986
M.A. Sherwood-Pike in F. Candoussau, K. Katomoto & M.A. Sherwood-Pike, 1986
0.2765

Haller, 1768
[Haller] E.M. Fries, 1821
0.2772

Korf, 1978
R.P. Korf in R.P. Korf, R.N. Singh & V.P. Tewari, 1978
0.2818

(Persoon) Roussel, 1806
Persoon ex S.F. Gray, 1821
0.2903

Nees, 1816
C.G.D. Nees ex S.F. Gray, 1821
0.2911

Kunze, 1817
(Kunze ex Persoon) Steudel, 1824
0.2911

Nees, 1816
C.G.D. Nees ex S.F. Gray, 1821
0.2911

Kunze, 1817
[Kunze] E.M. Fries, 1821
0.2924

Schulzer, 1866
S. Schulzer von Müggenburg in S. Schulzer von Müggenburg, A. Kanitz & Knapp, 1866
0.2927

Tode, 1790
Tode ex A.J.C. Corda, 1837
0.2939

Morais, Batista & Massa, 1966
Falcão de Morais et al., 1966 (Approved Lists, 1980)
0.298

Nees, 1816
C.G.D. Nees ex E.M. Fries, 1832
0.2991

(etc.)
.......


Trappe, Castellano & Amaranthus, 1996
J.M. Trappe, M.A. Castellano & M.P. Amaranthus, 1996
0.7844

Chesters & Greenhalgh, 1964
C.G.C. Chesters & G.N. Greenhalgh, 1964
0.7846

Mougeot & E.M. Fries, 1825
Mougeot & E.M. Fries ex E.M. Fries, 1825
0.7846

D. Hawksworth & R. Santesson, 1990
D.L. Hawksworth & R. Santesson in H.M. Jahns, 1990
0.7847

Berkeley & Broome, 1870
M.J. Berkeley & Broome, 1875
0.7854

Penzig & Saccardo, 1898
Penzig & P.A. Saccardo, 1897
0.7854

Nylander, 1885
Nylander in Hue, 1885
0.7856

F. Stevens, 1923
F.L. Stevens, 1924
0.7856

P. Karsten, 1870
P.A. Karsten, 1871
0.7856

(Nannenga-Bremekamp) Nannenga-Bremekamp, 1975
(N.E. Nanninga-Bremekamp) N.E. Nannenga-Bremekamp, 1974
0.7859

K.D. Hyde & Nakagiri, 1991
K.D. Hyde & Nakagiri In K.D. Hyde, 1991
0.7865

(etc.)






More information about the Taxacom mailing list