[Taxacom] data quality vs. data security
Tony.Rees at csiro.au
Tony.Rees at csiro.au
Thu Feb 11 22:44:11 CST 2010
Dear Stephen,
You wrote:
<snip>
66 taxon specific databases is still not much taxonomic coverage. You have actually confirmed my thoughts on what CoL is: namely a data aggregator of other taxon specific databases (66, in fact), and it is therefore no better or worse than those source databases.
</snip>
If that was a secret, well it's not a very well kept one :)
See e.g. the opening paragraph/s on http://www.catalogueoflife.org/info_about_col.php :
"The Species 2000 & ITIS Catalogue of Life is planned to become a comprehensive catalogue of all known species of organisms on Earth. Rapid progress has been made recently and this, the ninth edition of the Annual Checklist, contains 1,160,711 species. Please note that this is probably just more than half of the world's known species. This means that for many groups it continues to be deficient, and users will notice that many species are still missing from the Catalogue.
"The present Catalogue is compiled with sectors provided by 66 taxonomic databases from around the world. Many of these contain taxonomic data and opinions from extensive networks of specialists, so that the complete work contains contributions from more than 3,000 specialists from throughout the taxonomic profession. Species 2000 and ITIS teams peer review databases, select appropriate sectors and integrate the sectors into a single coherent catalogue with a single hierarchical classification."
If you think that combining the work of more than 3,000 specialists into a single (more or less) coherent whole coverning maybe 60% of the world's extant species (plus working to continually extend this coverage) is a value-less exercise, well I guess that's your perogative...
Regards - Tony
-----Original Message-----
From: Stephen Thorpe [mailto:s.thorpe at auckland.ac.nz]
Sent: Friday, 12 February 2010 3:36 PM
To: Rees, Tony (CMAR, Hobart); taxacom at mailman.nhm.ku.edu
Subject: RE: [Taxacom] data quality vs. data security
66 taxon specific databases is still not much taxonomic coverage. You have actually confirmed my thoughts on what CoL is: namely a data aggregator of other taxon specific databases (66, in fact), and it is therefore no better or worse than those source databases. I would say that any advantages of such a structure over Google, as a data aggregator, are unlikely to be of sufficient magnitude to justify the cost. By contrast, Wikispecies puts data together in intelligent ways, so although it will always lag well behind in absolute numbers of taxa covered, what is there is more useful. CoL only cites the source databases as sources, so you have to go there anyway to find the real sources. Wikispecies not only cites the primary sources, but also provides links to them whenever possible, and images of taxa whenever available. It is easy to get up numbers (of taxa covered) by sucking data out of multiple databases, and the numbers might impress the funders, but the content (or lack thereof) in CoL, EoL and the like is unlikely to impress anyone ...
________________________________________
From: Tony.Rees at csiro.au [Tony.Rees at csiro.au]
Sent: Friday, 12 February 2010 5:23 p.m.
To: Stephen Thorpe; taxacom at mailman.nhm.ku.edu
Subject: RE: [Taxacom] data quality vs. data security
Hi Stephen,
You write:
<snip>
I can make no sense of their "annual checklists", the annual checklist for 2009 has HUGE gaps for new taxa published in 2009, 2008, ... In fact, all they seem to have is what they can automatically suck out of the few taxon specific databases out there, and nothing much else!
</snip>
In that case, this is no doubt best explained by an appropriate CoL person, but by "gaps" I meant gaps in taxonomic coverage, i.e. not currently covered by any of their 66 (latest count) source databases, which with one significant exception aspire to "complete" coverage of particular taxonomic groups. Chronological gaps, e.g. for recently described taxa, are then the responsibility of the contributing databases, who in the main are progressing such gap filling as their resources and enthusiasm allow (and accepting that there will be a time lag before being uploaded into the next annual release of the CoL).
One mechanism currently being worked on either now or soon, I believe, is improving the "dynamic checklist" version of CoL such that live updates in the source databases are more rapidly accessible via a dynamic version of the CoL in advance of the "static snapshot" annual checklist, so certainly some latency issues exist but are within scope to be worked on as part of the 4D4Life project, see http://www.4d4life.eu/ (more acronyms of course, fun fun fun).
That's 2 more cents gone, hopefully not wasted...
Cheers - Tony
-----Original Message-----
From: Stephen Thorpe [mailto:s.thorpe at auckland.ac.nz]
Sent: Friday, 12 February 2010 3:13 PM
To: Rees, Tony (CMAR, Hobart); taxacom at mailman.nhm.ku.edu
Subject: RE: [Taxacom] data quality vs. data security
> If your argument is that one is more likely to find a more complete species list for *any* genus in wikispecies than elsewhere
No, that is not my argument, and I agree that it is patently untrue
My argument is that Wikispecies is a very easy and cheap way of providing verifiable and complete data to the whole world right now, and so people (especially those in the scientific community) would be foolish not to make the most of it, but, alas, I don't think they are ...
Compared to EOL, for example, the advantages of Wikispecies are too obvious to go over yet again ...
I'm glad you mentioned Catalogue of Life - for it isn't at all what it seems: you speak of "gaps", but actually I can make no sense of their "annual checklists", the annual checklist for 2009 has HUGE gaps for new taxa published in 2009, 2008, ... In fact, all they seem to have is what they can automatically suck out of the few taxon specific databases out there, and nothing much else!
________________________________________
From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Tony.Rees at csiro.au [Tony.Rees at csiro.au]
Sent: Friday, 12 February 2010 4:56 p.m.
To: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] data quality vs. data security
Dear Stephen,
If your argument is that one is more likely to find a more complete species list for *any* genus in wikispecies than elsewhere, then this is patently untrue, since the 2009 online Catalogue of Life currently contains more than 1.1m valid species names and 0.7m synonyms, compared to wikispecies presently quoted 210k taxa at all taxonomic ranks. Certainly CoL has gaps and it is in these areas that wikispecies may gain some points, however to generalise that one system is therefore "better" than the other seems a bit pointless, particularly as the likely winner is CoL.
Maybe a more fruitful area would be to see how the additional effort and content you and others are putting into wikispecies may also contribute to CoL and other related efforts (such as the upcoming GNA and GNUB, see e.g. http://code.google.com/p/gbif-ecat/wiki/GNUBIntro ), but that is unlikely to be advanced by repeated arguments that wikispecies is essentially better than the other initiatives that already feed into such compilations, some of which are considerably richer in species-level information than the equivalent wikispecies entries, and even (gasp) kept up-to-date by equally diligent workers...
Just my 2 cents' worth (I am drawing down my available cents here, though, probably will reach zero soon).
- Tony
More information about the Taxacom
mailing list