[Taxacom] GBIF: perpetuating probably defunct unpublished names
Stephen Thorpe
stephen_thorpe at yahoo.co.nz
Mon May 24 17:22:37 CDT 2010
yeah, these are the questions we would all like to be able to answer, but (1) how realistic is it to think that we can answer them in a meaningful and useful way; and (2) is a GBIF-type infrastructure (either alone or in combination with others) appropriate for these kinds of questions? Also, given that the answers to (1) and (2) are realistically likely to be of the form 'yes, to some extent anyway', is that extent sufficient to justify the cost of GBIF-sized initiatives, or would money be better spent on, for example, more targetted field-based studies to determine if some critically endangered species occurs within a protected area or not? Mapping of distributions from museum specimens is a whole can of worms in itself. A certain (unknown, but hopefully small) proportion of museum specimens are mislabelled, but more than that a dot on a map doesn't really tell you if the species occurs there now, or only at some time in the past. Add to that the
fact that many historical specimens do not have perfectly precise locality data, and it looks to me like that sort of question can only be reliably answered by going out into the field and actually surveying for a species (which would seem to be the only way to answer my question about the survival or extinction of that carabid beetle in Kuwait: http://species.wikimedia.org/wiki/Cicindis_johnbeckeri). This is certainly the case for most species on Earth, which are tiny sized insects and things. Maybe it is different (easier) with the much fewer big furry and feathery species? Lots of potentially threatened tiny species are going to slip under the radar anyway, due to never having been named or studied taxonomically.
I guess I think that data quality is of primary importance, but most of the effort these days seems to be going into data manipulation, which is certainly easier, but "garbage in, garbage out". I am deeply skeptical that much of any use can come out of automatic harvesting of names, partly because there is often (in the absence of a good modern taxonomic revision) no known meaningful connection between names and species (an old name may still have a mixed syntype series, for example, or the unique holotype may be the only identified specimen even though the name is an unpublished synonym of a common species), particularly in cases like the one that I started this thread with (unpublished and defunct Agalba species names).
I think GBIF (and others) would be wise to adopt a more collaborative and open approach. By the latter, I mean that it would be a very good idea, for example, to have a feedback comment facility in a prominent place on each page (moderated, of course, but fairly) where anyone who knows better can flag errors and/or indicate new data, make comments, and post links to better data sources (such as Wikispecies). In the case of the relevant Agalba pages, for example, I would simply want to flag them as unpublished, and likely never to be published (as probably already described under other names). For me, two of the key problems with closed-source biodiversity databases like GBIF, EoL, CoL, AFD, etc. are the great difficulty they seem to have in keeping themselves up-to-date, and the seeming impossibility of getting errors fixed or even suitably flagged.
In a nutshell, I just think that the data issues in this area are so complex, that there is just no possibility of creating anything meaningful without a massive input of human brain power to carefully assemble the data into a meaningful picture, given the enormous range in data reliability that no machine could ever assess. GBIF may "have" millions of names while Wikispecies currently has only hundreds of thousands, but Wikispecies pages can be easily given far more useful content than GBIF pages, by way of well-chosen links to primary references, etc.
I'm not sure what the answer is to all this, but it is surely worth giving some thought to, rather than just letting the locomotive steam on to destination unknown ...
One final comment:
>I don't believe the wikispecies format or wikispecies site provides services that can enable this content to be served in a manner that could inform the examples above
you are probably right, but I think it would be a very straightforward task to develop a solution to that. Surely a search engine could be designed to search Wikispecies pages for specific data? It might require a bit more structuring of the data on the Wikispecies pages, but that is not impossible. Frankly, at the moment at least, I don't see GBIF as being much better off for being able to 'inform the examples above', largely due to a lack of (validated) content...
Stephen
________________________________
From: David Remsen (GBIF) <dremsen at gbif.org>
To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz>
Cc: David Remsen (GBIF) <dremsen at gbif.org>; TAXACOM <taxacom at mailman.nhm.ku.edu>
Sent: Mon, 24 May, 2010 10:33:53 PM
Subject: Re: [Taxacom] GBIF: perpetuating probably defunct unpublished names
On May 24, 2010, at 1:39 AM, Stephen Thorpe wrote:
Two key issues here:
>
>(1) who wants/needs biodiversity information, and in what form do they want/need it? Is there a "one format suits all"?? What are people actually using any of these biodiversity data resources (including Wikispecies) for???
>
Among the uses of primary biodiversity data originating in collections that are by no means comprehensive - just what pops into my head from recent correspondence.
* Integration with information on world protected areas and with thematically-scoped species checklists like the Red-list to assess whether for example, critically endangered species occur within protected areas or not.
Ex. http://www.protectedplanet.net/sites/Cabaneros_National_Park_State_Network
* Integration with elevation and habitat data to focus on mountainous regions.
http://www.mountainbiodiversity.org/
* Integration with climatic data to define a species occurrence envelope to make predictive distributions.
http://www.aquamaps.org/
* Integration with climate change models to predict possible changes to crop distribution and the impact of climate change on crop wild relatives
http://earthsky.org/agriculture/andy-jarvis-models-the-effect-of-climate-change-on-worlds-top-50-crops
In these cases, it is infrastructure that enables these questions and lines of inquiry to be pursued and it often indicates a clear need for more accurate and verified data. Perhaps, we could and should, as Stephen suggests, disconnect that infrastructure from the 8000 or so source databases that provide these 200 million plus raw data and identify a set of courses that provide only sources derived through taxonomic revisions. It may be, via examples like this, that additional use cases to support increased taxonomic revisions can be marshalled.
I believe, however, that there is a need for infrastructure that enables and supports larger questions about biodiversity. I also think there should be a distinction made between that infrastructure and the quality of data that is mobilised through it. We can, and should, focus on methods that enable quality assessments to be made, annotations to be provided, and data quality to be improved.
The content in wikispecies may represent a higher quality than what we can derive from the raw collections data. I don't believe the wikispecies format or wikispecies site provides services that can enable this content to be served in a manner that could inform the examples above. However, if the data is consistent, comprehensive and has sufficient internal integrity, we could map it to the data formats that do conform to the requirements of those above and we could redirect that infrastructure to serve the wikispecies data. I am happy to take on the task of mapping the data to these standards and work with you or others on the structure of the wikispecies site to create a requirements document for a developer to create the transformation on a regular basis. I can also provide other use cases for the wikispecies data and the requirements for them.
David
>(2) it may be more difficult to measure than the rate of oil spewing into the GoM, but just how much spurious/incorrect/misleading data is being spewed out of GBIF etc.? Just 5000 barrells per day ... ! :) There really are an awful lot of scolytine names on GBIF misleadingly attributed to Wood & Bright, 1992 ...
>
>Personally, when a database tells me that the source of their data is another database, and gives no direct links to, or citations of, primary sources, I ask myself "why am I wasting my time here"?
>
>Also, using specimen data from collections, as I said, is really only reliable after taxonomic revision, and when each and every specimen has been labelled by the reviser ...
>
>Stephen
>
>
>
>
>________________________________
>From: Bob Mesibov <mesibov at southcom.com.au>
>To: TAXACOM <taxacom at mailman.nhm.ku.edu>
>Sent: Mon, 24 May, 2010 11:04:11 AM
>Subject: Re: [Taxacom] GBIF: perpetuating probably defunct unpublished names
>
>Wolfgang Lorenz wrote:
>
>"Large-scale "top-down" projects can provide infrastructures to make such individual taxonomists' work easier and the results better accessible for the wider community."
>
>This is a key argument made by aggregators, and I can't accept it.
>
>First, I don't believe that the various data infrastructures available for databasing taxonomic and specimen data can make the work of individual taxonomists any easier - unless those taxonomists are employed to transform their own or some museum's data sets into something usable by large-scale top-down projects. This of course only rarely happens, because taxonomists are expected to do this additional work for free. There is no universally used software package for recording taxonomic and specimen data and there never will be, and taxonomists will continue to manage data in ways that suit the taxa concerned and the specialists concerned.
>
>Second, 'the wider community' is not better served by the large-scale top-down projects. That community votes with its mouse-clicking finger and the overwhelming majority of clickers have selected WikiXXXX and the carefully crafted bottom-up sites on particular taxa.
>
>That's for access. How about content? Well, for taxonomy and the biology it fronts, no one is going to wait for EOL to fill its pages when there are content-rich sites already loaded with literature and other links. As for specimen data, I've personally been burned often enough by GBIF to avoid it. I go to the monographs that Stephen Thorpe highlights as our best source of specimen data, and to pages (like those on Wikispecies) that point to those monographs.
>
>The 'tunneling through the mountain' analogy isn't a good one. A better one would be: there is a mountain of biodiversity data. Numerous specialist workers are chipping away at it and taking off high-quality chunks and handing it out to anyone interested in those taxa, Elsewhere on the mountain, people have set up several big marquees with signs saying 'Get all your biodiversity data here!' Unfortunately these promoters don't have much to offer yet and what they have is more suspect than what the large number of specialist diggers are producing.
>--
>Dr Robert Mesibov
>Honorary Research Associate
>Queen Victoria Museum and Art Gallery, and
>School of Zoology, University of Tasmania
>Home contact: PO Box 101, Penguin, Tasmania, Australia 7316
>03 64371195; 61 3 64371195
>Webpage: http://www.qvmag.tas.gov.au/mesibov.html
>
>_______________________________________________
>
>Taxacom Mailing List
>Taxacom at mailman.nhm.ku.edu
>http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
>The Taxacom archive going back to 1992 may be searched with either of these methods:
>
>(1) http://taxacom.markmail.org
>
>Or (2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
>
>
>
>
>_______________________________________________
>
>Taxacom Mailing List
>Taxacom at mailman.nhm.ku.edu
>http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
>The Taxacom archive going back to 1992 may be searched with either of these methods:
>
>(1) http://taxacom.markmail.org
>
>Or (2) a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
>
>
More information about the Taxacom
mailing list