[Taxacom] Global biodiversity databases

Stephen Thorpe stephen_thorpe at yahoo.co.nz
Wed Aug 8 15:57:40 CDT 2012


Hi Karen,
I guess my problem is with the sales pitch, i.e. [quote]comprehensive first-draft tree of all named species[unquote]
This is ridiculously unrealistic (or else rather misleading)!
Firstly, there doesn't even exist a fully comprehensive *listing* of all named species yet! CoL might imply in their sales pitch that they are close, but not true!
Secondly, even if there did exist such a thing, the proportion of names for which phylogenies of any kind are available is very small, so the only way I can see OToL being able to do what it promises is to simply end up with an enormous unresolved polychotomy (at various levels), with a little bit of actual phylogenetic data buried somewhere inside! If, for example, you use CoL for weevil names, you will end up with the absurdity of presenting unresolved phylogenetic relationships between synonyms and the same species under different genera!
Please explain ...
Cheers,
Stephen

From: Karen Cranston <karen.cranston at gmail.com>
To: Stephen Thorpe <stephen_thorpe at yahoo.co.nz> 
Cc: "Tony.Rees at csiro.au" <Tony.Rees at csiro.au>; "taxacom at mailman.nhm.ku.edu" <taxacom at mailman.nhm.ku.edu> 
Sent: Thursday, 9 August 2012 1:28 AM
Subject: Re: [Taxacom] Global biodiversity databases

Speaking with my Open Tree of Life hat on... We plan to have a first
draft of the tree released in summer 2013, along with the ability for
users to annotate nodes, upload new trees (to enable continuous
updating), as well as an API for programmatic access. Where we can, we
will use inferred phylogenies to construct the OpenTree. We will need
to rely on taxonomies to fill in the gaps where we don't have
phylogenetic coverage, and also for resolution of names in input
trees. These are two key places where having centralized taxonomic
resources would be a huge benefit.

Cheers,
Karen

On Tue, Aug 7, 2012 at 10:37 PM, Stephen Thorpe
<stephen_thorpe at yahoo.co.nz> wrote:
> just noticed this on Twitter:
>
>>Open Tree of Life
> @opentreeoflife
> This NSF-funded project will produce the first online, comprehensive first-draft tree of all named species, accessible to both scientists. and the public.
>
> <
>
> *all named species*!! I note that they don't say when!
>
>
> ________________________________
> From: "Tony.Rees at csiro.au" <Tony.Rees at csiro.au>
> To: stephen_thorpe at yahoo.co.nz; taxacom at mailman.nhm.ku.edu
> Sent: Tuesday, 7 August 2012 2:07 PM
> Subject: RE: [Taxacom] Global biodiversity databases
>
> Hi Stephen,
>
> Those who know me might appreciate that I have some interest in this area, e.g. see a couple of recent presentations:
>
> http://www.slideshare.net/tony1212/rees-an-all-genera-index
> http://www.slideshare.net/tony1212/rees-towards-a-hierarchical-classification-of-all-life
>
>
> Without knowing the subtext to your question(s), here are the answers I would give if pressed...
>
>
>> Question 1: Do you expect a comprehensive and reliable GBD to exist in
>> the foreseeable future (or do you think that one or more already
>> exist)? If so, do you think it is likely to come from an existing
>> initiative, and if so, which one(s)?
>
> I think you have to split this across short term vs. medium/longer term. Short term answer is that currently you have to do a mix-and-match across the best curated resources for specific groups: examples being Eschmeyer's Catalog of Fishes for the latter (extant species and genera), Index/Species Fungorum for fungi, Systema Dipterorum for Diptera, etc. etc.; notable cross-group compilations being Catalogue of Life (which is really a collation/fusion of 100+ "expert curated" systems), WoRMS (similar for some 20 contributing components) for marine species, and so on. For higher plants there is The Plant List, for algae AlgaeBase, for prokaryotes there is List of Prokaryotic names with Standing in Nomenclature (LPSN) plus CyanoDB, and for viruses there is the ICTV database. I would term these (with the exception of the composite CoL dataset and the Plant List) "primary aggregators" which ideally are the realm of experts in their respective fields (my 2
>  cents anyway).
>
> Medium to longer term there is the hope/wish/desire to move to an environment where as many as possible of these resources agree to collaborate in a common infrastructure, currently termed the Global Names Architecture or simply GN for Global Names. A recent meeting in Hawaii aimed to address some of the challenges to doing this, see http://www.globalnames.org/taxonomy/term/169/0 .
>
> Meanwhile while we wait for GN to deliver the "holy grail" there are secondary aggregators of which my project, ITIS, Wikispecies, Wikipedia and more might be cited as examples, taking material from the primary aggregators and original sources to build something more complete than any single source. Speaking from experience I do this without necessarily taxonomic expertise in any particular area, but hopefully some ability to make calls on which source to use or weight accordingly in the case of conflicting information. In some cases these secondary aggregators may also have a slightly different remit than the primary ones e.g. fleshing out with images, descriptive information or distributions absent from the purely nomenclatorial or bare-bones species lists. Whether these will also move into the GN space as discrete entities maintained separately for ever or will coalesce into a few larger units remains to be seen.
>
>> Question 2: Which would you prefer, (A) data verified by "experts"; or
>> (2) data verifiable by the user (via referencing)?
>
> Answer would be both (see also examples given below). If an expert has made a call then that saves me (the user) doing the same! At the same time the more evidence which is included on which the call is based the better, so one can assess the currency and quality/credibility of that information, and as needed consider whether to utilise it unchanged or not (for example new taxonomic information may have been published since that call was made e.g. taxonomic placement, synonymy, name change etc.).
>
>> Question 3: What kinds of data do you want to be able to access from a
>> GBD?
>
> A previous Taxacom post from Rod Page suggested the following:
>
> <snip>
>
> Very simple questions are being asked:
>
> 1. Is this a name?
> 2. Is this the correct way to write it?
> 3. Is this name currently in use?
> 4. What other names are related to this name (e.g., synonyms, lexical variants)?
> 5. Where was this name published? Can I see that publication?
>
> </snip>
>
> I would extend this a bit further:
>
> 6. What is the current (and also past) taxonomic placement of this name (+ according to...)
> 7. What are its parent/children in the selected current taxonomic hierarchy
> 8. What other names are lexically related to this name (homonyms, near-homonyms/candidate "did you mean")
> 9. What do we know of the type specimen i.e. when/where collected, where deposited, geologic age, associated habitat etc.
> 10. What do we know of the taxon to which this name applies - ecological info, distribution in space and time, common names, descriptive characters, significant literature treatments
>
> For a "straw man" here is an example species-level name treatment from Eschmeyer's online Catalog of Fishes, for the name Bythites hollisi (now a syn. of Thermichthys hollisi):
>
> <snip>
> hollisi, Bythites Cohen [D. M.], Rosenblatt [R. H.] & Moser [H. G.] 1990:270, Figs. 1-8 [Deep-Sea Research v. 37 (no. 2); ref. 14223] Hydrothermal vent (Mussel Bed) on Galápagos Rift Zone, 0°47.894'N, 86°9.210'W, depth 2500 meters. Holotype (unique): SIO 88-97. .Valid as Bythites hollisi Cohen, Rosenblatt & Moser 1990 -- (Geistdoerfer 1999:9 [ref. 23832], Nielsen & Cohen in Nielsen et al. 1999:98 [ref. 24448], Machida & Hashimoto 2002:1 [ref. 25949], Chernova & Geistdorfer 2003:153 [ref. 26887]). .Valid as Gerhardia hollisi (Cohen, Rosenblatt & Moser 1990) -- (Nielsen & Cohen 2002:50 [ref. 26528]). .Valid as Thermichthys hollisi (Cohen, Rosenblatt & Moser 1990) -- (Nielsen & Cohen 2005:395 [ref. 28470]). Current status: Valid as Thermichthys hollisi (Cohen, Rosenblatt & Moser 1990). Bythitidae: Bythitinae. Distribution: Southeastern Pacific. Habitat: marine.
> </snip>
>
> (also note that all the statements are referenced to a references table which can be searched independently).
>
> You can assess for yourself how much of my suggestions above are covered here. Some that are not may be covered by the equivalent entry in FishBase, see
>
> http://www.fishbase.org/summary/Thermichthys-hollisi.html
>
> (Actually this page is pretty bare compared with many in FishBase, but you will get the idea).
>
> For fossil taxa I think PaleoDB has pretty much the right approach, as an example see this page for the genus Tyrannosaurus:
>
> http://paleodb.org/cgi-bin/bridge.pl?a=basicTaxonInfo&taxon_no=38613 (there is a lot more information also available via "more details" as well)
>
>
>> Question 4: Which existing initiative currently comes closest to what
>> you would ideally like to see?
>
> See some examples above for particular groups (many more out there). Across all groups - either build your own (as I do) or use Google Scholar and Nomenclator Zoologicus (for animals) as a surrogate for the literature at this time, backed up by other internet/print resources as available (a personal library is still invaluable, especially for the more substantial texts). Wikipedia is surprisingly useful for recent updates on treatments of some groups and for the more "charismatic" taxa in general (the value of crowdsourcing I guess) but also beware of inaccuracies/inconsistencies between treatments on different pages, also very incomplete as minor taxa are not considered sufficiently "notable" I guess. (Why Wikipedia as opposed to Wikispecies? I guess I typically want more than the "bare bones" taxonomic placement and Wikispecies only supplies the latter).
>
> That's my take - maybe not quite what you are asking for, but maybe something useful there.
>
> Regards - Tony
>
> Tony Rees
> Manager, Divisional Data Centre,
> CSIRO Marine and Atmospheric Research,
> GPO Box 1538,
> Hobart, Tasmania 7001, Australia
> Ph: 0362 325318 (Int: +61 362 325318)
> Fax: 0362 325000 (Int: +61 362 325000)
> e-mail: Tony.Rees at csiro.au
> Manager, OBIS Australia regional node, http://www.obis.org.au/
> Biodiversity informatics research activities: http://www.cmar.csiro.au/datacentre/biodiversity.htm
> Personal info: http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566
> LinkedIn profile: http://www.linkedin.com/pub/tony-rees/18/770/36
>
>> -----Original Message-----
>> From: taxacom-bounces at mailman.nhm.ku.edu [mailto:taxacom-
>> bounces at mailman.nhm.ku.edu] On Behalf Of Stephen Thorpe
>> Sent: Tuesday, 7 August 2012 8:32 AM
>> To: TAXACOM
>> Subject: [Taxacom] Global biodiversity databases
>>
>> Dear Taxacomers,
>> I have created a short questionnaire (below) for which I would
>> appreciate greatly any replies. It concerns global biodiversity
>> databases (GBDs) ("databases" in the broadest possible sense).
>> Cheers, Stephen
>>
>> Question 1: Do you expect a comprehensive and reliable GBD to exist in
>> the foreseeable future (or do you think that one or more already
>> exist)? If so, do you think it is likely to come from an existing
>> initiative, and if so, which one(s)?
>> Question 2: Which would you prefer, (A) data verified by "experts"; or
>> (2) data verifiable by the user (via referencing)?
>> Question 3: What kinds of data do you want to be able to access from a
>> GBD?
>> Question 4: Which existing initiative currently comes closest to what
>> you would ideally like to see?
>> _______________________________________________
>>
>> Taxacom Mailing List
>> Taxacom at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>>
>> The Taxacom archive going back to 1992 may be searched with either of
>> these methods:
>>
>> (1) by visiting http://taxacom.markmail.org/
>>
>> (2) a Google search specified as:
>> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either of these methods:
>
> (1) by visiting http://taxacom.markmail.org/
>
> (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here



-- 
~~~~~~~~~~~~~~~~~~~~~~~
karen.cranston at gmail.com
~~~~~~~~~~~~~~~~~~~~~~~


More information about the Taxacom mailing list