[Taxacom] Scope of current Biodiversity Informatics initiatives

Stephen Thorpe s.thorpe at auckland.ac.nz
Tue Feb 16 15:09:57 CST 2010


>What are the sources of new data?
A mixture of sources - sometimes other existing online databases, sometimes primary literature (including literature that is available freely online, literature that is available online to particular Wikispecies editors via personal and/or institutional subscriptions/access agreements, and non-digitised literature available from personal collections and/or institutional libraries and/or contacts)

>How many people work on Wikispecies?
Not many, currently - maybe 10 or so regular editors?

>How many entries (new taxa, existing taxa pages being updated) are added per year?
That depends on how things go - I don't know the answer. I can tell you that since 12 Nov 2008, I personally have made 88000 edits. Some measures of this can be misleading - other databases that add thousands of skeleton pages, for example.

>What do you do that Zoorecord does and what not?
We integrate the information that Zoorecord has. They don't put earlier data into modern context. They don't provide links to literature, or images of taxa. You can't go to Zoorecord and easily find a synthesis of the current taxonomic state of knowledge of a group. For example: http://species.wikimedia.org/wiki/Stereomerini , it would take a lot of work to integrate all this information from Zoorecord, but Wikispecies presents it to you at a glance.

>What are you quality control measures: Eg completeness of information (eg all children of a taxon; links, etc.)
With my edits, I strive for full explicit citation of sources, and I have developed a "complete & correct" template to sign off certain pages to distinguish them from pages with only a random selection of child taxa currently included (though I haven't actually used the template very often as yet). For example: http://species.wikimedia.org/wiki/Apteropanorpa
In general, a page should be written is such a way that the information presented is verifiable from the listed references, as in the Apteropanorpa example. Completeness is what is strived for, but the most important thing is that any information presented should be verifiable.

>What is the taxonomic coverage?
Don't know - currently very patchy! But the more people who choose to contribute, the better it is going to get...

>Are taxa chosen arbitrarily or do you have a stragegy to fill in lacunas in other systems such as COL?
A bit of both - I have particular interest/knowledge of some areas (e.g. Coleoptera, Acari) which happen to be not well covered elsewhere ...

Stephen


________________________________________
From: Donat Agosti [agosti at amnh.org]
Sent: Tuesday, 16 February 2010 5:57 p.m.
To: Stephen Thorpe; Tony.Rees at csiro.au; r.page at bio.gla.ac.uk; taxacom at mailman.nhm.ku.edu
Subject: RE: [Taxacom] Scope of current Biodiversity Informatics initiatives

Out of curiosity?

What are the sources of new data?
How many people work on Wikispecies?
How many entries (new taxa, existing taxa pages being updated) are added per
year?
What do you do that Zoorecord does and what not?
What are you quality control measures: Eg completeness of information (eg
all children of a taxon; links, etc.)
What is the taxonomic coverage?
Are taxa chosen arbitrarily or do you have a stragegy to fill in lacunas in
other systems such as COL?

Donat


-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu
[mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Stephen Thorpe
Sent: Tuesday, February 16, 2010 7:02 AM
To: Tony.Rees at csiro.au; r.page at bio.gla.ac.uk
Cc: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] Scope of current Biodiversity Informatics initiatives

As a quick first reply, let me add to your list info on another global
bioinformatics initiative:

- Wikispecies aspires to provide human users with verifiable,
up-to-the-minute information on the biodiversity/taxonomic state of play of
all taxa at all levels, living or fossil. Additionally, it tries to provide
links to publications and images of taxa whenever these are available.
Distributional, bionomic, and taxonomic comments are also provided on talk
pages, and taxa are cross-referenced with associated taxa (e.g. parasites
and hosts). Nomenclatural issues are also fleshed out in full. Low level
information is integrated into a single consistent classification (with
alternatives indicated). The focus is on content, and HOW the pages are
written (in terms of explicit referencing), not WHO wrote them. Since
Wikispecies does not confine itself to harvesting names from already
existing secondary sources, it covers many groups for which there is little
integrated information currently available, for example Coleoptera and also
Acari.

________________________________________
From: Tony.Rees at csiro.au [Tony.Rees at csiro.au]
Sent: Tuesday, 16 February 2010 4:00 p.m.
To: Stephen Thorpe; r.page at bio.gla.ac.uk
Cc: taxacom at mailman.nhm.ku.edu
Subject: Scope of current Biodiversity Informatics initiatives

Stephen,

OK, I have topped up the coffers with a few new cents (for better or worse),
so here goes.

You write:

<snip>

Whenever I ask a straight question, nobody answers it, but here I go again:

what does GBIF have to offer that is different to what CoL has to offer that
is different to what EoL has to offer, etc? Is what they offer worth the
cost? In a world of rapidly disappearing biodiversity (in some parts,
anyway), isn't it more sensible to "describe it before it is gone" as a
priority? Wikispecies is there to integrate it all, not perfectly, but
cheaply ...

</snip>

Actually you ask three questions. Here is perhaps the briefest possible
answer to the first one.

- CoL (Catalogue of Life) integrates names data at the species level from
multiple "authoritative" sources into a single catalogue that can be used by
other initiatives, or interrogated by humans over the web (or also offline
via CD-ROM). By its own estimates it is around 60% complete at this time,
for extant taxa only.

- GBIF integrates species occurrence data (in the main) from multiple
providers, and where possible, uses the CoL Catalogue as a means to organise
and navigate through its data where such information is available in CoL;
again its clients include humans over the web, and machine-level users i.e.
other initiatives that can connect remotely to GBIF data services.

- EOL aspires to provide summary information about species attributes, again
integrated from many sources, including maps/occurrence data from GBIF but
also descriptive information, images, and more, and also uses the CoL
Catalogue as a means to organise and navigate through its data where
possible.

The main aim is to avoid redundant data entry/capture, i.e. "enter once, use
many times", to transfer information between the systems by as automated
means as possible, and for each to concentrate on its specialist area of
interest and expertise.

There are also related, but by no means identical, other major initiatives
such as GNA and ZooBank. GNA aims to handle all published names (if you
like, a mix of "authorized" and "non-authorized but out there anyway"
names), using CoL as one contributory source but by no means the only one;
ZooBank is a prototype registry of newly published animal names, possibly
also covering already published ones retrospectively, which can once again
be used by humans or machine readable to provide unique identifiers and
associated attributes for names, and the publication instances by which
names are made available.

Each is in a varying degree of completeness to date, so of course it is
possible to point to taxa or information that is missing from the relevant
slots in any initiative; however in general these are hoped to be addressed
(by as efficient means as available, or can be devised) as the activities
proceed.

Others will obviously be in a clearer position to elaborate than myself,
should you have further specific questions related to particular initiatives
and the manner of data transfer between them, and either real or perceived
overlap. Of course the interested person can also discover much relevant
information via the home pages of the relevant initiatives.

Hope this helps,

Regards - Tony

Tony Rees
Manager, Divisional Data Centre,
CSIRO Marine and Atmospheric Research,
GPO Box 1538,
Hobart, Tasmania 7001, Australia
Ph: 0362 325318 (Int: +61 362 325318)
Fax: 0362 325000 (Int: +61 362 325000)
e-mail: Tony.Rees at csiro.au
Manager, OBIS Australia regional node, http://www.obis.org.au/
Biodiversity informatics research activities:
http://www.cmar.csiro.au/datacentre/biodiversity.htm
Personal info:
http://www.fishbase.org/collaborators/collaboratorsummary.cfm?id=1566


-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu
[mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Stephen Thorpe
Sent: Tuesday, 16 February 2010 11:18 AM
To: Roderic Page
Cc: TAXACOM
Subject: Re: [Taxacom] In defense of DOIs

>It's not "bioinformatics people" who drove the adoption of DOIs, it was
publishers looking for a way to increase the value of their content through
linking. Links mean greater traffic (e.g., through citations), as well as an
improved experience for the reader (e.g. links that don't break)

Well, I don't know the history, but now publishers have a free alternative
for greater traffic through linking - wasn't it you who pointed out a while
ago that Zootaxa has many thousands of citations (and links) on
Wikispecies/-pedia, all at no cost to them? Direct links to Zootaxa very
rarely break.

>Viewing everything in terms of taxonomy runs the risk of missing the bigger
picture
I agree, but it is unclear what the "bigger picture" is? Industries that
manipulate primary information have their place (and Zoological Record is a
particularly useful one IMO), but one could argue that there is simply far
too much of that sort of thing going on in the world today...

Whenever I ask a straight question, nobody answers it, but here I go again:

what does GBIF have to offer that is different to what CoL has to offer that
is different to what EoL has to offer, etc? Is what they offer worth the
cost? In a world of rapidly disappearing biodiversity (in some parts,
anyway), isn't it more sensible to "describe it before it is gone" as a
priority? Wikispecies is there to integrate it all, not perfectly, but
cheaply ...
_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these
methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:
site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here


__________ Information from ESET Smart Security, version of virus signature
database 4869 (20100215) __________

The message was checked by ESET Smart Security.

http://www.eset.com



More information about the Taxacom mailing list