[Taxacom] GBIF data

Shorthouse, David dps1 at ualberta.ca
Wed Nov 22 15:52:32 CST 2006


Wolfgang,

You have essentially commented on the ultimate goals for GBIF, but it's
unclear to me how your email or text file solution using pipes helps solve
the issue of digitizing invertebrate collections. Folks at GBIF are
considering text file uploads among other data exchange protocols, but this
really isn't the bottleneck. The true bottleneck is the human-power and
associated funds to enter the data. If "bugs" came with geotagged barcodes,
it would be much easier ;)~ To encourage data entry, I'd argue for
value-added features to assist the data entry process & thus streamline it
as much as possible e.g. real-time nomenclatural checks, geocoding checks,
etc. because what comes out of GBIF is really only as good as what goes in.
As for the choice of fields and XML schema, GBIF has a pretty elegant
(though somewhat sluggish) system with DiGIR and other communication
protocols...not all fields are required and you may use any database system
you want so long it is tied to a web server that can serve PHP. Perhaps what
they also need is better documentation about this system & step-by-step
installable packages.

David P. Shorthouse
------------------------------------------------------
Department of Biological Sciences
CW-403, Biological Sciences Centre
University of Alberta
Edmonton, AB   T6G 2E9
Phone: 1-780-492-3080
mailto:dps1 at ualberta.ca
http://canadianarachnology.webhop.net
http://arachnidforum.webhop.net
------------------------------------------------------


-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu
[mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Faunaplan at aol.com
Sent: Wednesday, November 22, 2006 2:24 PM
To: TAXACOM at mailman.nhm.ku.edu
Subject: [Taxacom] GBIF data

Dear all,
GBIF's data on geographic occurrences of the world's living species are
still 
highly fragmentary and, in part, rather unreliable esp. in regard to
insects. 
I'm just wondering why not open yet another gate for a large community of 
data holders who could support GBIF's mission, - more user-friendly, simple
and 
much faster  than current procedures.
Here are some musings, - focussing on data from entomological collections:

Insects make up the lion's share of species diversity but reliable
occurrence 
data are still mainly written on hundreds of millions of labels pinned to 
specimens in museums and collections, - practically inaccessible for most 
potential users. 
Instead of trying to digitize these data according to a highly complex XML 
schema, why not just take data as they are from these specimen labels and
put 
them into a simple (flat) data file, say - a pipe delimited text file like
in 
the following example:

ID-LABEL|LOC-LABEL|SAMPLESIZE|COLLECTION
"Amara (Brad.)\ majuscula Chd.\ det. Hieke 1982"|"Schwanheim a.M.\ Feld, 
18.6.53\ coll. H.Hesse"|1|"ZSM"
"Amara (s.str.)\ pindica Apf.\ det. Hieke 1969"|"Collection\ Strasser\\ 
Caucasus\\ Eriwan"|1|"ZSM"
"Harpalus\ Winkleri Schbg.\ det. Dr. E.Schaub."|"Mokra pl.\\ Golesnica pl.\ 
Macedonia\\ Sammlung\ Apfelbeck"|1|"ZSM"

In a second step, we can add a few fields in order to standardize and unlock

the treasure, e.g.:
- COUNTRY (standard names or codes)
- LATLONG (containing a gross latitude/longitude georeference)
- VALIDNAME (containing the standardized current taxonomic name)
example as above:
COUNTRY|LATLONG|VALIDNAME|ID-LABEL|LOC-LABEL|SAMPLESIZE|COLLECTION
Germany|NE50008|Amara majuscula|"Amara (Brad.)\ majuscula Chd.\ det. Hieke 
1982"|"Schwanheim a.M.\ Feld, 18.6.53\ coll. H.Hesse"|1|"ZSM"
Armenia|NE40044|Amara proxima|"Amara (s.str.)\ pindica Apf.\ det. Hieke 
1969"|"Collection\ Strasser\\ Caucasus\\ Eriwan"|1|"ZSM"
Macedonia|NE41021|Harpalus xanthopus|"Harpalus\ Winkleri Schbg.\ det. Dr. 
E.Schaub."|"Mokra pl.\\ Golesnica pl.\ Macedonia\\ Sammlung\
Apfelbeck"|1|"ZSM"
Additional steps, e.g. more detailed georeferencing, could follow later 
whenever needed...
 
Such simple files could be produced during routine curatorial or ID work and

sharing them via email should be very easy. GBIF could use these handy 
datasets to create dynamic distribution maps generated by a rather simple
PHP/ MySQL 
application. On the new GBIF data portal prototype, I've already seen a
pixel 
worldmap that seems to be perfectly fit for the display of standardized 
(one-degree latlong) occurrence data by accurrate 2X2 pixel dots. 
In addition to such maps, there could be a display of corresponding 
standardized background information (e.g., VALIDNAME, COUNTRY, LATLONG,
COLLECTION) in 
simple data lists. Both maps and lists could be downloaded by the user with 
just a click. 
In case access to full data details is seen as problematic (see previous 
discussions on "sensitive data"), the field LOC-LABEL could be kept under 
restricted access while other essential information is freely accessible.
Tracking 
down all information to the sources would be rather easy anyway.
As a side effect, this could also become a central database for collection 
inventaries, - wouldn't it be more useful than individual inventaries on
each 
Museum's website?

"Free and open access to the world's biodiversity data through the 
collaborative medium of the Web is an important tool for the sustainable
stewardship of 
Earth. Unlocking such data will lead to much better policy and 
resource-management choices locally, regionally and globally" (cited from an
article by Matt 
BALL, in GeoWorld, Aug. 2005)

Best wishes,
Wolfgang
---------------------------------------
Wolfgang Lorenz
Faunistics and Environmental Planning
Hoermannstr. 4
D-82327  Tutzing, Germany

(P.S.: I'm currently running two test versions of  PHP/MySQL map tools on my

websites; please contact me for details)
_______________________________________________
Taxacom mailing list
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom





More information about the Taxacom mailing list