[Taxacom] Wikispecies is not a database: part 3 (after thinking about it!)

Thu Aug 13 15:01:33 CDT 2009

Dear all,

Mike is of course correct - databases and web pages are in essence quite different - however the situation gets blurry when web sites are constructed using content management systems that actually do use a database under the bonnet (for example to manage the blocks of text, graphics, page navigation, user privileges etc.), so these can be thought of as a sort of hybrid; and I am presuming wikixxx falls into that bag. However this is still not a taxonomic database in the sense I would use the term; for this the relations between taxa are modelled as relations between elements in data tables - in other words you start with a database table (columns and rows) of all the (e.g.) species of interest (nothing to do with the web as yet), hook these to a table of genera, hook the latter to families and higher taxa, and go from there; you also do the same as desired for any other re-usable elements e.g. references, data sources, vernacular names, etc. Then if you want to derive a web site for presentation of all this, that is a separate exercise (or you purchase or otherwise implement an off the shelf solution that does both).

As Mike says, until you handle at least the taxonomic information in a manner like the above, plus make use of database functions to enforce the relationships, primary keys, constraints, and all the rest, it is easy to end up with poor data quaity (such as the same genus spelled in different ways for a start) - I know this from personal experience (one of my systems still does not have a genus table, just species and families, and the genera are held as free text - to be fixed in the next iteration of the system).

This may seem obvious to IT folks but I know (again from personal experience) that it may not be a natural way of thinking for all biologists - I know that many taxonomists (including some in my agency) are very happy NOT to have to think about this stuff, and get on with the task of describing taxa, so long as someone else looks after it. On the other hand some (Rich?) just love to do both, and have even written papers and "how to do it" on their experiences...

Regards - Tony

________________________________________
From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Mike Sadka [M.Sadka at nhm.ac.uk]
Sent: Thursday, 13 August 2009 8:19 PM
To: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] Wikispecies is not a database: part 3 (after thinking    about it!)

Just one final point:

>> We should prioritise data quality over everything else.

That is exactly why you need properly structured databases (in the
correct sense) - they store and protect data.  If you disagree look at
any book on relational databases.

>> impressively structured and presented websites ("databases" in the
broad sense)

This is an incorrect use of the term "database" - not a broad one.
Websites, however well structured or presented, are not and cannot
replace databases.  A website is just how the data are presented and
tells you absolutely nothing about how safely or effectively they are
stored.  This is basic IT.

OK - two final points.  But more than enough from me now I am sure!

-----Original Message-----
From: taxacom-bounces at mailman.nhm.ku.edu
[mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Stephen Thorpe
Sent: 11 August 2009 02:40
To: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] Wikispecies is not a database: part 3 (after
thinking about it!)

Hi Mike

I don't know if it is just me, but I find it quite difficult in a
forum like this to get the details of my argument right first time,
but then the responses kind of prod me in the right direction, so I
guess it works out good in the end. Anyway,

[you wrote] This attitude worries me a lot because it seems not to ask
where that taxonomic information is going or whether best use is made
of it, and I feel it sells the data short.
  Where does "just ... typing in taxonomic information" get you?
Without an underlying standard, typing just makes more pages of
taxonomic information.  They may be very useful but they might as well
be on paper except that they are easier to update and distribute.
  ICT has the potential to search, sort, aggregate and integrate data
from a range of sources - thereby generating new information, and
giving those "experienced and knowledgeable" people more and novel
opportunities to make discoveries - not to replace fieldwork (or
closetwork), but to maximise the information derived from the data it
generates.   If data are entered willy-nilly into numerous different
systems without care for their fate, their usefulness is limited and
maybe shortlived.
  Obviously this isn't an argument against wikispecies, or for
numerous different OLs.  It's an argument for both using common
standards.

OK, you have made a bit of a straw man here! I am certainly not
proposing that all we need to do is just type taxonomic information
willy nilly on to the web! In fact, it is this very thing which has
caused many of the current problems, because no two web sources seem
to agree very often, and so the "willinilliness" has resulted in utter
chaos! On that I think we can agree, hopefully!

My point is this: we need to prioritise things somewhat. We should
prioritise data quality over everything else. There is no point
developing flash databases if you don't know where you are going to
get hold of good data. Some of the examples I have given in previous
emails show relatively impressively structured and presented websites
("databases" in the broad sense) giving poor quality outputs.

Wikispecies already exists as a REASONABLY adequate infrastructure
upon which to create a solid pool of taxonomic information, which can
THEN be used as a solid DATA PROVIDER for any number of other database
initiatives. I just think that more effort at this early stage should
go into making that source of information comprehensive (and
verifiable, by way of full referencing).

The attitude that worries me a lot is "build your database first, and
worry about where the data is going to come from second (if the
funding doesn't run out first!)". Another related line that I have
heard goes something like "well, I know there are going to be issues
about who to believe when data providers come into conflict, but the
funding for the first year is just to get the infrastructure up and
running, so we will just have to sort that problem out somehow later
on down the track" ...

The reality is that many of the people currently in charge of online
"databases" obsess too much about presentation, and what it COULD do
(if it had the data!). AFD recently went to all the trouble of
changing its user interface from something that was fine to something
rather less than fine, and still they haven't managed to get even that
1999 Apteropanorpa name into their system!

Cheers,

Stephen

Quoting Mike Sadka <M.Sadka at nhm.ac.uk>:

>
> Hi Stephen
>
> And bravo Evgeniy !
>
> [You wrote...]
> Everybody won't just adopt them [standards] - we are even constantly
> having to defend ourselves against factions who want rid of
> traditional biological nomenclature altogether!
>
> But technological standards are not for "everybody" - they are for
> machines. Taxonomists don't need even to know that they exist, but
> machines will not be able to serve taxonomy to their full potential
> without them.
>
> [and...]
> ... instead of just sitting down at a computer online and typing in
> taxonomic information,...
>
> This attitude worries me a lot because it seems not to ask where
> that taxonomic information is going or whether best use is made of
> it, and I feel it sells the data short.
>
> Where does "just ... typing in taxonomic information" get you?
> Without an underlying standard, typing just makes more pages of
> taxonomic information.  They may be very useful but they might as
> well be on paper except that they are easier to update and distribute.
>
> ICT has the potential to search, sort, aggregate and integrate data
> from a range of sources - thereby generating new information, and
> giving those "experienced and knowledgeable" people more and novel
> opportunities to make discoveries - not to replace fieldwork (or
> closetwork), but to maximise the information derived from the data
> it generates.   If data are entered willy-nilly into numerous
> different systems without care for their fate, their usefulness is
> limited and maybe shortlived.
>
> Obviously this isn't an argument against wikispecies, or for
> numerous different OLs.  It's an argument for both using common
> standards.
>
>
> [And...]
> ...so we don't need working taxonomists to help build our databases...
>
> I agree - there's been quite enough of that already!   You need IT
> people to buld your databases.  ;-)
>
> Cheerio, Mike
>
>
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either
> of these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:
> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>

----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.

_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of
these methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:
site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here

_______________________________________________

Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom

The Taxacom archive going back to 1992 may be searched with either of these methods:

(1) http://taxacom.markmail.org

Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here