[Taxacom] Wikispecies is not a database: part 3 (after thinking about it!)

Thu Aug 13 16:43:01 CDT 2009

In what may seem like wild inconsistency, given that my "Wikispecies  
is not a database" blog post kicked off this thread, let me make the  
counter case.

1. The notion of a database espoused by Tony and Mike (i.e.,  
relational databases with tables with columns and rows) is but one  
view of databases, and a view some might say is old fashioned (key- 
value databases are the new hotness, there is a generation of  
programmers emerging for whom relational databases seem as relevant as  
FORTRAN).

2. Relational databases have some nice properties, but aren't suited  
to every task (witness the growth of databases such as CouchDB). They  
can also constrain what we can do, unless we developed elaborate and  
ultimately unwieldy schema. One reason some people like to use wikis  
is that they are vastly more flexible than a relational database  
schema. Our knowledge isn't closed in the sense of having well defined  
limits (unlike a bank transaction system, for example). Efforts to  
capture what we need in a schema will fail.

3. Somewhere between relational databases  and free-form text are semi- 
structured databases such as wikis, which combine text with templates  
(rather like some web programming languages such as PHP combine HTML  
and scripts).

4. Wiki-style semi-structured text has the potential to be powerful (a  
joy of working with sophisticated wiki tools such as Mediawiki is that  
one can effectively treat them as programming environments), as well  
as the potential to be a morass of hard to parse text.

Personally, systems such as http://www.freebase.com or Semantic  
Mediawiki are where I think the sweet spot lies. Whatever you do,  
don't argue that relational databases a la Computer Science 101 are  
the only kind of database out there, because that's patently false.

Regards

Rod

On 13 Aug 2009, at 21:01, <Tony.Rees at csiro.au> <Tony.Rees at csiro.au>  
wrote:

> Dear all,
>
> Mike is of course correct - databases and web pages are in essence  
> quite different - however the situation gets blurry when web sites  
> are constructed using content management systems that actually do  
> use a database under the bonnet (for example to manage the blocks of  
> text, graphics, page navigation, user privileges etc.), so these can  
> be thought of as a sort of hybrid; and I am presuming wikixxx falls  
> into that bag. However this is still not a taxonomic database in the  
> sense I would use the term; for this the relations between taxa are  
> modelled as relations between elements in data tables - in other  
> words you start with a database table (columns and rows) of all the  
> (e.g.) species of interest (nothing to do with the web as yet), hook  
> these to a table of genera, hook the latter to families and higher  
> taxa, and go from there; you also do the same as desired for any  
> other re-usable elements e.g. references, data sources, vernacular  
> names, etc. Then if you want to derive a web site for presentation  
> of all this, that is a separate exercise (or you purchase or  
> otherwise implement an off the shelf solution that does both).
>
> As Mike says, until you handle at least the taxonomic information in  
> a manner like the above, plus make use of database functions to  
> enforce the relationships, primary keys, constraints, and all the  
> rest, it is easy to end up with poor data quaity (such as the same  
> genus spelled in different ways for a start) - I know this from  
> personal experience (one of my systems still does not have a genus  
> table, just species and families, and the genera are held as free  
> text - to be fixed in the next iteration of the system).
>
> This may seem obvious to IT folks but I know (again from personal  
> experience) that it may not be a natural way of thinking for all  
> biologists - I know that many taxonomists (including some in my  
> agency) are very happy NOT to have to think about this stuff, and  
> get on with the task of describing taxa, so long as someone else  
> looks after it. On the other hand some (Rich?) just love to do both,  
> and have even written papers and "how to do it" on their  
> experiences...
>
> Regards - Tony
>
> ________________________________________
> From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu 
> ] On Behalf Of Mike Sadka [M.Sadka at nhm.ac.uk]
> Sent: Thursday, 13 August 2009 8:19 PM
> To: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] Wikispecies is not a database: part 3 (after  
> thinking    about it!)
>
> Just one final point:
>
>>> We should prioritise data quality over everything else.
>
> That is exactly why you need properly structured databases (in the
> correct sense) - they store and protect data.  If you disagree look at
> any book on relational databases.
>
>>> impressively structured and presented websites ("databases" in the
> broad sense)
>
> This is an incorrect use of the term "database" - not a broad one.
> Websites, however well structured or presented, are not and cannot
> replace databases.  A website is just how the data are presented and
> tells you absolutely nothing about how safely or effectively they are
> stored.  This is basic IT.
>
> OK - two final points.  But more than enough from me now I am sure!
>
>
>
>
>
> -----Original Message-----
> From: taxacom-bounces at mailman.nhm.ku.edu
> [mailto:taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Stephen  
> Thorpe
> Sent: 11 August 2009 02:40
> To: taxacom at mailman.nhm.ku.edu
> Subject: Re: [Taxacom] Wikispecies is not a database: part 3 (after
> thinking about it!)
>
> Hi Mike
>
> I don't know if it is just me, but I find it quite difficult in a
> forum like this to get the details of my argument right first time,
> but then the responses kind of prod me in the right direction, so I
> guess it works out good in the end. Anyway,
>
> [you wrote] This attitude worries me a lot because it seems not to ask
> where that taxonomic information is going or whether best use is made
> of it, and I feel it sells the data short.
>  Where does "just ... typing in taxonomic information" get you?
> Without an underlying standard, typing just makes more pages of
> taxonomic information.  They may be very useful but they might as well
> be on paper except that they are easier to update and distribute.
>  ICT has the potential to search, sort, aggregate and integrate data
> from a range of sources - thereby generating new information, and
> giving those "experienced and knowledgeable" people more and novel
> opportunities to make discoveries - not to replace fieldwork (or
> closetwork), but to maximise the information derived from the data it
> generates.   If data are entered willy-nilly into numerous different
> systems without care for their fate, their usefulness is limited and
> maybe shortlived.
>  Obviously this isn't an argument against wikispecies, or for
> numerous different OLs.  It's an argument for both using common
> standards.
>
> OK, you have made a bit of a straw man here! I am certainly not
> proposing that all we need to do is just type taxonomic information
> willy nilly on to the web! In fact, it is this very thing which has
> caused many of the current problems, because no two web sources seem
> to agree very often, and so the "willinilliness" has resulted in utter
> chaos! On that I think we can agree, hopefully!
>
> My point is this: we need to prioritise things somewhat. We should
> prioritise data quality over everything else. There is no point
> developing flash databases if you don't know where you are going to
> get hold of good data. Some of the examples I have given in previous
> emails show relatively impressively structured and presented websites
> ("databases" in the broad sense) giving poor quality outputs.
>
> Wikispecies already exists as a REASONABLY adequate infrastructure
> upon which to create a solid pool of taxonomic information, which can
> THEN be used as a solid DATA PROVIDER for any number of other database
> initiatives. I just think that more effort at this early stage should
> go into making that source of information comprehensive (and
> verifiable, by way of full referencing).
>
> The attitude that worries me a lot is "build your database first, and
> worry about where the data is going to come from second (if the
> funding doesn't run out first!)". Another related line that I have
> heard goes something like "well, I know there are going to be issues
> about who to believe when data providers come into conflict, but the
> funding for the first year is just to get the infrastructure up and
> running, so we will just have to sort that problem out somehow later
> on down the track" ...
>
> The reality is that many of the people currently in charge of online
> "databases" obsess too much about presentation, and what it COULD do
> (if it had the data!). AFD recently went to all the trouble of
> changing its user interface from something that was fine to something
> rather less than fine, and still they haven't managed to get even that
> 1999 Apteropanorpa name into their system!
>
> Cheers,
>
> Stephen
>
>
>
>
> Quoting Mike Sadka <M.Sadka at nhm.ac.uk>:
>
>>
>> Hi Stephen
>>
>> And bravo Evgeniy !
>>
>> [You wrote...]
>> Everybody won't just adopt them [standards] - we are even constantly
>> having to defend ourselves against factions who want rid of
>> traditional biological nomenclature altogether!
>>
>> But technological standards are not for "everybody" - they are for
>> machines. Taxonomists don't need even to know that they exist, but
>> machines will not be able to serve taxonomy to their full potential
>> without them.
>>
>> [and...]
>> ... instead of just sitting down at a computer online and typing in
>> taxonomic information,...
>>
>> This attitude worries me a lot because it seems not to ask where
>> that taxonomic information is going or whether best use is made of
>> it, and I feel it sells the data short.
>>
>> Where does "just ... typing in taxonomic information" get you?
>> Without an underlying standard, typing just makes more pages of
>> taxonomic information.  They may be very useful but they might as
>> well be on paper except that they are easier to update and  
>> distribute.
>>
>> ICT has the potential to search, sort, aggregate and integrate data
>> from a range of sources - thereby generating new information, and
>> giving those "experienced and knowledgeable" people more and novel
>> opportunities to make discoveries - not to replace fieldwork (or
>> closetwork), but to maximise the information derived from the data
>> it generates.   If data are entered willy-nilly into numerous
>> different systems without care for their fate, their usefulness is
>> limited and maybe shortlived.
>>
>> Obviously this isn't an argument against wikispecies, or for
>> numerous different OLs.  It's an argument for both using common
>> standards.
>>
>>
>> [And...]
>> ...so we don't need working taxonomists to help build our  
>> databases...
>>
>> I agree - there's been quite enough of that already!   You need IT
>> people to buld your databases.  ;-)
>>
>> Cheerio, Mike
>>
>>
>>
>> _______________________________________________
>>
>> Taxacom Mailing List
>> Taxacom at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>>
>> The Taxacom archive going back to 1992 may be searched with either
>> of these methods:
>>
>> (1) http://taxacom.markmail.org
>>
>> Or (2) a Google search specified as:
>> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>>
>
>
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either of
> these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:
> site:mailman.nhm.ku.edu/pipermail/taxacom  your search terms here
>
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either  
> of these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/ 
> pipermail/taxacom  your search terms here
> _______________________________________________
>
> Taxacom Mailing List
> Taxacom at mailman.nhm.ku.edu
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The Taxacom archive going back to 1992 may be searched with either  
> of these methods:
>
> (1) http://taxacom.markmail.org
>
> Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/ 
> pipermail/taxacom  your search terms here
>

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
DEEB, FBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html