[Taxacom] Google, Wikipedia, and EOL

Roderic Page r.page at bio.gla.ac.uk
Wed Sep 2 09:42:02 CDT 2009


Dear Dmitry,

There are several ways to parse Wikipedia:

1. Parse just the templates (such as Taxobox, Cite, etc.) and tags  
(such as <ref>, which gives the core stuff such as taxon names and  
bibliographic details.

2. Parse links. In the case of taxonomic pages redirects are typically  
taxonomic synonyms

3. Parse the body of the text itself (as distinct from the text that  
is marked up)

I've been focussing on 1 & 2.  DBpedia.org does much the same.

I think if Wikipedia is to be more useful for us, then we will need to  
introduce more templates to structure the text (e.g., providing basic  
nomenclatural details).

Semantic Mediawiki is cool, and I've played a little with it (e.g., http://itaxon.org/wikidev/Chromis_circumaurea 
  ) It enables some useful queries to be constructed, which makes it  
much more powerful than Mediawiki by itself.

If I were to create a "page per species" web site from scratch, I'd  
use Semantic Mediawiki. Indeed I may well resurrect the iTaxon project  
to explore this some more.

FreeBase is also cool, and they are doing some neat stuff.

But, it seems to me that the big issue here is not so much technology  
(which is important), but getting people to create/edit/use the site,  
and given that we are in the long tail business, I wonder whether we  
should go where the crowd is.

Regards

Rod


On 2 Sep 2009, at 15:21, Dmitry Mozzherin wrote:

> One problem we do have with Wikipedia and EOL approaches, is that
> information is mostly not atomized, which makes it quite
> expensive/impossible to use for any kind of data mining/data
> shuffling/reasoning.
>
> What are your thoughts about this? Do projects like Freebase
> (http://www.freebase.com/view/en/bird) or semantic wiki deserve more
> attention?
>
> Dima
>
> On Tue, Sep 1, 2009 at 8:56 AM, Roderic Page<r.page at bio.gla.ac.uk>  
> wrote:
>> Dear All,
>>
>> I've written a short blog post looking at what sites Google returns
>> when you search for all mammal species by scientific name. Can't say
>> I'm surprised by the results, but the magnitude of the difference
>> between Wikipedia and the rest is quite striking.
>>
>> The post is at http://iphylo.blogspot.com/2009/09/google-wikipedia-and-eol.html
>>   (or http://tinyurl.com/n7ey68 is that gets mangled).
>>
>> Regards
>>
>> Rod
>>
>>
>>
>>
>> ---------------------------------------------------------
>> Roderic Page
>> Professor of Taxonomy
>> DEEB, FBLS
>> Graham Kerr Building
>> University of Glasgow
>> Glasgow G12 8QQ, UK
>>
>> Email: r.page at bio.gla.ac.uk
>> Tel: +44 141 330 4778
>> Fax: +44 141 330 2792
>> AIM: rodpage1962 at aim.com
>> Facebook: http://www.facebook.com/profile.php?id=1112517192
>> Twitter: http://twitter.com/rdmpage
>> Blog: http://iphylo.blogspot.com
>> Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html
>>
>>
>>
>>
>>
>>
>>
>> _______________________________________________
>>
>> Taxacom Mailing List
>> Taxacom at mailman.nhm.ku.edu
>> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>>
>> The Taxacom archive going back to 1992 may be searched with either  
>> of these methods:
>>
>> (1) http://taxacom.markmail.org
>>
>> Or (2) a Google search specified as:  site:mailman.nhm.ku.edu/ 
>> pipermail/taxacom  your search terms here
>>
>

---------------------------------------------------------
Roderic Page
Professor of Taxonomy
DEEB, FBLS
Graham Kerr Building
University of Glasgow
Glasgow G12 8QQ, UK

Email: r.page at bio.gla.ac.uk
Tel: +44 141 330 4778
Fax: +44 141 330 2792
AIM: rodpage1962 at aim.com
Facebook: http://www.facebook.com/profile.php?id=1112517192
Twitter: http://twitter.com/rdmpage
Blog: http://iphylo.blogspot.com
Home page: http://taxonomy.zoology.gla.ac.uk/rod/rod.html










More information about the Taxacom mailing list