[Taxacom] iSpecies with Wikipedia

Andy Mabbett andy at pigsonthewing.org.uk
Tue Mar 25 13:46:56 CDT 2008


In message <451EF75C-2233-49D9-92D1-22C12D834020 at bio.gla.ac.uk>, Roderic
Page <r.page at bio.gla.ac.uk> writes

>As an experiment I've added snippets of Wikipedia content to the
>iSpecies results. If iSpecies finds an article in Wikipedia it displays
>the first 100 words form the article, plus a link to the article
>itself. It's a bit crude, and parsing Wikimedia output is painful, but
>I think it adds to the results. I'd welcome any feedback.

I think that's an excellent move; though there are some bugs; and some
potential features you might add.


For instance, it works for both "Pica pica" and "magpie", and for "Tyto
alba", but not "barn owl".


In another case, searching for just "pica" returns:

        Pica can refer to:

                * Pica (unit of measure), in typesetting and document
                layout

                * Pica (disorder), abnormal appetite for earth and other
                non-foods

                * Pica (genus), a genus of magpie

                * Pica Press, a publishing imprint

                [...]

Likewise, the result for "Hobby" is not appropriate. Perhaps it would be
a good idea if you tested for the presence of a taxobox, and, if none is
present, dropped the inclusion, or searched further using some form of
heuristic?


The GBIF map for "pica pica" shows all but one example as being in South
America (the other is in Africa - possibly a wrongly-signed longitude?).


You don't yet mention Wikipedia on your "how it works" page.


The image searches for a genus (e.g. "Pica") or common name (e.g.
"Hobby") are often mostly or all false-positives. You might like to add
a Flickr image search, too. There are several ways in which you could do
this, including searches for plain text search, tags or machine-tags
(e.g. "taxonomy:genus=Pica", "taxonomy:binomial=Alcedo_atthis"). The
latter would limit false positives. I'm happy to advise further if
required, though I note we discussed machine tags in early February this
year.


You could also search for images on Bioimages:

        <http://www.bioimages.org.uk/>

which uses Google for as a search engine; for example::

        <http://tinyurl.com/34ue8k>


Other possible links would be to the relevant entry on The National
Biodiversity Network's Species Dictionary:

        <http://nbn.nhm.ac.uk/nhm/>

and WikiSpecies:

        <http://species.wikimedia.org/>


The long list of links above the map needs to be marked up as a list,
and presented as with bullets or similar. Likewise, the list of articles
from Google.


Please use a DOCTYPE declaration (preferable HTML 4.01 STRICT), and
validate your HTML - watch out especially for unescaped ampersands.


It would also be sensible to use more human-friendly URLs (which could
also then be used for tags); so that, for example:

  http://darwin.zoology.gla.ac.uk/~rpage/ispecies/pica+pica

would do what is currently done by:

  http://darwin.zoology.gla.ac.uk/~rpage/ispecies/?q=pica+pica&submit=Go


I've mentioned the species microformat previously; it would be
appreciated if you could also use that in your pages. Again, I'm happy
to advise further.

-- 
Andy Mabbett




More information about the Taxacom mailing list