[Taxacom] Species pages (index)
Nicola Nicolson
n.nicolson at rbgkew.org.uk
Mon Feb 23 05:56:12 CST 2009
Hi,
Maybe you could try term boosting in Lucene to address this - at search time separate the search term into genus and species and give the genus part a higher priority. See :
http://lucene.apache.org/java/2_3_2/queryparsersyntax.html#Boosting%20a%20Term
cheers,
Nicky
- Nicola Nicolson
- Applications Development,
- Royal Botanic Gardens, Kew,
- Richmond, Surrey, TW9 3AB, UK
- email: n.nicolson at rbgkew.org.uk
- phone: 020-8332-5712
________________________________________
From: taxacom-bounces at mailman.nhm.ku.edu [taxacom-bounces at mailman.nhm.ku.edu] On Behalf Of Roger Hyam [rogerhyam at mac.com]
Sent: 23 February 2009 10:53
To: Kenneth Kinman
Cc: taxacom at mailman.nhm.ku.edu
Subject: Re: [Taxacom] Species pages (index)
Hi Ken,
I am just using the default Lucene settings for string searching at
the moment. I already had it pointed out that Wikipedia pages
(particularly for mammals) have embedded navigation templates at the
bottom that skews results.
Really this is an argument for getting pages marked up with just a
tiny bit of semantically rich information so we can link them rather
than try and guess it from the text context.
Thanks for your thoughts,
Roger
On 23 Feb 2009, at 02:56, Kenneth Kinman wrote:
> Hi Roger,
> I only spent a little time entering a few scattered species into
> your search so far, but actually found one proposed species of Homo
> that
> I had never heard of (although with just a skull-cap, its validity is
> regarded as very questionable).
> But more to the point, one potential problem I found (which
> frankly can even be problematic on Google) is scoring. When I entered
> Ursus maritimus, there are a lot of plants that score unexpectedly
> high
> (because their specific names are also maritimus). One plant even
> scored higher than the Wikipedia article on "polar bear" (which really
> surprised me). However, with a little tweaking, those kinds of
> scoring
> problems could probably be eliminated.
> --------Ken
>
>
> _______________________________________________
>
> Taxacom Mailing List
>
> Taxacom at mailman.nhm.ku.edu
>
> http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
>
> The entire Taxacom Archive back to 1992 can be searched with either
> of these methods:
>
> http://taxacom.markmail.org
>
> Or use a Google search specified as: site:mailman.nhm.ku.edu/
> pipermail/taxacom your search terms here
-------------------------------------------------------------
Roger Hyam
Roger at BiodiversityCollectionsIndex.org
http://www.BiodiversityCollectionsIndex.org
-------------------------------------------------------------
Royal Botanic Garden Edinburgh
20A Inverleith Row, Edinburgh, EH3 5LR, UK
Tel: +44 131 552 7171 ext 3015
Fax: +44 131 248 2901
http://www.rbge.org.uk/
-------------------------------------------------------------
_______________________________________________
Taxacom Mailing List
Taxacom at mailman.nhm.ku.edu
http://mailman.nhm.ku.edu/mailman/listinfo/taxacom
The entire Taxacom Archive back to 1992 can be searched with either of these methods:
http://taxacom.markmail.org
Or use a Google search specified as: site:mailman.nhm.ku.edu/pipermail/taxacom your search terms here
More information about the Taxacom
mailing list