> From: "Tronche Ch. le pitre" <Christophe.Tronche@lri.fr>
.....
> A more interesting approach is the indexer trying to figure the
> language of the document, based may be on a statistical analysis.
> Probably, problems will arise with mixed languages files.
>
> What do you think of that ? Has this been done by someone ?
We are developing a cross-lingual search engine, called TITAN.
We have already implemented exactly what you pointed out.
Our robot analyzes the language(s) of each visited WWW page and makes
an index that enables "search by language". If a WWW page contains
two or more languages, our algorithm tries to figure out the two
most dominant languages and the ratio of these two.
If you are interested in, please try:
http://isserv.tas.ntt.jp/chisho/titan-e.html
. Related documents are available in the Docs page linked from the
page above.
-----------------------------------------------------------
Gen-ichiro KIKUI
(NTT Information and Communication Systems Laboratories)
Email: kikui@nttnly.ntt.jp Phon: 0468 59 2521 Fax: 0468 59 3428