Donald
On Fri, 7 Jun 1996, Tronche Ch. le pitre wrote:
> Date: Fri, 7 Jun 1996 04:42:00 +0200
> From: Tronche Ch. le pitre <Christophe.Tronche@lri.fr>
> To: www-talk@w3.org, robots@webcrawler.com
> Subject: Tagging a document with language
>
>
> Hi everyone.
>
> I've just spent a few hours looking with alta-vista for informations,
> that incidentally I found. But I'm suprised by the increasing number
> of documents that I can't understand, simply because they're written
> in a foreign language (foreign to me, that is nor french nor english),
> not to speak of non iso-8859 files, such as japanese ones.
>
> The documents put on the Web used to be written by researchers, for
> whom english is mandatory, but they are likely to be outnumbered by
> the texts created by all the not-researcher-nor-computer-professional,
> anyone-like that are now most of the people using Internet and the
> Web. This is a great thing for sure, but the malediction of the Babel
> Tower is still on us, and a not-so-great effect is the dilution of
> documents one can understand when performing a research using an
> indexer.
>
> A simple solution: tagging the file with the language. For example,
> using an HTTP-EQUIV meta and an ISO 639 code, we got something like
> <META HTTP-EQUIV="Language" CONTENT="en"> for english. Of course, this
> is useful only if 1) the indexers give the ability to select only a
> given set of languages and 2) many people do it.
>
> A more interesting approach is the indexer trying to figure the
> language of the document, based may be on a statistical analysis.
> Probably, problems will arise with mixed languages files.
>
> What do you think of that ? Has this been done by someone ?
>
> +--------------------------+------------------------------------+
> | | |
> | Christophe TRONCHE | E-mail : tronche@lri.fr |
> | | |
> | +-=-+-=-+ | Phone : 33 - 1 - 69 41 66 25 |
> | | Fax : 33 - 1 - 69 41 65 86 |
> +--------------------------+------------------------------------+
> | ###### ** |
> | ## # Laboratoire de Recherche en Informatique |
> | ## # ## Batiment 490 |
> | ## # ## Universite de Paris-Sud |
> | ## #### ## 91405 ORSAY CEDEX |
> | ###### ## ## FRANCE |
> |###### ### |
> +---------------------------------------------------------------+
>
>
=====================================================================
Donald E. Eastlake 3rd +1 508-287-4877(tel) dee@cybercash.com
318 Acton Street +1 508-371-7148(fax) dee@world.std.com
Carlisle, MA 01741 USA +1 703-620-4200(main office, Reston, VA)
http://www.cybercash.com http://www.eff.org/blueribbon.html