RE: Tagging a document with language

Henk Alles (halles@medialab.nl)
Fri, 7 Jun 1996 10:01:54 +-200


------ =_NextPart_000_01BB5458.5CFD9B00
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

A more interesting approach is the indexer trying to figure the
language of the document, based may be on a statistical analysis.
Probably, problems will arise with mixed languages files.

What do you think of that ? Has this been done by someone ?=20

I've written such a thing a while ago. If anyone wants it, mail me.

It is written in C++ and Delphi as a 16 bit DLL. There is a little =
program to test, import and do some statistics on the data, it can =
handle up to 65535 different languages.

------ =_NextPart_000_01BB5458.5CFD9B00
Content-Type: application/ms-tnef
Content-Transfer-Encoding: base64

eJ8+IjkIAQaQCAAEAAAAAAABAAEAAQeQBgAIAAAA5AQAAAAAAADoAAENgAQAAgAAAAIAAgABBJAG
ACQBAAABAAAADAAAAAMAADADAAAACwAPDgAAAAACAf8PAQAAAEkAAAAAAAAAgSsfpL6jEBmdbgDd
AQ9UAgAAAAByb2JvdHNAd2ViY3Jhd2xlci5jb20AU01UUAByb2JvdHNAd2ViY3Jhd2xlci5jb20A
AAAAHgACMAEAAAAFAAAAU01UUAAAAAAeAAMwAQAAABYAAAByb2JvdHNAd2ViY3Jhd2xlci5jb20A
AAADABUMAQAAAAMA/g8GAAAAHgABMAEAAAAYAAAAJ3JvYm90c0B3ZWJjcmF3bGVyLmNvbScAAgEL
MAEAAAAbAAAAU01UUDpST0JPVFNAV0VCQ1JBV0xFUi5DT00AAAMAADkAAAAACwBAOgEAAAACAfYP
AQAAAAQAAAAAAAAD0jcBCIAHABgAAABJUE0uTWljcm9zb2Z0IE1haWwuTm90ZQAxCAEEgAEAJgAA
AFJFOiBUYWdnaW5nIGEgZG9jdW1lbnQgd2l0aCAgbGFuZ3VhZ2UAEg0BBYADAA4AAADMBwYABwAK
AAEANgAFACYBASCAAwAOAAAAzAcGAAcACQA7ACMABQBMAQEJgAEAIQAAAEU5OTkxMDAwNDhDMENG
MTFBMzdGMDAyMEFGQzc2NkUzABMHAQOQBgDcAwAAEgAAAAsAIwABAAAAAwAmAAAAAAALACkAAQAA
AAMANgAAAAAAQAA5AOANVJdHVLsBHgBwAAEAAAAmAAAAUkU6IFRhZ2dpbmcgYSBkb2N1bWVudCB3
aXRoICBsYW5ndWFnZQAAAAIBcQABAAAAFgAAAAG7VEeXTAAQmerASBHPo38AIK/HZuMAAB4AHgwB
AAAAAwAAAE1TAAAeAB8MAQAAABQAAABXaW5kb3dzL0plcm9lbi9IZW5rAAMABhAqUHflAwAHEIYB
AAAeAAgQAQAAAGUAAABBTU9SRUlOVEVSRVNUSU5HQVBQUk9BQ0hJU1RIRUlOREVYRVJUUllJTkdU
T0ZJR1VSRVRIRUxBTkdVQUdFT0ZUSEVET0NVTUVOVCxCQVNFRE1BWUJFT05BU1RBVElTVElDQUxB
AAAAAAIBCRABAAAAWAIAAFQCAACQAwAATFpGdd5J8XD/AAoBDwIVAqgF6wKDAFAC8gkCAGNoCsBz
ZXQyNwYABsMCgzIDxQIAcHJCcRHic3RlbQKDM7cC5AcTAoM0A0YUyDUDxl0UxX0KgAjPCdk7GG8y
PDU1AoAKgQ2xC2BuZ/gxMDMUUAsKFFEL8hNQym8T0GMFQEEgBGAYcD4gC4AT0BhwE8ALgGcgbGFw
HREA0Ggd8AQgdDZoHeINsHgEkB9wcnmBHoJ0byBmaWcIcIcd4B+BCotsaTM2HPFHHE8dUhtSdWFn
HeBvwmYfc2RvY3UHgAIwUCwgYmERsGQdoGFOeSWgJIEDoGEgE8Bh2x5wHmFjB0AesG4HQBOwdQQA
LgqFUANgJbACYHm/JZAdEQJgE+AEIAPwbCdx3wUQEbApsR+AHaBpIAAl8L8kFgQgIOApcCgHCoVX
EYBzBUAlECB5CGAfcQuAa+MklC0hPyBIJcAtsgQgVyZQCfAlAW4d4GImMHP9A3BlL7IuoAqHIsow
uBv/xxfQAEAshkkndipRBRDzAkAvcXN1HyEmsC3CHqGPKbAt0ClwHrBnby40cK8ksABwLYAvwXcA
cHQEIL8qgCWQAMADEQeALB1JBUCHH1E01guAIEMrKyeB4SXwRGVscC3QHrAEIDkmsDE2JaAqgDsQ
TEz9NtBUH5Ad0juiIlACQDZx8x0RCcBhbSCiE9ATwCWQ/QdwcBgROtMtUTAiJsgEIPcmgSTTJvBh
PrEFQCdQA6CnEYAf0DZxdXAgojYasOwzNSUABpBmHjECMCsYFywdMiUXkQBF8AMAEBAAAAAAAwAR
EAEAAABAAAcwQF3UREdUuwFAAAgwQF3UREdUuwEeAD0AAQAAAAUAAABSRTogAAAAAEoR

------ =_NextPart_000_01BB5458.5CFD9B00--