Re: Indexing two-byte text

Mark Schrimsher (mschrimsher@twics.com)
Thu, 7 Dec 1995 11:18:57 +0900

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Paul Francis: "Re: Indexing two-byte text"
Previous message: Paul Francis: "Re: Indexing two-byte text"
Maybe in reply to: Mark Schrimsher: "Indexing two-byte text"
Next in thread: Paul Francis: "Re: Indexing two-byte text"

At 4:46 PM 12/6/95, Paul Francis wrote:
>We are doing a multi-lingual navigation project
>(called Ingrid) that involves indexing Japanese
>text. We use JUMAN to extract japanese text
>(because it is public domain---it actually doesn't
>do such a good job), and some home grown perl
>stuff to filter out garbage, weight terms, and
>do stemming.

Is there publicly available code to handle stemming for Japanese, or is
there a description of the algorithm involved anywhere (in English or in
Japanese)?

And what sort of garbage remains after using JUMAN?

--Mark

Next message: Paul Francis: "Re: Indexing two-byte text"
Previous message: Paul Francis: "Re: Indexing two-byte text"
Maybe in reply to: Mark Schrimsher: "Indexing two-byte text"
Next in thread: Paul Francis: "Re: Indexing two-byte text"