We are doing a multi-lingual navigation project
(called Ingrid) that involves indexing Japanese
text. We use JUMAN to extract japanese text
(because it is public domain---it actually doesn't
do such a good job), and some home grown perl
stuff to filter out garbage, weight terms, and
do stemming.
But, for searching, we are for now doing exact
string matching only.
I suggest you ask this question on the
comp.infosystems.harvest and also on the
winter (web internationalization) mailing list
at winter@dorado.crpht.lu. (please see
http://dorado.crpht.lu:80/~carrasco/winter/
for the winter web page).
I think there may be some mule tools for international
grep like things, but I'm not absolutely sure
about it...
PF