Re: Indexing two-byte text

Paul Francis (francis@cactus.slab.ntt.jp)
Fri, 8 Dec 95 17:25:12 JST


>
> You may get approval, but I assume that it couldn't be freely used for
> commercial purposes?

You are right. I suppose you could liscense it, but
I hardly think it would be worth it. A good programmer
could throw it together in a week....

>
> I don't think perfection is necessary here anyway to produce a useful
> system. But couldn't you just swap out the dictionary for a better
> dictionary? I just got a copy of juman, though, and although I just glanced

One major problem is that all the better dictionaries we know
are commercial, so it broke our requirement for freely usable
code. Second, I think using a dictionary is a never-ending
battle. Each specialization has its own terms and require their
own dictionary. Further, language evolves fast, especially in
fast-moving fields. I don't want the headache of always trying
to maintain the dictionary.

> >case, a single term). We will implement phrase detection
> >next, and expect to have it by late January.
>
> Ha! A programmer's solution. It seems like just upping the dictionary is
> more straightforward. ;-)

Your a manager, eh? :-)

But, we need phrase detection in any event. So, I hope it
handles the term isolation part as well.

>
> I'm not sure what you mean by a "publisher"--I'm not sure what this does.
> Is this different from Ingrid?
>

"Publisher" is the (rather poor) term we use for the component
of Ingrid that takes a resource, automatically pulls out key
terms, generates some other info about the resource (size, type,
title, etc.) and gives it to the component of Ingrid that inserts
it into the navigation topology.

PF