Announce: ActiveX Search (IFilter) spec/sample

Lee Fisher (leefi@MICROSOFT.com)
Wed, 27 Mar 1996 15:31:44 -0800


I just realized that we just published something that might be of
relavance of folks on this mailing list...

About 2 weeks ago at our Internet developer's conference we released a
variety of new Internet products, some of which have interfaces for
ISVs. Nearly all of that stuff is on <http://www.microsoft.com/intdev/>,
which is a collection of development interfaces for client- and
server-side Internet stuff.

The "ActiveX Search" stuff, an OLE/COM IFilter interface, is something
that might be of interest to web crawlers and search engines. Info on
this is in the ActiveX SDK (available at above URL), in the
\InetSDK\Samples\HTMLFilt subdirectory. The spec for it is also in that
directory, as the source.

Quoting from the readme:

----- snip ----- snip -----

The IFilter interface was designed primarily to provide a uniform
mechanism to extract character streams from formatted data. The goal was
to provide ISVs with an interface that extracts text as the initial step
in content indexing document data. IFilter can be implemented over any
document format and the ISV can choose any API or interface to read the
data format. For example, a content filter can be written that reads
data using the Win32 file APIs or uses the OLE storage interfaces.

Any software author who stores textual data should consider implementing
a content filter for the document format to allow content indexing
systems to extract text.

The sample filter in this directory will extract text and properties
from HTML pages. In addition to raw content, headings (level 1 to 6),
title and anchors are emitted as pseudo-properties. Title is also
published as a full property available via IFilter::GetValue.

----- snip ----- snip -----

So, search engines and crawlers which grok IFilter will be able to break
up OLE-based code and get the contents of it. The HTMLFilt sample here
implements an IFilter-based sample which reads HTML text.

Hope that someone finds this useful...
__
Lee Fisher, leefi@microsoft.com