Pages

Thursday, 14 December 2006

Yahoo & IBM: free enterprise search







Yahoo and IBM are giving away an enterprise search engine, IBM OmniFind Yahoo! Edition, to search your organisation's file system, intranet and public website, integrating Yahoo! Search into the results so you can search both internal documents and the Web via a single search. (Extra support however has to be paid for.)

According to the Yahoo Search blog the features of this free software, which they say is easy to use, include:
  • Indexes up to 500,000 documents and over 200 file types in 30 different languages.
  • Customizable search results interface, and you can also hide certain features.
  • Add new synonyms.
  • Create your own featured links to applications and content that you may want to provide one-click access for. For example, a search on “expense reports” may include a featured link directly to your Expense Report system.
You can download it after filling in a form (which somewhat contradictorily says "Thanks for downloading..." and "Before you download, please answer a few quick questions..."). You have to give your business email address, the rest is optional.

As it's free, there seems to be no downside in trying it out. I haven't had a chance to check out the further info much yet let alone install it myself (supposedly a 3-click installation), but it looks good given that it's free!

It's meant to work on certain versions of Linux and Windows, with both Internet Explorer and Firefox too, and it seems you can customise results relevancy.

They say that it incorporates open source Apache Lucene technology (a Java search engine) and that key integration features are:
  • Open URL-based APIs for simplicity, development language flexibility, and extensive portability (REST architecture)
  • Interfaces that support:
    • Issuing search requests and receiving search results
    • Retrieving cached and source documents
    • Pushing new documents to be indexed and updates
    • Deleting documents from index
  • Easily embeddable and customizable UI output (XML/XSTL/HTML/HTML snippets)

Documentation/resources

There seems to be a decent amount of supporting documentation and info e.g.:

Thoughts - Google, security and document management systems etc

I wonder how OmniFind will compare with Google's own (not free) enterprise search product and Google Search Appliance? I notice that Google have been at pains to point out recently that they agree enterprise search systems must support document level security (and see their security page). I haven't been able to see what sort of security features OmniFind supports beyond a statement in their forum that "This release of OmniFind Yahoo! Edition does not support for Active Directory or LDAP for authenticated search. OmniFind Enterprise Edition supports this via global security w/in WebSphere Application Server".

Google must also be thinking hard about how to ensure OneBox can be competitive against a free offering.

Another important issue is integration with the major document management systems.

In my view, for enterprise search systems to gain widespread acceptance, they have to take into account not only that organisations will want document level security, but also that most large organisations already have in place their own, expensively bought/developed and maintained, document management systems.

Searching file systems and intranet Webpages just isn't good enough. Most people want to search documents. Searching accounts systems may be useful, yes, but for many of us it's really information in textual documents that we're after. Enterprise search engines have to be able to integrate with the existing leading document managements systems, as many organisations already have too much invested in those systems. If I were Google or Yahoo/IBM I'd focus quite hard on that, and I'd certainly provide an easily accessible and comprehensive list of DMS with which the search product is compatible.

I've never had the chance to test out or even research Google Enterprise products properly, but I've also often wondered how their relevance ranking works with an internal search. Google's PageRank system, which works extremely well to give you the most relevant results highest in the list, surely depends on hyperlinks from one HTML document to another. How do you "follow" links, which at best might just be short text references, from one internal (say Word) document to another? Especially if the documents are held via a document management system? That to me is another major issue for enterprise search. Will OmniFind do better?

Very interesting stuff, clearly there's going to be a big fight in the enterprise search arena. You never know, maybe Google will release their product for free too! Which can only be good for us users and consumers.

No comments: