Sympathy for the Search Engines

I've recently been pondering what the near and mid-range future may hold for web-wide search utilities. This was touched off when someone here posted a link to a study that indicates the total size of the web is currently nearing one trillion pages.

Now that's an impossible number to grasp. Even Google's one billion is way too big. A goal of "all the web, all the time" is a total pipe dream and will probably always be, no matter what the public misconceptions are -- or how much hype and spin gets tossed out to nourish those misconceptions.

I was speaking to the IT manager at a large e-commerce business that maintiains nine well-known, interlinked sites. I learned that until a recent upgrade, building a full index for their database took five days. And one glitch meant starting over!

That's only maintaining a search engine for one business -- a miniscule slice of the total web. They've got (somewhat) predictable keywords. Tables with well chosen fields and index keys. It took five days. This gives me serious pause when I think about what a web search engine is trying to deal with.

Plus, a site search engine has a major advantage -- no one is actively trying to fool their indexing. I just don't see how a general web search can be an effective model for very much longer -- it must already take weeks to do a build for a major engine! Petabytes of data -- even "googol-bytes", or whatever the next level up is called.

I'm amazed that any SE can ever list any new pages in a week or two. Even more amazed when Alta Vista can list something with a day or two -- they certainly seem to have one of the best infrastructures going, no matter what happens in their algorithm struggles.

I have no brainstorm on better ways to help people find their way around the web. But I'm sure that those ways are needed, and that this represents a business opportunity for those who can build a better system.

The current ingredients in "themes" indexing -- term vectors, clustering, hubs, authorities, etc -- these were developed several years ago by the academics in Information Science. And these methods are just now coming into application, years later. I have some connections to academia, but from what I hear, there is no quantum advance on the drawing boards. And that's what is needed -- a total paradigm shift.

So I have much sympathy. Trying to offer a good search tool for the HUGE data pile that makes up today's web -- AND make profit at the same time! It boggles my mind. In the long term, photonics may replace electronics and speed everything along. But that certainly won't be soon enough for the present challenges.

My best guess for the mid range future is that general, one-size-fits-all search engines will recede from usefulness as the web continues to mushroom. They will be replaced by partial, targeted databases -- regional, topical, etc. There will be a few big companies where hundreds of separate databases are maintained and you drill down through a "directory" to the one you want.

This brings me to a wish-list item I've been nursing -- I'd love to find a search engine that ONLY deals with frequently refreshed sites. Or maybe a regular SE with an option to exclude old results.

Sympathy for the Search Engines

tedster

rcjordan

tedster

GWJ

rcjordan

Brett_Tabke

rcjordan

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week