Forum Moderators: bakedjake

Message Too Old, No Replies

What's going on with webtop.com?

Huge database, cool algo, but zero referrals

         

Everyman

5:35 pm on May 6, 2001 (gmt 0)



Does anyone know what webtop.com is up to?

The are UK-based, and were affiliated with Dialog (perhaps still are). They seem to have money, and while they don't have as many Ph.D.s as Google, they are using a very sophisticated algorithm that seems to work fairly well. They appear to have money. Their algo does not appear to be based at all on link popularity, but rather on content analysis.

A couple months ago they were trying to suck me up rather aggressively; I let them get a little bit and then cut them off from my dynamic pages. This past week it was as if they had read my posts on this forum, and modified their algo for my benefit, in order to suck up my names. Their crawling slowed down to two hits per minute, they used an IP that didn't reverse resolve (212.135.14.12), and they wiped their HTTP_FROM clean (which I look for). All these put them under my automatic-rejection threshold, so I let them go a couple days like this, 24/7. Then I got curious and checked out webtop.com more closely. Curiously, all their new tricks happened this week, and it was just about three weeks ago that they were all explained on this forum, in various posts, by me.

It looks to me like they have a huge database, even disregarding their "total hit" numbers, which seem to be flaky. But here's the thing -- no one, and I mean no one, is using webtop.com. I've had zero referrals from them, even though they show a couple thousand hits from me in their database. I tried it, and sure enough, the referrals would have showed up as "webtop.com" in my logs if anyone was using it.

Why would they go to all this trouble to suck up interesting web content, if no one is using them? It's possible that everyone is happy with Google, and Webtop just don't know how to get anyone to use their site because they lack a marketing and promotion department. It's also possible that they want the data, but not necessarily for general web surfers. A bit spooky, I'd say. (I'm not allowed to advertise my site, but let me say that it would be of interest to intelligence agencies.)

Let your imagination run wild. Does anyone have any information on what they're up to?

NFFC

5:45 pm on May 6, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'll say this, for the third time this weekend, they will be the next Google. People are raving about Webtop, with a little more PR and a slice of luck they could be very big.

>I've had zero referrals

I've had a fair few, perhaps your site doesn't quite match their demographics at this point in time.

bobriggs

1:00 am on May 7, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had forgotton that I had submitted to webtop.

Because of this thread I checked one of my sites and had it ranked 3rd for my #1 targed keyword phrase.

Now, I check my access logs quite frequently, and was surprised to find that it [webtop] had the whole site indexed. But I never knew it happened. Does anyone know the spider?

Two spiders that have completely indexed the site are analysis.he.net and alexa, but I can't figure out what their function is. Could webtop be using one of these spiders?

bobriggs

1:05 am on May 7, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, just reread first post, but:

212.135.14.12

Is not in my access logs.

In addition, I have no referrals from webtop.

jeremy goodrich

3:01 am on May 7, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think I've only been to their site once before, but after a few trial searches, I think their relevance is good. Reading about their technology here: [webtop.com...]

Reminds me a lot of google. They are both using probabalistic methods of categorizing web content and rating document relevance.

I find it interesting that they refer to "probabalistic" methods of indexing web content. Any AI buffs here care to explain what that's referring to?

Everyman

1:56 pm on May 7, 2001 (gmt 0)




Webtop describes their technology at:

[webtop.com...]

They list a half-dozen academics; apparently their algo goes back to the 1970s.

It seems to me that the major point of their technique is to make a search "concept-driven" rather than "keyword-driven." I disagree that it sounds like Google, because I see no evidence that they are using link popularity in their relevance ranking. If they are, they don't talk about it. Google is very strongly driven by PageRank.

Assuming that they are not using link popularity at all, their relevancy ranking is fairly impressive. They're doing something right. As more and more garbage is dumped into the web landfill, it's quite possible that Google's link popularity will become less useful for relevancy ranking, while Webtop's technique will become more useful.

Moreover, it seems to me that Webtop's technique will be able to make a more refined content judgment about whole sites and individual pages, and base their crawling priorities on this. Doing things this way seems more intelligent than the use of Google's PageRank for crawling priorities. Google starts feeding on itself (we'll crawl this site because it has a higher PageRank; it has a higher PageRank because we used the same logic to crawl it the last six times).

Webtop has this downloadable software gizmo that you can use to paste in content -- even entire documents -- and it will extract the concepts that are important before you start searching. Unfortunately, you give up privacy by using it. I haven't tried it.

Here is an excerpt from their white paper, which is at:

[webtop.com...] (MS Word document)

Introducing Linguistic Inference

Traditional search and retrieval technologies, including those used by the leading Internet search engines, still assume that users know (a) what they are looking for (b) which words to use to best describe what they are looking for (c) how to initiate a search with the appropriate syntax using these words to maximise the likelihood of immediately finding what they are looking for. In reality however, users are generally not able to completely and unambiguously describe what they are looking for at the start of the searching process - principally because they have limited prior knowledge of the specific documents and data that are actually available to them. In fact, users most commonly initiate the searching process with a limited quantity of potentially ambiguous information and then use the results of their initial searches to refine and improve their own understanding and description of what they are looking for. Linguistic inference is an application of probabilistic information retrieval theory which enables users to quickly and accurately pinpoint information in vast document collections and textual databases using natural language techniques - even if the information available to initiate the searching process is incomplete and/or ambiguous. Linguistic inference can do this because it is able to infer important *concepts* recurring linguistic patterns or themes - from textual data and then correlate these concepts using probabilistic modelling techniques with even an incomplete and ambiguous description of what the user is looking for.

More about the technology

Linguistic inference comprises five interacting processes which continuously review the underlying information set to identify both new and changed data (data collection), identify and extract concepts from the collected textual information (concept extraction), identify and conceptualise the essence of an user's initial information need, no matter how incomplete or ambiguous this may be (interest recognition), correlate concepts defined by the user or inferred from their actions or behaviour with the concepts extracted from the documents and databases and present the results of this correlation to the user (probabilistic concept correlation), interact with the user to help them to refine their interest following the review of a results synopsis (interest refinement).

Everyman

4:27 pm on May 7, 2001 (gmt 0)



There's a way to simulate what Webtop's desktop gizmo does without downloading their software. I just discovered this, and I'm very impressed with it.

First of all, pick out a document on your hard disk that discusses a topic that you're intimately familiar with. Load the text into your clipboard.

Then go to [webtop.com...] for a search. Instead of using the little box for one or two keywords, click on "PowerSearch" just below this box. You will get a page of options. The bigger box where it says "Enter as many words as you like" will actually take up to about 30K of text, by my reckoning. Just dump your clipboard into this box and hit "Search" at the bottom of this page.

It takes a few seconds, but the results that come back are extremely relevant, particularly the first few. I tried it on about four documents that they had already indexed from my site, and the document I loaded from the clipboard always came out on top, as you would expect. The ones immediately below were very relevant also.

The other thing that's cool is that Webtop is hooked into Moreover, so that you also see breaking news that might be relevant when you do a search.

Google had better watch their back; this thing is hot. I'm going to let them crawl my site if they ever come back.

They seem to be fast, thorough, and much more current than Google. Webtop is a sleeping giant, I'd say. I suspect they can do a lot with this technology, such as allowing icons on pages that say, "Search the web for pages like this one." Google is not really set up to do that.

Webtop's HTML interface isn't as cool as Google's, but a few weeks of work would solve that problem. It's the stuff behind the interface that's impressive.

jeremy goodrich

1:44 am on May 8, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ahh, my point was the "probabalistic" aspect of their indexing and keyword scoring methodologies. I was actually referring to this method of dealing with ambiguities...because every algo has to deal with uncertainty.

There are two mathematical principles available to do this, one of probability and the other being fuzzy logic. If they had said "possibilistic" in their technological explanation instead, it would have indicated fuzzy logic, since they said "probabalistic" it indicates they utilize probability. (of course, i'm probably spelling all this stuff wrong :) )

After reading some stuff by Bart Kosko, and his theorum on fuzzy logic, continuums, and a hypercube model which demostrates that probability is in fact a subset of fuzzy logic, it seems odd anybody would choose a lessor math rule set when designing a search service. His home page is [sipi.usc.edu...]

That was what I meant by asking if there were any AI buffs here :)