Inktomi in Google

Forum Moderators: open

Message Too Old, No Replies

Inktomi in Google

The New Secret Backdoor?

Josk

8:38 am on Jul 11, 2002 (gmt 0)

Hi,

I've just noticed that some of my Onltomi PFI campaigns are showing up in Google. They've even been spidered...! I'm not complaining that much, but I need to know this for a client of mine...

So...how is this possible...? Google spidering Hotbot or somewhere?

ciml

9:58 am on Jul 11, 2002 (gmt 0)

Do any Inktomi partners have Overture-style "pretend directories", where they link to specific searches in a crawlable manner?

If the links are PR4 or higher then they should show in a link: search, do you see anything?

tigger

10:15 am on Jul 11, 2002 (gmt 0)

is it not that Google is picking the pages when it crawls the site??

Grumpus

11:56 am on Jul 11, 2002 (gmt 0)

When I write articles for my web site, I frequently link to specific search results on search engines. This is not, by any means, an attempt to get Googlebot to crawl search results that contain my site, but rather, just a means of getting folks to a decent listing of "related reading" without me having to surf all night and individually link each page I find interesting. (Unless I quote or source you, you ain't getting a direct link).

I never really thought about it, but I suppose the old Googlebot could probably follow that link and hit the "next" button a few times before it gets tired.

I had for a month (it's since vanished) a "LINK TO" my site from specific "Comet Search" results. Considering that Comet (more of an annoyance, than anything useful) doesn't even have a web based way to access their search, I found this interesting. Near as I can tell, someone did a search for a specific site (whose name is a substring of my domain name). Google then crawled that site's public access logs, hit the results page and indexed it.

That's a possibility too. (erm - actually, thinking about it, it's probably a heck of a lot more likely than my first thought)...

Josk

12:15 pm on Jul 11, 2002 (gmt 0)

I've more or less solved this... I tried checking in Google for back links but this turned up nada. I then tried Alltheweb, and found the same had happened. However, back link checking turned up that alltheweb had be indexing icq, a inktomi affliate as shown by http://www.alltheweb.com/search?t=all&q=link.all%3Asearch.icq.com%2F&c=web&o=10&h=10
&l=any&av=1&wf%5Bn%5D=4&wf%5B0%5D%5Br%5D=%2B&wf%5B0%5D%5Bq%5D=computer+tables&wf%5B1
%5D%5Br%5D=%2B&wf%5B2%5D%5Br%5D=%2B&wf%5B2%5D%5Bw%5D=url.all%3A&wf%5B3%5D%5Br%5D=%2B&wf
%5B3%5D%5Bw%5D=url.all%3A&no=on&qtf=n&cn=1&size%5Bp%5D=%3D&size%5Bx%5D=0&ics=utf-8&cs=
utf-8

(link broken up so that formatting isn't screwed)

So...I'm thinking the same has happened. How? Well in both Google and AlltheWeb there are lots of webserver logs which include refferal info. Which is a valid link to follow. So...Googlebot hits a log file, follows the links, hits ICQ, or someone else, and then follows links of it...

Grampus: I've had more or less the same happen to me. A client included a url that had been marked for Inktomi inclusion only (for tracking) in a press release. It got indexed by Google.

So...the lesson is, if you don't want something in Google, don't put it on the Internet, or make sure that disallow list is current!