Forum Moderators: open

Message Too Old, No Replies

Why so many "test" spiders?

         

msgraph

1:48 am on Apr 14, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please no one make a mockery of me, but I was pondering something while drilling through a few hundred raw log files.

Inktomi and Altavista. They are spidering my sites like crazy. Every single day. Hundreds of pages on each site.

I have yet to see any results, good or bad, from this(talking at least two months time here). Some of them are supposed to be so-called "experimental spiders." Are they?

I do not want to sound like some conspiracy theorist living in Montana but I thought about this....

Are they doing this just for testing/research/future_plan purposes or to weed out people. Would be pretty smart on their part when you think about it. Spider the heck out of all the sites. If a server can't handle it then drop em from the index. Either way their making the little guys pay up the arse on bandwidth charges. If they can't afford it then knock em out as well.

Honestly I am really getting sick of Altavista changing their user agents on a daily basis only to grab the same pages almost every other day. Same with Inktomi. I'm also getting sick of them eating up bandwidth just for their "testing" purposes. AND I don't want to stop them for fear of getting knocked out of their indexes.

Brett_Tabke

4:12 pm on Apr 16, 2001 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



You are absolutely right. What Inktomi is doing to the net and servers with their spiders is absurd. We should charge THEM for accessing our site - not the other way around. I can find no valid reason for the pattern of usage by Inktomi. They have spidered the root page on some times 5 times already today. That is just crazy. They already have the data, no reason to come back out and spider it again.

Plus, there is so little traffic coming from buried Ink pages anymore (pay or free) that I wonder if it is close to the time to banning of their spider outright.

msgraph

5:56 pm on Apr 24, 2001 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First off this has nothing to do with Inktomi's pay-to-include whatsoever.

I wrote to Inktomi about their heavy spider activity from the spiders:

Mozilla/3.0 (Slurp/cat; slurp@inktomi.com; [inktomi.com...]

Mozilla/3.0 (Slurp.so/1.0; slurp@inktomi.com;
[inktomi.com...]

While waiting for a reply I took the risk of placing a robots.txt file into one site as a test. Since Apr. 20 all Inktomi spiders have stopped grabbing sub-pages except the Slurp/cat version.

This spider, Slurp/cat, continues to hit the sub-pages without ever looking for the robots.txt file. I presume that this version is one that is a inbound link follower? Or maybe one of it's many tasks.

Inktomi has just replied...

"The crawlers you are seeing are the main crawlers which provide indexing for the search engines used by iWon, HotBot, MSN search, AOL search, etc. They are not doing "deep spidering". They are crawling a predetermined list of URLs."

This is pretty much known. But I thought at least one of these was a deep crawler but I guess not. So I imagine that they are building up a very very very huge URL list from all the pages/sites they spider. Then after sorting out duplicates, they go bezerk throughout the net.

This probably explains why some people here are posting that they only get spidered on root level pages while others are reporting massive spider activity like I am.

The bottom line is that those of us who made the point of linking excessively throughout every page/site we could, will just have to put up with it. Or else get knocked out of their index.