homepage Welcome to WebmasterWorld Guest from 54.237.249.10
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
Forum Library, Charter, Moderators: Brett Tabke

Paid Inclusion Engines and Topics Forum

  posting off  
Inktomi's slurp is grabbing 5 GB+ daily
Slurp appears to be in a loop
drbobhmi




msg:27073
 5:02 pm on Dec 13, 2003 (gmt 0)

I have maintained a 250+ page, 42+ MB church site for the last 4 years with low activity (20-30 hits per day). Since 12/7/03, slurp has been running non-stop, getting pages and gobbling up bandwidth. I have contacted my web host daily and they keep zeroing out my monthly totals so that browsers can get to my site. They also installed a robot.txt file to inhibit ALL robots, but slurp continues to run and I am over my monthly limit of 5 GB every day with 200,000 to 400,000+ page accesses!

Inktomi claims to adhere to the standards that recognize robot.txt directives. Anyone know why they would be ignoring the following robot.txt entries?

User-agent: *
Disallow: /

Anyone else having this problem?

My web host has suggested that I set up my pages with a new member name. But would that really help? If inktomi finds us again, we'll be in the same boat.

Is there anything I could have done by mistake to trigger this looping? To my knowledge, I haven't done anything different than what I've been doing for years.

BTW: Inktomi has yet to respond to my queries.

 

kanetrain




msg:27074
 7:05 pm on Dec 13, 2003 (gmt 0)

SLURP has big issues right now with dynamic pages.

Tim




msg:27075
 7:08 pm on Dec 13, 2003 (gmt 0)

drbob please sticky mail the site info to me and I will take care of it.

kanetrain




msg:27076
 8:45 pm on Dec 15, 2003 (gmt 0)

Tim, are you aware of the other problems that INK is having indexing dynamic links as copies of the pages that they are linking to?
There were a few threads here on this issue (now there's just one), and there are many webmasters experiencing the same problems.
It appears that SLURP is crawling dynamic links as if they were exact duplicates of the pages they are crawling. It's really quite a weird phenomemon... the biggest problem is that in many cases, the duplicate content page (really just a dynamic link) is showing up in the SERPS instead of the legitimate webpage.
I don't know if I did a good job of explaining that, but it is indeed true. If you would like examples of this, I have some and can PM some of them to you.

Tim




msg:27077
 2:52 am on Dec 16, 2003 (gmt 0)

We checked your URL and you do in fact have a
robots.txt at your url. But it does no good at the subdirectory level. It needs to be at the root of (which you have no control over). Your other option is to use meta tags: <META NAME="robots"CONTENT="noindex">

Our FAQ is at: [support.inktomi.com...]
or specifically:http://support.inktomi.com/searchfaq.html#stopslurp
How do I stop your robot / spider (Slurp) from crawling my site?

Tim




msg:27078
 2:57 am on Dec 16, 2003 (gmt 0)

To respond to Kanetrain (off topic in this thread)

I read the thread over the weekend and this morning discussed the problem you are referring to with a few people but really need some examples to diagnose further if and where this is happening. Please sticky some examples to me.
Thanks

steveb




msg:27079
 3:45 am on Dec 16, 2003 (gmt 0)

Example stickyed.

Brett_Tabke




msg:27080
 6:48 pm on Dec 16, 2003 (gmt 0)

Thanks Tim.

kanetrain




msg:27081
 8:54 pm on Dec 16, 2003 (gmt 0)

Examples just stickied.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved