Welcome to WebmasterWorld Guest from 107.22.14.254

Forum Moderators: open

Message Too Old, No Replies

Inktomi's slurp is grabbing 5 GB+ daily

Slurp appears to be in a loop

     
5:02 pm on Dec 13, 2003 (gmt 0)

10+ Year Member



I have maintained a 250+ page, 42+ MB church site for the last 4 years with low activity (20-30 hits per day). Since 12/7/03, slurp has been running non-stop, getting pages and gobbling up bandwidth. I have contacted my web host daily and they keep zeroing out my monthly totals so that browsers can get to my site. They also installed a robot.txt file to inhibit ALL robots, but slurp continues to run and I am over my monthly limit of 5 GB every day with 200,000 to 400,000+ page accesses!

Inktomi claims to adhere to the standards that recognize robot.txt directives. Anyone know why they would be ignoring the following robot.txt entries?

User-agent: *
Disallow: /

Anyone else having this problem?

My web host has suggested that I set up my pages with a new member name. But would that really help? If inktomi finds us again, we'll be in the same boat.

Is there anything I could have done by mistake to trigger this looping? To my knowledge, I haven't done anything different than what I've been doing for years.

BTW: Inktomi has yet to respond to my queries.

7:05 pm on Dec 13, 2003 (gmt 0)

10+ Year Member



SLURP has big issues right now with dynamic pages.

Tim

7:08 pm on Dec 13, 2003 (gmt 0)

10+ Year Member



drbob please sticky mail the site info to me and I will take care of it.
8:45 pm on Dec 15, 2003 (gmt 0)

10+ Year Member



Tim, are you aware of the other problems that INK is having indexing dynamic links as copies of the pages that they are linking to?
There were a few threads here on this issue (now there's just one), and there are many webmasters experiencing the same problems.
It appears that SLURP is crawling dynamic links as if they were exact duplicates of the pages they are crawling. It's really quite a weird phenomemon... the biggest problem is that in many cases, the duplicate content page (really just a dynamic link) is showing up in the SERPS instead of the legitimate webpage.
I don't know if I did a good job of explaining that, but it is indeed true. If you would like examples of this, I have some and can PM some of them to you.

Tim

2:52 am on Dec 16, 2003 (gmt 0)

10+ Year Member



We checked your URL and you do in fact have a
robots.txt at your url. But it does no good at the subdirectory level. It needs to be at the root of (which you have no control over). Your other option is to use meta tags: <META NAME="robots"CONTENT="noindex">

Our FAQ is at: [support.inktomi.com...]
or specifically:http://support.inktomi.com/searchfaq.html#stopslurp
How do I stop your robot / spider (Slurp) from crawling my site?

Tim

2:57 am on Dec 16, 2003 (gmt 0)

10+ Year Member



To respond to Kanetrain (off topic in this thread)

I read the thread over the weekend and this morning discussed the problem you are referring to with a few people but really need some examples to diagnose further if and where this is happening. Please sticky some examples to me.
Thanks

3:45 am on Dec 16, 2003 (gmt 0)

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Example stickyed.
6:48 pm on Dec 16, 2003 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Thanks Tim.
8:54 pm on Dec 16, 2003 (gmt 0)

10+ Year Member



Examples just stickied.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month