|Inktomi's slurp is grabbing 5 GB+ daily|
Slurp appears to be in a loop
| 5:02 pm on Dec 13, 2003 (gmt 0)|
I have maintained a 250+ page, 42+ MB church site for the last 4 years with low activity (20-30 hits per day). Since 12/7/03, slurp has been running non-stop, getting pages and gobbling up bandwidth. I have contacted my web host daily and they keep zeroing out my monthly totals so that browsers can get to my site. They also installed a robot.txt file to inhibit ALL robots, but slurp continues to run and I am over my monthly limit of 5 GB every day with 200,000 to 400,000+ page accesses!
Inktomi claims to adhere to the standards that recognize robot.txt directives. Anyone know why they would be ignoring the following robot.txt entries?
Anyone else having this problem?
My web host has suggested that I set up my pages with a new member name. But would that really help? If inktomi finds us again, we'll be in the same boat.
Is there anything I could have done by mistake to trigger this looping? To my knowledge, I haven't done anything different than what I've been doing for years.
BTW: Inktomi has yet to respond to my queries.
| 7:05 pm on Dec 13, 2003 (gmt 0)|
SLURP has big issues right now with dynamic pages.
| 7:08 pm on Dec 13, 2003 (gmt 0)|
drbob please sticky mail the site info to me and I will take care of it.
| 8:45 pm on Dec 15, 2003 (gmt 0)|
Tim, are you aware of the other problems that INK is having indexing dynamic links as copies of the pages that they are linking to?
There were a few threads here on this issue (now there's just one), and there are many webmasters experiencing the same problems.
It appears that SLURP is crawling dynamic links as if they were exact duplicates of the pages they are crawling. It's really quite a weird phenomemon... the biggest problem is that in many cases, the duplicate content page (really just a dynamic link) is showing up in the SERPS instead of the legitimate webpage.
I don't know if I did a good job of explaining that, but it is indeed true. If you would like examples of this, I have some and can PM some of them to you.
| 2:52 am on Dec 16, 2003 (gmt 0)|
We checked your URL and you do in fact have a
robots.txt at your url. But it does no good at the subdirectory level. It needs to be at the root of (which you have no control over). Your other option is to use meta tags: <META NAME="robots"CONTENT="noindex">
Our FAQ is at: [support.inktomi.com...]
How do I stop your robot / spider (Slurp) from crawling my site?
| 2:57 am on Dec 16, 2003 (gmt 0)|
To respond to Kanetrain (off topic in this thread)
I read the thread over the weekend and this morning discussed the problem you are referring to with a few people but really need some examples to diagnose further if and where this is happening. Please sticky some examples to me.
| 3:45 am on Dec 16, 2003 (gmt 0)|
| 6:48 pm on Dec 16, 2003 (gmt 0)|
| 8:54 pm on Dec 16, 2003 (gmt 0)|
Examples just stickied.