Welcome to WebmasterWorld Guest from 54.145.11.9

Forum Moderators: open

Message Too Old, No Replies

Slurp just doesn't like my site

Could my robots.txt file somehow be the problem?

     
8:39 am on Dec 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 29, 2000
posts:1133
votes: 0


I have a site of about 15 pages that's been on the Web since July and Inktomi won't deep crawl it. It only crawls robots.txt and the index page. For instance, on Dec. 2 Ink/Si came in and got robots.txt. The next day Ink/cat came in and got only the index page, then left. I have seen this behavior before ... no deep crawl.

All the other SE's have indexed the whole site.

Could there be something wrong with my robots.txt file that somehow gives Slurp a headache and tells it to only get the index page. Here it is, a direct cut and paste:

# bad bots, you are not welcome here so get lost

User-agent: ia_archiver
Disallow: /

User-agent: ia_archiver/1.6
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: sitecheck.internetseer.com
Disallow: /

8:45 am on Dec 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


mayor,

It looks OK.

Just for insurance, try running your robots.txt file through the robots.txt validator [searchengineworld.com].

Ink is very slow unless you've paid them for inclusion and frequent spidering. I think it took them almost three months to pick up one of my non-paid sites... So, you can pay or wait. :(

HTH,
Jim

9:06 am on Dec 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 29, 2000
posts:1133
votes: 0


What continues to perplex me is that it Slurp just takes the index page. I do have a link to the site map on the index page, but Slurp won't go get it.

Other people have reported this same behavior, but it seems no one has an explanation. Yet others are reporting that Slurp crawls all over their site.

It's not my policy to disclose URLs, but this site is an experiment with a serious dose of high quality content to see if content is really king. Google says "yes". Fast says "yes". Ask Jeeves says "yes". Alta Vista says "yes". But Inktomi is saying "no".

9:20 am on Dec 4, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:July 26, 2002
posts:166
votes: 0


I have a robots.txt very much like yours with the addition of this right at the bottom:

User-agent: *
Disallow: /

I actually built it from one of the robots.txt's I found here at webmasterworld.

I've had no problem with slurp deepcrawling my site over the past one or two months, although I did not submit anywhere or pay anyone for this. However for the first four months or so after they found me they only crawled my index page and sometimes one other main page. I don't know why slurp suddenly decided they liked me but I assumed it was either the passage of time or an increase in backlinks. Slurp has given my files a couple of good going overs since the deepcrawling started anyrate. The robots.txt was the same all this time so I don't think it made any difference.

9:30 am on Dec 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 29, 2000
posts:1133
votes: 0


Shellycat, I apologize that I'm a little confused with your addition.

Unless your file containing:

User-agent: *
Disallow: /

includes a more specific override for individual bots like Slurp, namely:

User-agent: slurp
Disallow:

I would think you are <gasp> excluding all the bots

[edited by: mayor at 9:36 am (utc) on Dec. 4, 2002]

9:34 am on Dec 4, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:July 26, 2002
posts:166
votes: 0


Doh! OK I had it the other way around. Yes, my robots.txt has a list of "allowed" bots (including slurp) and then that at the end. I'm not sure now if my case is applicable to yours, although I would say that if they're taking the index page then the robots.txt file probably isn't upsetting them.

Heh, I think it's time I went to bed :)

9:38 am on Dec 4, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 29, 2000
posts:1133
votes: 0


Thanks a lot for the insight, shellycat. I guess I really have to just wait a while longer. Guess I'll go read "Rip Van Winkle" for farther clues on Inktomi.