Slurp just doesn't like my site

Forum Moderators: open

Message Too Old, No Replies

Slurp just doesn't like my site

Could my robots.txt file somehow be the problem?

mayor

8:39 am on Dec 4, 2002 (gmt 0)

I have a site of about 15 pages that's been on the Web since July and Inktomi won't deep crawl it. It only crawls robots.txt and the index page. For instance, on Dec. 2 Ink/Si came in and got robots.txt. The next day Ink/cat came in and got only the index page, then left. I have seen this behavior before ... no deep crawl.

All the other SE's have indexed the whole site.

Could there be something wrong with my robots.txt file that somehow gives Slurp a headache and tells it to only get the index page. Here it is, a direct cut and paste:

# bad bots, you are not welcome here so get lost

User-agent: ia_archiver
Disallow: /

User-agent: ia_archiver/1.6
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: sitecheck.internetseer.com
Disallow: /

jdMorgan

8:45 am on Dec 4, 2002 (gmt 0)

mayor,

It looks OK.

Just for insurance, try running your robots.txt file through the robots.txt validator [searchengineworld.com].

Ink is very slow unless you've paid them for inclusion and frequent spidering. I think it took them almost three months to pick up one of my non-paid sites... So, you can pay or wait. :(

HTH,
Jim

mayor

9:06 am on Dec 4, 2002 (gmt 0)

What continues to perplex me is that it Slurp just takes the index page. I do have a link to the site map on the index page, but Slurp won't go get it.

Other people have reported this same behavior, but it seems no one has an explanation. Yet others are reporting that Slurp crawls all over their site.

It's not my policy to disclose URLs, but this site is an experiment with a serious dose of high quality content to see if content is really king. Google says "yes". Fast says "yes". Ask Jeeves says "yes". Alta Vista says "yes". But Inktomi is saying "no".

shelleycat

9:20 am on Dec 4, 2002 (gmt 0)

I have a robots.txt very much like yours with the addition of this right at the bottom:

User-agent: *
Disallow: /

I actually built it from one of the robots.txt's I found here at webmasterworld.

I've had no problem with slurp deepcrawling my site over the past one or two months, although I did not submit anywhere or pay anyone for this. However for the first four months or so after they found me they only crawled my index page and sometimes one other main page. I don't know why slurp suddenly decided they liked me but I assumed it was either the passage of time or an increase in backlinks. Slurp has given my files a couple of good going overs since the deepcrawling started anyrate. The robots.txt was the same all this time so I don't think it made any difference.

mayor

9:30 am on Dec 4, 2002 (gmt 0)

Shellycat, I apologize that I'm a little confused with your addition.

Unless your file containing:

User-agent: *
Disallow: /

includes a more specific override for individual bots like Slurp, namely:

User-agent: slurp
Disallow:

I would think you are <gasp> excluding all the bots

[edited by: mayor at 9:36 am (utc) on Dec. 4, 2002]

shelleycat

9:34 am on Dec 4, 2002 (gmt 0)

Doh! OK I had it the other way around. Yes, my robots.txt has a list of "allowed" bots (including slurp) and then that at the end. I'm not sure now if my case is applicable to yours, although I would say that if they're taking the index page then the robots.txt file probably isn't upsetting them.

Heh, I think it's time I went to bed :)

mayor

9:38 am on Dec 4, 2002 (gmt 0)

Thanks a lot for the insight, shellycat. I guess I really have to just wait a while longer. Guess I'll go read "Rip Van Winkle" for farther clues on Inktomi.