homepage Welcome to WebmasterWorld Guest from 54.204.249.184
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
Forum Library, Charter, Moderator: open

Paid Inclusion Engines and Topics Forum

  posting off  
Slurp just doesn't like my site
Could my robots.txt file somehow be the problem?
mayor




msg:28282
 8:39 am on Dec 4, 2002 (gmt 0)

I have a site of about 15 pages that's been on the Web since July and Inktomi won't deep crawl it. It only crawls robots.txt and the index page. For instance, on Dec. 2 Ink/Si came in and got robots.txt. The next day Ink/cat came in and got only the index page, then left. I have seen this behavior before ... no deep crawl.

All the other SE's have indexed the whole site.

Could there be something wrong with my robots.txt file that somehow gives Slurp a headache and tells it to only get the index page. Here it is, a direct cut and paste:

# bad bots, you are not welcome here so get lost

User-agent: ia_archiver
Disallow: /

User-agent: ia_archiver/1.6
Disallow: /

User-agent: Alexibot
Disallow: /

User-agent: EmailCollector
Disallow: /

User-agent: WebBandit
Disallow: /

User-agent: EmailWolf
Disallow: /

User-agent: ExtractorPro
Disallow: /

User-agent: Zeus
Disallow: /

User-agent: sitecheck.internetseer.com
Disallow: /

 

jdMorgan




msg:28283
 8:45 am on Dec 4, 2002 (gmt 0)

mayor,

It looks OK.

Just for insurance, try running your robots.txt file through the robots.txt validator [searchengineworld.com].

Ink is very slow unless you've paid them for inclusion and frequent spidering. I think it took them almost three months to pick up one of my non-paid sites... So, you can pay or wait. :(

HTH,
Jim

mayor




msg:28284
 9:06 am on Dec 4, 2002 (gmt 0)

What continues to perplex me is that it Slurp just takes the index page. I do have a link to the site map on the index page, but Slurp won't go get it.

Other people have reported this same behavior, but it seems no one has an explanation. Yet others are reporting that Slurp crawls all over their site.

It's not my policy to disclose URLs, but this site is an experiment with a serious dose of high quality content to see if content is really king. Google says "yes". Fast says "yes". Ask Jeeves says "yes". Alta Vista says "yes". But Inktomi is saying "no".

shelleycat




msg:28285
 9:20 am on Dec 4, 2002 (gmt 0)

I have a robots.txt very much like yours with the addition of this right at the bottom:

User-agent: *
Disallow: /

I actually built it from one of the robots.txt's I found here at webmasterworld.

I've had no problem with slurp deepcrawling my site over the past one or two months, although I did not submit anywhere or pay anyone for this. However for the first four months or so after they found me they only crawled my index page and sometimes one other main page. I don't know why slurp suddenly decided they liked me but I assumed it was either the passage of time or an increase in backlinks. Slurp has given my files a couple of good going overs since the deepcrawling started anyrate. The robots.txt was the same all this time so I don't think it made any difference.

mayor




msg:28286
 9:30 am on Dec 4, 2002 (gmt 0)

Shellycat, I apologize that I'm a little confused with your addition.

Unless your file containing:

User-agent: *
Disallow: /

includes a more specific override for individual bots like Slurp, namely:

User-agent: slurp
Disallow:

I would think you are <gasp> excluding all the bots

[edited by: mayor at 9:36 am (utc) on Dec. 4, 2002]

shelleycat




msg:28287
 9:34 am on Dec 4, 2002 (gmt 0)

Doh! OK I had it the other way around. Yes, my robots.txt has a list of "allowed" bots (including slurp) and then that at the end. I'm not sure now if my case is applicable to yours, although I would say that if they're taking the index page then the robots.txt file probably isn't upsetting them.

Heh, I think it's time I went to bed :)

mayor




msg:28288
 9:38 am on Dec 4, 2002 (gmt 0)

Thanks a lot for the insight, shellycat. I guess I really have to just wait a while longer. Guess I'll go read "Rip Van Winkle" for farther clues on Inktomi.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Advertising / Paid Inclusion Engines and Topics
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved