homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

Spiders get / and /robots.txt but no more!
Is my robots.txt killing me?

 6:42 pm on Dec 11, 2003 (gmt 0)

I just put up the site mentioned in my profile last month. The logs show that spiders from several search engines have arrived. But they've gotten only the first page and robots.txt.

Here is the robots.txt file:
#Robots.txt for www.MyDomain.com
#Email editor@MyDomain.com with any questions.

User-agent: *
Disallow: /images/
Disallow: Purchase.php

I am trying to disallow spiders from indexing the /images folder and the Purchase.php page. Am I doing more than I'm intending?

I do find that when I follow a link at www.MyDomain.com, I end up dropping the "www." and go to MyDomain.com/MyPage.php.

Would that mess up a spider? I'm using relative links ... /MyDirectory/MyPage.php ... throughout.

Thanks for the help!



 7:02 pm on Dec 11, 2003 (gmt 0)

I can't see anything that should stop them. Your file validates in Brett's robots.txt validator [searchengineworld.com]. The only thing might be to insert a space between the # and actual comment, not sure how nitpicky some spiders are, but that's the format used at robotstxt.org.

There doesn't look to be anything on your index page to stop them going further. You might try putting up a fresh link or two on the page and see if they follow those.


 7:10 pm on Dec 11, 2003 (gmt 0)

Thanks, Jim.

I do have some more content almost ready to add, so I'll provide a link from the main page & see if they get it.


 7:18 pm on Dec 11, 2003 (gmt 0)

>#Email editor@MyDomain.com with any questions.

I'd remove that to help cut back on e-mail spam.


 7:20 pm on Dec 11, 2003 (gmt 0)

Thanks, Engine. Good idea.


 8:55 pm on Dec 11, 2003 (gmt 0)

A service I was trying to use for my site-level search (not G, as I can't get them to re-index whenever I want) explained my problem.

"I took a look at your account and I noticed that your page has links such as
[MyDomain.com...] Because there is a slash missing from
the end to make the URL [MyDomain.com...] the spider
receiveds a re-direct to [MyDomain.com...] but cannot
follow the re-direct. "

That was the case for one spider, at least ... possibly more.

Hope this helps somebody else -- and thanks to those who gave me some ideas earlier.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved