homepage Welcome to WebmasterWorld Guest from 54.242.140.11
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Spiders get / and /robots.txt but no more!
Is my robots.txt killing me?
dwilson

10+ Year Member



 
Msg#: 217 posted 6:42 pm on Dec 11, 2003 (gmt 0)

I just put up the site mentioned in my profile last month. The logs show that spiders from several search engines have arrived. But they've gotten only the first page and robots.txt.

Here is the robots.txt file:
#Robots.txt for www.MyDomain.com
#Email editor@MyDomain.com with any questions.

User-agent: *
Disallow: /images/
Disallow: Purchase.php

I am trying to disallow spiders from indexing the /images folder and the Purchase.php page. Am I doing more than I'm intending?

I do find that when I follow a link at www.MyDomain.com, I end up dropping the "www." and go to MyDomain.com/MyPage.php.

Would that mess up a spider? I'm using relative links ... /MyDirectory/MyPage.php ... throughout.

Thanks for the help!

 

jimbeetle

WebmasterWorld Senior Member jimbeetle us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 217 posted 7:02 pm on Dec 11, 2003 (gmt 0)

I can't see anything that should stop them. Your file validates in Brett's robots.txt validator [searchengineworld.com]. The only thing might be to insert a space between the # and actual comment, not sure how nitpicky some spiders are, but that's the format used at robotstxt.org.

There doesn't look to be anything on your index page to stop them going further. You might try putting up a fresh link or two on the page and see if they follow those.

dwilson

10+ Year Member



 
Msg#: 217 posted 7:10 pm on Dec 11, 2003 (gmt 0)

Thanks, Jim.

I do have some more content almost ready to add, so I'll provide a link from the main page & see if they get it.

engine

WebmasterWorld Administrator engine us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 217 posted 7:18 pm on Dec 11, 2003 (gmt 0)

>#Email editor@MyDomain.com with any questions.

I'd remove that to help cut back on e-mail spam.

dwilson

10+ Year Member



 
Msg#: 217 posted 7:20 pm on Dec 11, 2003 (gmt 0)

Thanks, Engine. Good idea.

dwilson

10+ Year Member



 
Msg#: 217 posted 8:55 pm on Dec 11, 2003 (gmt 0)

A service I was trying to use for my site-level search (not G, as I can't get them to re-index whenever I want) explained my problem.

"I took a look at your account and I noticed that your page has links such as
[MyDomain.com...] Because there is a slash missing from
the end to make the URL [MyDomain.com...] the spider
receiveds a re-direct to [MyDomain.com...] but cannot
follow the re-direct. "

That was the case for one spider, at least ... possibly more.

Hope this helps somebody else -- and thanks to those who gave me some ideas earlier.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved