Welcome to WebmasterWorld Guest from 54.145.235.23

Forum Moderators: goodroi

Message Too Old, No Replies

Spiders get / and /robots.txt but no more!

Is my robots.txt killing me?

     
6:42 pm on Dec 11, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Mar 3, 2003
posts:306
votes: 0


I just put up the site mentioned in my profile last month. The logs show that spiders from several search engines have arrived. But they've gotten only the first page and robots.txt.

Here is the robots.txt file:
#Robots.txt for www.MyDomain.com
#Email editor@MyDomain.com with any questions.

User-agent: *
Disallow: /images/
Disallow: Purchase.php

I am trying to disallow spiders from indexing the /images folder and the Purchase.php page. Am I doing more than I'm intending?

I do find that when I follow a link at www.MyDomain.com, I end up dropping the "www." and go to MyDomain.com/MyPage.php.

Would that mess up a spider? I'm using relative links ... /MyDirectory/MyPage.php ... throughout.

Thanks for the help!

7:02 pm on Dec 11, 2003 (gmt 0)

Senior Member

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 26, 2002
posts:3295
votes: 6


I can't see anything that should stop them. Your file validates in Brett's robots.txt validator [searchengineworld.com]. The only thing might be to insert a space between the # and actual comment, not sure how nitpicky some spiders are, but that's the format used at robotstxt.org.

There doesn't look to be anything on your index page to stop them going further. You might try putting up a fresh link or two on the page and see if they follow those.

7:10 pm on Dec 11, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Mar 3, 2003
posts:306
votes: 0


Thanks, Jim.

I do have some more content almost ready to add, so I'll provide a link from the main page & see if they get it.

7:18 pm on Dec 11, 2003 (gmt 0)

Administrator from GB 

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:May 9, 2000
posts:23807
votes: 456


>#Email editor@MyDomain.com with any questions.

I'd remove that to help cut back on e-mail spam.

7:20 pm on Dec 11, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Mar 3, 2003
posts:306
votes: 0


Thanks, Engine. Good idea.
8:55 pm on Dec 11, 2003 (gmt 0)

Full Member

10+ Year Member

joined:Mar 3, 2003
posts:306
votes: 0


A service I was trying to use for my site-level search (not G, as I can't get them to re-index whenever I want) explained my problem.

"I took a look at your account and I noticed that your page has links such as
[MyDomain.com...] Because there is a slash missing from
the end to make the URL [MyDomain.com...] the spider
receiveds a re-direct to [MyDomain.com...] but cannot
follow the re-direct. "

That was the case for one spider, at least ... possibly more.

Hope this helps somebody else -- and thanks to those who gave me some ideas earlier.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members