Welcome to WebmasterWorld Guest from 54.146.180.94

Forum Moderators: goodroi

Spiders get / and /robots.txt but no more!

Is my robots.txt killing me?

   
6:42 pm on Dec 11, 2003 (gmt 0)

10+ Year Member



I just put up the site mentioned in my profile last month. The logs show that spiders from several search engines have arrived. But they've gotten only the first page and robots.txt.

Here is the robots.txt file:
#Robots.txt for www.MyDomain.com
#Email editor@MyDomain.com with any questions.

User-agent: *
Disallow: /images/
Disallow: Purchase.php

I am trying to disallow spiders from indexing the /images folder and the Purchase.php page. Am I doing more than I'm intending?

I do find that when I follow a link at www.MyDomain.com, I end up dropping the "www." and go to MyDomain.com/MyPage.php.

Would that mess up a spider? I'm using relative links ... /MyDirectory/MyPage.php ... throughout.

Thanks for the help!

7:02 pm on Dec 11, 2003 (gmt 0)

WebmasterWorld Senior Member jimbeetle is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I can't see anything that should stop them. Your file validates in Brett's robots.txt validator [searchengineworld.com]. The only thing might be to insert a space between the # and actual comment, not sure how nitpicky some spiders are, but that's the format used at robotstxt.org.

There doesn't look to be anything on your index page to stop them going further. You might try putting up a fresh link or two on the page and see if they follow those.

7:10 pm on Dec 11, 2003 (gmt 0)

10+ Year Member



Thanks, Jim.

I do have some more content almost ready to add, so I'll provide a link from the main page & see if they get it.

7:18 pm on Dec 11, 2003 (gmt 0)

WebmasterWorld Administrator engine is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



>#Email editor@MyDomain.com with any questions.

I'd remove that to help cut back on e-mail spam.

7:20 pm on Dec 11, 2003 (gmt 0)

10+ Year Member



Thanks, Engine. Good idea.
8:55 pm on Dec 11, 2003 (gmt 0)

10+ Year Member



A service I was trying to use for my site-level search (not G, as I can't get them to re-index whenever I want) explained my problem.

"I took a look at your account and I noticed that your page has links such as
[MyDomain.com...] Because there is a slash missing from
the end to make the URL [MyDomain.com...] the spider
receiveds a re-direct to [MyDomain.com...] but cannot
follow the re-direct. "

That was the case for one spider, at least ... possibly more.

Hope this helps somebody else -- and thanks to those who gave me some ideas earlier.

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month