Welcome to WebmasterWorld Guest from 54.211.82.105

Forum Moderators: mack

Message Too Old, No Replies

ROBOTS.TXT - 404 Error

spiders can't find my Robots.txt file - help please

     
7:34 pm on Jun 27, 2005 (gmt 0)

New User

10+ Year Member

joined:June 27, 2005
posts:6
votes: 0


Hi all,
My first post here.
the Robots.txt file IS in my root directory but no/none/nada/nyet webspiders are finding it. It is a simple * allow all spiders,... no disallows etc.
The syntax IS correct.
I've used the robots.txt validator tool over at www.searchengineworld.com and it gives a 404 error also.
Is there some "security" or other setting on my webserver I'm missing.
Any help appreciated

PS- I tried "site search" to find a thread for this question here at webmasterworld but as I'm a noobie I can't figure out the google instructions posted.

7:37 pm on June 27, 2005 (gmt 0)

Preferred Member

10+ Year Member

joined:July 14, 2004
posts:388
votes: 3


test the location for yourself...

[yourdomain.com...]

if you get a 404 error...

then it's NOT in the correct place..

should be in the same directory as your index.htm (index.html,index.cfm,default.htm...etc)

7:38 pm on June 27, 2005 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7575
votes: 0


Welcome to WebmasterWorld!

First, check the spelling and case of your file name. robots.txt should be in lower case.

Also, are you sure you've placed the file in the root of your web directory?

For more info: [robotstxt.org...]

7:39 pm on June 27, 2005 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member encyclo is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 31, 2003
posts:9073
votes: 4


Welcome to WebmasterWorld [webmasterworld.com] RgPhnx!

Also check capitalization: "the file should be called "robots.txt" (all lower-case), not Robots.txt or ROBOTS.TXT.

7:58 pm on June 27, 2005 (gmt 0)

New User

10+ Year Member

joined:June 27, 2005
posts:6
votes: 0


Hi all,
Wow you guys are fast..thanks for the suggestions.
Unfortunately ..problem is not solved.
>the file is in the correct root directory (I can see It right alongside the index.html file when I use my FTP client program to log onto the server)
> the filename spelling is correct (ie. it's robots.txt.....NOT ROBOTS.TXT..etc)
> typing " [mydomain.com...] " in the webbrowser .. returns the same 404 error as the validator tool.
Any other suggestions?
8:01 pm on June 27, 2005 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 1, 2003
posts:158
votes: 0


Howdy,

Is the robots.txt file inside the header of your pages? You mentioned you could see it "alongside" your index.html when using FTP. Just wonderin'

Goober

8:09 pm on June 27, 2005 (gmt 0)

New User

10+ Year Member

joined:June 27, 2005
posts:6
votes: 0


Hi goober,
don't exactly understand your question re: "inside the header"..etc.

When FTp into the server this is what it looks like:

/
[folder logo]..
[folder logo] logs
[folder logo] mail
[folder logo] www
etcetera
default.htm
default.html
index.htm
index.html
robots.txt

Hope this makes sense. too bad you can't post real pics in the posts.

8:11 pm on June 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 18, 2003
posts:1925
votes: 0


I think you have to put all your files into
www
folder if you want to make them public.
8:19 pm on June 27, 2005 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7575
votes: 0


I'd agree with moltar.

And if that's not it you may want to ask them if they're using some Apache directive to block it or invalidate or? in the htaccess file for some unknown reason (I'm reaching with this one).

8:21 pm on June 27, 2005 (gmt 0)

New User

10+ Year Member

joined:June 27, 2005
posts:6
votes: 0


Regarding "reaching"
What specifically should I ask the webserver admins regarding "re-directives"?
8:30 pm on June 27, 2005 (gmt 0)

New User

10+ Year Member

joined:June 27, 2005
posts:6
votes: 0


Hi agaoin guys,
took moltars advice & put the robots.txt file in the "www" folder/directory..then
used "http://www.mydomain.com/robots.txt " to search for it with the webbrowser.
That wiped out the 404 error message & found the file.
:) :)
QUESTION:
I UNDERSTAND THE LOGIC OF PUTING IT INTO THE WWW FOLDER (or the "public_html" folder).. BUT..
Why do all the tutorials I've read SPECIFICALLY say to put it in the "root" directory?
8:44 pm on June 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 18, 2003
posts:1925
votes: 0


By "root" they mean "web root".

There are many hosting companies and many setups. On some servers the ftp root is the same as www root. In others you have "www", "public_html", "httpdocs" and multiple other variations.

It's done that way so that you can "hide" certain files from public view. For example if in your case everything was public, surely you wouldn't want someone typing in widget.com/mail and looking throught your mail.

If you put any files below the web root, they will not be accessible via HTTP. You can put data that you want to protect. E.g. password files, mailing list databases, etc... Basically, anything that you wouldn't want to be downloaded by someone.

8:53 pm on June 27, 2005 (gmt 0)

New User

10+ Year Member

joined:June 27, 2005
posts:6
votes: 0


Hi moltar,
Thanks for the clarification.
Guess I'm too "old school" from DOS programing days when "root" meant the
.
..
...
directory.
:) :)
Again,
Many thanks for the suggestion that solved the problem.
;) ;)
9:08 pm on June 27, 2005 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 18, 2003
posts:1925
votes: 0


You are right. Root is still same old root. What the article was refering to is "web root".

Web root is never the same as root of the file sistem. Web root is when you do a HTTP request to /. And the "/" is mapped by the web server to some place in the file system.

When you do the request to "/robots.txt" the web server looks in "/home/user/www/robots.txt" and serves the file to the user. As far as web server is conserned the root is "/home/user/www". It can't see below that.

2:17 am on June 28, 2005 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lorax is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Mar 31, 2002
posts:7575
votes: 0


As far as web server is conserned the root is "/home/user/www". It can't see below that.

ermm... I think you meant above that. ;)