homepage Welcome to WebmasterWorld Guest from 23.20.220.79
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / WebmasterWorld / New To Web Development
Forum Library, Charter, Moderators: brotherhood of lan & mack

New To Web Development Forum

    
ROBOTS.TXT - 404 Error
spiders can't find my Robots.txt file - help please
RgPhnx




msg:965603
 7:34 pm on Jun 27, 2005 (gmt 0)

Hi all,
My first post here.
the Robots.txt file IS in my root directory but no/none/nada/nyet webspiders are finding it. It is a simple * allow all spiders,... no disallows etc.
The syntax IS correct.
I've used the robots.txt validator tool over at www.searchengineworld.com and it gives a 404 error also.
Is there some "security" or other setting on my webserver I'm missing.
Any help appreciated

PS- I tried "site search" to find a thread for this question here at webmasterworld but as I'm a noobie I can't figure out the google instructions posted.

 

Terabytes




msg:965604
 7:37 pm on Jun 27, 2005 (gmt 0)

test the location for yourself...

[yourdomain.com...]

if you get a 404 error...

then it's NOT in the correct place..

should be in the same directory as your index.htm (index.html,index.cfm,default.htm...etc)

lorax




msg:965605
 7:38 pm on Jun 27, 2005 (gmt 0)

Welcome to WebmasterWorld!

First, check the spelling and case of your file name. robots.txt should be in lower case.

Also, are you sure you've placed the file in the root of your web directory?

For more info: [robotstxt.org...]

encyclo




msg:965606
 7:39 pm on Jun 27, 2005 (gmt 0)

Welcome to WebmasterWorld [webmasterworld.com] RgPhnx!

Also check capitalization: "the file should be called "robots.txt" (all lower-case), not Robots.txt or ROBOTS.TXT.

RgPhnx




msg:965607
 7:58 pm on Jun 27, 2005 (gmt 0)

Hi all,
Wow you guys are fast..thanks for the suggestions.
Unfortunately ..problem is not solved.
>the file is in the correct root directory (I can see It right alongside the index.html file when I use my FTP client program to log onto the server)
> the filename spelling is correct (ie. it's robots.txt.....NOT ROBOTS.TXT..etc)
> typing " [mydomain.com...] " in the webbrowser .. returns the same 404 error as the validator tool.
Any other suggestions?

Goober




msg:965608
 8:01 pm on Jun 27, 2005 (gmt 0)

Howdy,

Is the robots.txt file inside the header of your pages? You mentioned you could see it "alongside" your index.html when using FTP. Just wonderin'

Goober

RgPhnx




msg:965609
 8:09 pm on Jun 27, 2005 (gmt 0)

Hi goober,
don't exactly understand your question re: "inside the header"..etc.

When FTp into the server this is what it looks like:

/
[folder logo]..
[folder logo] logs
[folder logo] mail
[folder logo] www
etcetera
default.htm
default.html
index.htm
index.html
robots.txt

Hope this makes sense. too bad you can't post real pics in the posts.

moltar




msg:965610
 8:11 pm on Jun 27, 2005 (gmt 0)

I think you have to put all your files into
www folder if you want to make them public.
lorax




msg:965611
 8:19 pm on Jun 27, 2005 (gmt 0)

I'd agree with moltar.

And if that's not it you may want to ask them if they're using some Apache directive to block it or invalidate or? in the htaccess file for some unknown reason (I'm reaching with this one).

RgPhnx




msg:965612
 8:21 pm on Jun 27, 2005 (gmt 0)

Regarding "reaching"
What specifically should I ask the webserver admins regarding "re-directives"?

RgPhnx




msg:965613
 8:30 pm on Jun 27, 2005 (gmt 0)

Hi agaoin guys,
took moltars advice & put the robots.txt file in the "www" folder/directory..then
used "http://www.mydomain.com/robots.txt " to search for it with the webbrowser.
That wiped out the 404 error message & found the file.
:) :)
QUESTION:
I UNDERSTAND THE LOGIC OF PUTING IT INTO THE WWW FOLDER (or the "public_html" folder).. BUT..
Why do all the tutorials I've read SPECIFICALLY say to put it in the "root" directory?

moltar




msg:965614
 8:44 pm on Jun 27, 2005 (gmt 0)

By "root" they mean "web root".

There are many hosting companies and many setups. On some servers the ftp root is the same as www root. In others you have "www", "public_html", "httpdocs" and multiple other variations.

It's done that way so that you can "hide" certain files from public view. For example if in your case everything was public, surely you wouldn't want someone typing in widget.com/mail and looking throught your mail.

If you put any files below the web root, they will not be accessible via HTTP. You can put data that you want to protect. E.g. password files, mailing list databases, etc... Basically, anything that you wouldn't want to be downloaded by someone.

RgPhnx




msg:965615
 8:53 pm on Jun 27, 2005 (gmt 0)

Hi moltar,
Thanks for the clarification.
Guess I'm too "old school" from DOS programing days when "root" meant the
.
..
...
directory.
:) :)
Again,
Many thanks for the suggestion that solved the problem.
;) ;)

moltar




msg:965616
 9:08 pm on Jun 27, 2005 (gmt 0)

You are right. Root is still same old root. What the article was refering to is "web root".

Web root is never the same as root of the file sistem. Web root is when you do a HTTP request to /. And the "/" is mapped by the web server to some place in the file system.

When you do the request to "/robots.txt" the web server looks in "/home/user/www/robots.txt" and serves the file to the user. As far as web server is conserned the root is "/home/user/www". It can't see below that.

lorax




msg:965617
 2:17 am on Jun 28, 2005 (gmt 0)

As far as web server is conserned the root is "/home/user/www". It can't see below that.

ermm... I think you meant above that. ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / New To Web Development
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved