Forum Moderators: goodroi
Just a straight text robots file or no robots.txt at all is preferred. Just lookin' out for things the sometimes catch people..
I've tended to assume that unless the 404 page uses the word "disallow" or "allow" it would be OK. Maybe that's not quite right; it's easy enough to upload an empty robots.txt file anyway.
People have asked about 404 redirects. I couldn't decipher any indication from the Robots Exclusion Protocol, would you expect to follow the redirect or justs treat it as "Not Found" and ignore?
With regards to the 404 issue I do not understand I use a custom 404 which can be seen in my profile, what am i doing wrong, also do i need to use a robots text file if so where can i get one and where do i put it. Sorry for being a pain. Thanks guys Irv
redirect is generating a true 404 header first (before the redirect
Are you referring to the title of the page? I have custom 404's on all my sites. I haven't seen any problems for the most part, but most don't redirect, some do. Then again I have some sites that aren't getting into Google for some reason and I want to cover all my bases.
Brett wrote an excellent article on how to write your own robots.txt file. It is located here [searchengineworld.com]
The robots.txt file is separate from your 404 file Robots.txt is for search spiders that crawl your site and tells them what parts of your site is allowed to be crawled and not allowed to be crawled.
domain.com/lasjfljsldjsljfsljdf
Will just pump your error page to them with no redirect and no actual error header.
You need to generate a 404 NOT FOUND header. If you don't know, what your's is saying:
- go to StickyMail (login if you aren't).
- click on "headers" on the left menu.
- put in an address to a bogus file on your site (full http url)
- see what the header response is.
It should say:
HTTP/1.1 404 Not Found
Thanks Brett - I feel much better now! Time for a beer.
On the robots.txt standards site to which GG refers, the syntax for the META shows no space between the "noindex,nofollow"
On www.google.com/remove.html the syntax shows a space after the comma when describing the same META commands.
Does it make a difference? I understand that upper/lower case doesn't matter, but what about a space as in "NOINDEX, NOFOLLOW"?
Yes, if you have *any* robots.txt file in your root directory (blank or not), then your custom 404 page will not be invoked when Googlebot asks for robots.txt.
If robots.txt is missing completely *and* you have a custom 404 page, then GoogleGuy says this can cause problems for Googlebot.
For those of you who have found a problem using Brett's response header checker (see first page of this thread) and are using the Apache ErrorDocument directive, and getting 301 redirects instead of the desired 404 error code, make sure that the path you specify in your ErrorDocument directive is a *relative* path, i.e.
ErrorDocument /404file.html
If you use a remotely-hosted URL, or *any* URL starting with "http:" such as
ErrorDocument [yourdomain.com...]
then Apache will do an external permanent redirect, and return a 301 code instead of a 404. Therefore Googlebot and other robots may never remove your dead pages from their indices.
See the Apache Core Features ErrorDocument directive documentation for the official word.
Thanks for the server response checker, BT!
Jim
on this site i use the
ErrorDocument /
to redirect every file that didn't exists to the root. so if google requested the robot.txt file it gets the main page.
the same page is in a pr6 dmoz-category with domain.com in the anchortext and has lot of quality links.
it droped from the top 10 on the top 30 in ranking on all keywords. the pagerank remains. even if i type domain.com alone or with other keywords the site gets a start=20 ranking with 30 pages above me that link to mine. :(
was this causing the drop and how long will it remain when i add a robot.txt file ?
Just add
<% Response.Status = "404 Not Found" %>to your 404.asp code. The headers must be written before any output to the client occurs, so just make sure all your HTML and
Response.Write's are after that line.
This way, if you fail to include a robots.txt in your root dir, your custom error page will correctly return a "404 Not Found" status code.
I did the test on our site and got an (feared) error http/1.1 302 on an Apache server. I checked a site of a friend of mine hosted at the same hosting company (not necessarely on the same server), and it shows the same error code.
What should I do?
What can I do?
Write my hosting company asking what exactly?
Should I add a robots.txt file myself to the root directory of my site to neutralize this?
I am a novice regarding those more technical aspects.
Help!
TIA