Forum Moderators: open
User-agent: *
Disallow:
I ran this on a simulator and it came back with a 200 OK message or something close to that.
K, have your laughs at my expense, I don't mind. It seems this time of the month everyone needs something or someone to laugh at :)
The robots.txt proposal recommends (but does not require) that a 403 response to the robots.txt should mean "stay out of this site completely." We followed that behavior until the last couple months. However, we saw several users (and at least one ISP) return a 403 as a mistake so often that we changed our behavior for this. It's fine to have no robots.txt file.
Hope that helps, and welcome to WebmasterWorld!
GoogleGuy
have your laughs at my expense, I don't mind.
No-one is laughing. Some of us might laugh, but certainly not at someone who has the good sense to ask a reasonable question. You're not a new-new user here, but again, Welcome to WebmasterWorld! We all have to start somewhere - and learn as we go... Enjoy it.
Your robots.txt looks correct, and will allow robots to spider all of your pages. At the same time its presence will prevent your error log from being filled with 404-Not Found errors when robots come calling and asking for robots.txt.
You can validate your robots.txt using the robots.txt validator [searchengineworld.com] for extra assurance.
HTH,
Jim
Google (rightly) took that to mean "don't crawl the site", but they realised that most 403s for /robots.txt are mistakes, so they changed their behavior quite recently.
coosblues should be fine, with /robots.txt returning 200 Found and the following content.
User-agent: *
Disallow:
I have been trying out Funnel Web Profiler. It would not follow my site because of "Not allowd by Robot.txt". My Robots.txt is
User-agent: *
Disallow:
Why is that? Once I removed it there was no problem. My Q is - if it allows all to follow how come it stoppede Funnel Web Profiler, even when I made it simulate Mozilla/4.0?
It has me worried.
Googleguy clearly stated that some servers return 403 error instead of 404, thus the people at google decided to ignore it and not take the 403 error as a "stay out" order.
But that was not what I actually wanted to emphasize, my goal was to point your attention to the sentence that followed :"It's fine to have no robots.txt file."
I posted it because coosblues posted that googleguy adviced to have the file, but on the other hand in a post I made googleguy responded that it is fine not to have that file :) So now I am confused - to have the robots file or not to have it? :)
matuloo,
Googleguy's response was potentially confusing because the 403 discussion was off-topic, that's all. The bit about not needing a robots.txt was overshadowed by the longer statement about 403 vs. 404, IMHO.
---
If you have no robots.txt, your site's error log file may accumulate lots of 404-Not Found errors from robots trying to fetch robots.txt. If you don't care about having to sort through those errors while looking for real errors (such as those caused by internal and external broken links) and you want to let all robots spider all pages on your site, then it is absolutely OK to not have a robots.txt file on your site.
If you want to prevent the 404 errors in your log file, the next alternative is to upload or create a blank robots.txt file to/on your server. This prevents the 404 errors. Also, since the robots can't find anything in the blank file to tell them not to spider anything on your site, they will assume that they are welcome to full access.
If you'd like to keep all spiders out of some pages of your site, or some spiders out of all pages of your site, or some spiders out of some pages of your site, then write a robots.txt file conforming to the robots exclusion standard, and validate it [searchengineworld.com] before publishing it.
Jim