Forum Moderators: open

Message Too Old, No Replies

robots.txt file

to add or not

         

coosblues

7:22 am on Nov 27, 2002 (gmt 0)

10+ Year Member



I'm always hesistant to post on these forums because i'm all so new to this so forgive my ignorance :) . I've read the related posts about having a robots.txt file and have a basic understanding of what it does. I've even seen googleguy suggest I have one but I want to make sure I get it right. Things are going just fine on my site it seems without it but if the bot wants it I want to make sure I'm giving it the right text. Ok, I don't want to ban any bots nor stop them from visiting any files/folders. My site is completely open. This is the text file I have made: Is this correct? Sure don't want to inadvertently stop google from spidering my site.

User-agent: *
Disallow:

I ran this on a simulator and it came back with a 200 OK message or something close to that.

K, have your laughs at my expense, I don't mind. It seems this time of the month everyone needs something or someone to laugh at :)

Sinner_G

7:31 am on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Looks right to me if you want all bots to index all of your files. And as you noticed yourself, Googleguy said to have a robots.txt, so do it (that's what I did, didn't have one previous to GG's post).

matuloo

7:36 pm on Nov 27, 2002 (gmt 0)

10+ Year Member



Actually this is a response I received from googleguy when I was asking about the robots.txt file :

The robots.txt proposal recommends (but does not require) that a 403 response to the robots.txt should mean "stay out of this site completely." We followed that behavior until the last couple months. However, we saw several users (and at least one ISP) return a 403 as a mistake so often that we changed our behavior for this. It's fine to have no robots.txt file.

Hope that helps, and welcome to WebmasterWorld!
GoogleGuy

jdMorgan

7:52 pm on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



coosblues,

have your laughs at my expense, I don't mind.

No-one is laughing. Some of us might laugh, but certainly not at someone who has the good sense to ask a reasonable question. You're not a new-new user here, but again, Welcome to WebmasterWorld! We all have to start somewhere - and learn as we go... Enjoy it.

Your robots.txt looks correct, and will allow robots to spider all of your pages. At the same time its presence will prevent your error log from being filled with 404-Not Found errors when robots come calling and asking for robots.txt.

You can validate your robots.txt using the robots.txt validator [searchengineworld.com] for extra assurance.

HTH,
Jim

jdMorgan

7:56 pm on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



matuloo,

Was your question about a missing robots.txt, or about a 403-Forbidden server response? Or did GoogleGuy confuse a 404-Not Found with a 403-Forbidden response?

The quote you posted is ambiguous because it says "403", rather than "404", and could confuse readers of this thread.

Thanks,
Jim

ciml

8:09 pm on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim, I don't think that matuloo's confused. 403 Forbidden is issued by quite a lot of servers where 404 Not Found should be issued.

Google (rightly) took that to mean "don't crawl the site", but they realised that most 403s for /robots.txt are mistakes, so they changed their behavior quite recently.

coosblues should be fine, with /robots.txt returning 200 Found and the following content.

User-agent: *
Disallow:

jdMorgan

8:22 pm on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ciml,

I didn't think matuloo was confused, just that GG's quoted answer was confusing. And since the issue of mis-configured servers wasn't mentioned, even moreso.

Thanks,
Jim

ciml

8:30 pm on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, I thought it was the 403/404 thing. It's not been discussed much, maybe because most people here have an idea about that their servers do.

jimh009

9:42 pm on Nov 27, 2002 (gmt 0)

10+ Year Member



Hi coosblues,

I just uploaded the exact same robots.txt file to my site about a month ago, with some trepidation. Google, FAST and AV other spiders all handle it just fine. Long as your seeing "200" in the log files, all should be good to go.

ibpotter

9:51 pm on Nov 27, 2002 (gmt 0)

10+ Year Member



Hi,

I have been trying out Funnel Web Profiler. It would not follow my site because of "Not allowd by Robot.txt". My Robots.txt is

User-agent: *
Disallow:

Why is that? Once I removed it there was no problem. My Q is - if it allows all to follow how come it stoppede Funnel Web Profiler, even when I made it simulate Mozilla/4.0?

It has me worried.

jdMorgan

9:58 pm on Nov 27, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ibpotter,

If your robots.txt validates (see link above), then its a bug in Profiler - Your directives look correct.

Jim

coosblues

2:11 am on Nov 28, 2002 (gmt 0)

10+ Year Member



Thanks to you all for your response and an extra thanks to webmasterworld for providing such a wonderful forum. Happy Holidays to you all :)

matuloo

1:50 pm on Nov 28, 2002 (gmt 0)

10+ Year Member



Whats confusing about my post?

Googleguy clearly stated that some servers return 403 error instead of 404, thus the people at google decided to ignore it and not take the 403 error as a "stay out" order.

But that was not what I actually wanted to emphasize, my goal was to point your attention to the sentence that followed :"It's fine to have no robots.txt file."

I posted it because coosblues posted that googleguy adviced to have the file, but on the other hand in a post I made googleguy responded that it is fine not to have that file :) So now I am confused - to have the robots file or not to have it? :)

Harley_m

11:49 pm on Nov 28, 2002 (gmt 0)

10+ Year Member



surely that robots.txt isnt write - as all characters should be lower case - does that in practice make nay difference?

jdMorgan

1:12 am on Nov 29, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Harley_m,
The robots.txt posted by coosblues is exactly right. The capitalization employed conforms to the robots.txt standard [searchengineworld.com].

matuloo,
Googleguy's response was potentially confusing because the 403 discussion was off-topic, that's all. The bit about not needing a robots.txt was overshadowed by the longer statement about 403 vs. 404, IMHO.

---

If you have no robots.txt, your site's error log file may accumulate lots of 404-Not Found errors from robots trying to fetch robots.txt. If you don't care about having to sort through those errors while looking for real errors (such as those caused by internal and external broken links) and you want to let all robots spider all pages on your site, then it is absolutely OK to not have a robots.txt file on your site.

If you want to prevent the 404 errors in your log file, the next alternative is to upload or create a blank robots.txt file to/on your server. This prevents the 404 errors. Also, since the robots can't find anything in the blank file to tell them not to spider anything on your site, they will assume that they are welcome to full access.

If you'd like to keep all spiders out of some pages of your site, or some spiders out of all pages of your site, or some spiders out of some pages of your site, then write a robots.txt file conforming to the robots exclusion standard, and validate it [searchengineworld.com] before publishing it.

Jim