| 1:16 am on Jun 17, 2002 (gmt 0)|
Hi and welcome to WWM. robots.txt does not have to be there, but it is a very good idea. See [webmasterworld.com...]
| 1:47 am on Jun 17, 2002 (gmt 0)|
thanks for your reply. let me make sure I get this right. i want to make sure that googlebot (and other spiders) index my site.
is it better to put a robots.txt file on my server or not to put it?
| 2:02 am on Jun 17, 2002 (gmt 0)|
Much better to put it there, but make sure that you have the syntax right.
| 2:03 am on Jun 17, 2002 (gmt 0)|
Johnd - there are a few threads on this subject here. In addition to the one cited by SmallTime, check out:
| 2:40 am on Jun 17, 2002 (gmt 0)|
It's worth repeating - even having a blank page named robots.txt will avoid a lot of 404s.
| 2:48 am on Jun 17, 2002 (gmt 0)|
That's the reason I finally put one up, Marcia - and, just as you suggested, it is blank. Maybe one day I'll get around to excluding someone!
| 2:52 am on Jun 17, 2002 (gmt 0)|
The mention of blank robots.txt files is giving me the heebies...
I've read more than once never to use a blank robots.txt as some spiders will interpret it as a 'disallow all'.
Have I been led up the garden path?
| 3:04 am on Jun 17, 2002 (gmt 0)|
deejay - all a robots.txt file does is tell robots where they cannot go, and there is a standard (robot exclusion standard) for the language to be used. A blank page does not convey any "disallow" information; it just stops a 404 error from showing up in your logs.
| 8:43 am on Jun 17, 2002 (gmt 0)|
there seems to be no one answer to the question of robots.txt
it wouldn't be logical if not having one would be interpreted as disallow all, because a large number of web sites would never get a chance to be spidered (people who never heard of robots.txt - personal home pages etc would never make it to the web). it doesn't mean it's not true on the other hand.
| 8:48 am on Jun 17, 2002 (gmt 0)|
|there seems to be no one answer to the question of robots.txt |
There is one answer ;) Just put a blank robots.txt file - and everything will be crawled fine. And because we are in the Google forum, the most amazingly witty, funny and intelligent GoogleGuy told us to do it!
[edited by: nutsandbolts at 10:42 am (utc) on June 17, 2002]
| 8:55 am on Jun 17, 2002 (gmt 0)|
|GoogleGuy told us to do it! |
then, I'll do it :)
| 9:54 am on Jun 17, 2002 (gmt 0)|
Here's the GoogleGuy robots.txt thread [webmasterworld.com] for those interested...
| 10:33 am on Jun 17, 2002 (gmt 0)|
*scrawling on yet another post-it and slapping it onto a corner of the screen* ok.. ya got me.. I'll put one up.
Thanks for the responses :)
I am ever more impressed with this place.
| 11:32 am on Jun 17, 2002 (gmt 0)|
I am definitely impressed with this forum :)
| 6:13 pm on Jun 17, 2002 (gmt 0)|
along the same lines of this thread... where does the robots.txt need to reside?
Does that mean that one .txt is recommended for each URL/server, etc?
| 6:46 pm on Jun 17, 2002 (gmt 0)|
In the root directory of the website (same place as your initial index page)
| 6:49 pm on Jun 17, 2002 (gmt 0)|
To check your robots.txt, use Brett's handy validator:
| 1:29 pm on Jun 18, 2002 (gmt 0)|
just curious, should the (blank) robots.txt be uploaded to the root such as www.domainname.com/robots.txt or where the site is www.domainname.com/mysite/robots.txt or both?
may seem like a stupid question, but it's been itching. thanks!
| 1:35 pm on Jun 18, 2002 (gmt 0)|
The robots.txt should be in the root directory. From this spot in the root, you will define what the robots can and can not access. If you want them to reach a folder within the other sub-section of your site, allow access, if not, deny the robot access.
| 3:41 pm on Jun 18, 2002 (gmt 0)|
thanks agerhart... so a blank robots.txt will just allow all access?
| 3:45 pm on Jun 18, 2002 (gmt 0)|
It shouldn't be blank, but you shouldn't disallow access if you want them to roam free.
| 8:33 pm on Jun 18, 2002 (gmt 0)|
ok, so now it shouldn't be blank.
what do I put in there then?
"allow all" or what?
| 8:41 pm on Jun 18, 2002 (gmt 0)|
johnd - with all due respect to agerhart, I don't think it matters if it is blank :) But Brett has an article that will tell you everything you need to know about preparing a properly-functioning robots.txt at:
| 8:52 pm on Jun 18, 2002 (gmt 0)|
I think I may have not clarified what I meant, and I think it comes down to the way that you like to set yours up.
I always list out all of the important spiders and robots in my robots.txt and then specify if they have full access, partial access, or no access at all.
In my opinion, this makes it easier to change it in the future.
| 9:41 pm on Jun 18, 2002 (gmt 0)|
ok, cool. I want all spiders to spider everything they want, and have a good time on my sites
I'll leave them a blank robots.txt :)
| 10:03 pm on Jun 18, 2002 (gmt 0)|
Or just drop in: