Welcome to WebmasterWorld Guest from 54.198.87.238

Forum Moderators: goodroi

Message Too Old, No Replies

Do I need a robots.txt file for Googlebot to read?

Will I get indexed by Google if I don't have a robots.txt?

     
1:12 am on Jun 17, 2002 (gmt 0)

10+ Year Member



googlebot hit my page a couple days ago and requested the robots.txt file. I don't have it.

does it have to be there? what if it's not?

thanks

1:16 am on Jun 17, 2002 (gmt 0)

10+ Year Member



Hi and welcome to WWM. robots.txt does not have to be there, but it is a very good idea. See [webmasterworld.com...]
1:47 am on Jun 17, 2002 (gmt 0)

10+ Year Member



thanks for your reply. let me make sure I get this right. i want to make sure that googlebot (and other spiders) index my site.

is it better to put a robots.txt file on my server or not to put it?

2:02 am on Jun 17, 2002 (gmt 0)

10+ Year Member



Much better to put it there, but make sure that you have the syntax right.
2:03 am on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Johnd - there are a few threads on this subject here. In addition to the one cited by SmallTime, check out:

[webmasterworld.com...]
[webmasterworld.com...]

2:40 am on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It's worth repeating - even having a blank page named robots.txt will avoid a lot of 404s.
2:48 am on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's the reason I finally put one up, Marcia - and, just as you suggested, it is blank. Maybe one day I'll get around to excluding someone!
2:52 am on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The mention of blank robots.txt files is giving me the heebies...

I've read more than once never to use a blank robots.txt as some spiders will interpret it as a 'disallow all'.

Have I been led up the garden path?

3:04 am on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



deejay - all a robots.txt file does is tell robots where they cannot go, and there is a standard (robot exclusion standard) for the language to be used. A blank page does not convey any "disallow" information; it just stops a 404 error from showing up in your logs.
8:43 am on Jun 17, 2002 (gmt 0)

10+ Year Member



there seems to be no one answer to the question of robots.txt

it wouldn't be logical if not having one would be interpreted as disallow all, because a large number of web sites would never get a chance to be spidered (people who never heard of robots.txt - personal home pages etc would never make it to the web). it doesn't mean it's not true on the other hand.

8:48 am on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



there seems to be no one answer to the question of robots.txt

There is one answer ;) Just put a blank robots.txt file - and everything will be crawled fine. And because we are in the Google forum, the most amazingly witty, funny and intelligent GoogleGuy told us to do it!

[edited by: nutsandbolts at 10:42 am (utc) on June 17, 2002]

8:55 am on Jun 17, 2002 (gmt 0)

10+ Year Member



GoogleGuy told us to do it!

then, I'll do it :)

9:54 am on Jun 17, 2002 (gmt 0)

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Here's the GoogleGuy robots.txt thread [webmasterworld.com] for those interested...
10:33 am on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



*scrawling on yet another post-it and slapping it onto a corner of the screen* ok.. ya got me.. I'll put one up.

Thanks for the responses :)

I am ever more impressed with this place.

11:32 am on Jun 17, 2002 (gmt 0)

10+ Year Member



I am definitely impressed with this forum :)
6:13 pm on Jun 17, 2002 (gmt 0)

10+ Year Member



Hey everyone...
along the same lines of this thread... where does the robots.txt need to reside?

Does that mean that one .txt is recommended for each URL/server, etc?

6:46 pm on Jun 17, 2002 (gmt 0)

10+ Year Member



In the root directory of the website (same place as your initial index page)
6:49 pm on Jun 17, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To check your robots.txt, use Brett's handy validator:

[searchengineworld.com...]

1:29 pm on Jun 18, 2002 (gmt 0)

10+ Year Member



just curious, should the (blank) robots.txt be uploaded to the root such as www.domainname.com/robots.txt or where the site is www.domainname.com/mysite/robots.txt or both?

may seem like a stupid question, but it's been itching. thanks!

1:35 pm on Jun 18, 2002 (gmt 0)

WebmasterWorld Senior Member agerhart is a WebmasterWorld Top Contributor of All Time 10+ Year Member



The robots.txt should be in the root directory. From this spot in the root, you will define what the robots can and can not access. If you want them to reach a folder within the other sub-section of your site, allow access, if not, deny the robot access.
3:41 pm on Jun 18, 2002 (gmt 0)

10+ Year Member



thanks agerhart... so a blank robots.txt will just allow all access?
3:45 pm on Jun 18, 2002 (gmt 0)

WebmasterWorld Senior Member agerhart is a WebmasterWorld Top Contributor of All Time 10+ Year Member



It shouldn't be blank, but you shouldn't disallow access if you want them to roam free.
8:33 pm on Jun 18, 2002 (gmt 0)

10+ Year Member



ok, so now it shouldn't be blank.

what do I put in there then?

"allow all" or what?

8:41 pm on Jun 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



johnd - with all due respect to agerhart, I don't think it matters if it is blank :) But Brett has an article that will tell you everything you need to know about preparing a properly-functioning robots.txt at:

[searchengineworld.com...]

8:52 pm on Jun 18, 2002 (gmt 0)

WebmasterWorld Senior Member agerhart is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I think I may have not clarified what I meant, and I think it comes down to the way that you like to set yours up.

I always list out all of the important spiders and robots in my robots.txt and then specify if they have full access, partial access, or no access at all.

In my opinion, this makes it easier to change it in the future.

9:41 pm on Jun 18, 2002 (gmt 0)

10+ Year Member



ok, cool. I want all spiders to spider everything they want, and have a good time on my sites

I'll leave them a blank robots.txt :)

10:03 pm on Jun 18, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Or just drop in:

User-agent: *
Disallow:

 

Featured Threads

Hot Threads This Week

Hot Threads This Month