Welcome to WebmasterWorld Guest from 54.162.117.84

Forum Moderators: goodroi

Message Too Old, No Replies

Do I need a robots.txt file for Googlebot to read?

Will I get indexed by Google if I don't have a robots.txt?

     
1:12 am on Jun 17, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2002
posts:66
votes: 0


googlebot hit my page a couple days ago and requested the robots.txt file. I don't have it.

does it have to be there? what if it's not?

thanks

1:16 am on June 17, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 20, 2001
posts:478
votes: 0


Hi and welcome to WWM. robots.txt does not have to be there, but it is a very good idea. See [webmasterworld.com...]
1:47 am on June 17, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2002
posts:66
votes: 0


thanks for your reply. let me make sure I get this right. i want to make sure that googlebot (and other spiders) index my site.

is it better to put a robots.txt file on my server or not to put it?

2:02 am on June 17, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 20, 2001
posts:478
votes: 0


Much better to put it there, but make sure that you have the syntax right.
2:03 am on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2002
posts:1422
votes: 0


Johnd - there are a few threads on this subject here. In addition to the one cited by SmallTime, check out:

[webmasterworld.com...]
[webmasterworld.com...]

2:40 am on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member marcia is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Sept 29, 2000
posts:12095
votes: 0


It's worth repeating - even having a blank page named robots.txt will avoid a lot of 404s.
2:48 am on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2002
posts:1422
votes: 0


That's the reason I finally put one up, Marcia - and, just as you suggested, it is blank. Maybe one day I'll get around to excluding someone!
2:52 am on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 9, 2002
posts:861
votes: 0


The mention of blank robots.txt files is giving me the heebies...

I've read more than once never to use a blank robots.txt as some spiders will interpret it as a 'disallow all'.

Have I been led up the garden path?

3:04 am on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2002
posts:1422
votes: 0


deejay - all a robots.txt file does is tell robots where they cannot go, and there is a standard (robot exclusion standard) for the language to be used. A blank page does not convey any "disallow" information; it just stops a 404 error from showing up in your logs.
8:43 am on June 17, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2002
posts:66
votes: 0


there seems to be no one answer to the question of robots.txt

it wouldn't be logical if not having one would be interpreted as disallow all, because a large number of web sites would never get a chance to be spidered (people who never heard of robots.txt - personal home pages etc would never make it to the web). it doesn't mean it's not true on the other hand.

8:48 am on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 22, 2002
posts:959
votes: 0


there seems to be no one answer to the question of robots.txt

There is one answer ;) Just put a blank robots.txt file - and everything will be crawled fine. And because we are in the Google forum, the most amazingly witty, funny and intelligent GoogleGuy told us to do it!

[edited by: nutsandbolts at 10:42 am (utc) on June 17, 2002]

8:55 am on June 17, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2002
posts:66
votes: 0


GoogleGuy told us to do it!

then, I'll do it :)

9:54 am on June 17, 2002 (gmt 0)

Administrator from JP 

WebmasterWorld Administrator bill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Oct 12, 2000
posts:14789
votes: 86


Here's the GoogleGuy robots.txt thread [webmasterworld.com] for those interested...
10:33 am on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 9, 2002
posts:861
votes: 0


*scrawling on yet another post-it and slapping it onto a corner of the screen* ok.. ya got me.. I'll put one up.

Thanks for the responses :)

I am ever more impressed with this place.

11:32 am on June 17, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2002
posts:66
votes: 0


I am definitely impressed with this forum :)
6:13 pm on June 17, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:May 16, 2001
posts:48
votes: 0


Hey everyone...
along the same lines of this thread... where does the robots.txt need to reside?

Does that mean that one .txt is recommended for each URL/server, etc?

6:46 pm on June 17, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Sept 20, 2001
posts:478
votes: 0


In the root directory of the website (same place as your initial index page)
6:49 pm on June 17, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2002
posts:1422
votes: 0


To check your robots.txt, use Brett's handy validator:

[searchengineworld.com...]

1:29 pm on June 18, 2002 (gmt 0)

New User

10+ Year Member

joined:June 18, 2002
posts:4
votes: 0


just curious, should the (blank) robots.txt be uploaded to the root such as www.domainname.com/robots.txt or where the site is www.domainname.com/mysite/robots.txt or both?

may seem like a stupid question, but it's been itching. thanks!

1:35 pm on June 18, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member agerhart is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 29, 2001
posts:2945
votes: 0


The robots.txt should be in the root directory. From this spot in the root, you will define what the robots can and can not access. If you want them to reach a folder within the other sub-section of your site, allow access, if not, deny the robot access.
3:41 pm on June 18, 2002 (gmt 0)

New User

10+ Year Member

joined:June 18, 2002
posts:4
votes: 0


thanks agerhart... so a blank robots.txt will just allow all access?
3:45 pm on June 18, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member agerhart is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 29, 2001
posts:2945
votes: 0


It shouldn't be blank, but you shouldn't disallow access if you want them to roam free.
8:33 pm on June 18, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2002
posts:66
votes: 0


ok, so now it shouldn't be blank.

what do I put in there then?

"allow all" or what?

8:41 pm on June 18, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2002
posts:1422
votes: 0


johnd - with all due respect to agerhart, I don't think it matters if it is blank :) But Brett has an article that will tell you everything you need to know about preparing a properly-functioning robots.txt at:

[searchengineworld.com...]

8:52 pm on June 18, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member agerhart is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 29, 2001
posts:2945
votes: 0


I think I may have not clarified what I meant, and I think it comes down to the way that you like to set yours up.

I always list out all of the important spiders and robots in my robots.txt and then specify if they have full access, partial access, or no access at all.

In my opinion, this makes it easier to change it in the future.

9:41 pm on June 18, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 12, 2002
posts:66
votes: 0


ok, cool. I want all spiders to spider everything they want, and have a good time on my sites

I'll leave them a blank robots.txt :)

10:03 pm on June 18, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2002
posts:1422
votes: 0


Or just drop in:

User-agent: *
Disallow: