homepage Welcome to WebmasterWorld Guest from 54.146.190.193
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Do I need a robots.txt file for Googlebot to read?
Will I get indexed by Google if I don't have a robots.txt?
johnd

10+ Year Member



 
Msg#: 113 posted 1:12 am on Jun 17, 2002 (gmt 0)

googlebot hit my page a couple days ago and requested the robots.txt file. I don't have it.

does it have to be there? what if it's not?

thanks

 

SmallTime

10+ Year Member



 
Msg#: 113 posted 1:16 am on Jun 17, 2002 (gmt 0)

Hi and welcome to WWM. robots.txt does not have to be there, but it is a very good idea. See [webmasterworld.com...]

johnd

10+ Year Member



 
Msg#: 113 posted 1:47 am on Jun 17, 2002 (gmt 0)

thanks for your reply. let me make sure I get this right. i want to make sure that googlebot (and other spiders) index my site.

is it better to put a robots.txt file on my server or not to put it?

SmallTime

10+ Year Member



 
Msg#: 113 posted 2:02 am on Jun 17, 2002 (gmt 0)

Much better to put it there, but make sure that you have the syntax right.

Mardi_Gras

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 113 posted 2:03 am on Jun 17, 2002 (gmt 0)

Johnd - there are a few threads on this subject here. In addition to the one cited by SmallTime, check out:

[webmasterworld.com...]
[webmasterworld.com...]

Marcia

WebmasterWorld Senior Member marcia us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 113 posted 2:40 am on Jun 17, 2002 (gmt 0)

It's worth repeating - even having a blank page named robots.txt will avoid a lot of 404s.

Mardi_Gras

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 113 posted 2:48 am on Jun 17, 2002 (gmt 0)

That's the reason I finally put one up, Marcia - and, just as you suggested, it is blank. Maybe one day I'll get around to excluding someone!

deejay

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 113 posted 2:52 am on Jun 17, 2002 (gmt 0)

The mention of blank robots.txt files is giving me the heebies...

I've read more than once never to use a blank robots.txt as some spiders will interpret it as a 'disallow all'.

Have I been led up the garden path?

Mardi_Gras

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 113 posted 3:04 am on Jun 17, 2002 (gmt 0)

deejay - all a robots.txt file does is tell robots where they cannot go, and there is a standard (robot exclusion standard) for the language to be used. A blank page does not convey any "disallow" information; it just stops a 404 error from showing up in your logs.

johnd

10+ Year Member



 
Msg#: 113 posted 8:43 am on Jun 17, 2002 (gmt 0)

there seems to be no one answer to the question of robots.txt

it wouldn't be logical if not having one would be interpreted as disallow all, because a large number of web sites would never get a chance to be spidered (people who never heard of robots.txt - personal home pages etc would never make it to the web). it doesn't mean it's not true on the other hand.

nutsandbolts

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 113 posted 8:48 am on Jun 17, 2002 (gmt 0)

there seems to be no one answer to the question of robots.txt

There is one answer ;) Just put a blank robots.txt file - and everything will be crawled fine. And because we are in the Google forum, the most amazingly witty, funny and intelligent GoogleGuy told us to do it!

[edited by: nutsandbolts at 10:42 am (utc) on June 17, 2002]

johnd

10+ Year Member



 
Msg#: 113 posted 8:55 am on Jun 17, 2002 (gmt 0)

GoogleGuy told us to do it!

then, I'll do it :)

bill

WebmasterWorld Administrator bill us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 113 posted 9:54 am on Jun 17, 2002 (gmt 0)

Here's the GoogleGuy robots.txt thread [webmasterworld.com] for those interested...

deejay

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 113 posted 10:33 am on Jun 17, 2002 (gmt 0)

*scrawling on yet another post-it and slapping it onto a corner of the screen* ok.. ya got me.. I'll put one up.

Thanks for the responses :)

I am ever more impressed with this place.

johnd

10+ Year Member



 
Msg#: 113 posted 11:32 am on Jun 17, 2002 (gmt 0)

I am definitely impressed with this forum :)

wangdy

10+ Year Member



 
Msg#: 113 posted 6:13 pm on Jun 17, 2002 (gmt 0)

Hey everyone...
along the same lines of this thread... where does the robots.txt need to reside?

Does that mean that one .txt is recommended for each URL/server, etc?

SmallTime

10+ Year Member



 
Msg#: 113 posted 6:46 pm on Jun 17, 2002 (gmt 0)

In the root directory of the website (same place as your initial index page)

Mardi_Gras

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 113 posted 6:49 pm on Jun 17, 2002 (gmt 0)

To check your robots.txt, use Brett's handy validator:

[searchengineworld.com...]

josh

10+ Year Member



 
Msg#: 113 posted 1:29 pm on Jun 18, 2002 (gmt 0)

just curious, should the (blank) robots.txt be uploaded to the root such as www.domainname.com/robots.txt or where the site is www.domainname.com/mysite/robots.txt or both?

may seem like a stupid question, but it's been itching. thanks!

agerhart

WebmasterWorld Senior Member agerhart us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 113 posted 1:35 pm on Jun 18, 2002 (gmt 0)

The robots.txt should be in the root directory. From this spot in the root, you will define what the robots can and can not access. If you want them to reach a folder within the other sub-section of your site, allow access, if not, deny the robot access.

josh

10+ Year Member



 
Msg#: 113 posted 3:41 pm on Jun 18, 2002 (gmt 0)

thanks agerhart... so a blank robots.txt will just allow all access?

agerhart

WebmasterWorld Senior Member agerhart us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 113 posted 3:45 pm on Jun 18, 2002 (gmt 0)

It shouldn't be blank, but you shouldn't disallow access if you want them to roam free.

johnd

10+ Year Member



 
Msg#: 113 posted 8:33 pm on Jun 18, 2002 (gmt 0)

ok, so now it shouldn't be blank.

what do I put in there then?

"allow all" or what?

Mardi_Gras

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 113 posted 8:41 pm on Jun 18, 2002 (gmt 0)

johnd - with all due respect to agerhart, I don't think it matters if it is blank :) But Brett has an article that will tell you everything you need to know about preparing a properly-functioning robots.txt at:

[searchengineworld.com...]

agerhart

WebmasterWorld Senior Member agerhart us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 113 posted 8:52 pm on Jun 18, 2002 (gmt 0)

I think I may have not clarified what I meant, and I think it comes down to the way that you like to set yours up.

I always list out all of the important spiders and robots in my robots.txt and then specify if they have full access, partial access, or no access at all.

In my opinion, this makes it easier to change it in the future.

johnd

10+ Year Member



 
Msg#: 113 posted 9:41 pm on Jun 18, 2002 (gmt 0)

ok, cool. I want all spiders to spider everything they want, and have a good time on my sites

I'll leave them a blank robots.txt :)

Mardi_Gras

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 113 posted 10:03 pm on Jun 18, 2002 (gmt 0)

Or just drop in:

User-agent: *
Disallow:

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved