homepage Welcome to WebmasterWorld Guest from 23.22.173.58
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Is robots.txt always needed?
robots.txt
benwalsh

10+ Year Member



 
Msg#: 457 posted 2:17 am on Oct 3, 2004 (gmt 0)

should I have a robots.txt file of some description, even though I do not wish to exclude any files

is the simplest

User-agent: *
Disallow:

I would just like to use some thing like index or archive all pages, but can find no valid postive values for this file all standards I read are to do with exclusion.

I simply wish to return a robots.txt to tell all bots especially googebot that they are welcome to spider my entire site, in preference to returning my custom 404 page.

[edited by: Woz at 3:06 am (utc) on Oct. 3, 2004]
[edit reason] No URLs please, see TOS#13 [/edit]

 

hurlimann

10+ Year Member



 
Msg#: 457 posted 3:07 am on Oct 3, 2004 (gmt 0)

You don't need a robots.txt from what you say. I would advise you don't have one just in case you make a mistake in it. That said if you really want one you could have a blank robots.txt file or one with just this:

User-agent: *
Disallow:

While you say you don't want to exclude some things it is a good idea to exclude a load of bots that will come along and cause pain: You can use this sites robot.txt, delete any bots you do want to crawl and delete the end section which is site specific

You are right their are no postive values such as "Allow": These are two ways to get round it

To exclude all files except one

The easy way is to put all files to be disallowed into a separate directory, say "docs", and leave the one file in the level above this directory:
User-agent: *
Disallow: /~joe/docs/

Alternatively you can explicitly disallow all disallowed pages:
User-agent: *
Disallow: /~joe/private.html
Disallow: /~joe/foo.html
Disallow: /~joe/bar.html

fourstardragon

10+ Year Member



 
Msg#: 457 posted 4:41 am on Oct 3, 2004 (gmt 0)

"I simply wish to return a robots.txt to tell all bots especially googebot that they are welcome to spider my entire site, in preference to returning my custom 404 page."
I added the minimal robots.txt file to my site just to avoid all the 404 messages in my error log. It serves no useful purpose.

User-agent: *
Disallow:

benwalsh

10+ Year Member



 
Msg#: 457 posted 12:24 am on Oct 4, 2004 (gmt 0)

I see on google

if you wanted to allow all filetypes to be served.... the robots.txt ... would be:

User-Agent: *
Allow: /

[google.com...]

ncw164x

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 457 posted 5:59 am on Oct 4, 2004 (gmt 0)

if you wanted to allow all filetypes to be served

that stops all crawlers form accessing your site

benwalsh

10+ Year Member



 
Msg#: 457 posted 6:39 am on Oct 4, 2004 (gmt 0)

User-agent: *
Disallow:

it is!

ncw164x

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 457 posted 6:54 am on Oct 4, 2004 (gmt 0)

correct, i should of opened my eyes first ;)

piskie

10+ Year Member



 
Msg#: 457 posted 7:37 am on Oct 4, 2004 (gmt 0)

I'm with fourstardragon, it is worth having a robots.txt file purely to cut down on 404s when checking logs.

Leosghost

WebmasterWorld Senior Member leosghost us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 457 posted 12:23 pm on Oct 4, 2004 (gmt 0)

There was time when search engines that couldn't find a robots text wouldn't spider ..last year the ATW had a problem with this ..lasted about six weeks ..I had one site up without a robots text ( not deliberately ..just forgot to write the *^$)* thing ..and had a sort of blindness every time I looked that meant I didn't notice that it was missing ) ATW came at least twice a day ..requested it ( yes eventually I looked at my logs!) couldn't find it ..went away emptyhanded so to speak ...

In case such things may happen elsewhere ...better to have than to have not

benwalsh

10+ Year Member



 
Msg#: 457 posted 12:38 pm on Oct 4, 2004 (gmt 0)

Leosghost you forgot to include your robots.txt, what do you use when you wish to have all files spidered.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved