Welcome to WebmasterWorld Guest from 54.160.131.144

Forum Moderators: goodroi

Message Too Old, No Replies

Do we need robots.txt?

     

irock

9:54 pm on Jun 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I realize I don't have a robots.txt on my site. Is it important to have that file in the root directory?

brotherhood of LAN

9:56 pm on Jun 5, 2002 (gmt 0)

WebmasterWorld Administrator brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Hi irock,

the short answer is no.

Beachboy

9:57 pm on Jun 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not unless you need to issue a specific directive to a spider, usually having to do with excluding certain areas of the site from spidering. If not, then don't worry.

hurlimann

9:58 pm on Jun 5, 2002 (gmt 0)

10+ Year Member



No. The file is only important if you want to tell the spiders and others to do something not normal.

buckworks

10:13 pm on Jun 5, 2002 (gmt 0)

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member



On the other hand if spiders look for robots.txt and don't find it, that would trigger your custom error page if you had one. You might save bandwidth (and also take a few errors out of your logs) by having a really basic robots.txt for them to find.

MHes

11:37 pm on Jun 5, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I disagree with all of you! I think it is important.....
Any feature within a site that makes that site look more complete and competant to a spider has got to be good news. Too many sites have errors and poor html, and having a robot.txt file, even if it has no specific function, will make your site stand out from the crowd and might just give you a bonus point, however small that bonus may be.

ferrari360

1:31 am on Jun 6, 2002 (gmt 0)

10+ Year Member



Having a robots.txt file doesn't give you any ranking boost. However if you do have a robots.txt file make sure that it is properly configured, otherwise you might find yourself blocking spiders that you may want to crawl your site..

robots.txt validator:
[searchengineworld.com...]

pageoneresults

2:42 am on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



I look at the robots.txt file as an essential part of the package. Its like the keywords meta tag, do ya? or don't ya?

I was tired of seeing all those 404 errors in my logs for the robots.txt file. So, I went on a quest over a year ago to learn everything I could. Now, one of the first things I do, is set up the robots.txt and disallow directories that contain working files, css, javascript and any other content that I don't want indexed.

There have been many conversations on this topic. I've seen comments to the effect that the spider called the robots.txt file, didn't find it, and left without grabbing anything else. Followup comments stated that the spider had not been back since the first call for the robots.txt file. What does this mean? I'm not really sure. Although, I'm one to play it safe. If that robots.txt contains nothing other than...

User-agent: *
Disallow:

...which tells all spiders that they are welcome to index the entire site, then so be it! I kind of looked at it this way...

They came a knockin' and no one was home (no robots.txt file), so they left. They didn't say when they would return so I missed them, that first time (bummer). I've now put the robots.txt in place. They came a knockin' again one month later, I was home and let them in. They got what they came for!

How fond am I of the robots.txt file? Do a search in Google for robots text or robots text file!

ciml

12:06 pm on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member



If you have no /robots.txt file, and your server is configured to send 403 "Forbidden" for files that don't exist (a bad move IMO) then you will not be spidered by Google. In that case, you need to fix the server or upload a /robots.txt file to allowing spidering (even just an empty one).

chris_f

12:10 pm on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm one the must have boat. Every one of my sites has a robots.txt file. Even though it just tells the spiders that the entire site is open to them. For instance, see below:

User-agent: *
Disallow:

conor

3:42 pm on Jun 6, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I do the same as Chris_F , always, always, always have a robots.txt.

If you have spent time and money building a site, why not spend the extra ten seconds to add a robots.txt file even if all it does is allow all !

willtell

9:21 pm on Jun 6, 2002 (gmt 0)

10+ Year Member



Ciml, could you elaborate on your comment about sending 403's. We are going to stop certain spam sites at the network card level.

DrOliver

7:57 am on Jun 7, 2002 (gmt 0)

10+ Year Member



You can even create an all empty text file, call it robtos.txt and make sure it's in the root (as www.yoursite.com/robots.txt). Right, the file could even be completely empty. Spiders will look at it as if it is okay to spider the whole site. But what's said above is right, and will invite spiders to spider your whole site. No 404's anymore.

NotNervous

4:44 pm on Jun 7, 2002 (gmt 0)

10+ Year Member



We've never used robots.txt files on any of our sites and have never had a problem. Our pages that don't have incoming links are never listed anyway and we have no complaints about being overlooked during regular updates. Frankly, I can't see the use of these files, except maybe in very unusual circumstances.

ciml

1:57 pm on Jun 8, 2002 (gmt 0)

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member



willtell, the 403 problem happens when the server default is set to use 403 forbidden for filed that are not found, instead of 404 not found.

It isn't very common, but I have been seeing it quite a lot over the last few months. I don't know whether this is done by server admin's trying to be more secure or if it's the default for some kind of Apache set-up.

The solution is easy, just upload a blank /robots.txt as DrOliver suggests.

GoogleGuy

8:30 pm on Jun 9, 2002 (gmt 0)

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member



What ciml said. Best not to have a 403 returned if someone tries to fetch your robots.txt file.

jady

1:26 am on Jun 10, 2002 (gmt 0)

10+ Year Member



It is advisable to have your robots.txt to block /java and /cgi-bin? This is what I do and havent caught on to any problem.

Brett_Tabke

8:30 am on Jun 10, 2002 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I realize I don't have a robots.txt on my site. Is it important to have that file in the root directory?

As others have mentioned, no. A robots.txt is for the blocking of robots that obey the standard.

Some leave it intentionally missing so that it will go 404 and show in error logs. That way, you can identify obeying spiders easy enough.

Although this is common and acceptable to standard:

User-agent: *
Disallow:

I wouldn't recommend it. There are some spiders that will incorrectly interpret that as blocking all content.

Ya jady, block anything you think is sensitive. I think cgi-bin and java would qualify for that.

pageoneresults

8:40 am on Jun 10, 2002 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member



User-agent: *
Disallow:

> I wouldn't recommend it. There are some spiders that will incorrectly interpret that as blocking all content.

Ouch! I guess I need to rewrite all my instructions on creating a robots text file. Brett, do you know which spiders misinterpret the above?

rrl

11:48 am on Jun 10, 2002 (gmt 0)

10+ Year Member



Funny, I tried that robots.txt validator and it just told me every line of my site is invalid. I haven't found one of these validators to work yet.

jady

12:17 am on Jun 11, 2002 (gmt 0)

10+ Year Member



Thanks (yet again) Brett.. :)

For the other guys - I go with what people are posting in the forum. If you can avoid a 404 error page - you will be better off..

 

Featured Threads

Hot Threads This Week

Hot Threads This Month