Welcome to WebmasterWorld Guest from 54.167.83.224

Forum Moderators: goodroi

Message Too Old, No Replies

Do we need robots.txt?

     
9:54 pm on Jun 5, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 2, 2002
posts:1099
votes: 0


I realize I don't have a robots.txt on my site. Is it important to have that file in the root directory?
9:56 pm on June 5, 2002 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator brotherhood_of_lan is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 30, 2002
posts:4842
votes: 1


Hi irock,

the short answer is no.

9:57 pm on June 5, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Jan 4, 2002
posts:1068
votes: 0


Not unless you need to issue a specific directive to a spider, usually having to do with excluding certain areas of the site from spidering. If not, then don't worry.
9:58 pm on June 5, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Mar 29, 2002
posts:558
votes: 0


No. The file is only important if you want to tell the spiders and others to do something not normal.
10:13 pm on June 5, 2002 (gmt 0)

Moderator

WebmasterWorld Administrator buckworks is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Dec 9, 2001
posts:5609
votes: 19


On the other hand if spiders look for robots.txt and don't find it, that would trigger your custom error page if you had one. You might save bandwidth (and also take a few errors out of your logs) by having a really basic robots.txt for them to find.
11:37 pm on June 5, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 5, 2002
posts:1378
votes: 0


I disagree with all of you! I think it is important.....
Any feature within a site that makes that site look more complete and competant to a spider has got to be good news. Too many sites have errors and poor html, and having a robot.txt file, even if it has no specific function, will make your site stand out from the crowd and might just give you a bonus point, however small that bonus may be.
1:31 am on June 6, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:May 3, 2002
posts:89
votes: 0


Having a robots.txt file doesn't give you any ranking boost. However if you do have a robots.txt file make sure that it is properly configured, otherwise you might find yourself blocking spiders that you may want to crawl your site..

robots.txt validator:
[searchengineworld.com...]

2:42 am on June 6, 2002 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts:12166
votes: 51


I look at the robots.txt file as an essential part of the package. Its like the keywords meta tag, do ya? or don't ya?

I was tired of seeing all those 404 errors in my logs for the robots.txt file. So, I went on a quest over a year ago to learn everything I could. Now, one of the first things I do, is set up the robots.txt and disallow directories that contain working files, css, javascript and any other content that I don't want indexed.

There have been many conversations on this topic. I've seen comments to the effect that the spider called the robots.txt file, didn't find it, and left without grabbing anything else. Followup comments stated that the spider had not been back since the first call for the robots.txt file. What does this mean? I'm not really sure. Although, I'm one to play it safe. If that robots.txt contains nothing other than...

User-agent: *
Disallow:

...which tells all spiders that they are welcome to index the entire site, then so be it! I kind of looked at it this way...

They came a knockin' and no one was home (no robots.txt file), so they left. They didn't say when they would return so I missed them, that first time (bummer). I've now put the robots.txt in place. They came a knockin' again one month later, I was home and let them in. They got what they came for!

How fond am I of the robots.txt file? Do a search in Google for robots text or robots text file!

12:06 pm on June 6, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


If you have no /robots.txt file, and your server is configured to send 403 "Forbidden" for files that don't exist (a bad move IMO) then you will not be spidered by Google. In that case, you need to fix the server or upload a /robots.txt file to allowing spidering (even just an empty one).
12:10 pm on June 6, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 5, 2002
posts:1510
votes: 0


I'm one the must have boat. Every one of my sites has a robots.txt file. Even though it just tells the spiders that the entire site is open to them. For instance, see below:

User-agent: *
Disallow:

3:42 pm on June 6, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 22, 2001
posts:781
votes: 0


I do the same as Chris_F , always, always, always have a robots.txt.

If you have spent time and money building a site, why not spend the extra ten seconds to add a robots.txt file even if all it does is allow all !

9:21 pm on June 6, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 16, 2001
posts:112
votes: 0


Ciml, could you elaborate on your comment about sending 403's. We are going to stop certain spam sites at the network card level.
7:57 am on June 7, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 5, 2002
posts:142
votes: 0


You can even create an all empty text file, call it robtos.txt and make sure it's in the root (as www.yoursite.com/robots.txt). Right, the file could even be completely empty. Spiders will look at it as if it is okay to spider the whole site. But what's said above is right, and will invite spiders to spider your whole site. No 404's anymore.
4:44 pm on June 7, 2002 (gmt 0)

New User

10+ Year Member

joined:Apr 5, 2002
posts:29
votes: 0


We've never used robots.txt files on any of our sites and have never had a problem. Our pages that don't have incoming links are never listed anyway and we have no complaints about being overlooked during regular updates. Frankly, I can't see the use of these files, except maybe in very unusual circumstances.
1:57 pm on June 8, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


willtell, the 403 problem happens when the server default is set to use 403 forbidden for filed that are not found, instead of 404 not found.

It isn't very common, but I have been seeing it quite a lot over the last few months. I don't know whether this is done by server admin's trying to be more secure or if it's the default for some kind of Apache set-up.

The solution is easy, just upload a blank /robots.txt as DrOliver suggests.

8:30 pm on June 9, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member googleguy is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Oct 8, 2001
posts:2882
votes: 0


What ciml said. Best not to have a 403 returned if someone tries to fetch your robots.txt file.
1:26 am on June 10, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 9, 2002
posts:426
votes: 0


It is advisable to have your robots.txt to block /java and /cgi-bin? This is what I do and havent caught on to any problem.
8:30 am on June 10, 2002 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38047
votes: 11


I realize I don't have a robots.txt on my site. Is it important to have that file in the root directory?

As others have mentioned, no. A robots.txt is for the blocking of robots that obey the standard.

Some leave it intentionally missing so that it will go 404 and show in error logs. That way, you can identify obeying spiders easy enough.

Although this is common and acceptable to standard:

User-agent: *
Disallow:

I wouldn't recommend it. There are some spiders that will incorrectly interpret that as blocking all content.

Ya jady, block anything you think is sensitive. I think cgi-bin and java would qualify for that.

8:40 am on June 10, 2002 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 27, 2001
posts:12166
votes: 51


User-agent: *
Disallow:

> I wouldn't recommend it. There are some spiders that will incorrectly interpret that as blocking all content.

Ouch! I guess I need to rewrite all my instructions on creating a robots text file. Brett, do you know which spiders misinterpret the above?

rrl

11:48 am on June 10, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 25, 2002
posts:127
votes: 0


Funny, I tried that robots.txt validator and it just told me every line of my site is invalid. I haven't found one of these validators to work yet.
12:17 am on June 11, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Feb 9, 2002
posts:426
votes: 0


Thanks (yet again) Brett.. :)

For the other guys - I go with what people are posting in the forum. If you can avoid a 404 error page - you will be better off..

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members