|robots.txt to block bots access to images |
got to save bandwidth
| 11:27 am on Feb 14, 2003 (gmt 0)|
I haven't used a robots.txt file before but I am now beginning to see the light.
My website is chewing up rather a lot of bandwidth mainly due to the large amount of images, many of which are quite large in file size.
What I need is a simple robots.txt file which prevents ALL bots from accessing my image directory. Would the following be sufficient?
and if so, great - but would it affect the way Google perceives the website - i.e. would the robots file disadvantage the site in any way. For example, it may be seen as being a smaller site in terms of webspace - and Google likes big websites... (probably being paranoid here).
| 1:58 pm on Feb 14, 2003 (gmt 0)|
The robots.txt code you posted will work - but only for robots which obey robots.txt.
If your problem is "brand-name" robots, like Googlebot, then your approach will work fine. If you are also being hit by many other 'bots with no name or unfamiliar names, then you may need to use a stronger method, such as access blocks by IP address or user-agent. How you do that depends on what server your sites are hosted on.
To save bandwidth, I blocked the majority of images on my sites too, and Google didn't seem to care; I didn't notice any "penalty" for doing so, if that's what you mean. I had a lot of image-based traffic that was not very useful to me or to the visitors - most would look and leave.
| 2:47 pm on Feb 14, 2003 (gmt 0)|
Thanks JD - you've given me a few things to consider.
I suppose my next question is, given that some bots don't recognise (or take notice of) the robots.txt file, what would I achieve from using the file? Would it actually make any significant difference to my bandwidth usage?
| 3:00 pm on Feb 14, 2003 (gmt 0)|
Only a thorough analysis of the log files from your site(s) can answer the question of whether this will help or not.
As I stated, I use robots.txt to block robots which will obey it, and stronger methods for those which won't. A search for "hot-linking", "image blocking" and similar subjects here on WebmasterWorld will turn up quite a few threads which may be useful to you.
On some sites, the impact of SE spiders requesting images is small in relation to overall traffic. On others, it may represent a significant load. Only the webmaster can determine the value of efforts to block image access by spiders.
On my sites where the load is significant, I use mod_rewrite in .htaccess to block robots which do not obey robots.txt, as well as blocking off-site referrers (hot-linking).