homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

robots.txt to block bots access to images
got to save bandwidth

 11:27 am on Feb 14, 2003 (gmt 0)


I haven't used a robots.txt file before but I am now beginning to see the light.

My website is chewing up rather a lot of bandwidth mainly due to the large amount of images, many of which are quite large in file size.

What I need is a simple robots.txt file which prevents ALL bots from accessing my image directory. Would the following be sufficient?

User-agent: *
Disallow: /images/
Disallow: /gfx/

User-agent: Googlebot-Image
Disallow: /

and if so, great - but would it affect the way Google perceives the website - i.e. would the robots file disadvantage the site in any way. For example, it may be seen as being a smaller site in terms of webspace - and Google likes big websites... (probably being paranoid here).




 1:58 pm on Feb 14, 2003 (gmt 0)


The robots.txt code you posted will work - but only for robots which obey robots.txt.

If your problem is "brand-name" robots, like Googlebot, then your approach will work fine. If you are also being hit by many other 'bots with no name or unfamiliar names, then you may need to use a stronger method, such as access blocks by IP address or user-agent. How you do that depends on what server your sites are hosted on.

To save bandwidth, I blocked the majority of images on my sites too, and Google didn't seem to care; I didn't notice any "penalty" for doing so, if that's what you mean. I had a lot of image-based traffic that was not very useful to me or to the visitors - most would look and leave.



 2:47 pm on Feb 14, 2003 (gmt 0)

Thanks JD - you've given me a few things to consider.

I suppose my next question is, given that some bots don't recognise (or take notice of) the robots.txt file, what would I achieve from using the file? Would it actually make any significant difference to my bandwidth usage?


 3:00 pm on Feb 14, 2003 (gmt 0)


Only a thorough analysis of the log files from your site(s) can answer the question of whether this will help or not.

As I stated, I use robots.txt to block robots which will obey it, and stronger methods for those which won't. A search for "hot-linking", "image blocking" and similar subjects here on WebmasterWorld will turn up quite a few threads which may be useful to you.

On some sites, the impact of SE spiders requesting images is small in relation to overall traffic. On others, it may represent a significant load. Only the webmaster can determine the value of efforts to block image access by spiders.

On my sites where the load is significant, I use mod_rewrite in .htaccess to block robots which do not obey robots.txt, as well as blocking off-site referrers (hot-linking).


Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved