Welcome to WebmasterWorld Guest from 54.167.75.28

Forum Moderators: goodroi

Is there a way to block thumbnails in robots.txt?

     
8:19 pm on Sep 14, 2017 (gmt 0)

Full Member

5+ Year Member

joined:June 26, 2008
posts:265
votes: 5


My thumbnails are in different directories so that makes it challenging. Also they all end in _sm.jpg. So is there a way to block images by ending such as _sm.jpg no matter in what directory they are located or if not, is it possible to block images by size, for example if image is less than 5kb block it?
8:41 pm on Sept 14, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member tangor is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 29, 2005
posts:7995
votes: 578


robots.txt only provides suggestions to crawlers. It has no teeth to prevent bad actors ignoring those suggestions.

What you want can be done in .htaccess
8:45 pm on Sept 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:10629
votes: 630


First, doing anything via robots.txt will only affect the agents who support robots.txt. So, all the other million or so agents that do not support robots.txt directives, will continue to do what they want, unless managed with alternative solutions.

But if you are only concerned with Google, Bing and Yandex, you could use a wildcard disallow:

Disallow: *_sm.jpg
10:11 pm on Sept 14, 2017 (gmt 0)

Full Member

5+ Year Member

joined:June 26, 2008
posts:265
votes: 5


Thank you both for your replies. Is the .htaccess solution better then and if so how can I configure it?
10:32 pm on Sept 14, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:3555
votes: 196


The htaccess solution simply adds a noindex metatag to all files in a folder. Given that you mentioned that the images are in several folders, this robots.txt disallow might serve you better.

Two things to help you decide which way you should go:
1. If you disallow images in robots.txt, they may consider those as "Blocked Resources" and claim that they can't determine whether a page where those Blocked Resources are used is Mobile Friendly or not.

2. If you use the X-Robots header method via .htaccess, you could block all files in that folder from indexing. It would not prevent crawling that folder - but more importantly, rewrite rules in root htaccess file might not execute in those folders with an additional htaccess file.
10:51 pm on Sept 14, 2017 (gmt 0)

Full Member

5+ Year Member

joined:June 26, 2008
posts:265
votes: 5


Thank you, not2easy!
11:06 pm on Sept 14, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:14426
votes: 576


rewrite rules in root htaccess file might not execute in those folders with an additional htaccess file.

That is: if and only if the additional htaccess file also contains RewriteRules without an "inherit" directive. If the sole content of your supplementary htaccess files is to set a noindex header, there is no risk.