Forum Moderators: goodroi

Message Too Old, No Replies

Is there a way to block thumbnails in robots.txt?

         

virtualreality

8:19 pm on Sep 14, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



My thumbnails are in different directories so that makes it challenging. Also they all end in _sm.jpg. So is there a way to block images by ending such as _sm.jpg no matter in what directory they are located or if not, is it possible to block images by size, for example if image is less than 5kb block it?

tangor

8:41 pm on Sep 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



robots.txt only provides suggestions to crawlers. It has no teeth to prevent bad actors ignoring those suggestions.

What you want can be done in .htaccess

keyplyr

8:45 pm on Sep 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



First, doing anything via robots.txt will only affect the agents who support robots.txt. So, all the other million or so agents that do not support robots.txt directives, will continue to do what they want, unless managed with alternative solutions.

But if you are only concerned with Google, Bing and Yandex, you could use a wildcard disallow:

Disallow: *_sm.jpg

virtualreality

10:11 pm on Sep 14, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you both for your replies. Is the .htaccess solution better then and if so how can I configure it?

not2easy

10:32 pm on Sep 14, 2017 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



The htaccess solution simply adds a noindex metatag to all files in a folder. Given that you mentioned that the images are in several folders, this robots.txt disallow might serve you better.

Two things to help you decide which way you should go:
1. If you disallow images in robots.txt, they may consider those as "Blocked Resources" and claim that they can't determine whether a page where those Blocked Resources are used is Mobile Friendly or not.

2. If you use the X-Robots header method via .htaccess, you could block all files in that folder from indexing. It would not prevent crawling that folder - but more importantly, rewrite rules in root htaccess file might not execute in those folders with an additional htaccess file.

virtualreality

10:51 pm on Sep 14, 2017 (gmt 0)

10+ Year Member Top Contributors Of The Month



Thank you, not2easy!

lucy24

11:06 pm on Sep 14, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



rewrite rules in root htaccess file might not execute in those folders with an additional htaccess file.

That is: if and only if the additional htaccess file also contains RewriteRules without an "inherit" directive. If the sole content of your supplementary htaccess files is to set a noindex header, there is no risk.