Page is a not externally linkable
incrediBILL - 1:32 am on Sep 9, 2012 (gmt 0)
There appears to be some interest in blocking Pinterest's crowd source scraping so here's some simple ways to stop Pinterest's PinIT from functioning.
When you pin something you'll notice that first a HEAD check is made to your page to verify access before an actual PIN occurs.
You'll see something like this in your log file:
50.16.149.229 - - [09/Sep/2012:--:--:-- +0000] "HEAD / HTTP/1.1" 403 883 "-" "Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.45 Safari/535.19"
Note that my server said "403" which means FORBIDDEN yet it grabbed the images anyway.
Then they download the image two times every time.
23.22.226.192 - - [09/Sep/2012:--:--:-- +0000] "GET /scrapeit.jpg HTTP/1.1" 200 8743 "-" "Pinterest/0.1 +http://pinterest.com/"
23.22.226.192 - - [09/Sep/2012:--:--:-- +0000] "GET /scrapeit.jpg HTTP/1.1" 200 8743 "-" "Pinterest/0.1 +http://pinterest.com/"
Must have really bad software if they can't catch it the first time, who knows.
Disabling Pinterest can be done multiple ways.
1. Blocking all the Amazon Web Services (AWS) IP ranges in .htaccess
There's some comprehensive lists of Amazon's IP ranges in the Spider forum:
[webmasterworld.com...]
Block AWS like this in .htaccess:
<Limit GET POST>
order allow,deny
deny from 23.20.0.0/14
deny from 50.16.0.0/15
deny from etc.
allow from all
</Limit>
2. Blocking Pinterest by user agent in your .htaccess file.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Pinterest [NC]
RewriteRule !^robots\.txt$ - [F]
DO BOTH just in case Pinterest starts using a different cloud service, assuming they continue using their easily identifiable user agent, you'll be covered.
3. You might also want to install the meta tags just in case to make sure, but I'll bet if the Pinheads really want your image they'll get it somehow regardless.
<meta name="pinterest" content="nopin" />