Welcome to WebmasterWorld Guest from 54.197.171.28

Forum Moderators: incrediBILL

Message Too Old, No Replies

How To Stop Pinterest PinIt Crowd Sourced Scraping

Putting a Stop to Crowd Sourced Scraping

   
1:32 am on Sep 9, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



There appears to be some interest in blocking Pinterest's crowd source scraping so here's some simple ways to stop Pinterest's PinIT from functioning.

When you pin something you'll notice that first a HEAD check is made to your page to verify access before an actual PIN occurs.

You'll see something like this in your log file:
50.16.149.229 - - [09/Sep/2012:--:--:-- +0000] "HEAD / HTTP/1.1" 403 883 "-" "Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.45 Safari/535.19"


Note that my server said "403" which means FORBIDDEN yet it grabbed the images anyway.

Then they download the image two times every time.
23.22.226.192 - - [09/Sep/2012:--:--:-- +0000] "GET /scrapeit.jpg HTTP/1.1" 200 8743 "-" "Pinterest/0.1 +http://pinterest.com/"
23.22.226.192 - - [09/Sep/2012:--:--:-- +0000] "GET /scrapeit.jpg HTTP/1.1" 200 8743 "-" "Pinterest/0.1 +http://pinterest.com/"


Must have really bad software if they can't catch it the first time, who knows.

Disabling Pinterest can be done multiple ways.

1. Blocking all the Amazon Web Services (AWS) IP ranges in .htaccess
There's some comprehensive lists of Amazon's IP ranges in the Spider forum:
[webmasterworld.com...]

Block AWS like this in .htaccess:
<Limit GET POST>
order allow,deny
deny from 23.20.0.0/14
deny from 50.16.0.0/15
deny from etc.
allow from all
</Limit>


2. Blocking Pinterest by user agent in your .htaccess file.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Pinterest [NC]
RewriteRule !^robots\.txt$ - [F]


DO BOTH just in case Pinterest starts using a different cloud service, assuming they continue using their easily identifiable user agent, you'll be covered.

3. You might also want to install the meta tags just in case to make sure, but I'll bet if the Pinheads really want your image they'll get it somehow regardless.
<meta name="pinterest" content="nopin" />
8:13 pm on Sep 18, 2012 (gmt 0)

10+ Year Member



Blocking Amazon's web services also blocks Loveit.com

Aside from some slight slowing down of page delivery, I believe there is little to lose from blocking Amazon servers from accessing your content.

I have this list:

Deny from 8.18.144.0/23
Deny from 23.20.0.0/14
Deny from 46.51.128.0/17
Deny from 46.137.0.0/16
Deny from 50.16.0.0/14
Deny from 50.112.0.0/16
Deny from 54.240.0.0/12
Deny from 67.202.0.0/18
Deny from 72.44.32.0/19
Deny from 75.101.128.0/17
Deny from 79.125.0.0/17
Deny from 96.127.0.0/17
Deny from 103.4.8.0/21
Deny from 107.20.0.0/14
Deny from 122.248.192.0/18
Deny from 174.129.0.0/16
Deny from 175.41.128.0/17
Deny from 176.32.64.0/18
Deny from 176.34.0.0/16
Deny from 177.71.128.0/17
Deny from 184.72.0.0/15
Deny from 184.169.128.0/17
Deny from 204.236.128.0/17
Deny from 216.182.224.0/20
11:17 pm on Sep 18, 2012 (gmt 0)

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Pinterest [NC]
RewriteRule !^robots\.txt$ - [F]

Not sure I understand this, do I need to place something in my robots.txt also?

Like this?

User-agent: Pinterest
Disallow: /
2:45 am on Sep 19, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



robots.txt is voluntary. .htaccess (or config file) is deaf to all appeals.

The RewriteRule quoted here is one way of exempting robots.txt from rules targeting unwanted visitors. So they can't say "I wanted to obey robots.txt, honest I did, but they wouldn't let me see it!" Another is to put robots.txt into a <Files> envelope to cover anyone blocked by core-level Deny from... directives.

But I don't think it's really necessary in this situation, because how many .txt files have you got? In mod_rewrite, most visitors can be blocked by extension: either .html (or whatever you normally use, including / for directories) or jpe?g|gif|png, depending on whether they are after pages or pictures.

For those just joining us: The ordinary anti-hotlinking routines won't work with pinterest, because their image-harvesting is coded to look as if your own page is the referer.
12:12 pm on Sep 24, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Almost all of my recent photos have the website name clearly but discreetly on the photo.

One person I investigated had pinned around 40 of my photos to their board and I love it. 40 photos with my website url on them plus 40 links back.

I started to copyright the pictures when I noticed that .ru websites (plus a few .com) were scraping entire sites, pictures and content.
8:25 am on Nov 9, 2012 (gmt 0)

5+ Year Member



The
2. Blocking Pinterest by user agent in your .htaccess file.
method works great for me.

I was wondering: is there a htaccess method to make them pin a replacement image, similar to a hotlinking replacement image?

As hotlinking replacement image I use a large attractive and colorful banner with my domain name. I'm sure it drives some type-in traffic to my site. It would be great to get this one pinned instead of my pictures.
9:07 am on Nov 9, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



You could replace the [F] with a rewrite to your image of choice. You would have to allow them to get the page itself, and only apply the rewrite to requests for images.

This only works if the human pinner has the (real) image in their cache, so they don't realize the wrong thing is getting pinned when they look at the preview page. Otherwise they would just cancel the whole process.

Option 4. (from my own htaccess) is

RewriteRule \.(jpe?g|gif|png)$ /pictures/smallgifs/onedot.gif [L]

where onedot.gif is a 1x1 transparent gif that's used for a variety of purposes. People can pin to their heart's content, but it won't do them any good because nobody will see anything.
8:40 pm on Dec 21, 2012 (gmt 0)

WebmasterWorld Senior Member sgt_kickaxe is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Weeee, Google shoved a good percentage of my images into the explicit category recently despite their being as non-explicit as a tea kettle so this is the perfect time for me to make image changes. My url will now appear on all my images and my htaccess file just got a wee bit bigger.

Begone bots, scrapers and mashups - Go please your shareholders with someone else's content, preferably your own.
10:28 pm on Dec 21, 2012 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



Been here before haven't we....

Isn't this google's problem ?
Blocking ip's to solve this seems like we are not getting to the root of the problem.
11:13 pm on Dec 21, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Isn't this google's problem ?
Blocking ip's to solve this seems like we are not getting to the root of the problem.


Has nothing to do with Google and everything to do with Pinterest.

The object is to stop scraping and unauthorized usage by Pinheads in Pinterest (not Google) and several methods, including blocking IPs, are included.

Proactive prevention of copyright infringement is time well spent vs. wasting time with CopyScape, DMCA, etc, after the fact.
11:24 pm on Dec 21, 2012 (gmt 0)

WebmasterWorld Senior Member Top Contributors Of The Month



Has nothing to do with Google and everything to do with Pinterest.


My bad, I thought this was referring to Pinterest outranking certain sites in google results.
10:06 pm on Dec 22, 2012 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



I say serve them up a little #*$! everytime they request one of your images!
11:07 pm on Mar 27, 2013 (gmt 0)

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



this:
<meta name="pinterest" content="nopin" />

Doesn't validate in <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

If I change end tag to drop the " / and just use ">

will the meta still block pinning?

[I posted this in the HTML forum too, but maybe it belongs here.]
12:35 am on Mar 28, 2013 (gmt 0)

WebmasterWorld Senior Member ken_b is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I got the answer I needed in the other forum.

Thanks.
6:36 pm on Apr 8, 2013 (gmt 0)



Maybe I'm just a little chaotic/neutral, but I like @ZydoSEO's idea. I'm not opposed to Pinterest scraping my sites (see my Pinterest copyright infringement thread) but I do respect the right of a site owner to reduce, prevent and even fight back against unwanted use of their site. So here are a list of ideas:

- cause Pinterest crawler to hang somehow, using their system resources
- cause Pin crawler to grab the wrong image, preferably one with undesired content like skin diseases, etc
- cause a never ending redirect loop of some sort
- redirect to a large site that is known to go after copyright violators
- automail a DMCA takedown notice

The last one is my favorite, but creating something of a random mix of the above would probably be fun.
7:17 pm on Apr 8, 2013 (gmt 0)

10+ Year Member



cause Pin crawler to grab the wrong image


I have this in place, the substituted image is a copyright warning.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month