homepage Welcome to WebmasterWorld Guest from 50.16.112.199
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Social Media / Pinterest
Forum Library, Charter, Moderators: incrediBILL

Pinterest Forum

    
How To Stop Pinterest PinIt Crowd Sourced Scraping
Putting a Stop to Crowd Sourced Scraping
incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4492835 posted 1:32 am on Sep 9, 2012 (gmt 0)

There appears to be some interest in blocking Pinterest's crowd source scraping so here's some simple ways to stop Pinterest's PinIT from functioning.

When you pin something you'll notice that first a HEAD check is made to your page to verify access before an actual PIN occurs.

You'll see something like this in your log file:
50.16.149.229 - - [09/Sep/2012:--:--:-- +0000] "HEAD / HTTP/1.1" 403 883 "-" "Mozilla/5.0 (Windows NT 6.0; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.45 Safari/535.19"


Note that my server said "403" which means FORBIDDEN yet it grabbed the images anyway.

Then they download the image two times every time.
23.22.226.192 - - [09/Sep/2012:--:--:-- +0000] "GET /scrapeit.jpg HTTP/1.1" 200 8743 "-" "Pinterest/0.1 +http://pinterest.com/"
23.22.226.192 - - [09/Sep/2012:--:--:-- +0000] "GET /scrapeit.jpg HTTP/1.1" 200 8743 "-" "Pinterest/0.1 +http://pinterest.com/"


Must have really bad software if they can't catch it the first time, who knows.

Disabling Pinterest can be done multiple ways.

1. Blocking all the Amazon Web Services (AWS) IP ranges in .htaccess
There's some comprehensive lists of Amazon's IP ranges in the Spider forum:
[webmasterworld.com...]

Block AWS like this in .htaccess:
<Limit GET POST>
order allow,deny
deny from 23.20.0.0/14
deny from 50.16.0.0/15
deny from etc.
allow from all
</Limit>


2. Blocking Pinterest by user agent in your .htaccess file.
RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Pinterest [NC]
RewriteRule !^robots\.txt$ - [F]


DO BOTH just in case Pinterest starts using a different cloud service, assuming they continue using their easily identifiable user agent, you'll be covered.

3. You might also want to install the meta tags just in case to make sure, but I'll bet if the Pinheads really want your image they'll get it somehow regardless.
<meta name="pinterest" content="nopin" />

 

helleborine

10+ Year Member



 
Msg#: 4492835 posted 8:13 pm on Sep 18, 2012 (gmt 0)

Blocking Amazon's web services also blocks Loveit.com

Aside from some slight slowing down of page delivery, I believe there is little to lose from blocking Amazon servers from accessing your content.

I have this list:

Deny from 8.18.144.0/23
Deny from 23.20.0.0/14
Deny from 46.51.128.0/17
Deny from 46.137.0.0/16
Deny from 50.16.0.0/14
Deny from 50.112.0.0/16
Deny from 54.240.0.0/12
Deny from 67.202.0.0/18
Deny from 72.44.32.0/19
Deny from 75.101.128.0/17
Deny from 79.125.0.0/17
Deny from 96.127.0.0/17
Deny from 103.4.8.0/21
Deny from 107.20.0.0/14
Deny from 122.248.192.0/18
Deny from 174.129.0.0/16
Deny from 175.41.128.0/17
Deny from 176.32.64.0/18
Deny from 176.34.0.0/16
Deny from 177.71.128.0/17
Deny from 184.72.0.0/15
Deny from 184.169.128.0/17
Deny from 204.236.128.0/17
Deny from 216.182.224.0/20

ken_b

WebmasterWorld Senior Member ken_b us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4492835 posted 11:17 pm on Sep 18, 2012 (gmt 0)

RewriteEngine on
RewriteCond %{HTTP_USER_AGENT} Pinterest [NC]
RewriteRule !^robots\.txt$ - [F]

Not sure I understand this, do I need to place something in my robots.txt also?

Like this?

User-agent: Pinterest
Disallow: /

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4492835 posted 2:45 am on Sep 19, 2012 (gmt 0)

robots.txt is voluntary. .htaccess (or config file) is deaf to all appeals.

The RewriteRule quoted here is one way of exempting robots.txt from rules targeting unwanted visitors. So they can't say "I wanted to obey robots.txt, honest I did, but they wouldn't let me see it!" Another is to put robots.txt into a <Files> envelope to cover anyone blocked by core-level Deny from... directives.

But I don't think it's really necessary in this situation, because how many .txt files have you got? In mod_rewrite, most visitors can be blocked by extension: either .html (or whatever you normally use, including / for directories) or jpe?g|gif|png, depending on whether they are after pages or pictures.

For those just joining us: The ordinary anti-hotlinking routines won't work with pinterest, because their image-harvesting is coded to look as if your own page is the referer.

nomis5

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4492835 posted 12:12 pm on Sep 24, 2012 (gmt 0)

Almost all of my recent photos have the website name clearly but discreetly on the photo.

One person I investigated had pinned around 40 of my photos to their board and I love it. 40 photos with my website url on them plus 40 links back.

I started to copyright the pictures when I noticed that .ru websites (plus a few .com) were scraping entire sites, pictures and content.

chrisv1963

5+ Year Member



 
Msg#: 4492835 posted 8:25 am on Nov 9, 2012 (gmt 0)

The
2. Blocking Pinterest by user agent in your .htaccess file.
method works great for me.

I was wondering: is there a htaccess method to make them pin a replacement image, similar to a hotlinking replacement image?

As hotlinking replacement image I use a large attractive and colorful banner with my domain name. I'm sure it drives some type-in traffic to my site. It would be great to get this one pinned instead of my pictures.

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4492835 posted 9:07 am on Nov 9, 2012 (gmt 0)

You could replace the [F] with a rewrite to your image of choice. You would have to allow them to get the page itself, and only apply the rewrite to requests for images.

This only works if the human pinner has the (real) image in their cache, so they don't realize the wrong thing is getting pinned when they look at the preview page. Otherwise they would just cancel the whole process.

Option 4. (from my own htaccess) is

RewriteRule \.(jpe?g|gif|png)$ /pictures/smallgifs/onedot.gif [L]

where onedot.gif is a 1x1 transparent gif that's used for a variety of purposes. People can pin to their heart's content, but it won't do them any good because nobody will see anything.

Sgt_Kickaxe

WebmasterWorld Senior Member sgt_kickaxe us a WebmasterWorld Top Contributor of All Time



 
Msg#: 4492835 posted 8:40 pm on Dec 21, 2012 (gmt 0)

Weeee, Google shoved a good percentage of my images into the explicit category recently despite their being as non-explicit as a tea kettle so this is the perfect time for me to make image changes. My url will now appear on all my images and my htaccess file just got a wee bit bigger.

Begone bots, scrapers and mashups - Go please your shareholders with someone else's content, preferably your own.

seoskunk



 
Msg#: 4492835 posted 10:28 pm on Dec 21, 2012 (gmt 0)

Been here before haven't we....

Isn't this google's problem ?
Blocking ip's to solve this seems like we are not getting to the root of the problem.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4492835 posted 11:13 pm on Dec 21, 2012 (gmt 0)

Isn't this google's problem ?
Blocking ip's to solve this seems like we are not getting to the root of the problem.


Has nothing to do with Google and everything to do with Pinterest.

The object is to stop scraping and unauthorized usage by Pinheads in Pinterest (not Google) and several methods, including blocking IPs, are included.

Proactive prevention of copyright infringement is time well spent vs. wasting time with CopyScape, DMCA, etc, after the fact.

seoskunk



 
Msg#: 4492835 posted 11:24 pm on Dec 21, 2012 (gmt 0)

Has nothing to do with Google and everything to do with Pinterest.


My bad, I thought this was referring to Pinterest outranking certain sites in google results.

ZydoSEO

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4492835 posted 10:06 pm on Dec 22, 2012 (gmt 0)

I say serve them up a little #*$! everytime they request one of your images!

ken_b

WebmasterWorld Senior Member ken_b us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4492835 posted 11:07 pm on Mar 27, 2013 (gmt 0)

this:
<meta name="pinterest" content="nopin" />

Doesn't validate in <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">

If I change end tag to drop the " / and just use ">

will the meta still block pinning?

[I posted this in the HTML forum too, but maybe it belongs here.]

ken_b

WebmasterWorld Senior Member ken_b us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 4492835 posted 12:35 am on Mar 28, 2013 (gmt 0)

I got the answer I needed in the other forum.

Thanks.

zork



 
Msg#: 4492835 posted 6:36 pm on Apr 8, 2013 (gmt 0)

Maybe I'm just a little chaotic/neutral, but I like @ZydoSEO's idea. I'm not opposed to Pinterest scraping my sites (see my Pinterest copyright infringement thread) but I do respect the right of a site owner to reduce, prevent and even fight back against unwanted use of their site. So here are a list of ideas:

- cause Pinterest crawler to hang somehow, using their system resources
- cause Pin crawler to grab the wrong image, preferably one with undesired content like skin diseases, etc
- cause a never ending redirect loop of some sort
- redirect to a large site that is known to go after copyright violators
- automail a DMCA takedown notice

The last one is my favorite, but creating something of a random mix of the above would probably be fun.

helleborine

10+ Year Member



 
Msg#: 4492835 posted 7:17 pm on Apr 8, 2013 (gmt 0)

cause Pin crawler to grab the wrong image


I have this in place, the substituted image is a copyright warning.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Social Media / Pinterest
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved