Forum Moderators: open

Message Too Old, No Replies

New scraper? portalimage.org

         

brokaddr

9:07 am on Dec 15, 2011 (gmt 0)

10+ Year Member



Just saw Google buzzing around my site with this site listed as a referrer. Googling my site name + portalimage.org yield a small amount of results; but the problem lies here:

- This isn't a "proxy" site from what I can tell. It's a site with ads that displays your/my content in iframes

- Blocking the site's IP 49.50.8.43 seems futile - even when firewalled, they're still iframing my content. I watched the activity on my site, the viewer IP for said page is *my* IP!

- Blocking referrer "portalimage.org" also seems futile, as that's what's listed as the latest referrer for Google:
SetEnvIfNoCase Referer "portalimage" bad_bot

"Googlebot" is still clicking around with this referrer, even after an htaccess block of this referrer.

It had occured to me that the referrer or user-agent is spoofed, but the UA, IP, etc. all appears to be legit.

Any tips on how to block these guys once and for all?

keyplyr

11:32 am on Dec 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




Hi brokaddr,

If you're blocking their IP then that's probably all you can do for now. They already have your image(s) so the only thing you can do is block them the next time they come scraping. Keep a look-out for this IP in your logs to see the UA used, then block that too.

Pfui

3:34 pm on Dec 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just saw Google buzzing around my site

"Googlebot" is still clicking around with this referrer, even after an htaccess block of this referrer.


brokaddr: Sorry, I'm confused. Googlebot from googlebot.com/Google IPs rarely (actually, I'd say never) includes a referer so...

Could you please (re)ID the Host/IP of the machine that hit you in the first place, and the UA is appeared to be using? Thanks.

P.S. re iFraming (one of many) [webmasterworld.com...]

dstiles

10:30 pm on Dec 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If the IP is as given in the OP then 49.50.8.0/22 is Indonesia, a notorious source of internet nastiness.

Apart from that, the IP range contains servers so should be blocked anyway.

Pfui

11:41 pm on Dec 15, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, I thought that about the IP, 49.50.8.43, but wasn't sure. Even multiple re-readings leave me a bit cloudy: "the viewer IP for said page is *my* IP" ?

Also, the OP thought "Googlebot" may be legit, but if it came from that IP, no, it wasn't. Ditto if the UA was simply "Googlebot".

Anyway, yep, because "at least 20 other hosts point to the troublesome 49.50.8.43... [including the troublesome] portalimage.org..." [robtex.com...] I agree re: blocking, at a minimum:

deny from 49.50.8.0/24

brokaddr

5:13 am on Dec 17, 2011 (gmt 0)

10+ Year Member



49.50.8.0/24 is now firewalled.

Here's the data from my site stats:
Host/IP: crawl-66-249-71-132.googlebot.com - 66.249.71.132
Agent: Mozilla/5.0 (compatible; Googlebot/2.1; [google.com...]
Referrer: [portalimage.org...]
Time on Site: 126+hrs
Last Click: just now. Firewall ban did not work. :(
Number of Clicks: 42000+

They do not appear to be scraping from 49.50.8.0/24. If so, the firewall deny would have kicked them.

If they're spoofing, they're doing a damn good job at impersonating google; I can't block google, obviously.
Though I did report the domain. Hopefully their entire domain is de-indexed.

Any further tips on how to get these guys off my site?

keyplyr

5:35 am on Dec 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What does Googlebot have to do with anything? It obviously just followed a hot-link on an image scraper site. Look at your raw logs over the last month or so and find the hits from that scraper and block it either by UA or IP.

And I agree, I rarely see a valid Googlebot with a referrer, but I have seen it a couple times. Congratulations, you're lucky. Googlebot just handed you the culprit.

brokaddr

7:25 am on Dec 17, 2011 (gmt 0)

10+ Year Member



If only it were that simple.

0 hits in the past 30 days from any IP matching: 49.50%

To add insult to injury, I just realized "ID" was blocked via firewall country deny for the past month or two due to unrelated circumstances. So unless this bot scraped my site prior to the ID block, they're using a different IP/excellent spoof to index sites.

keyplyr

11:27 am on Dec 17, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If only it were that simple. 0 hits in the past 30 days from any IP matching: 49.50


Yeah, its a lot of work sometimes. I've got the defense & maintenance for my site down to 1 hour a day, then another hour or two on my client's sites. Then I grab a board and jump in the Pacific Ocean and do a little surfing to stay sane.

Do some searches for that ip address. Check out honeypot and other similar sites and see if you can find abuse reports for these guys. You might find that UA.