Welcome to WebmasterWorld Guest from 54.145.136.73

Forum Moderators: Ocean10000 & incrediBILL

Google App Engine

Tons of junk coming from Google

   
2:52 am on Jul 15, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



From the makers of Penguin and Panda, here comes more crAPpEngine.

Just ran a little report to see what's been coming from the Google AppEngine and it seems to be pretty prolific with crud just like AWS or the nutch invasion.

The following is what I've seen from a couple of sites.

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: anjulaproxy)"
IP: 209.85.224.96

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: blacklanternproxy)"
IP: 209.85.224.85

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: cyberdudeproxy)"
IP: 209.85.224.97

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: domain-worlds)"
IP: 209.85.224.90
IP: 209.85.224.92
IP: 209.85.224.95

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: forward4browser)"
IP: 209.85.224.94

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: mjcproxy)"
IP: 209.85.224.81
IP: 209.85.224.90
IP: 209.85.224.95
IP: 209.85.224.97

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: mogoktharproxy)"
IP: 209.85.224.96

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: ninjamacsproxy)"
IP: 209.85.224.82

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: pcdevil-proxy)"
IP: 209.85.224.96

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: puneproxy)"
IP: 209.85.224.84
IP: 209.85.224.87

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: shwebproxy)"
IP: 209.85.224.87

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: skilymob)"
IP: 209.85.224.89

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~capproxy)"
IP: 74.125.158.91

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~crackernew-1234-proxy)"
IP: 74.125.156.83

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~doingfly)"
IP: 74.125.156.89


USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gaeatbeijing48)"
IP: 74.125.158.91

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~hr-pulsesubscriber)"
IP: 74.125.64.91

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~nyinayminproxy1)"
IP: 74.125.90.84
IP: 74.125.90.90
IP: 74.125.92.81
IP: 74.125.92.87
IP: 74.125.92.88
IP: 74.125.92.90

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~secure-facebook-ranney)"
IP: 74.125.112.85
IP: 74.125.114.86
IP: 74.125.114.90

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~urgentproxy)"
IP: 74.125.112.81

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: toom16-10)"
IP: 209.85.224.84
IP: 209.85.224.85
IP: 209.85.224.87
IP: 209.85.224.89
IP: 209.85.224.94
IP: 209.85.224.96
IP: 209.85.224.98

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: v8ad86)"
IP: 209.85.224.89

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: web4proxy)"
IP: 209.85.224.91

Thanks Google!
5:00 am on Jul 15, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





I block all AppEngine-Google. In fact, I block anything app.

The thinking is - although I optimize my sites for mobile rendering and enjoy a large visitor base from various mobile devices, the app world is just as malicious and copyright infringed as the internet is in general.

Anyone can write an app for any reason and use AppEngine-Google or any other platform. Many social site scrapers and parasites have app based tools developed for mobile phones & tablets.

I've found my content on several social spin-off sites that publish to mobile apps. I'm convinced these social sites make deals with various interests to sell/lease content supplied by their users with or without the user's consent.

I've also caught mobile apps scraping my images for album art, wiki profile photos, event info, etc... so I block them all now, adding a few new ones each week it seems.
5:35 am on Jul 15, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



adding a few new ones each week it seems


See, that's why whitelisting user agents beats blocking.

While your list gets longer and longer, mine stays the same length ;)

Not to mention the fact they don't get access the first time which means no infringement has yet happened which may not be the case if you're blocking them after the fact.

Anywho, I don't know much about AppEngine but note most of those apps have "proxy" attached and what a clever way to bypass many of the absurdly simplistic bot blocks out there that simply whitelist the Google IP ranges than to make a proxy that operates from within those ranges.

The most clever use of this technique I've ever seen was back a few years ago before Google tightened up some of their security. Someone set their user agent to Googlebot and then used the Google Translator to scrape pages which was successful on many sites, just not anyone using full trip DNS verification for Googlebot which obviously failed on the Google translator.

What cracks me up is those that whine that full trip DNS verification is too slow. Some blindly do RDNS on everything coming to their server which is insane, and of course slows everything down. However, if you only validate just bots and then cache the DNS results for a day it's only slow once per IP verified, which out of many pages crawled daily, is quite acceptable. Since you're only doing full trip verification for bots it has zero impact on regular visitors.

The implementation is what makes or breaks the usefulness of the method and what separates real programmers from script kiddies :)
5:48 am on Jul 15, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



See, that's why whitelisting user agents beats blocking.

I whitelist. However not everything is what it says it is :) I also keep a list of what I block (hence the adding new ones weekly.) Actually, I usually go a couple weeks now without needing to change anything server related.

I also block most all proxy & proxi identifiers, filtered through a few whitelisted IP ranges. I also block all trancoder and translator tools, add-ons, toolbars, etc.
6:01 am on Jul 15, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



not everything is what it says it is


If it was there wouldn't be much of a challenge would there?

What fun would that be?

There's just something exhilarating about busting a stealth bot, publishing the details about it, and then sitting back to watch the flurry of visits to your blog post from the people who's stealth bot you just outed.

Even if you weren't sure you had it 100% right, that flurry of visits is a pretty good confirmation you nailed it IMO.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month