homepage Welcome to WebmasterWorld Guest from 54.198.94.76
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google App Engine
Tons of junk coming from Google
incrediBILL




msg:4475766
 2:52 am on Jul 15, 2012 (gmt 0)

From the makers of Penguin and Panda, here comes more crAPpEngine.

Just ran a little report to see what's been coming from the Google AppEngine and it seems to be pretty prolific with crud just like AWS or the nutch invasion.

The following is what I've seen from a couple of sites.

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: anjulaproxy)"
IP: 209.85.224.96

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: blacklanternproxy)"
IP: 209.85.224.85

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: cyberdudeproxy)"
IP: 209.85.224.97

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: domain-worlds)"
IP: 209.85.224.90
IP: 209.85.224.92
IP: 209.85.224.95

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: forward4browser)"
IP: 209.85.224.94

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: mjcproxy)"
IP: 209.85.224.81
IP: 209.85.224.90
IP: 209.85.224.95
IP: 209.85.224.97

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: mogoktharproxy)"
IP: 209.85.224.96

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: ninjamacsproxy)"
IP: 209.85.224.82

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: pcdevil-proxy)"
IP: 209.85.224.96

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: puneproxy)"
IP: 209.85.224.84
IP: 209.85.224.87

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: shwebproxy)"
IP: 209.85.224.87

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: skilymob)"
IP: 209.85.224.89

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~capproxy)"
IP: 74.125.158.91

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~crackernew-1234-proxy)"
IP: 74.125.156.83

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~doingfly)"
IP: 74.125.156.89


USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~gaeatbeijing48)"
IP: 74.125.158.91

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~hr-pulsesubscriber)"
IP: 74.125.64.91

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~nyinayminproxy1)"
IP: 74.125.90.84
IP: 74.125.90.90
IP: 74.125.92.81
IP: 74.125.92.87
IP: 74.125.92.88
IP: 74.125.92.90

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~secure-facebook-ranney)"
IP: 74.125.112.85
IP: 74.125.114.86
IP: 74.125.114.90

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: s~urgentproxy)"
IP: 74.125.112.81

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: toom16-10)"
IP: 209.85.224.84
IP: 209.85.224.85
IP: 209.85.224.87
IP: 209.85.224.89
IP: 209.85.224.94
IP: 209.85.224.96
IP: 209.85.224.98

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: v8ad86)"
IP: 209.85.224.89

USER AGENT: "AppEngine-Google; (+http://code.google.com/appengine; appid: web4proxy)"
IP: 209.85.224.91

Thanks Google!

 

keyplyr




msg:4475785
 5:00 am on Jul 15, 2012 (gmt 0)



I block all AppEngine-Google. In fact, I block anything app.

The thinking is - although I optimize my sites for mobile rendering and enjoy a large visitor base from various mobile devices, the app world is just as malicious and copyright infringed as the internet is in general.

Anyone can write an app for any reason and use AppEngine-Google or any other platform. Many social site scrapers and parasites have app based tools developed for mobile phones & tablets.

I've found my content on several social spin-off sites that publish to mobile apps. I'm convinced these social sites make deals with various interests to sell/lease content supplied by their users with or without the user's consent.

I've also caught mobile apps scraping my images for album art, wiki profile photos, event info, etc... so I block them all now, adding a few new ones each week it seems.

incrediBILL




msg:4475792
 5:35 am on Jul 15, 2012 (gmt 0)

adding a few new ones each week it seems


See, that's why whitelisting user agents beats blocking.

While your list gets longer and longer, mine stays the same length ;)

Not to mention the fact they don't get access the first time which means no infringement has yet happened which may not be the case if you're blocking them after the fact.

Anywho, I don't know much about AppEngine but note most of those apps have "proxy" attached and what a clever way to bypass many of the absurdly simplistic bot blocks out there that simply whitelist the Google IP ranges than to make a proxy that operates from within those ranges.

The most clever use of this technique I've ever seen was back a few years ago before Google tightened up some of their security. Someone set their user agent to Googlebot and then used the Google Translator to scrape pages which was successful on many sites, just not anyone using full trip DNS verification for Googlebot which obviously failed on the Google translator.

What cracks me up is those that whine that full trip DNS verification is too slow. Some blindly do RDNS on everything coming to their server which is insane, and of course slows everything down. However, if you only validate just bots and then cache the DNS results for a day it's only slow once per IP verified, which out of many pages crawled daily, is quite acceptable. Since you're only doing full trip verification for bots it has zero impact on regular visitors.

The implementation is what makes or breaks the usefulness of the method and what separates real programmers from script kiddies :)

keyplyr




msg:4475796
 5:48 am on Jul 15, 2012 (gmt 0)

See, that's why whitelisting user agents beats blocking.

I whitelist. However not everything is what it says it is :) I also keep a list of what I block (hence the adding new ones weekly.) Actually, I usually go a couple weeks now without needing to change anything server related.

I also block most all proxy & proxi identifiers, filtered through a few whitelisted IP ranges. I also block all trancoder and translator tools, add-ons, toolbars, etc.

incrediBILL




msg:4475803
 6:01 am on Jul 15, 2012 (gmt 0)

not everything is what it says it is


If it was there wouldn't be much of a challenge would there?

What fun would that be?

There's just something exhilarating about busting a stealth bot, publishing the details about it, and then sitting back to watch the flurry of visits to your blog post from the people who's stealth bot you just outed.

Even if you weren't sure you had it 100% right, that flurry of visits is a pretty good confirmation you nailed it IMO.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved