homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Google scraping through China?

 10:29 pm on Dec 18, 2012 (gmt 0)

Just found a group of half a dozen or so hits from IP 61.49.40.nn, a Chinese broadband (I think) IP but with three open ports, including 80, 22 and 8080.

The hits were actually from a proxy at 8.8.8.n (guess the fourth number!) using the Chinese IP as a (presumably) open proxy.

All hits bar one were to guestbook pages (and rejected), which suggests a form-spamming attack. - is a level3 sub-range assigned to google with rDNS of google-public-dns-a.google.com. This suggests it may be a public DNS service, but if so why is it behaving like a scraper? And if it is a public general-purpose IP why is it allowed to do this? (Although given G's parctices it would not be surprising.)

Come to think of it, the IP (all 8's) was suggested to me by my broadband provider recently as a way of proving whether I had an external DNS problem or not...



 1:17 am on Dec 19, 2012 (gmt 0)

What was the user agent?

I recently encountered the following and it appears to be legit. (China) - crawl-203-208-60-198.googlebot.com

inetnum: -
netname: GOOGLECN


 5:08 am on Dec 19, 2012 (gmt 0)

What leads you to think it may be G? More likely an individual using the open network IMO.


 9:18 pm on Dec 19, 2012 (gmt 0)

Bill - sorry, forgot:

Two on the 14th...

Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv: Gecko/20100401 MRA 5.6 (build 03278) Firefox/3.6.3 sputnik WebMoney Advisor

Opera/9.80 (Windows NT 5.1; U; Edition Campaign 09; ru) Presto/2.10.229 Version/11.64

On the 18th (the one in the OP with half-dozen hits)...

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.1634 Safari/535.19 YE

All through the same (compromised?) IP.

The 203 IP shows in DNS as belonging to googlecn - which could be anyone but probably isn't. The IP resolves to the same format as googlebot. But since it's China it's blocked anyway.

Keyplr - yes, that's why I was questioning its validity. But: to use a remote and possibly compromised IP as a proxy requires the forwarded-for IP (the 8's) to actually do the forwarding - ie to select and pass data to the ultimate IP. How could joe hacker arrange to access web sites via a google proxy and a probably compromised broadband IP? And do it consistently - the hits on 14th and 18th all used the same IP, which has open ports suggesting a compromised or deliberately open machine.


I (belatedly) tried the 8s in robtex (enter only three 8s and terminate with a period to get the cnet). There are a LOT of domains in the list that look remarkably like web sites, so probably one of those is compromised or deliberately scraping/posting. For a small (/24) non-google-owned google-used IP range this is a very odd setup!


 7:07 am on Dec 20, 2012 (gmt 0)

I guess I just don't see what G has to gain by this, nor why they'd spend the effort when they can just access your files normally. OTOH a user up to no good would have an advantage hiding in this manner, however it still seems like a lot of work when they could just use an anonymous proxy with a spoofed UA.


 11:15 am on Dec 20, 2012 (gmt 0)

The hits were actually from a proxy at 8.8.8.n (guess the fourth number!) using the Chinese IP as a (presumably) open proxy.
forwarded-for IP (the 8's)
Did you see
? [webmasterworld.com...]
You've been fooled by a distorting proxy. Don't trust X-Forwarded-For.


 8:13 pm on Dec 20, 2012 (gmt 0)

Different situation. I'm more inclined to think the forwarder has a VPS registered there.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved