homepage Welcome to WebmasterWorld Guest from 54.197.183.230
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google scraping through China?
dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4529439 posted 10:29 pm on Dec 18, 2012 (gmt 0)

Just found a group of half a dozen or so hits from IP 61.49.40.nn, a Chinese broadband (I think) IP but with three open ports, including 80, 22 and 8080.

The hits were actually from a proxy at 8.8.8.n (guess the fourth number!) using the Chinese IP as a (presumably) open proxy.

All hits bar one were to guestbook pages (and rejected), which suggests a form-spamming attack.

8.8.8.0 - 8.8.8.255 is a level3 sub-range assigned to google with rDNS of google-public-dns-a.google.com. This suggests it may be a public DNS service, but if so why is it behaving like a scraper? And if it is a public general-purpose IP why is it allowed to do this? (Although given G's parctices it would not be surprising.)

Come to think of it, the IP (all 8's) was suggested to me by my broadband provider recently as a way of proving whether I had an external DNS problem or not...

 

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4529439 posted 1:17 am on Dec 19, 2012 (gmt 0)

What was the user agent?

I recently encountered the following and it appears to be legit.

203.208.60.198 (China) - crawl-203-208-60-198.googlebot.com

inetnum: 203.208.32.0 - 203.208.63.255
netname: GOOGLECN

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4529439 posted 5:08 am on Dec 19, 2012 (gmt 0)

What leads you to think it may be G? More likely an individual using the open network IMO.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4529439 posted 9:18 pm on Dec 19, 2012 (gmt 0)

Bill - sorry, forgot:

Two on the 14th...

Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 MRA 5.6 (build 03278) Firefox/3.6.3 sputnik 2.1.0.18 WebMoney Advisor

Opera/9.80 (Windows NT 5.1; U; Edition Campaign 09; ru) Presto/2.10.229 Version/11.64

On the 18th (the one in the OP with half-dozen hits)...

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.1634 Safari/535.19 YE

All through the same (compromised?) IP.

The 203 IP shows in DNS as belonging to googlecn - which could be anyone but probably isn't. The IP resolves to the same format as googlebot. But since it's China it's blocked anyway.

Keyplr - yes, that's why I was questioning its validity. But: to use a remote and possibly compromised IP as a proxy requires the forwarded-for IP (the 8's) to actually do the forwarding - ie to select and pass data to the ultimate IP. How could joe hacker arrange to access web sites via a google proxy and a probably compromised broadband IP? And do it consistently - the hits on 14th and 18th all used the same IP, which has open ports suggesting a compromised or deliberately open machine.

And...

I (belatedly) tried the 8s in robtex (enter only three 8s and terminate with a period to get the cnet). There are a LOT of domains in the list that look remarkably like web sites, so probably one of those is compromised or deliberately scraping/posting. For a small (/24) non-google-owned google-used IP range this is a very odd setup!

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4529439 posted 7:07 am on Dec 20, 2012 (gmt 0)

I guess I just don't see what G has to gain by this, nor why they'd spend the effort when they can just access your files normally. OTOH a user up to no good would have an advantage hiding in this manner, however it still seems like a lot of work when they could just use an anonymous proxy with a spoofed UA.

thetrasher

5+ Year Member



 
Msg#: 4529439 posted 11:15 am on Dec 20, 2012 (gmt 0)

The hits were actually from a proxy at 8.8.8.n (guess the fourth number!) using the Chinese IP as a (presumably) open proxy.
forwarded-for IP (the 8's)
Did you see
X-Forwarded-For: 8.8.8.8
? [webmasterworld.com...]
You've been fooled by a distorting proxy. Don't trust X-Forwarded-For.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4529439 posted 8:13 pm on Dec 20, 2012 (gmt 0)

Different situation. I'm more inclined to think the forwarder has a VPS registered there.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved