Welcome to WebmasterWorld Guest from 107.21.175.43

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Google scraping through China?

     

dstiles

10:29 pm on Dec 18, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Just found a group of half a dozen or so hits from IP 61.49.40.nn, a Chinese broadband (I think) IP but with three open ports, including 80, 22 and 8080.

The hits were actually from a proxy at 8.8.8.n (guess the fourth number!) using the Chinese IP as a (presumably) open proxy.

All hits bar one were to guestbook pages (and rejected), which suggests a form-spamming attack.

8.8.8.0 - 8.8.8.255 is a level3 sub-range assigned to google with rDNS of google-public-dns-a.google.com. This suggests it may be a public DNS service, but if so why is it behaving like a scraper? And if it is a public general-purpose IP why is it allowed to do this? (Although given G's parctices it would not be surprising.)

Come to think of it, the IP (all 8's) was suggested to me by my broadband provider recently as a way of proving whether I had an external DNS problem or not...

incrediBILL

1:17 am on Dec 19, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



What was the user agent?

I recently encountered the following and it appears to be legit.

203.208.60.198 (China) - crawl-203-208-60-198.googlebot.com

inetnum: 203.208.32.0 - 203.208.63.255
netname: GOOGLECN

keyplyr

5:08 am on Dec 19, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



What leads you to think it may be G? More likely an individual using the open network IMO.

dstiles

9:18 pm on Dec 19, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Bill - sorry, forgot:

Two on the 14th...

Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 MRA 5.6 (build 03278) Firefox/3.6.3 sputnik 2.1.0.18 WebMoney Advisor

Opera/9.80 (Windows NT 5.1; U; Edition Campaign 09; ru) Presto/2.10.229 Version/11.64

On the 18th (the one in the OP with half-dozen hits)...

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.1634 Safari/535.19 YE

All through the same (compromised?) IP.

The 203 IP shows in DNS as belonging to googlecn - which could be anyone but probably isn't. The IP resolves to the same format as googlebot. But since it's China it's blocked anyway.

Keyplr - yes, that's why I was questioning its validity. But: to use a remote and possibly compromised IP as a proxy requires the forwarded-for IP (the 8's) to actually do the forwarding - ie to select and pass data to the ultimate IP. How could joe hacker arrange to access web sites via a google proxy and a probably compromised broadband IP? And do it consistently - the hits on 14th and 18th all used the same IP, which has open ports suggesting a compromised or deliberately open machine.

And...

I (belatedly) tried the 8s in robtex (enter only three 8s and terminate with a period to get the cnet). There are a LOT of domains in the list that look remarkably like web sites, so probably one of those is compromised or deliberately scraping/posting. For a small (/24) non-google-owned google-used IP range this is a very odd setup!

keyplyr

7:07 am on Dec 20, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I guess I just don't see what G has to gain by this, nor why they'd spend the effort when they can just access your files normally. OTOH a user up to no good would have an advantage hiding in this manner, however it still seems like a lot of work when they could just use an anonymous proxy with a spoofed UA.

thetrasher

11:15 am on Dec 20, 2012 (gmt 0)

10+ Year Member



The hits were actually from a proxy at 8.8.8.n (guess the fourth number!) using the Chinese IP as a (presumably) open proxy.
forwarded-for IP (the 8's)
Did you see
X-Forwarded-For: 8.8.8.8

? [webmasterworld.com...] been fooled by a distorting proxy. Don't trust X-Forwarded-For.

dstiles

8:13 pm on Dec 20, 2012 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Different situation. I'm more inclined to think the forwarder has a VPS registered there.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month