Welcome to WebmasterWorld Guest from 54.147.158.215

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Google scraping through China?

     
10:29 pm on Dec 18, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3125
votes: 4


Just found a group of half a dozen or so hits from IP 61.49.40.nn, a Chinese broadband (I think) IP but with three open ports, including 80, 22 and 8080.

The hits were actually from a proxy at 8.8.8.n (guess the fourth number!) using the Chinese IP as a (presumably) open proxy.

All hits bar one were to guestbook pages (and rejected), which suggests a form-spamming attack.

8.8.8.0 - 8.8.8.255 is a level3 sub-range assigned to google with rDNS of google-public-dns-a.google.com. This suggests it may be a public DNS service, but if so why is it behaving like a scraper? And if it is a public general-purpose IP why is it allowed to do this? (Although given G's parctices it would not be surprising.)

Come to think of it, the IP (all 8's) was suggested to me by my broadband provider recently as a way of proving whether I had an external DNS problem or not...
1:17 am on Dec 19, 2012 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


What was the user agent?

I recently encountered the following and it appears to be legit.

203.208.60.198 (China) - crawl-203-208-60-198.googlebot.com

inetnum: 203.208.32.0 - 203.208.63.255
netname: GOOGLECN
5:08 am on Dec 19, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6665
votes: 130


What leads you to think it may be G? More likely an individual using the open network IMO.
9:18 pm on Dec 19, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3125
votes: 4


Bill - sorry, forgot:

Two on the 14th...

Mozilla/5.0 (Windows; U; Windows NT 6.1; ru; rv:1.9.2.3) Gecko/20100401 MRA 5.6 (build 03278) Firefox/3.6.3 sputnik 2.1.0.18 WebMoney Advisor

Opera/9.80 (Windows NT 5.1; U; Edition Campaign 09; ru) Presto/2.10.229 Version/11.64

On the 18th (the one in the OP with half-dozen hits)...

Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.1634 Safari/535.19 YE

All through the same (compromised?) IP.

The 203 IP shows in DNS as belonging to googlecn - which could be anyone but probably isn't. The IP resolves to the same format as googlebot. But since it's China it's blocked anyway.

Keyplr - yes, that's why I was questioning its validity. But: to use a remote and possibly compromised IP as a proxy requires the forwarded-for IP (the 8's) to actually do the forwarding - ie to select and pass data to the ultimate IP. How could joe hacker arrange to access web sites via a google proxy and a probably compromised broadband IP? And do it consistently - the hits on 14th and 18th all used the same IP, which has open ports suggesting a compromised or deliberately open machine.

And...

I (belatedly) tried the 8s in robtex (enter only three 8s and terminate with a period to get the cnet). There are a LOT of domains in the list that look remarkably like web sites, so probably one of those is compromised or deliberately scraping/posting. For a small (/24) non-google-owned google-used IP range this is a very odd setup!
7:07 am on Dec 20, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6665
votes: 130


I guess I just don't see what G has to gain by this, nor why they'd spend the effort when they can just access your files normally. OTOH a user up to no good would have an advantage hiding in this manner, however it still seems like a lot of work when they could just use an anonymous proxy with a spoofed UA.
11:15 am on Dec 20, 2012 (gmt 0)

Junior Member

10+ Year Member

joined:June 25, 2005
posts:179
votes: 1


The hits were actually from a proxy at 8.8.8.n (guess the fourth number!) using the Chinese IP as a (presumably) open proxy.
forwarded-for IP (the 8's)
Did you see
X-Forwarded-For: 8.8.8.8

? [webmasterworld.com...] been fooled by a distorting proxy. Don't trust X-Forwarded-For.
8:13 pm on Dec 20, 2012 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:May 14, 2008
posts:3125
votes: 4


Different situation. I'm more inclined to think the forwarder has a VPS registered there.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members