homepage Welcome to WebmasterWorld Guest from 54.227.141.230
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Google Web Preview
wilderness




msg:4478565
 10:50 pm on Jul 24, 2012 (gmt 0)

There's an old thread on this from 2010 [webmasterworld.com], however that thread is closed.

I've had some strange correlation between a Level3 IP and Google Web Preview.

I had an earlier requests from the same Level3 IP were duplicated as part of somebody's workplace environment in which another IP's requests were the primary visitor.

I suppose the correlation could be related to the users toolbar, however the coincidence is most interesting.

8.28.16.zzz - - [24/Jul/2012:21:30:53 +0100] "GET /MyFolder/MyPage.html HTTP/1.1" 403 559 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
209.85.224.95 - - [24/Jul/2012:21:30:53 +0100] "GET /SameFolder/SamePage.html HTTP/1.1" 403 559 "http://www.google.com/search" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.8 (KHTML, like Gecko; Google Web Preview) Chrome/19.0.1084.36 Safari/536.8"
8.28.16.zzz - - [24/Jul/2012:21:30:54 +0100] "GET /MyFolder/MyPage1.html HTTP/1.1" 403 559 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
209.85.224.95 - - [24/Jul/2012:21:30:54 +0100] "GET /SameFolder/SamePage1.html HTTP/1.1" 403 559 "http://www.google.com/search" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.8 (KHTML, like Gecko; Google Web Preview) Chrome/19.0.1084.36 Safari/536.8"
8.28.16.zzz - - [24/Jul/2012:21:30:54 +0100] "GET /MyFolder/MyPage2.html HTTP/1.1" 403 559 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1; .NET CLR 2.0.50727; .NET CLR 1.1.4322; MS-RTC LM 8; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"
209.85.224.95 - - [24/Jul/2012:21:30:55 +0100] "GET /SameFolder/SamePage2.html HTTP/1.1" 403 559 "http://www.google.com/search" "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.8 (KHTML, like Gecko; Google Web Preview) Chrome/19.0.1084.36 Safari/536.8"

 

wilderness




msg:4478604
 12:54 am on Jul 25, 2012 (gmt 0)

FWIW, SamePage1-2-3 were actually three different pages.

wilderness




msg:4478981
 6:46 am on Jul 26, 2012 (gmt 0)

2nd FWIW.

This Level3 IP has been unusually active (on its own) since I posted this.

Making specific page requests absent duplicated requests from another user and/or IP.

dstiles




msg:4479198
 7:32 pm on Jul 26, 2012 (gmt 0)

I blocked 8.28.16.0 - 8.28.17.255 about a year ago.

Running 8.28.16. as a cnet search on robtex returned bluecoat cache - perhaps that's it? It included at least a couple of webprotection dot com domains (robtex returns only samples so your test may differ from mine). It may be worth doing a full DNS lookup on all IPs in that range if you're worried but for me the 8.28.16.0/23 block suffices. As far as I'm concerned it's a virus protection "bot".

keyplyr




msg:4479200
 7:40 pm on Jul 26, 2012 (gmt 0)


For over a year I've been blocking:

8.0.0.0 - 8.255.255.255
8.0.0.0/8

motorhaven




msg:4479836
 2:37 am on Jul 29, 2012 (gmt 0)

Keyplyr, you're a lot braver than I am knocking out an entire /8 range. I have many legit users in the 8.x.x.x range so I'm fairly selective there with blocks. :)

wilderness




msg:4479843
 3:04 am on Jul 29, 2012 (gmt 0)

I have both the 4 & 8 Class A's denied as well.

I'm sure keyplr has the 4 as well, seem to recall he mentioning it previously.

Level3 has been a PITA since I've been a webmaster.

It's easy to make exceptions for legitimate users, however there's too many networks in the Level3 ranges which simply duplicate requests without ever reading or abiding with anything resembling compliancy.

motorhaven




msg:4479847
 4:13 am on Jul 29, 2012 (gmt 0)

I do some selective blocking in 8.x.x.x, nothing in 4.x.x.x.

The 4.x.x.x I show literally thousands of posts from long time forum users (as opposed to single post users which would indicate spammers) in pretty much every single 4.x.0.0/16 range. It looks like its a huge dynamic IP pool for broadband, because the same user will bounce around between 4.x.0.0/16 addresses over time. I would lose a huge amount of legit traffic blocking out this range.

motorhaven




msg:4479849
 4:22 am on Jul 29, 2012 (gmt 0)

Quick follow-up. I did a scan of the last 30 days for 4.x.x.x IPs. Countless legit hits, just a handful of hits getting snagged by my traps, and they appear at first glance to be exploited PCs, not server farms/crawlers.

wilderness




msg:4479850
 4:27 am on Jul 29, 2012 (gmt 0)

The 4.x.x.x I show literally thousands


You know what they say about assume ;)

I'm assuming a high quantity of your visitors are west of the Mississippi?

I get a large range of widget visitors from the Oceanic countries than I do from west of the Mississippi, and I've even honed down the Oceanic countries.

In the end, what is beneficial for myself and/or keyplr may not be ideal for your own websites.

Level3 has always been a PITA and there are other folks that have the ranges denied as well.

motorhaven




msg:4479852
 4:36 am on Jul 29, 2012 (gmt 0)

Most of my list visitors (depending on which of the several sites I have) are from the US, Canada, western Europe, Australia, New Zealand, Mexico, a spattering of middle-eastern and African countries and a couple of countries in Asia.

99% of my problems originate from China and Asian countries, eastern Europe, Israel, Netherlands, and a couple of South American countries. Also some host farms in Germany/England and a boatload of hosting companies in the USA (none of them in the 4.x.x.x range). Even with problem areas such as Asia/East Europe I take a scalpel rather than butcher knife approach. I've taken a butcher knife approach before and watched Adsense revenue drop as a result.

keyplyr




msg:4479868
 8:05 am on Jul 29, 2012 (gmt 0)

Yes, I also block:

4.0.0.0 - 4.255.255.255
4.0.0.0/8

I may rethink my butcher knife approach sometime soon as my traffic is down 30% from a month ago, but I also dropped to page 2 in Google SERP for my main keyword so that's most of the traffic loss I would assume (oh oh, you know what wilderness says about that.)

dstiles




msg:4479942
 7:48 pm on Jul 29, 2012 (gmt 0)

Here in the UK I have about two dozen 4/8 and 8/8 IPs or sub-ranges banned. Total level3 IP ranges banned (eg as servers) is about 30 and another ten or so IPs blocked other than the 4 and 8 ranges. Not really a big deal and as far as I know my customers get a fair number of legit hits from level3.

On the other hand I have well over 1000 comcast user IPs logged as blocked and about 400 roadrunner IPs (most have since been re-opened: these figures are total IPs blocked in the past couple of years). This compares with approx 1100 UK BT IPs blocked over the same period (most of my sites are mainly UK audience).

Balanced against that is a predominance of US and CA server farms being ill-used, probably more than RU and UA put together as far as my sites see it.

My rough analysis of the blocked IPs is: they were either used by botnets or by ambitious idiots who have no idea what they are doing.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved