homepage Welcome to WebmasterWorld Guest from 54.242.18.190
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
"DAPPER-HOST-IP" why Baidu ?
"DAPPER-HOST-IP" usrer agent baidu
aboshakeeb




msg:4612982
 6:01 pm on Sep 26, 2013 (gmt 0)

Hello ,

Any idea what is that "DAPPER-HOST-IP" ! . it is in the User Agent for Baidu's requests , very strange and annoying Bot .

Here is the problem :

Headers as in logs :
Get: /somepage.html ' a page that is not on my website.
Host: subdomain.Not-my-domain.com
Agent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)DAPPER-HOST-IP:67.228.##.### ( not my IP ) .

Crawler IP : 180.76.4.137 'this is Baidu !

I think that this IP (67.228.##.###) belongs to subdomain.Not-my-domain.com , I'm not sure but what the heck is going on ?

 

keyplyr




msg:4613253
 4:59 pm on Sep 27, 2013 (gmt 0)


Any idea what is that "DAPPER-HOST-IP"


Sorry... don't know, don't care. I block Baidu:

119.63.192.0 - 119.63.199.255
119.63.192.0/21

123.112.0.0 - 123.127.255.255
123.112.0.0/12

180.76.0.0 - 180.76.255.255
180.76.0.0/16

185.10.104.0 - 185.10.107.255
185.10.104.0/22

What I do care about is whether I'm missing any Baidu ranges.

bhukkel




msg:4613292
 7:27 pm on Sep 27, 2013 (gmt 0)

@keyplyr

I have this range also:

103.6.76.0 - 103.6.79.255

it is allocated but not in use at the moment.

dstiles




msg:4613295
 7:34 pm on Sep 27, 2013 (gmt 0)

If it's Japan I let it through. If it's China it's blocked. One or more of my customers gets trade from Japan but not from China. I know it's likely the results are pooled but that's the way it is. :)

I've also had baiduspider (Hong Kong) visit from the European (RIPE) range of 185.10.104.0/22.

Why do you think 123.112.0.0/12 is baidu? DNS shows it as Unicom and it LOOKS as if it's (mostly) a DSL range. I have two baidu sub-ranges within that at 123.125.66.0/24 and 123.125.71.0/24. I have notes on another couple of /24 in that neighbourhood but no rDNS found (this was two or three years ago). Needless to say, China: blocked.

bhukkel - thanks for the new range!

[edited by: dstiles at 7:36 pm (utc) on Sep 27, 2013]

bhukkel




msg:4613296
 7:36 pm on Sep 27, 2013 (gmt 0)

@aboshakeeb

i have the same log entries only other page and dapper-host-ip. In my case it is a page that exists. Perhaps some kind of preview or translate service?

aboshakeeb




msg:4613322
 8:40 pm on Sep 27, 2013 (gmt 0)

@bhukkel : the host in request's header is not my domain !

@keyplyr : i do care about Baidu , i have my share from Baidu and hao123 i cant block it all , the spider's IP and this Dapper have the same IP range .

My server can handle the requests and simply return 404 , but i want to know if this is safe , and are we missing something ? the requests is about 5k /day !

lucy24




msg:4613327
 8:53 pm on Sep 27, 2013 (gmt 0)

fwiw: if you go to dapper.net you get redirected to a Yahoo! Advertising page with that purple new* logo. Dapper also seems to be a Linux server.

67.228-229 is SoftLayer, another Shoot To Kill range.

don't know, don't care

That about sums it up.


* Idle query: would a non-native speaker recognize that this usage is wrong when not used for humorous effect?

keyplyr




msg:4613395
 3:51 am on Sep 28, 2013 (gmt 0)

Why do you think 123.112.0.0/12 is baidu?

dstiles- in the early days of Baidu, it crawled from different spots in this range previous to its own assignments (you may be correct with your subs-ranges.) However since it is all China, I just broadened the block at some point. My notes say I added this range in 2005.

As I block absolutely everything from China I find, even if Baidu no longer uses this range, it matters not to me. It remains blocked, along with the other China Unicom and Chinanet ranges. But thanks for the heads-up. Always good to update my notes with current/accurate info :)


Thanks bhukkel, didn't have that one.

67.228-229 is SoftLayer, another Shoot To Kill range

Lucy - I have 67.228.0.0/16 as Softlayer (blocked), but 67.229.0.0/16 I have as VPLS.net a reseller biz hosting service, with the mothership being Krypt Technologies (blocked) a nefarious dedi/cloud server farm.

dstiles




msg:4613483
 5:50 pm on Sep 28, 2013 (gmt 0)

I didn't have the Krypt one but I do now. Thanks, Lucy. :)

A slight digression from topic but my full VPLS_Krypt list is now:

66.186.32.0 - 66.186.63.255
67.198.128.0 - 67.198.255.255
67.229.0.0 - 67.229.255.255
98.126.0.0 - 98.126.255.255
100.43.128.0 - 100.43.191.255
107.6.192.0 - 107.6.255.255
110.34.128.0 - 110.34.255.255
173.214.0.0 - 173.214.127.255
174.139.0.0 - 174.139.255.255
184.75.176.0 - 184.75.191.255
184.83.0.0 - 184.83.255.255
184.164.192.0 - 184.164.223.255
209.11.240.0 - 209.11.255.255

I try to be fair about China (Ukraine, Russia, Korea, etc). I mark them as potential trouble and block them from accessing some UK-only web sites. On others I let them through. If an IP causes trouble it gets temporarily blocked. If several IPs in a sub-range or complete range give trouble, the offending range is permanently blocked. Occassionally I'll check UCE-Protect and block permanently if they are on a serious blockage there. This does not, of course, pertain to servers, which are blocked anyway.

I do find that some Chinese districts seem to be worse than others either due to more aggressive "operators" or (more likely) more prone to getting viruses and hence becoming botnet members.

keyplyr




msg:4613500
 7:12 pm on Sep 28, 2013 (gmt 0)



Thanks for the Krypt ranges dstiles. Didn't have 2 of those.

Well, China is indeed a complicated subject. Unfair as it may be, I just don't have the time to micro-manage those endless ranges.

lucy24




msg:4613512
 8:36 pm on Sep 28, 2013 (gmt 0)

I try to be fair about China (Ukraine, Russia, Korea, etc). I mark them as potential trouble and block them from accessing some UK-only web sites. On others I let them through. If an IP causes trouble it gets temporarily blocked.

I have some parts of the world on a "one-strike" rule. If I meet a bad robot from anywhere that turns out to be from Eastern Europe (in practice = Poland + former Soviet, greater leeway for Baltic) the range is generally blocked. Someone hereabouts said that IP addresses in this geographical area tend to be mixed human ISPs and servers, so you can't readily classify them as one or the other. The same may well apply to parts of southeast Asia, but I haven't met enough of them to bother with.

Don't know if other people's experience is similar. But robots from Brazil or Vietnam or whatnot tend to be one-offs, while a Ukrainian robot once established will keep hammering away forever.

67.229.0.0/16 I have as VPLS.net a reseller biz hosting service, with the mothership being Krypt Technologies

Well, if you want to spend time searching the corpses for ID, it's your lookout ;)

dstiles




msg:4613630
 6:15 pm on Sep 29, 2013 (gmt 0)

I haven't seen a great deal of mixed human/server anywhere as a policy. Depending on the country a lot of people decide to run servers from their broadband-based machines or have servers run for them by trojans/botnets. Some of this is evil users but the botnet stuff is usually lack of computer literacy or lack of a good OS (eg Windows clones, which MS will not (understandably) feed Updates to).

You are probably correct about Brazil, Vietnam etc and I would class those as probably "computer illiteracy" in some form.

Shame about Brazil because, in conjunction with some European countries and a few others, it's trying to build a new and safer internet. Already in demo, from what I can gather, but it will probably be a few more years yet. Certainly it's way past time someone built a decent internet. :(

Poland - I had a lot of dynamic IP hits last year and the year before but very little this year. It's not one I block as I would UA or RU, for example. My semi-blocked countries are:

Ripe...

il:Israel
lv:Latvia
ro:Romania
rs:Serbia
ru:Russia
tr:Turkey
ua:Ukraine

Apnic/Lacnic...

br:Brazil
cn:China
id:Indonesia
in:India
kr:Korea
my:Malaysia
ph:Phillippines
pk:Pakistan
th:Thailand
tw:Taiwan
vn:Vietnam

robzilla




msg:4615400
 5:26 pm on Oct 8, 2013 (gmt 0)

Host: subdomain.Not-my-domain.com

This just means someone (or something, i.e. the bot) has set up his/her/its DNS or local hosts file to point this domain to your IP address. Because the request is for "subdomain.Not-my-domain.com", it will show up as such in your logs. God knows to what purpose, but there you go. It's harmless.

aboshakeeb




msg:4615452
 8:49 pm on Oct 8, 2013 (gmt 0)

@robzilla , finally someone see how dangerous is this , but why Baidu is doing this ?

robzilla




msg:4615472
 10:12 pm on Oct 8, 2013 (gmt 0)

The IP addresses I've seen don't seem to be connected to Baidu (other than being Chinese). Example:

202.46.61.122 - - [08/Oct/2013:18:04:34 +0200] "GET /contatoEstabelecimento.cfm HTTP/1.1" 404 1229 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)DAPPER-HOST-IP:67.215.242.52"
202.46.50.140 - - [08/Oct/2013:18:11:22 +0200] "GET /felfel62003/calendar/20100416 HTTP/1.1" 404 1229 "-" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1)DAPPER-HOST-IP:67.228.244.106"

The "DAPPER-HOST-IP" addresses look pretty random to me.

aboshakeeb




msg:4615614
 12:31 pm on Oct 9, 2013 (gmt 0)

It is not random, a fixed 20k hits /day , from same sub-net , the Whois data is :

China Beijing Beijing Baidu Netcom Science And Technology Co. Ltd.

when i blocked that Sub-net it stopped for 2 days then restarted with another IP registered for ( Baidu Netcom Science And Technology )

Either they are phishing their own customers or they have a messed up Proxy server !

bhukkel




msg:4615638
 3:02 pm on Oct 9, 2013 (gmt 0)

yesterday i had 40k DAPPER-HOST-IP hits from different subnets but all from China.

robzilla




msg:4615826
 9:20 am on Oct 10, 2013 (gmt 0)

Might want to block user agents containing "DAPPER-HOST-IP" then.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved