homepage Welcome to WebmasterWorld Guest from 54.242.241.20
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
your-server.de hosts bad bots --
findfiles.net, heritrix, Mr. X, GrubNG, Eurobot, etc.
Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3963600 posted 6:56 am on Aug 1, 2009 (gmt 0)

Today:

static.88-198-7-nn.clients.your-server.de
findfiles.net/0.96 (Robot;test_robot@gmx-topmail.de)
robots.txt? Yes BUT ignored it

Since May, partial listing:

static.108.75.46.nn.clients.your-server.de
Mozilla/5.0 (compatible; heritrix/2.0.2 +http://seekda.com)
robots.txt? YES

static.108.75.46.nn.clients.your-server.de
Mozilla/5.0 (compatible; heritrix/${pom.version} +http://seekda.com)
robots.txt? YES
Fake ref? YES

static.47.84.46.nn.clients.your-server.de
Mr. X (Nutch spiderman; [agenteX.googlepages.com...] ; MyEmail)
robots.txt? Yes BUT ignored it

static.84.69.46.nn.clients.your-server.de
IE 4.01 Win98
(yeah, sure)

static.213-239-214-nn.clients.your-server.de
Mozilla/5.0 (compatible; proximic; +http://www.proximic.com)
robots.txt? YES

213-239-212-nn.clients.your-server.de
GrubNG 20080128
robots.txt? NO

static.165.71.46.nn.clients.your-server.de
Eurobot/Nutch-1.0-dev (1.0)
robots.txt? Yes BUT ignored it

static.88-198-50-nnn.clients.your-server.de
Mozilla/5.0 (Windows; U; Windows NT 5.0; de; rv:1.8.1.5) Gecko/20070713 Firefox/2.0.0.5
robots.txt? Yes BUT ignored it

 

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3963600 posted 8:21 pm on Aug 1, 2009 (gmt 0)

A problem with blocking out one of the IP units with nnn is: it's useless trying to resolve anything on that IP if it's reversed. Some of those IP ranges you give are actually of the form nnn.46.75.108 :)

Correct way around ones are those in the ranges below:

213.239.192.0 - 213.239.223.225
88.198.0.0 - 88.198.255.255

Any chance of the true initial IP portion so they can be tracked down? I may already have them blocked (as with those above) but it would be nice to check.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3963600 posted 12:12 am on Aug 2, 2009 (gmt 0)

This forum's Charter [webmasterworld.com] requires the obfuscation:

-----
Any IP address or reverse DNS information not expressly belonging to a search engine should be masked as follows:

Example IP: 111.222.333.nnn
Example DNS: nnn.333.222.111.example.com

Additionally, the IPs should be obscured when discussing distributed crawlers that are run from volunteer computers.
-----

I have the full Host info, of course, but bothunting/tracking is too OCD/time-consuming as-is:) so please don't Sticky me for the missinnng details. If you know a server reverses IPs in its Host names, I guess you'll have to swap 'em around yourself as need be, sorry.

blend27

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3963600 posted 1:48 am on Aug 2, 2009 (gmt 0)

From my tracking script, not just bad bots, but a plethora of open proxies and hacked servers that are constantly used for comment-spam attempts. your-server.de is slowly becoming NETDIREKT.

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3963600 posted 2:57 am on Aug 2, 2009 (gmt 0)

Same bot, same webserver space, within two seconds of each other. Note the Hosts...

(If life was like the movie TIMECOP and "The same matter cannot occupy the same space" vis-a-vis scourge hosts/farms/clouds*, these two would be gone in a flash. Forever;)

ec2-[yada-yada].compute-1.amazonaws.com
Mozilla/5.0 (compatible; proximic; +http://www.proximic.com)
robots.txt? YES

08/01 06:20:24

static.47.34.46.nn.clients.your-server.de
Mozilla/5.0 (compatible; proximic; +http://www.proximic.com)
robots.txt? Yes BUT ignored it

08/01 06:20:25
08/01 06:20:26

*see also:
amazonaws.com plays host to wide variety of bad bots [webmasterworld.com]

janharders

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3963600 posted 7:01 am on Aug 2, 2009 (gmt 0)

btw, your-server.de is actually hetzner.de, and it's no surprise there are alot rouge bots on their net, since they are one of the cheapest providers in germany, offering quite powerful dedicated servers with unlimited traffic (though bandwidth is reduced after the first two tb in a month, iirc) for a low price.
I've come to find them quite reasonable in dealing with complaints, so if you're tracking open proxies etc anyhow, you might consider handing them a list and asking for action.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3963600 posted 4:21 pm on Aug 2, 2009 (gmt 0)

pfui - I know the charter says to obfuscate, but you obfuscated the wrong bit. It should have been (for some of them) nnn.123.124.125 not 125.124.123.nnn. It is not possible to reverse the IP to detect the offending range if the vital bit is obscured. It is better to give the IP rather than the rDNS.

For example: 47.34.46.nnn resolves to Bell in Canada. The correct IP range should begin nnn.46.34.47 but it's difficult to discover what nnn is and hence which block your-server.de resides on in that instance. nnn could be anything from the 80 to 95 but isn't, nor is it in the 21n range. There are several other possibilties including 77, 78, 79 and in fact it appears to resolve to 78.46.32.0 - 78.46.63.255. I actually have the whole block 78.46.nnn.nnn already blocked. :)

I agree it is not always obvious when to reverse the numbers and I appreciate your time is valuable. I also appreciate your postings. :)

blend27 - I almost wrote the same thing about netdirekt, which is a known exploit source. :)

janharders - life is too short to compile a list of exploites from there. :)

Pfui

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 3963600 posted 8:02 pm on Aug 3, 2009 (gmt 0)

dstiles: Again I get what you're saying about which nnn bits to block, or not, in posts.

Here's the thing:

We do rDNS on the server so my Apache ELF entries show visitors by Host name. Plain IPs only appear when there's no Host.

That's why, after white- or blacklisting by UA, I then 403 by Host, and thereafter 403 by IP/CIDR if need be. And that's why the majority of my bot-sighting posts show Host info, not IPs: I don't need to WHOIS every bot-running Host I spot prior to blocking. And I don't have time to WHOIS them just to post.

So where does how we do things leave you in terms of lookups and/or nnn reversals?

You're on your own:)

FWIW, at least vis-a-vis hits to our Class C, the vast majority are by Hosts, and the worst trouble-making Hosts do not reverse IPs in their names.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 3963600 posted 10:01 pm on Aug 3, 2009 (gmt 0)

Ah. I now understand.

All of my logs show IP not rDNS. It's faster, although speed is not so much of a problem now (always provided the DNS server doesn't bottleneck).

The only time the server does rDNS is for stats analysis - which I took ages setting up per site and none of the b clients uses! :(

I suppose a problem in blocking by host rather than IP is that server farms often have rDNS set up to the clients' domains (mine all are), so you would need to block a lot of domains instead of a range of IPs. Obviously more selective but in my case much more server-time consuming.

So: I'm on my own. No problem now I understand. :)

Leosghost

WebmasterWorld Senior Member leosghost us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3963600 posted 10:53 pm on Aug 3, 2009 (gmt 0)

just to let you guys know others are watching these threads and do appreciate your efforts and sometimes we can even glean "tangential info" ..janharders post #3964072 pointed me to a hitherto unkown ( to me ) dedi server facility in Germany ..at reasonable prices for spec .

I was looking for one ( not urgently but for a future project ) hetzner.de will do nicely :) and I promise to not run bots off it ..

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved