Forum Moderators: open

Message Too Old, No Replies

NerdByNature.Bot

         

keyplyr

8:01 pm on Aug 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




176.9.0.*** Mozilla/5.0 (compatible; NerdByNature.Bot; [nerdbynature.net...]

rDNS: static.121.0.9.176.clients.your-server.de

robots.txt: yes

Pfui

11:12 pm on Aug 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"Very aggressive bot. Generates huge server loads..." Comments continue here [projecthoneypot.org]

Neighboring IPs have HEAD-hit / at least twice, never asking for robots.txt:

static.120.0.9.176.clients.your-server.de - [01/Aug/2011] HEAD
static.119.0.9.176.clients.your-server.de - [24/Jul/2011] HEAD

Mozilla/5.0 (compatible; NerdByNature.Bot; http://www.nerdbynature.net/bot)

robots.txt? NO

Aside: your-server.de is home to many bad UA eggs.

keyplyr

11:49 pm on Aug 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Pfui, I agree. Just putting it up since there was no mention of this bot.

I have 176.9.0.0/16 blocked as well as 188.40.73.239 and 78.46.43.100 from your-server.de. I might broaden the blocked ranges on later two.

Pfui

12:01 am on Aug 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for reporting it so I could chime in with info.

We all spot so many bad bots day in and day out that OPs could easily be a part-time gig! (I'd do more but I hate the post-delay;)

lucy24

1:12 am on Aug 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I met them on the third. Their info page says they grab the first 50 pages, which in my case was literally true. 53 HEAD (three of them belonging to big fat pages which they decided not to bother with), 50 GET, generally 2/second.

Their exact sequence of hits meant that they could have picked up a page in an off-limits directory-- it would have come about 2/3 of the way down their list-- but they didn't.

I put them in the "no skin off my nose" category.

dstiles

7:46 pm on Aug 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As with keyweb, block hetzner...

46.4/16
78.46/15
85.10.192/18
88.198/16
176.9/16
178.63/16
188.40/16
213.133.96/19
213.239.192/18

keyplyr

8:57 pm on Aug 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@ dstiles

Where are you putting all these denies... in the server config? Surly not an .htaccess file given the seemingly vast number of ranges you deny.

While my .htaccess file is only about 10k, there are over 120 ranges denied, two dozen rewrites denying by UA, and IP white list filters for the top 5 SEs. Add that to several URL rewrites and a couple other goodies, I'm always concerned about response times, especially since Google has made a speed issue.

I guess if you're serious about blocking Google [webmasterworld.com...] , this is a non-issue?

wilderness

9:32 pm on Aug 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Where are you putting all these denies... in the server config? Surly not an .htaccess file given the seemingly vast number of ranges you deny.


keyplr,
IP denials by themselves don't cause any delay or server load.

I had a quickly functioning htaccess that was 115k and had 2300 lines, which functioned just fine and caused no server load.

keyplyr

4:04 am on Aug 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




I had a quickly functioning htaccess that was 115k and had 2300 lines, which functioned just fine and caused no server load.

Amazing (it's times like this I wish we had emoticons.)





wilderness

4:53 am on Aug 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



keyplyr,
Although it may seem an unusually large size to most, it's certainly not an obscure instance.

I recall an old thread where some were comparing htaccess size and others had larger than I.

BTW, I consolidated IP's twice to shrink the numbers, else I'da had surely double that number.

keyplyr

6:00 am on Aug 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Sorry for the offTopic.

Last year my htaccess file was about 22k and I started to see GWT site speed tool climb above "your site is slower than 50%..."

I consolidated rewrites, condensed ranges, deleted most redirects, switched to white list filters and got the htaccess file under 8k. I immediately saw site speed improve 20%.

I do understand it's not the just the size of the file, but the complexity and number of processes that the file directs the server to perform. But to my understanding, the entire htaccess has to be read for every request so if there are 2k deny lines, each one has to be read right?

wilderness

1:18 pm on Aug 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But to my understanding, the entire htaccess has to be read for every request so if there are 2k deny lines, each one has to be read right?


At least until the applicable visitor reaches either a point of denial or all for a pass.

dstiles

10:18 pm on Aug 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Keyplr - my last comment re: google was to the "nasty insidious" bit, not to blocking google in general - although that would be favourite. :(

I run IIS not (sadly) apache - I have no htaccess capability.

I built a home-made bot-blocker over many years. About two years back I switched from a text file of blocked IPs to MySQL, which contains mostly ranges I've had hassle from over several years plus quite a few single IPs that auto-block themselves by unwanted action; usually from scrapers, badly-behaved browsers and virus-compromised "home" machines. There are currently about 23,000 records in the database, a large proportion of them being "singles". Access time for the database, including a quite extensive test script to determine bad bots etc, is about 10-30 mS.

I do not believe google can accurately determine page delivery time. They have to rely on desktops reporting this via googletoolbar and google analyzer, neither of which a lot of people have. The result depends on how fast any specific desktop computer is, how busy it is, how fast the broadband or other supply is - a whole load of things including mom/pop who only go to a small range of sites with last century's very slow computer.

Google may try to make people think they have slow site delivery but from observations in the google forum I'm more inclined to think they are not able to really tell and in any case are more concerned with their own crawler bandwidth.