Forum Moderators: DixonJones
Thanks
Jay
Since about a year I 'classify' every entrance in the web if it is done by a robot, machine or by a human user.
To be able to do this I wrote a 'log database' where I log every entrance with its server variables (Ip, Proxy, referrer, querystring, country where they come from page visited etc. etc.)
If I see a 'rare number' of page views by one IP, I look at the pages viewed by this IP, if I notice that there have been called 1.000 sites in 2 minutes without accepting sessions it's clear that it was a bot and after having done a 'whois' I add this IP to to the robot-table and if it comes back to the site the IP would be treated correspondent to the classification I assigned: Maybe it could never enter the page for being a spammer, maybe it was 'accredited' as search engine spider etc. etc.
At the beginning it was a little bit work to classify all the spiders and spammers but now there are 2 or 3 new spiders or spammers per week.
If one of them causes too much trouble I block it directly on the server.
I did all this because my clients are paying me for being presented on the page and I don't want (and they too) that they pay for things caused by spammers or search engine robots.
Two or three weeks ago I noticed an increment of page requests for pages that do not exist, never existed and searching for files like owssvr.dll, or files in a _vti_bin directory etc.
So I wrote another database to catch the IPs doing this with a provider database behind containing the providers and there IP range.
The first week was hard: I had to classify nearly every hour some of this violent IPs, but now it seems that the 'must important' providers doing nothing against spam and attacks are already classified and new attacks enter atomically in the database and - and this is another effect - the number of this rare attacks decreased nearly to 10% of the attacks on port 80 I had before.
Maybe they don't love me anymore ...
If you are nowhere near 250k hits / month, then your site might benefit from some optimizing... I just ran my site through this validator and must've fixed anywhere from 30 to 100 thousand errors in the past 2 days (don't laugh too hard until you're error free yourself, hehe).
[htmlhelp.com ]
(you can check a box for 'Validate entire site')
Also when things get hairy, I use this as a backup:
[netmechanic.com ]
(scroll down a bit).
By optimizing your code (validate the entire site to an error-free specification), you can considerably reduce server overhead - this usually prolongs my stay with a host for some time and extends my upgrade deadline noticeably.
Hope that helps.
p.s.: a dedi can tackle 10 - 15 million data req's / month or thereabouts, for a highly optimized site (98% + error-free, w3-compliant standardized code).