Page is a not externally linkable
blend27 - 3:42 am on Jan 21, 2013 (gmt 0)
@lucy24
Logs of course have one advantage over real-time activity: you can see what the next request will be.
Now take the knowledge you have learned, create a mySQL/MSSSQL schema, and log all that info.
request headers
robots.txt access
URI requested/QueryString/Referrer
UAs
IPs(including rdns)
hosting ranges
country ranges(2 indexed views - first search allowed, if not found search not allowed(log data, block))
media files access
speed of access
Errors, redirects / Click Path / Scrape Path
You will be surprised how much real time data matters/is useful now days. And how much faster
I have 9 tables with 3GB of data in MSSQL with a sub-domain on one of the busiest site's that is used for WebServices that spit out all that data live to other sites I own. 7 queries, all together all under a second. Authenticated access only.
I could tell you how many times GoogleBot had crawled URI #672 in the second week of April of 2004 or a particular UA first showed up on the site and which geographical area in US or CA was more interested in "curly red widgets" on BlackFriday/CyberMonday of 2008. Oh, and that IPhone and IPad based UAs send request headers in different order all together :).
I could ban/unban an IP/range based on that info on more than 2 dozen sites via an Custom Blackberry App that I wrote.
It's is a lot more fun that way.
and no it's not on Apache/PHP platform, sorry ;)
-----------------------------------------------------------
@incrediBILL
I have a function that does look ups that takes advantage of Java Classes.
in short:
function rdnsLookUp(address) {
// Variables
var iaclass="";
var addr="";
// Init class
iaclass=CreateObject("java", "java.net.InetAddress");
// Get address
addr=iaclass.getByName(address);
// Return the name
return addr.getCanonicalHostName();
}
Problem with running rDNS requests against the IPs that do not have them is that the time to look up is USUALLY 4-5 seconds. So if someone would run a ddos style scrape that would slow down a server a bit. So I time out the requests after 2 seconds(no more), then scheduled tasks that runs on the back burner(diff app pull) picks it up. Mostly these are hosting ranges.