lucy24 - 10:25 am on Jul 16, 2011 (gmt 0)
Now here's the part that gives your log-wrangling script a workout: How many of those robots went on to read robots.txt and abide by its instructions? Some robots have fooled me by picking it up faithfully on every visit-- and then merrily going wherever they want to go. Some are so entranced by robots.txt-- whose subject matter is not gripping-- that they never get around to picking up any real files. (The bingbot does this consistently. When it does get bored, it varies the menu by landing on the nearest 301. It is a mystery to me how it finds them or why it wants to, since it never follows the redirect. Maybe it's in secret communication with the googlebot.)
And then there was the robot I should by all rights have locked out on sight because it grabbed everything it could lay its hands on, never even pretended to look at robots.txt... but it faithfully obeyed all "nofollow" directives! Score a victory for the belt-and-suspenders principle.