joined:July 13, 2010
Iíve read the Wheels thread and spend a night reading the website of incredibill. [webmasterworld.com
Then I started programming but found out it was more difficult than I thought. Iím checking everyone who access robots.txt and set up some hidden links(not yet blocked in robots.txt) to see who is crawling my website. After a couple of hourís Iíve got more than 50 bots already, some of them of which I donít know if they are really bots (canít think of a way how to access certain hidden links unless you are not a bot?).
Incredibill recommended to use a whitelist to give access to bots, but how would you implement that? Do you give access to only these bots you have in your whitelist and block other bots by default?
And how would you block the unwanted bots without using too much resources? If I would all ipís in the database of unwanted botís this list will grow massively at this pace. The same if I would add them to .htaccess
The only way to do it effectively would be to detect ďbot behaviour ď and block by that. But Iíve searched the internet to find the ďbest wayĒ but couldnít find anything satisfying to start with.
I would like to know an efficient way to detect how fast a crawler is accessing my website, you do this in a database or in sessions orÖ? I did find some code snippets in other threads but none of them where working ďout of the boxĒ to test them en alter them to use for my website. ( [webmasterworld.com
Anyone who can help?