Forum Moderators: phranque
I've noticed that someone who is trying to scrape my content signals himself this way, and I was hoping to block him very easily as a result, but a closer examination of the logs shows that a few apparently legitimate visitors seem to do this as well. Are they in fact legitimate?
Thank you for your comments.
Peter.
So, a better way to reject these scrapers is to validate the Windows version. Since "Windows NT" by itself and not followed by a version number is invalid, that's an obvious way to catch them without blocking search engine robots.
Jim
I seem to get a fair number of legitimate visitors with slightly unlikely user-agents, and I wouldn't normally want to refuse them just for that. What surprised me was to find a few **apparently** legitimate visitors (in addition to my scraping friend) presenting a combination of fake MSIE and HTTP/1.0.
Peter.
I suggest you use the invalid user-agent screening as described above -- It has worked well for me on dozens of servers for 9+ years... I have never seen an example of a legitimate client with an invalid Windows version.
Jim