Welcome to WebmasterWorld Guest from 22.214.171.124
So I changed my mind and decided to no longer allow Nutch. At first I denied it in robots.txt, but most Nutch variants ignored it, so I pulled the plug altogether and banned all UAs containing "nutch" via .htaccess.
How do I know Nutch is not being used to scrape content or copy entire sites to remote servers in other countries that I will never know about? I find my content infringed on web sites, forums and blogs all the time and almost always have DMCA papers in action.
All the threads I find at WW are pretty old, and before we had much info on this bot. What's the latest? Any hard data Nutch is being used for nefarious purposes?