Welcome to WebmasterWorld Guest from 54.145.209.107

Forum Moderators: Ocean10000 & incrediBILL

Ban All Nutch Variants?

   
8:30 am on Jul 29, 2007 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I've seen Nutch being used by almost everyone in the last few years: school CS projects, Yahoo, Overture, Internet Archive, and dozens of unknown sources. At one time I supported the NutchOrg project's efforts and allowed access to all Nutch agents, however it really got out of hand. I was seeing it come from everywhere.

So I changed my mind and decided to no longer allow Nutch. At first I denied it in robots.txt, but most Nutch variants ignored it, so I pulled the plug altogether and banned all UAs containing "nutch" via .htaccess.

How do I know Nutch is not being used to scrape content or copy entire sites to remote servers in other countries that I will never know about? I find my content infringed on web sites, forums and blogs all the time and almost always have DMCA papers in action.

All the threads I find at WW are pretty old, and before we had much info on this bot. What's the latest? Any hard data Nutch is being used for nefarious purposes?

5:34 pm on Jul 29, 2007 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



keyplr,
I've had all versions of Nutch denied for more than four-and-one-half years.

I seem to recall Jim having some formal discussions with the Nutch folks and when a solution failed, Jim added a deny for Nutch as well?

Don

 

Featured Threads

Hot Threads This Week

Hot Threads This Month