Welcome to WebmasterWorld Guest from

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Ban All Nutch Variants?

8:30 am on Jul 29, 2007 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
votes: 265

I've seen Nutch being used by almost everyone in the last few years: school CS projects, Yahoo, Overture, Internet Archive, and dozens of unknown sources. At one time I supported the NutchOrg project's efforts and allowed access to all Nutch agents, however it really got out of hand. I was seeing it come from everywhere.

So I changed my mind and decided to no longer allow Nutch. At first I denied it in robots.txt, but most Nutch variants ignored it, so I pulled the plug altogether and banned all UAs containing "nutch" via .htaccess.

How do I know Nutch is not being used to scrape content or copy entire sites to remote servers in other countries that I will never know about? I find my content infringed on web sites, forums and blogs all the time and almost always have DMCA papers in action.

All the threads I find at WW are pretty old, and before we had much info on this bot. What's the latest? Any hard data Nutch is being used for nefarious purposes?

5:34 pm on July 29, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
votes: 3

I've had all versions of Nutch denied for more than four-and-one-half years.

I seem to recall Jim having some formal discussions with the Nutch folks and when a solution failed, Jim added a deny for Nutch as well?



Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members