Forum Moderators: open
penguin26.parc.xerox.com - - [25/Jul/2006:03:52:01 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:03:54:18 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:08:58:00 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:01:18 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:01:33 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:02:03 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:02:23 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:02:26 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:03:19 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:04:25 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:37:39 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:38:56 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:40:59 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:09:48:09 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
penguin26.parc.xerox.com - - [25/Jul/2006:11:28:31 -0700]
"NutchCVS/0.8-dev (Nutch; [lucene.apache.org...] nutch-agent@lucene.apache.org)"
Any user agent that starts with NutchCVS/ get banned immediately. I love it when the user doesn't bother to remove it from the default user agent.
[edited by: GaryK at 11:13 pm (utc) on July 25, 2006]
Wonder where they found out what to hit?Search engines?!
From nutch-0.7.2.tar.gz\nutch-0.7.2\src\engines\Google.src:
# Google plugin<search
name="Google"
description="Google Search"
method="GET"
action="http://www.google.com/search"
update="http://www.google.com/mozilla/google.src"
updateCheckDays=1
>
<input name="q" user>
<input name="sourceid" value="mozilla-search">
<inputnext name="start" factor="10">
<inputprev name="start" factor="10">
<interpret
resultListStart="<body"
resultListEnd="</body>"
resultItemStart="<p class=g>"
resultItemEnd="<br>"
>
</search>
www.google.com/search?q=cache:oqN_1H25-s4J:cvs.sourceforge.net/viewcvs.py/nutch/nutch/engines/+nutch+engines+%22google.src%22&hl=de&lr=&strip=1
This subdirectory contains Altavista.src, FAST.src, Google.src and Inktomi.src.
[google.com ]
[edited by: jatar_k at 6:06 pm (utc) on July 26, 2006]