| 6:57 pm on Dec 16, 2012 (gmt 0)|
All you need to know is Nutch and any leading or trailing name is possible, as well as any action.
Nutch should be in your UA denails?
| 8:02 pm on Dec 16, 2012 (gmt 0)|
|we extract new or unknown words, and we analyze statistical information such as word frequency. Utilizing this information, we develop highly accurate statistical machine translation systems, text-to-speech systems and so on. |
This reeks of SEO.
| 10:18 pm on Dec 16, 2012 (gmt 0)|
I have 188.8.131.52/24 listed for TOS (Toshiba) as a bot with the attribute Kill!
| 10:29 pm on Dec 16, 2012 (gmt 0)|
no nutch is good nutch
| 12:15 am on Dec 17, 2012 (gmt 0)|
I'm not sure you can put my site and SEO into the same sentence.
My only mental association with Toshiba is TV, so I'm thinking TVs that produce their own closed captioning and/or subtitles on the fly. Matter of fact about half of the inner pages they've picked up to date say something about translation-- but in my case this is not statistically meaningful ;)
They seem to be especially interested in one subgroup of paintings. I'll have to see if I consistently use some word that has an alternate meaning.
| 12:20 am on Dec 17, 2012 (gmt 0)|
All you need to know is Nutch
I second that. And yes, they came by starting in september and got blocked.
| 12:29 am on Dec 17, 2012 (gmt 0)|
Good Nutch is a 403d Nutch.
Apparently it comes from several IPs: [projecthoneypot.org...] . Showed up on my sites mid October or so.
| 1:29 am on Dec 17, 2012 (gmt 0)|
wonder if it's somehow related to start.toshiba.com
just found a referral from them..
Looks like their search is powered by goole,but look at how the "related searches" come up in the results... interesting.
| 3:19 am on Dec 17, 2012 (gmt 0)|
I get these tobi's refers fairly regular (I guess they like me)coming from Cellco users.
Also here's a 2009 UA:
"Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; PeoplePC 1.0; Toshiba; (R1 1.5))"
| 5:56 am on Dec 17, 2012 (gmt 0)|
I first searched my logs for "Toshiba" but that was a red herring. It shows up as part of the referer-query string in g### mobile searches: tablet-android-toshiba, ms-android-toshiba and so on.
| 10:01 am on Dec 17, 2012 (gmt 0)|
Postscript, in case it makes a difference to anyone:
I fired off an e-mail and, thanks to time difference which I'd, ahem, forgotten about, received an almost immediate reply. Toshiba says it is their robot, in spite of the generic IP. (Toshiba's website is also splat in the middle of a random bunch of others sharing the same address. Were they standing behind the door when IP ranges were handed out?)
:: idly wondering whether A Very Big Registrar would even care that someone* wrote "State" in the line that asks for "State or Province" ::
* Irritating but wholly unrelated referer spam. Couldn't Fairpoint have taken a whole /8 to themselves somewhere, so I wouldn't have to keep swatting them by /17s and /18s?
| 8:38 pm on Dec 17, 2012 (gmt 0)|
I've blocked all "nutch" for years. IMO if they can't get their own bot and name it accordingly, why should I recognize them.
| 6:55 pm on Dec 25, 2012 (gmt 0)|
:: bump ::
Now here's an interesting coincidence. After months of nibbling at a page here, a page there, Toshiba has taken to gulping up to 40 pages at once. Deep enough that I can be pretty sure they are honoring robots.txt. (I have two directories that are fully accessible to humans, but off-limits to robots.) Pages only, no other stuff.
We'll call it a coincidence because my e-mail IP is unrelated to my www IP, my signature never includes the domain name, and the log entry I quoted does not include a domain name. Quick detour to g### confirms that I appear to have the only site in the world with the exact pagename I randomly quoted. Oops.