Forum Moderators: open
68.180.139.*** - - [05/Feb/2008:00:59:37 -0500] "GET /index.html HTTP/1.0" 403 462 "-" "LTI/LemurProject Nutch Spider/Nutch-1.0-dev (Research spider using Nutch; http://www.lemurproject.org; mhoy@cs.cmu.edu)"
FYI: IP is a Yahoo business account hosting server.
It requests robots.txt (where it is correctly disallowed) and then eats a couple hundred 403s (since it's banned by the generic "Nutch") then comes back a few hours later switching D and/or C IP class and does the same... day after day.
I've emailed them with log snippets. No reply.
For Yahoo IPs, how can one distinguish between Yahoo activity vs 3rd party businesses that are hosted with Yahoo?
Umbra,
I going to need to buy a newboard as unable to get the drool off mine ;)
Yahoo (and most other SE Providers) have such a vast quanity of tools coming from so many different ranges that it's impossible to stay abreast.
I did the following and thus far haven't seen anything detrimental.
[webmasterworld.com...]
A newer thread
[webmasterworld.com...]
Well to use an example, I just spotted this (denied) request:
68.180.176.114
libwww-perl/5.803
Is this Yahoo being stupid, or a 3rd party hosted on a Yahoo server? The whois record indicates Yahoo, the reverse ip gives mproc15.data.corp.sk1.yahoo.com, but I don't know where to go from there.
IMO, in no way, shape or form, does that include libwww-perl. (and many others).
All SE's are sending a MASS of IP ranges and so many various tools at our websites that we must begin to wonder if there exists any benefit to our websites for all this excess?
Don
For Yahoo IPs, how can one distinguish between Yahoo activity vs 3rd party businesses that are hosted with Yahoo? - Umbra
Yahoo does use libwww-per for some purpose (never did determine exactly what) and I see it occasionally. But because of the potential for abuse, I deny all libwww-per requests and only allow select IP addresses to use it via a white list with mod_rewrite.