Forum Moderators: open
We do not want to leave our server open to Java scrapers but we do want Yahoo to crawl our site.
Anyone have a solution to this issue.
Example log:
66.228.167.32 - - [28/Jan/2008:16:20:18 -0500] "GET /foo.html HTTP/1.0" 406 460 "-" "Java/1.5.0_11"
IP 66.228.167.32 resolves to fsdev1000.yst.corp.yahoo.com
[edited by: volatilegx at 9:35 pm (utc) on Feb. 12, 2008]
#keep-out; or whatever term you use
SetEnvIf Java keep-out
OR
RewriteCond %{HTTP_USER_AGENT} Java
66.228.167.aa is a Overture (formerely Goto.com) range.
I've had 66.228.166. denied for an eternity.
Goto has always been a pest.
Don
It seems an accepted practice for SE's to cloak, however webmasters are penalized for similar viewable practices.
Al the major SE's have so many tools avaialble to users and grabbing from some different IP ranges that it's almost a joke. In fact it would be if webmasters didn't have to deal with their farces.
As an aside; yesterday I saw a bunch of blank refers and UA from the 131.107. (which I haven't seen in some time).
This IP range gets denied at my sites regardless of what UA they use (and has for quite a while; Very OLD MSN thread here).
131.107.0.*** - - [31/Jan/2008:17:49:06 -0600] "GET /Myfolder/Mypage.html HTTP/1.1" 403 - "-" "-"
[edited by: volatilegx at 9:37 pm (utc) on Feb. 12, 2008]
My widgets derive visitors from the oddest of sources.
Recall one from "NYSE" that had a nasty habit of going through hundreds of pages (very slowly and time consuming) looking for materils, rather than learning properly how to utilize the search options and quotes for proper or multiple names ;)
Get regular visitors from the .MIL sites as well. Early on with my websites, though it was some goverment conspiracy ;)
All these and many more oddities would seem pecuilar, however the refer searches are on topic and validate support for their interest.
Unfortuantely the 131.107. has never provided valid refers:
2003:
[webmasterworld.com...]
[webmasterworld.com...]
Additionally there were a couple of more MSN threads in Forum 11 around the same time, which I failed to bookmark.
Don
they are monitoring it for kw's of interest
That may well be spot-on.
Have about a hundred articles online from talented widget writer who died in 1947.
His articles offer an amazing depth, which periodicals seem absent of in today's subscribers attention spans.
I recall a series of articles (lasted weekly for nearly three months; not online) by the writer that were very interesting. However there was very little subject matter (at least after the initial artilce of rants) of my widgets.
Rather, the articles topics took a very sharp turm in the direction of John Hunt Morgan the Civil War Guerilla.
67.195.44.108 - - [11/Feb/2008:12:05:33 -0600] "GET /robots.txt HTTP/1.0" 200 4549 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]
Followed with a sub-driectory file read from a different Class D.
My inquiry is because the internet provider range is outside the normal ranges used by Yahoo, although I have no doubt as to the auntenticity of Yahoo.
My question is why the additional and MEW (at least to me and other webmasters) IP range?
Is there some source or specific tool in use for Yahoo users which would result in Yahoo spidering from this IP range?
67.195.44.108 - - [11/Feb/2008:02:00:06 -0600] "GET /robots.txt HTTP/1.0" 200 4549 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [help.yahoo.com...]