Does anyone have the bot UA for the US version of Yellow Pages, please?
I have a bot UA of clj-httpc coming from IP 12.175.178.nnn (which corresponds to YP's IP range). This UA seems unreasonable and is currently being blocked.
(This is in UK)
blend27
4:38 pm on Jan 28, 2011 (gmt 0)
I was just going to vent about this PEST, it grabs the Robots TXT and ingnores it for the past 2 weeks or so. Goes for the Home page only after that. All requests are made from nnn.nnn.nnn.250 IP to home page only after Robots is read where it gets a disallow .
dstiles
9:48 pm on Jan 28, 2011 (gmt 0)
Yep, that's the IP.
I have hits from it going back to August but not sure what the UA was that far back without doing some back-tracking.
I'm inclined to let it stay blocked except a couple of my clients expect customers from the US. I wonder how much damage continuing to block it could sustain.
blend27
1:19 pm on Jan 29, 2011 (gmt 0)
I got 2 more user agents for that IP:
Mozilla/5.0 (compatible; heritrix/1.14.3.r6601+http://www.buddybuzz.net/yptrino) - CAUTION this URL in UA leads to CASINO Type Site.
ua is from November of 2010
and before that it was:
libwww-perl/5.805
ua is from January of 2011
SO maybe that IP is not Yellow Pages Bot after all.
dstiles
9:02 pm on Jan 29, 2011 (gmt 0)
DNS gives the full /24 as Yellow Pages. If it's running something else as well I'm definitely keeping it blocked.
Looking at the /24 in robtex, there are several rDNS entries that look as if YP is selling server space on its range; includes AT&T Interactive.
caribguy
8:06 am on Jan 30, 2011 (gmt 0)
From my ipf filter:
block in log first quick on bge0 from 12.175.178.0/24 to any # Anywho / Yellowpages, US
I'm tempted to do the whole /20
wilderness
8:59 am on Jan 30, 2011 (gmt 0)
I'm tempted to do the whole /20
Some years back I had the entire /8 denied ;)
caribguy
8:21 am on Jan 31, 2011 (gmt 0)
Makes me feel so conservative! Only one /8 and a handful of /10's - If anybody cares...
block in log first quick on bge0 from 116.0.0.0/8 to any block in log first quick on bge0 from 58.192.0.0/10 to any block in log first quick on bge0 from 59.0.0.0/10 to any block in log first quick on bge0 from 61.128.0.0/10 to any # China Telecom CN block in log first quick on bge0 from 78.192.0.0/10 to any # Proxad, FR block in log first quick on bge0 from 79.192.0.0/10 to any # dip.t-dialin.net, DE block in log first quick on bge0 from 183.0.0.0/10 to any # Chinanet, CN block in log first quick on bge0 from 218.0.0.0/10 to any block in log first quick on bge0 from 220.192.0.0/10 to any
And I think most of PSI 38.0.0.0/8 is 403'd in my httpd.conf
Mokita
12:10 pm on Feb 2, 2011 (gmt 0)
In case anyone might be feeling inclined to copy caribguy's blocklist verbatim, I would suggest significant caution and prior thorough investigation with regard to your "audience/clients" before doing so.
e.g. the 116.0.0.0/8 range contains IPs from both Australia and New Zealand.
I do block 38.0.0.0/8 and all the exclusively China located IP ranges, but definitely not any that are located in Oceania (Australia, New Zealand, Papua New Guinea and Pacific Islands).
caribguy
4:37 pm on Feb 2, 2011 (gmt 0)
Thanks for the heads up Mokita!
I would suggest significant caution and prior thorough investigation with regard to your "audience/clients"
Is always important, and key to my decision to whitelist part of 38.0.0.0/8 (mostly offices in NYC) and not care so much about the 116. range.