Forum Moderators: open

Message Too Old, No Replies

Yellow Pages USA

what is its bot UA?

         

dstiles

4:48 pm on Jan 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does anyone have the bot UA for the US version of Yellow Pages, please?

I have a bot UA of clj-httpc coming from IP 12.175.178.nnn (which corresponds to YP's IP range). This UA seems unreasonable and is currently being blocked.

(This is in UK)

blend27

4:38 pm on Jan 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I was just going to vent about this PEST, it grabs the Robots TXT and ingnores it for the past 2 weeks or so. Goes for the Home page only after that. All requests are made from nnn.nnn.nnn.250 IP to home page only after Robots is read where it gets a disallow .

dstiles

9:48 pm on Jan 28, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep, that's the IP.

I have hits from it going back to August but not sure what the UA was that far back without doing some back-tracking.

I'm inclined to let it stay blocked except a couple of my clients expect customers from the US. I wonder how much damage continuing to block it could sustain.

blend27

1:19 pm on Jan 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I got 2 more user agents for that IP:

Mozilla/5.0 (compatible; heritrix/1.14.3.r6601+http://www.buddybuzz.net/yptrino) - CAUTION this URL in UA leads to CASINO Type Site.

ua is from November of 2010

and before that it was:

libwww-perl/5.805

ua is from January of 2011


SO maybe that IP is not Yellow Pages Bot after all.

dstiles

9:02 pm on Jan 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



DNS gives the full /24 as Yellow Pages. If it's running something else as well I'm definitely keeping it blocked.

Looking at the /24 in robtex, there are several rDNS entries that look as if YP is selling server space on its range; includes AT&T Interactive.

caribguy

8:06 am on Jan 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From my ipf filter:

block in log first quick on bge0 from 12.175.178.0/24 to any # Anywho / Yellowpages, US

I'm tempted to do the whole /20

wilderness

8:59 am on Jan 30, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm tempted to do the whole /20


Some years back I had the entire /8 denied ;)

caribguy

8:21 am on Jan 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Makes me feel so conservative! Only one /8 and a handful of /10's - If anybody cares...

block in log first quick on bge0 from 116.0.0.0/8 to any
block in log first quick on bge0 from 58.192.0.0/10 to any
block in log first quick on bge0 from 59.0.0.0/10 to any
block in log first quick on bge0 from 61.128.0.0/10 to any # China Telecom CN
block in log first quick on bge0 from 78.192.0.0/10 to any # Proxad, FR
block in log first quick on bge0 from 79.192.0.0/10 to any # dip.t-dialin.net, DE
block in log first quick on bge0 from 183.0.0.0/10 to any # Chinanet, CN
block in log first quick on bge0 from 218.0.0.0/10 to any
block in log first quick on bge0 from 220.192.0.0/10 to any

And I think most of PSI 38.0.0.0/8 is 403'd in my httpd.conf

Mokita

12:10 pm on Feb 2, 2011 (gmt 0)

10+ Year Member



In case anyone might be feeling inclined to copy caribguy's blocklist verbatim, I would suggest significant caution and prior thorough investigation with regard to your "audience/clients" before doing so.

e.g. the 116.0.0.0/8 range contains IPs from both Australia and New Zealand.

I do block 38.0.0.0/8 and all the exclusively China located IP ranges, but definitely not any that are located in Oceania (Australia, New Zealand, Papua New Guinea and Pacific Islands).

caribguy

4:37 pm on Feb 2, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for the heads up Mokita!
I would suggest significant caution and prior thorough investigation with regard to your "audience/clients"

Is always important, and key to my decision to whitelist part of 38.0.0.0/8 (mostly offices in NYC) and not care so much about the 116. range.