keyplyr

msg:4516526 | 9:31 pm on Nov 6, 2012 (gmt 0) |
I block all known AMAZON-EC2 ranges, including this one: 23.20.0.0 - 23.23.255.255 23.20.0.0/14
|
lucy24

msg:4516558 | 10:20 pm on Nov 6, 2012 (gmt 0) |
:: quick detour to whois :: Yup, still listed as 23.20.0.0/14 As long as you're there, I've also got the adjoining 23.19 blocked. Server farm, apparently. And, in the other direction: if 23.24.0.0/14 is still Comcast Business, they're probably expendable as well. Come to think of it, I don't see much of anything in 23 except for a couple of Canadian IPs. Most of the range is-- or was until recently-- unassigned, so I'd expect a lot of filling-in over the next couple of years.
|
keyplyr

msg:4516602 | 1:00 am on Nov 7, 2012 (gmt 0) |
Thanks for the Ubiquity range. Just a FYI - Comcast Biz also includes all those employees who surf from their desks. We had a lengthy discussion about this last year. I spot checked through a year's logs and found thousands of human hits from inside that range (I had also considered blocked it.)
|
lucy24

msg:4516606 | 2:13 am on Nov 7, 2012 (gmt 0) |
| Comcast Biz also includes all those employees who surf from their desks. |
| Ah ha. I don't think I've formally blocked any of their (many, many, many) ranges for that very reason: Just when you think it's completely useless, you get a bona fide human. And in my case it can be very hard to tell if they're goofing off or looking up something work-related ;)
|
brokaddr

msg:4530903 | 3:52 am on Dec 25, 2012 (gmt 0) |
IP Address: 72.30.142.253 User Agent: NING/1.0 Seems Yahoo is using this user agent, as well?
|
wilderness

msg:4530907 | 4:12 am on Dec 25, 2012 (gmt 0) |
| IP Address: 72.30.142.253 |
| I have this noted (and denied) as Inktomi Cache. Please see the WayBack thread and reference to NOARCHIVE io the thread immediately below this thread.
|
not2easy

msg:4530910 | 4:26 am on Dec 25, 2012 (gmt 0) |
72.30.0.0 - 72.30.255.255 72.30.0.0/16 Listed as INKTOMI-BLK-5 maintained by Yahoo.
|
brokaddr

msg:4530911 | 4:59 am on Dec 25, 2012 (gmt 0) |
wilderness, this topic: [webmasterworld.com...] - I didn't see anything mentioned, specifically? Pardon my ignorance if it's plain as day. I searched the NING/1.0 user-agent and this topic came up. Nothing about Wayback machine/noarchive did. What's their relation?
|
wilderness

msg:4530917 | 5:21 am on Dec 25, 2012 (gmt 0) |
My apologies brokaddr. My reference was to the perils of allowing SE's and others to "cache" pages as per the NOARCHIVE link provided by Bill and repeated by myself. Slurp/Inktomi and Yahoo all cache pages and even send their bots on solitary pages requests (with full supporting files) for their cache. It's a good idea to separate that valid SE bots from all the SE's auxiliary tools, and then only allow the valid bots.
|
brokaddr

msg:4530931 | 7:21 am on Dec 25, 2012 (gmt 0) |
| It's a good idea to separate that valid SE bots from all the SE's auxiliary tools, and then only allow the valid bots. |
| I wasn't even aware Yahoo did that. Is there an easy way to decipher which is which?
|
wilderness

msg:4531104 | 11:03 am on Dec 26, 2012 (gmt 0) |
I'm not aware of any Yahoo crawlers, rather AFAIK Yahoo uses the crawls by MSN/Bing for their SERPS. Somebody else may be able to provide IP's. All I have documented since reactivation in February are Yahoo utilities, which I do not allow. I haven't had any full crawls from anything that has identified itself itself as Yahoo, Slurp, or Inktomi since same reactivation.
|
keyplyr

msg:4531197 | 8:35 pm on Dec 26, 2012 (gmt 0) |
I'm not aware of any Yahoo crawlers |
| There are actually several different bots still run by Yahoo, mostly in Europe and Asia, but Inktomi and Slurp still hit my US site, possibly because I have a lot of inbound from Europe.
|
wilderness

msg:4531210 | 9:22 pm on Dec 26, 2012 (gmt 0) |
keyplr, I'm not getting any full crawls, rather solitary page requests from a few select pages (with complete accompanying files), and the same pages are repeating.
|
keyplyr

msg:4531287 | 2:51 am on Dec 27, 2012 (gmt 0) |
I'm not getting full crawls either. Never did from the 2nd level bots. Although Slurp and Inktomi did full site crawls a couple years ago. They may have been re-assigned for other purposes.
|
|