homepage Welcome to WebmasterWorld Guest from 54.145.182.50
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Ning/1.0
eat my 403
Bewenched

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 4516476 posted 7:39 pm on Nov 6, 2012 (gmt 0)

23.22.67.164

 

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4516476 posted 9:31 pm on Nov 6, 2012 (gmt 0)

I block all known AMAZON-EC2 ranges, including this one:

23.20.0.0 - 23.23.255.255
23.20.0.0/14

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4516476 posted 10:20 pm on Nov 6, 2012 (gmt 0)

:: quick detour to whois ::

Yup, still listed as 23.20.0.0/14

As long as you're there, I've also got the adjoining 23.19 blocked. Server farm, apparently.

And, in the other direction: if 23.24.0.0/14 is still Comcast Business, they're probably expendable as well.

Come to think of it, I don't see much of anything in 23 except for a couple of Canadian IPs. Most of the range is-- or was until recently-- unassigned, so I'd expect a lot of filling-in over the next couple of years.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4516476 posted 1:00 am on Nov 7, 2012 (gmt 0)


Thanks for the Ubiquity range.

Just a FYI - Comcast Biz also includes all those employees who surf from their desks. We had a lengthy discussion about this last year. I spot checked through a year's logs and found thousands of human hits from inside that range (I had also considered blocked it.)

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4516476 posted 2:13 am on Nov 7, 2012 (gmt 0)

Comcast Biz also includes all those employees who surf from their desks.

Ah ha. I don't think I've formally blocked any of their (many, many, many) ranges for that very reason: Just when you think it's completely useless, you get a bona fide human. And in my case it can be very hard to tell if they're goofing off or looking up something work-related ;)

brokaddr



 
Msg#: 4516476 posted 3:52 am on Dec 25, 2012 (gmt 0)

IP Address: 72.30.142.253
User Agent: NING/1.0

Seems Yahoo is using this user agent, as well?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4516476 posted 4:12 am on Dec 25, 2012 (gmt 0)

IP Address: 72.30.142.253


I have this noted (and denied) as Inktomi Cache.

Please see the WayBack thread and reference to NOARCHIVE io the thread immediately below this thread.

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4516476 posted 4:26 am on Dec 25, 2012 (gmt 0)

72.30.0.0 - 72.30.255.255 72.30.0.0/16
Listed as INKTOMI-BLK-5 maintained by Yahoo.

brokaddr



 
Msg#: 4516476 posted 4:59 am on Dec 25, 2012 (gmt 0)

wilderness, this topic: [webmasterworld.com...] - I didn't see anything mentioned, specifically?

Pardon my ignorance if it's plain as day. I searched the NING/1.0 user-agent and this topic came up. Nothing about Wayback machine/noarchive did.
What's their relation?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4516476 posted 5:21 am on Dec 25, 2012 (gmt 0)

My apologies brokaddr.

My reference was to the perils of allowing SE's and others to "cache" pages as per the NOARCHIVE link provided by Bill and repeated by myself.

Slurp/Inktomi and Yahoo all cache pages and even send their bots on solitary pages requests (with full supporting files) for their cache.

It's a good idea to separate that valid SE bots from all the SE's auxiliary tools, and then only allow the valid bots.

brokaddr



 
Msg#: 4516476 posted 7:21 am on Dec 25, 2012 (gmt 0)

It's a good idea to separate that valid SE bots from all the SE's auxiliary tools, and then only allow the valid bots.

I wasn't even aware Yahoo did that. Is there an easy way to decipher which is which?

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4516476 posted 11:03 am on Dec 26, 2012 (gmt 0)

I'm not aware of any Yahoo crawlers, rather AFAIK Yahoo uses the crawls by MSN/Bing for their SERPS.

Somebody else may be able to provide IP's.

All I have documented since reactivation in February are Yahoo utilities, which I do not allow.
I haven't had any full crawls from anything that has identified itself itself as Yahoo, Slurp, or Inktomi since same reactivation.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4516476 posted 8:35 pm on Dec 26, 2012 (gmt 0)


I'm not aware of any Yahoo crawlers

There are actually several different bots still run by Yahoo, mostly in Europe and Asia, but Inktomi and Slurp still hit my US site, possibly because I have a lot of inbound from Europe.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4516476 posted 9:22 pm on Dec 26, 2012 (gmt 0)

keyplr,
I'm not getting any full crawls, rather solitary page requests from a few select pages (with complete accompanying files), and the same pages are repeating.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4516476 posted 2:51 am on Dec 27, 2012 (gmt 0)

I'm not getting full crawls either. Never did from the 2nd level bots. Although Slurp and Inktomi did full site crawls a couple years ago. They may have been re-assigned for other purposes.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved