homepage Welcome to WebmasterWorld Guest from 54.205.228.154
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Ning/1.0
eat my 403
Bewenched




msg:4516478
 7:39 pm on Nov 6, 2012 (gmt 0)

23.22.67.164

 

keyplyr




msg:4516526
 9:31 pm on Nov 6, 2012 (gmt 0)

I block all known AMAZON-EC2 ranges, including this one:

23.20.0.0 - 23.23.255.255
23.20.0.0/14

lucy24




msg:4516558
 10:20 pm on Nov 6, 2012 (gmt 0)

:: quick detour to whois ::

Yup, still listed as 23.20.0.0/14

As long as you're there, I've also got the adjoining 23.19 blocked. Server farm, apparently.

And, in the other direction: if 23.24.0.0/14 is still Comcast Business, they're probably expendable as well.

Come to think of it, I don't see much of anything in 23 except for a couple of Canadian IPs. Most of the range is-- or was until recently-- unassigned, so I'd expect a lot of filling-in over the next couple of years.

keyplyr




msg:4516602
 1:00 am on Nov 7, 2012 (gmt 0)


Thanks for the Ubiquity range.

Just a FYI - Comcast Biz also includes all those employees who surf from their desks. We had a lengthy discussion about this last year. I spot checked through a year's logs and found thousands of human hits from inside that range (I had also considered blocked it.)

lucy24




msg:4516606
 2:13 am on Nov 7, 2012 (gmt 0)

Comcast Biz also includes all those employees who surf from their desks.

Ah ha. I don't think I've formally blocked any of their (many, many, many) ranges for that very reason: Just when you think it's completely useless, you get a bona fide human. And in my case it can be very hard to tell if they're goofing off or looking up something work-related ;)

brokaddr




msg:4530903
 3:52 am on Dec 25, 2012 (gmt 0)

IP Address: 72.30.142.253
User Agent: NING/1.0

Seems Yahoo is using this user agent, as well?

wilderness




msg:4530907
 4:12 am on Dec 25, 2012 (gmt 0)

IP Address: 72.30.142.253


I have this noted (and denied) as Inktomi Cache.

Please see the WayBack thread and reference to NOARCHIVE io the thread immediately below this thread.

not2easy




msg:4530910
 4:26 am on Dec 25, 2012 (gmt 0)

72.30.0.0 - 72.30.255.255 72.30.0.0/16
Listed as INKTOMI-BLK-5 maintained by Yahoo.

brokaddr




msg:4530911
 4:59 am on Dec 25, 2012 (gmt 0)

wilderness, this topic: [webmasterworld.com...] - I didn't see anything mentioned, specifically?

Pardon my ignorance if it's plain as day. I searched the NING/1.0 user-agent and this topic came up. Nothing about Wayback machine/noarchive did.
What's their relation?

wilderness




msg:4530917
 5:21 am on Dec 25, 2012 (gmt 0)

My apologies brokaddr.

My reference was to the perils of allowing SE's and others to "cache" pages as per the NOARCHIVE link provided by Bill and repeated by myself.

Slurp/Inktomi and Yahoo all cache pages and even send their bots on solitary pages requests (with full supporting files) for their cache.

It's a good idea to separate that valid SE bots from all the SE's auxiliary tools, and then only allow the valid bots.

brokaddr




msg:4530931
 7:21 am on Dec 25, 2012 (gmt 0)

It's a good idea to separate that valid SE bots from all the SE's auxiliary tools, and then only allow the valid bots.

I wasn't even aware Yahoo did that. Is there an easy way to decipher which is which?

wilderness




msg:4531104
 11:03 am on Dec 26, 2012 (gmt 0)

I'm not aware of any Yahoo crawlers, rather AFAIK Yahoo uses the crawls by MSN/Bing for their SERPS.

Somebody else may be able to provide IP's.

All I have documented since reactivation in February are Yahoo utilities, which I do not allow.
I haven't had any full crawls from anything that has identified itself itself as Yahoo, Slurp, or Inktomi since same reactivation.

keyplyr




msg:4531197
 8:35 pm on Dec 26, 2012 (gmt 0)


I'm not aware of any Yahoo crawlers

There are actually several different bots still run by Yahoo, mostly in Europe and Asia, but Inktomi and Slurp still hit my US site, possibly because I have a lot of inbound from Europe.

wilderness




msg:4531210
 9:22 pm on Dec 26, 2012 (gmt 0)

keyplr,
I'm not getting any full crawls, rather solitary page requests from a few select pages (with complete accompanying files), and the same pages are repeating.

keyplyr




msg:4531287
 2:51 am on Dec 27, 2012 (gmt 0)

I'm not getting full crawls either. Never did from the 2nd level bots. Although Slurp and Inktomi did full site crawls a couple years ago. They may have been re-assigned for other purposes.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved