homepage Welcome to WebmasterWorld Guest from 54.198.224.121
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
is there a data centre ip list
topr8




msg:3887517
 9:39 pm on Apr 7, 2009 (gmt 0)

anyone know if there is a maintained list of ip addresses allocated to data centres that are used exclusively by servers (eg hosting companies)...

which i'm rather assuming that i can block at the firewall level, as any request from one of these ip's would most likely be a scraper or unwanted bot (i can always punch holes in the firewall for exceptions if i wanted)

i'm happy to subscribe to a service if a free list isn't available.

 

wilderness




msg:3887990
 1:39 pm on Apr 8, 2009 (gmt 0)

Could you expand on the term "data centers"?

There are different types of backbone providers.

Some have subnet ranges clearly assigned to the registries, while others (Level3, Road Runner, Verizon and Comcast; a few that come to mind)are actual data centers in which the IP range works behind the scenes (at least so to speak).
If there is a method to deny traffic from IP that are not exposed within website visitor logs (i. e., tracert or ping)
than I'm not aware of it (other may be).

In the event that your definition of "data centers" simply means "internet provider (IP) ranges" and/or "User Agents (UA)" aligned in unison?
There are not any recent lists being accumulated in this forum. Close to perfect [webmasterworld.com] may help you.

I've a list of colo's that I began accumulating since the mid-2006, however I don't see how that would be beneficial to anybody else, and in addition the reference files contains actual logs from my websites, which I'm not too keen on providing to an unknown party. NOR, would I be interested in providing a public list of colo's (it's also been my belief that simply mentioning their name, in effect provides free advertising.)

So please expand on precisely what your looking for?

Don

blend27




msg:3888439
 9:31 pm on Apr 8, 2009 (gmt 0)

topr8,

Exposing a list like that to public would raise a lot of Security Issues for Hosting companies.

Maintaining that same list is a very time consuming feature in the beginning, I am sure Don will agree with me on that one. I started my list at the beginning of 2006, up to 1500 ranges - Ripe and Arin, some Apnic I was doing some statistics a month ago and found out that utilizing Headers Information to determine that info is much more effective.

1500 ranges ads about 1.5 seconds(SQL Round trip) to that response, where headers filter is 0.01 ms.

Blend27

topr8




msg:3888888
 11:47 am on Apr 9, 2009 (gmt 0)

ok, thanks for your input guys.

basically i was looking for for ip ranges controlled by hosting companies/colo's, which i could then block from accessing my sites.

i assume, perhaps wrongly, that any requests from such places would only be from scrapers and robots that i don't want - i could always make exceptions.

i have a small collection of ip ranges which i have collected from my own logs from over the years, so i guess keeping on making my own collection is the way to go.

GaryK




msg:3889116
 5:15 pm on Apr 9, 2009 (gmt 0)

i assume, perhaps wrongly, that any requests from such places would only be from scrapers and robots that i don't want - i could always make exceptions.

They're not all bad, but certainly a large percentage of them are. Still, except for Amazon I look at them all on a case-by-case basis. Cause if I ban them all there's no way for me to see which ones should be whitelisted. Amazon has proven itself over and over again to host nothing but scum.

tangor




msg:3890399
 6:42 am on Apr 11, 2009 (gmt 0)

Check your logs (I know, preaching to the choir!) and ban the ones that are offensive, ignore the less objectionable, and forget about the one-offs who hit from time to time. Life is too short.

Megaclinium




msg:3919585
 4:46 am on May 25, 2009 (gmt 0)

as tangor mentions, check your logs. I block individual IPs that scrape then immediately check if they are coming from a hosting company. If possible I goto the website of the company if I can find it and often you can see they don't offer services to individuals at home.

then I add the whole range to my deny list.
For some bots that resolve to provinces of large countries that are not too quick to take down bots I block the whole network range. If I see a scraper come back after ban from a dift IP e.g. at a small ISP in smaller developing countries, I lose no sleep over banning that range.

I've seen chinese major legit supposedly search engine bots where apnic won't resolve to the company name (resolves only to province telecom network). If they as a major company are too stupid to have their IP address range registered with the internics then I have no probs deep sixing them. Off with their heads, screamed the Red Queen! :)

And thanks someone else for pointing out what renlifangbot means. I thought m$ botmasters had suddenly decided to watch wolf or shark nature videos or something.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved