homepage Welcome to WebmasterWorld Guest from 54.227.41.242
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

This 38 message thread spans 2 pages: 38 ( [1] 2 > >     
Blocking Monitoring Services
looking for list of ips.
Brett_Tabke




msg:400447
 2:36 pm on Apr 9, 2003 (gmt 0)

Does anyone have a comprehensive list of website monitoring service IP address?

I've got a few collected from the forum here, but I just thought maybe someone had a full list of these rogue unwanted bots.

At a minimum, do you know these 4:
www.aignes.com
www.watchthatpage.com
www.trackengine.com
www.infominder.com

Aside from blocking them, what kind of "fun" can be had via cloaking with these bots?

<added>

Website monitoring services take users. Why would a user visit the site if they can "monitor" it from elsewhere. That defeats mission critical branding, it defeats promotion efforts, it defeats advertising, and that defeats your sites goals.

When a user visits your site and does not find updated content, they may find content or advertising they have not been exposed too. It's like channel surfing - it is how visitors are exposed to content they may not have seen before.

On the technical side, if people don't visit a site, it also means you are not counted by page counters such as the search engine toolbars like Google, Yahoo, and Alexa. That in turn may hurt your search engine rankings.

By not actively blocking these monitors, you are allowing and endorsing the poaching of your visitors. Website monitors are worse than Gator too me. Atleast with Gator, they have to visit your site.

Aside from the approved bots and partnerships with search engines, we do not allow unauthorized programmed querying of the site.

 

Spannerworks




msg:400448
 2:58 pm on Apr 9, 2003 (gmt 0)

InternetSeer.com

But this one is fairly polite, only hitting the server once every couple of hours.

Can't quite see how you could cloak 'em. If you pretend your site was not there, you'd just generate lots of spam emails notifying you of their "service".

Brett_Tabke




msg:400449
 3:09 pm on Apr 9, 2003 (gmt 0)

My one thought on that is to send them the same page forever. Users use their service, and get stale pages.

jdMorgan




msg:400450
 3:13 pm on Apr 9, 2003 (gmt 0)

...Or use the current time and date to generate pseudo-random content, and send them something different every millisecond. Then, assuming they send e-mail notices, they'll spam their subscribers...

Must have had some "evil" coffee this A.M. ;)

Jim

wilderness




msg:400451
 3:39 pm on Apr 9, 2003 (gmt 0)

Jim,
Althogh I don't recall the reference I do recall it being mentioned of sort of redirecting a visitor to a sort of "null land" where the bot is held in space for a very long delay.

Any details or reference on that?
TIA

top5jamaica




msg:400452
 3:46 pm on Apr 9, 2003 (gmt 0)

internetseer is the worst. every hour almost on the hour and sometimes your requests for them to stop go ignored, even robots.txt! tired of sending emails to those guys.

wilderness




msg:400453
 3:52 pm on Apr 9, 2003 (gmt 0)

Yesterday I saw an interesting thread in Usent about copyight infringment.
Somebody provided this link (for US)
http ://www.copyright.gov/onlinesp/list/

It would be nice if some similar accepted compliance was in place for spidering.
It would sure clean up these pests.

jdMorgan




msg:400454
 3:56 pm on Apr 9, 2003 (gmt 0)

Don,

SSI include to slow down bad_bots.
<!--#exec cmd="sleep 20" -->

Jim

creative craig




msg:400455
 3:59 pm on Apr 9, 2003 (gmt 0)

I use InternetSeer and it only hits every hour on average for my site, I think its a good free service.

jdMorgan




msg:400456
 4:08 pm on Apr 9, 2003 (gmt 0)

I use InternetSeer on a few sites, too. But if you are not a subscriber, they access your site anyway, and send you a marketing e-mail pitch if your site goes down.

Has anyone tried 403ing them?

(I think this thread needs to be split - Brett is asking about "web site content update monitoring services" and we are apparently veering off into "server uptime monitoring services".)

Jim

wilderness




msg:400457
 4:10 pm on Apr 9, 2003 (gmt 0)

Jim
ONLY SSI no option in other modules?
TIA
Don

mayor




msg:400458
 4:22 pm on Apr 9, 2003 (gmt 0)

Two other abusers ...

Zeus
Turnitin

Markus




msg:400459
 4:29 pm on Apr 9, 2003 (gmt 0)

changedetection.com
changedetect.com

But they appear to be rather polite.

wilderness




msg:400460
 4:30 pm on Apr 9, 2003 (gmt 0)

Brett
A google turned up these:
1stMonitor Web Site Monitoring
Site Vigil
YourSiteUp.com
NodeBlue.net
Uptime100 Professional Website Monitoring
Alertra Web Site Monitoring
WatchDog
Affiliate Selling and Marketing Software
Atomic Watch
WebSitePulse
Peer-to-Peer distributed
InternetSeer
SiteProbe
WatchMouse
NetWhistle
PingAlink
Elk Fork
Uptime100

I also used to get pestered by a Canadian one which stopped on the first dead link it hit and stated MANY dead many links or some such nonsense.
It was named twentyfourseven or something similar.

daisho




msg:400461
 4:53 pm on Apr 9, 2003 (gmt 0)

In PHP <?sleep(20);?>

Every script language will have something like that.

A guy by the name of Gary Keith keeps an updated browscap file and he has a flag for web strippers. Along with search engines and every obscure web browser out there.

As links are not permitted to a google for gary keith browscap.ini

I use this all the time in my PHP scripts using the PHP get_browser function.

carfac




msg:400462
 4:56 pm on Apr 9, 2003 (gmt 0)

Hi:

Off the original topic, but here goes...

>>> Has anyone tried 403ing them?

I do. They go away after a while.

>>> redirecting a visitor to a sort of "null land"

Send them to a "black Hole." simple mod-rewrite to send certian offenders (requests for default.ida, formmail.cgi, UA of internetseer, referer of iaee.org, a whole bunch of stuff you do not want to bother with, and you so not want to bother you!) and send them to a non-existant IP address. That will hang each request for 20 seconds, and all your server does is send out a redirect.

I have a post on this in Webmaster General, and someone there suggested an IP range that is pretty good!

dave

Brett_Tabke




msg:400463
 4:58 pm on Apr 9, 2003 (gmt 0)

IP's baby - who's collected ips?

fiestagirl




msg:400464
 5:17 pm on Apr 9, 2003 (gmt 0)

FairAd Client
217.226.85.248
tracerlock
209.61.182.37
Cyveillance
63.148.99.224 - 63.148.99.255
65.118.41.192-65.118.41.223
TurnitinBot
64.140.48.25-27
NameProtect
12.148.196.128 - 12.148.196.255
Linkwalker
209.167.50.16 - 209.167.50.31
BDFetch
204.92.59.0 - 204.92.59.255

frankray




msg:400465
 5:30 pm on Apr 9, 2003 (gmt 0)

I know a lot about web page monitoring services.

Some of you are mixing up website monitoring services and webpage monitoring services.

Brett asked about webpage monitoring services which are entirely different than a robot that spiders/downloads your entire website over and over again.

Web page monitoring services are not:
> email harvesters
> domain spiders or
> trademark search tools

Web page monitoring services allow your visitors to check a single web page on your site once a day.

So web page monitoring is not in the same class as something like internetseer, zeus or tunitin.

Brett: Your list of web page monitoring services is near complete if you add changedetect.com and changedetection.com (similar names, but not related) as Markus pointed out:

Markus said:
changedetect.com changedetection.com
But they appear to be rather polite.

And ChangeDetect is indeed a polite web page monitoring service that does not negatively impact bandwidth. Here is a quote from the website:

The ChangeDetect automated page monitor tool is a "good bot" (robot). No matter how many users monitor a single page on your website, your web server opens only one session per page. ChangeDetect runs only once a day to monitor the page.

http://www.changedetect.com/?page=reduce-bandwidth-website

Why would someone block a tool like this?... Brett?

cfx211




msg:400466
 5:55 pm on Apr 9, 2003 (gmt 0)

I think that Turnitin is a plagiarism fighting service.

I'd love to drop internetseer. Those guys are a pain in the neck, especially for really small new sites. Nothing more annoying that opening up a 7kb log file and seeing half of it is internetseer and nimda scans...

wilderness




msg:400467
 6:03 pm on Apr 9, 2003 (gmt 0)

<snip>Why would someone block a tool like this?... Brett?</snip>

Frank
Change detection comes as a accessory to my sitemeter.
I have a particular page in which the content is only valid from late May to late October.
I mistakenly added a page date which resolves a new date daily. The Change detection send notification and the visitor views the page EVERY DAY.
I attempted removing "new date daily" from the html and it had no effect on change detection or this visitor. Guess there are exceptions to everything.

Rugles




msg:400468
 6:13 pm on Apr 9, 2003 (gmt 0)

I have seen three different incarnations of turnitin bots over the last couple of years. They do obey the robots.txt file if anyone cares.

ai archiver can chew up a lot of bandwidth if you do not ban it. However you will see your site in the wayback back machine.

Sorry Brett, slightly off topic.

daisho




msg:400469
 6:30 pm on Apr 9, 2003 (gmt 0)

What would the need be for IP Addresses? If there is a good purpose I'll whip up a small script to start loging the IP and UA for UA's that are considered strippers.

Then we can parse the log to get the IPs.

But I don't see how this is better than just checking the UA in the first place since strippers may be run from an xDSL account where the IP changes. The next person that gets that IP may be a valid user. I wouldn't want to block him.

Brett_Tabke




msg:400470
 6:33 pm on Apr 9, 2003 (gmt 0)

see what I added to the lead post above.

Ip's are needed to block the bots in an htaccess file. We will add them to the close to perfect htaccess ban list [webmasterworld.com] that has spread across the net the last year.

daisho




msg:400471
 6:49 pm on Apr 9, 2003 (gmt 0)

Ok I'll spend some time to write a little script that will process an apache log using browscap.ini to find IP addresses. I'll post it when there is something to show.

frankray




msg:400472
 6:54 pm on Apr 9, 2003 (gmt 0)

Brett,

I noted your update to the original message. Interesting fears that do not really match up with my experiences.

Actually a web page monitoring service actually increased my site traffic and generated repeat visitors.

Have you seen those alert services or "monitor this page" forms on a content page?

Sure you have... because you even have an alert service for this forum. ;) Well a page monitoring service is really nothing different.

Anyway I do not have a forum, but I was able to use a web page monitoring service to replace a giant mailing list. I now push my content/news to a targetted audience with zero spamming and no newsletter publishing.

However since you have a zero-tolerance policy toward automated programs and already have an alert service for this forum, I somewhat understand your wanting to block any and all page monitoring services.

For me though, I consider them a good way to generate repeat visitors.

Frank

[edited by: frankray at 7:40 pm (utc) on April 9, 2003]

Mikkel Svendsen




msg:400473
 7:24 pm on Apr 9, 2003 (gmt 0)

As much as I understand your concern, Brett, I have a few problems with this.

1) These programs (for the most part) is helping users solve a huge problem on the Net: Keeping up with all the information. Software that are very helpfull to users tend to stick around for a long time - even if hole webmaster communities or governments for that sake try to block it. Just look at Gator, KaZaA etc. Many people, companies and governments have done all they can to close this down but it's still here becauser users want it so much.

2) I am not exactly sure what programs and services it is you want to block? Is it all services that request your site outside of a "normal" browser? Is it programs that grab only part of your page? If so, one very large new project is going to course you a great deal of headache: The new Lycos Europe personalization and "clipping" feature. I think you can only see it on Min.Jubii.dk (Danish) now but it will soon be rolled out all over Lycos Europe. With this tool you can "clip" any part of any website you wish and show all the clippings on one page. The full page is downloaded, using your local IP and browser agent name - but only the part you selected is shown.

Personally I love this service, and I use it to monitor news-sites and forums that I frequently use. It gives me access to more information at less time and help me pick the right news to read on and the right discussions to participate in. In fact, this thread was one that "poped up" in my personalized min.jubii.dk portal - who knows when I would have got the time to go here and see it :)

daisho




msg:400474
 7:59 pm on Apr 9, 2003 (gmt 0)

Ok I've whipped up a little PHP Shell Script that will process an apache log and find Unique IPs with UA's that my browscap.ini considers "Strippers".

Sticky me if you would like the url for the script along with my output file.

My log isn't finished processing yet and I have found 199 unique IPs so far...

Tapolyai




msg:400475
 10:12 pm on Apr 9, 2003 (gmt 0)

Mikkel_Svendsen,

Gator and KazaA are "still here" because:
a. The users don't even know it.
b. The user knows it but it cannot be uninstalled with regular methods.

On of the losses for web owners with off-line readers is the loss of page views. That is - if a group of people look at cashed pages, my advertisements, and other links do not register on my web site. In essence the caching is depriving me from income. An other one is registering actual visits, which, as we all know, a key selling factor to advertisers.

I believe Brett is concerned about non-friendly systems, that does not take the visited web site into consideration.

JudgeJeffries




msg:400476
 10:37 pm on Apr 9, 2003 (gmt 0)
Is this one of them.
http://www.seventwentyfour.com
I find it very useful. It tells me about broken links for free.
This 38 message thread spans 2 pages: 38 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved