Welcome to WebmasterWorld Guest from 220.127.116.11
I've got a few collected from the forum here, but I just thought maybe someone had a full list of these rogue unwanted bots.
At a minimum, do you know these 4:
Aside from blocking them, what kind of "fun" can be had via cloaking with these bots?
Website monitoring services take users. Why would a user visit the site if they can "monitor" it from elsewhere. That defeats mission critical branding, it defeats promotion efforts, it defeats advertising, and that defeats your sites goals.
When a user visits your site and does not find updated content, they may find content or advertising they have not been exposed too. It's like channel surfing - it is how visitors are exposed to content they may not have seen before.
On the technical side, if people don't visit a site, it also means you are not counted by page counters such as the search engine toolbars like Google, Yahoo, and Alexa. That in turn may hurt your search engine rankings.
By not actively blocking these monitors, you are allowing and endorsing the poaching of your visitors. Website monitors are worse than Gator too me. Atleast with Gator, they have to visit your site.
Aside from the approved bots and partnerships with search engines, we do not allow unauthorized programmed querying of the site.
But this one is fairly polite, only hitting the server once every couple of hours.
Can't quite see how you could cloak 'em. If you pretend your site was not there, you'd just generate lots of spam emails notifying you of their "service".
Must have had some "evil" coffee this A.M. ;)
It would be nice if some similar accepted compliance was in place for spidering.
It would sure clean up these pests.
Has anyone tried 403ing them?
(I think this thread needs to be split - Brett is asking about "web site content update monitoring services" and we are apparently veering off into "server uptime monitoring services".)
I also used to get pestered by a Canadian one which stopped on the first dead link it hit and stated MANY dead many links or some such nonsense.
It was named twentyfourseven or something similar.
Every script language will have something like that.
A guy by the name of Gary Keith keeps an updated browscap file and he has a flag for web strippers. Along with search engines and every obscure web browser out there.
As links are not permitted to a google for gary keith browscap.ini
I use this all the time in my PHP scripts using the PHP get_browser function.
Off the original topic, but here goes...
>>> Has anyone tried 403ing them?
I do. They go away after a while.
>>> redirecting a visitor to a sort of "null land"
Send them to a "black Hole." simple mod-rewrite to send certian offenders (requests for default.ida, formmail.cgi, UA of internetseer, referer of iaee.org, a whole bunch of stuff you do not want to bother with, and you so not want to bother you!) and send them to a non-existant IP address. That will hang each request for 20 seconds, and all your server does is send out a redirect.
I have a post on this in Webmaster General, and someone there suggested an IP range that is pretty good!
Some of you are mixing up website monitoring services and webpage monitoring services.
Brett asked about webpage monitoring services which are entirely different than a robot that spiders/downloads your entire website over and over again.
Web page monitoring services are not:
> email harvesters
> domain spiders or
> trademark search tools
Web page monitoring services allow your visitors to check a single web page on your site once a day.
So web page monitoring is not in the same class as something like internetseer, zeus or tunitin.
Brett: Your list of web page monitoring services is near complete if you add changedetect.com and changedetection.com (similar names, but not related) as Markus pointed out:
But they appear to be rather polite.
And ChangeDetect is indeed a polite web page monitoring service that does not negatively impact bandwidth. Here is a quote from the website:
The ChangeDetect automated page monitor tool is a "good bot" (robot). No matter how many users monitor a single page on your website, your web server opens only one session per page. ChangeDetect runs only once a day to monitor the page.
Why would someone block a tool like this?... Brett?
I'd love to drop internetseer. Those guys are a pain in the neck, especially for really small new sites. Nothing more annoying that opening up a 7kb log file and seeing half of it is internetseer and nimda scans...
Change detection comes as a accessory to my sitemeter.
I have a particular page in which the content is only valid from late May to late October.
I mistakenly added a page date which resolves a new date daily. The Change detection send notification and the visitor views the page EVERY DAY.
I attempted removing "new date daily" from the html and it had no effect on change detection or this visitor. Guess there are exceptions to everything.
ai archiver can chew up a lot of bandwidth if you do not ban it. However you will see your site in the wayback back machine.
Sorry Brett, slightly off topic.
Then we can parse the log to get the IPs.
But I don't see how this is better than just checking the UA in the first place since strippers may be run from an xDSL account where the IP changes. The next person that gets that IP may be a valid user. I wouldn't want to block him.
Ip's are needed to block the bots in an htaccess file. We will add them to the close to perfect htaccess ban list [webmasterworld.com] that has spread across the net the last year.
I noted your update to the original message. Interesting fears that do not really match up with my experiences.
Actually a web page monitoring service actually increased my site traffic and generated repeat visitors.
Have you seen those alert services or "monitor this page" forms on a content page?
Sure you have... because you even have an alert service for this forum. ;) Well a page monitoring service is really nothing different.
Anyway I do not have a forum, but I was able to use a web page monitoring service to replace a giant mailing list. I now push my content/news to a targetted audience with zero spamming and no newsletter publishing.
However since you have a zero-tolerance policy toward automated programs and already have an alert service for this forum, I somewhat understand your wanting to block any and all page monitoring services.
For me though, I consider them a good way to generate repeat visitors.
[edited by: frankray at 7:40 pm (utc) on April 9, 2003]
1) These programs (for the most part) is helping users solve a huge problem on the Net: Keeping up with all the information. Software that are very helpfull to users tend to stick around for a long time - even if hole webmaster communities or governments for that sake try to block it. Just look at Gator, KaZaA etc. Many people, companies and governments have done all they can to close this down but it's still here becauser users want it so much.
2) I am not exactly sure what programs and services it is you want to block? Is it all services that request your site outside of a "normal" browser? Is it programs that grab only part of your page? If so, one very large new project is going to course you a great deal of headache: The new Lycos Europe personalization and "clipping" feature. I think you can only see it on Min.Jubii.dk (Danish) now but it will soon be rolled out all over Lycos Europe. With this tool you can "clip" any part of any website you wish and show all the clippings on one page. The full page is downloaded, using your local IP and browser agent name - but only the part you selected is shown.
Personally I love this service, and I use it to monitor news-sites and forums that I frequently use. It gives me access to more information at less time and help me pick the right news to read on and the right discussions to participate in. In fact, this thread was one that "poped up" in my personalized min.jubii.dk portal - who knows when I would have got the time to go here and see it :)
Sticky me if you would like the url for the script along with my output file.
My log isn't finished processing yet and I have found 199 unique IPs so far...
Gator and KazaA are "still here" because:
a. The users don't even know it.
b. The user knows it but it cannot be uninstalled with regular methods.
On of the losses for web owners with off-line readers is the loss of page views. That is - if a group of people look at cashed pages, my advertisements, and other links do not register on my web site. In essence the caching is depriving me from income. An other one is registering actual visits, which, as we all know, a key selling factor to advertisers.
I believe Brett is concerned about non-friendly systems, that does not take the visited web site into consideration.