| 2:58 pm on Apr 9, 2003 (gmt 0)|
But this one is fairly polite, only hitting the server once every couple of hours.
Can't quite see how you could cloak 'em. If you pretend your site was not there, you'd just generate lots of spam emails notifying you of their "service".
| 3:09 pm on Apr 9, 2003 (gmt 0)|
My one thought on that is to send them the same page forever. Users use their service, and get stale pages.
| 3:13 pm on Apr 9, 2003 (gmt 0)|
...Or use the current time and date to generate pseudo-random content, and send them something different every millisecond. Then, assuming they send e-mail notices, they'll spam their subscribers...
Must have had some "evil" coffee this A.M. ;)
| 3:39 pm on Apr 9, 2003 (gmt 0)|
Althogh I don't recall the reference I do recall it being mentioned of sort of redirecting a visitor to a sort of "null land" where the bot is held in space for a very long delay.
Any details or reference on that?
| 3:46 pm on Apr 9, 2003 (gmt 0)|
internetseer is the worst. every hour almost on the hour and sometimes your requests for them to stop go ignored, even robots.txt! tired of sending emails to those guys.
| 3:52 pm on Apr 9, 2003 (gmt 0)|
Yesterday I saw an interesting thread in Usent about copyight infringment.
Somebody provided this link (for US)
It would be nice if some similar accepted compliance was in place for spidering.
It would sure clean up these pests.
| 3:56 pm on Apr 9, 2003 (gmt 0)|
SSI include to slow down bad_bots.
<!--#exec cmd="sleep 20" -->
| 3:59 pm on Apr 9, 2003 (gmt 0)|
I use InternetSeer and it only hits every hour on average for my site, I think its a good free service.
| 4:08 pm on Apr 9, 2003 (gmt 0)|
I use InternetSeer on a few sites, too. But if you are not a subscriber, they access your site anyway, and send you a marketing e-mail pitch if your site goes down.
Has anyone tried 403ing them?
(I think this thread needs to be split - Brett is asking about "web site content update monitoring services" and we are apparently veering off into "server uptime monitoring services".)
| 4:10 pm on Apr 9, 2003 (gmt 0)|
ONLY SSI no option in other modules?
| 4:22 pm on Apr 9, 2003 (gmt 0)|
Two other abusers ...
| 4:29 pm on Apr 9, 2003 (gmt 0)|
But they appear to be rather polite.
| 4:30 pm on Apr 9, 2003 (gmt 0)|
A google turned up these:
1stMonitor Web Site Monitoring
Uptime100 Professional Website Monitoring
Alertra Web Site Monitoring
Affiliate Selling and Marketing Software
I also used to get pestered by a Canadian one which stopped on the first dead link it hit and stated MANY dead many links or some such nonsense.
It was named twentyfourseven or something similar.
| 4:53 pm on Apr 9, 2003 (gmt 0)|
In PHP <?sleep(20);?>
Every script language will have something like that.
A guy by the name of Gary Keith keeps an updated browscap file and he has a flag for web strippers. Along with search engines and every obscure web browser out there.
As links are not permitted to a google for gary keith browscap.ini
I use this all the time in my PHP scripts using the PHP get_browser function.
| 4:56 pm on Apr 9, 2003 (gmt 0)|
Off the original topic, but here goes...
>>> Has anyone tried 403ing them?
I do. They go away after a while.
>>> redirecting a visitor to a sort of "null land"
Send them to a "black Hole." simple mod-rewrite to send certian offenders (requests for default.ida, formmail.cgi, UA of internetseer, referer of iaee.org, a whole bunch of stuff you do not want to bother with, and you so not want to bother you!) and send them to a non-existant IP address. That will hang each request for 20 seconds, and all your server does is send out a redirect.
I have a post on this in Webmaster General, and someone there suggested an IP range that is pretty good!
| 4:58 pm on Apr 9, 2003 (gmt 0)|
IP's baby - who's collected ips?
| 5:17 pm on Apr 9, 2003 (gmt 0)|
126.96.36.199 - 188.8.131.52
184.108.40.206 - 220.127.116.11
18.104.22.168 - 22.214.171.124
126.96.36.199 - 188.8.131.52
| 5:30 pm on Apr 9, 2003 (gmt 0)|
I know a lot about web page monitoring services.
Some of you are mixing up website monitoring services and webpage monitoring services.
Brett asked about webpage monitoring services which are entirely different than a robot that spiders/downloads your entire website over and over again.
Web page monitoring services are not:
> email harvesters
> domain spiders or
> trademark search tools
Web page monitoring services allow your visitors to check a single web page on your site once a day.
So web page monitoring is not in the same class as something like internetseer, zeus or tunitin.
Brett: Your list of web page monitoring services is near complete if you add changedetect.com and changedetection.com (similar names, but not related) as Markus pointed out:
|Markus said: |
But they appear to be rather polite.
And ChangeDetect is indeed a polite web page monitoring service that does not negatively impact bandwidth. Here is a quote from the website:
|The ChangeDetect automated page monitor tool is a "good bot" (robot). No matter how many users monitor a single page on your website, your web server opens only one session per page. ChangeDetect runs only once a day to monitor the page. |
Why would someone block a tool like this?... Brett?
| 5:55 pm on Apr 9, 2003 (gmt 0)|
I think that Turnitin is a plagiarism fighting service.
I'd love to drop internetseer. Those guys are a pain in the neck, especially for really small new sites. Nothing more annoying that opening up a 7kb log file and seeing half of it is internetseer and nimda scans...
| 6:03 pm on Apr 9, 2003 (gmt 0)|
<snip>Why would someone block a tool like this?... Brett?</snip>
Change detection comes as a accessory to my sitemeter.
I have a particular page in which the content is only valid from late May to late October.
I mistakenly added a page date which resolves a new date daily. The Change detection send notification and the visitor views the page EVERY DAY.
I attempted removing "new date daily" from the html and it had no effect on change detection or this visitor. Guess there are exceptions to everything.
| 6:13 pm on Apr 9, 2003 (gmt 0)|
I have seen three different incarnations of turnitin bots over the last couple of years. They do obey the robots.txt file if anyone cares.
ai archiver can chew up a lot of bandwidth if you do not ban it. However you will see your site in the wayback back machine.
Sorry Brett, slightly off topic.
| 6:30 pm on Apr 9, 2003 (gmt 0)|
What would the need be for IP Addresses? If there is a good purpose I'll whip up a small script to start loging the IP and UA for UA's that are considered strippers.
Then we can parse the log to get the IPs.
But I don't see how this is better than just checking the UA in the first place since strippers may be run from an xDSL account where the IP changes. The next person that gets that IP may be a valid user. I wouldn't want to block him.
| 6:33 pm on Apr 9, 2003 (gmt 0)|
see what I added to the lead post above.
Ip's are needed to block the bots in an htaccess file. We will add them to the close to perfect htaccess ban list [webmasterworld.com] that has spread across the net the last year.
| 6:49 pm on Apr 9, 2003 (gmt 0)|
Ok I'll spend some time to write a little script that will process an apache log using browscap.ini to find IP addresses. I'll post it when there is something to show.
| 6:54 pm on Apr 9, 2003 (gmt 0)|
I noted your update to the original message. Interesting fears that do not really match up with my experiences.
Actually a web page monitoring service actually increased my site traffic and generated repeat visitors.
Have you seen those alert services or "monitor this page" forms on a content page?
Sure you have... because you even have an alert service for this forum. ;) Well a page monitoring service is really nothing different.
Anyway I do not have a forum, but I was able to use a web page monitoring service to replace a giant mailing list. I now push my content/news to a targetted audience with zero spamming and no newsletter publishing.
However since you have a zero-tolerance policy toward automated programs and already have an alert service for this forum, I somewhat understand your wanting to block any and all page monitoring services.
For me though, I consider them a good way to generate repeat visitors.
[edited by: frankray at 7:40 pm (utc) on April 9, 2003]
| 7:24 pm on Apr 9, 2003 (gmt 0)|
As much as I understand your concern, Brett, I have a few problems with this.
1) These programs (for the most part) is helping users solve a huge problem on the Net: Keeping up with all the information. Software that are very helpfull to users tend to stick around for a long time - even if hole webmaster communities or governments for that sake try to block it. Just look at Gator, KaZaA etc. Many people, companies and governments have done all they can to close this down but it's still here becauser users want it so much.
2) I am not exactly sure what programs and services it is you want to block? Is it all services that request your site outside of a "normal" browser? Is it programs that grab only part of your page? If so, one very large new project is going to course you a great deal of headache: The new Lycos Europe personalization and "clipping" feature. I think you can only see it on Min.Jubii.dk (Danish) now but it will soon be rolled out all over Lycos Europe. With this tool you can "clip" any part of any website you wish and show all the clippings on one page. The full page is downloaded, using your local IP and browser agent name - but only the part you selected is shown.
Personally I love this service, and I use it to monitor news-sites and forums that I frequently use. It gives me access to more information at less time and help me pick the right news to read on and the right discussions to participate in. In fact, this thread was one that "poped up" in my personalized min.jubii.dk portal - who knows when I would have got the time to go here and see it :)
| 7:59 pm on Apr 9, 2003 (gmt 0)|
Ok I've whipped up a little PHP Shell Script that will process an apache log and find Unique IPs with UA's that my browscap.ini considers "Strippers".
Sticky me if you would like the url for the script along with my output file.
My log isn't finished processing yet and I have found 199 unique IPs so far...
| 10:12 pm on Apr 9, 2003 (gmt 0)|
Gator and KazaA are "still here" because:
a. The users don't even know it.
b. The user knows it but it cannot be uninstalled with regular methods.
On of the losses for web owners with off-line readers is the loss of page views. That is - if a group of people look at cashed pages, my advertisements, and other links do not register on my web site. In essence the caching is depriving me from income. An other one is registering actual visits, which, as we all know, a key selling factor to advertisers.
I believe Brett is concerned about non-friendly systems, that does not take the visited web site into consideration.
| 10:37 pm on Apr 9, 2003 (gmt 0)|
|Is this one of them.|
I find it very useful. It tells me about broken links for free.
| This 38 message thread spans 2 pages: 38 (  2 ) > > |