Forum Moderators: open
multiBlocker browser - IP blocker for Spam, Fraud + Snoop Protection
Is this an other vendor trying to sell garbage or a real agent?
As for the product... uhhhh.. $99 for configuring your .htaccess - Why didn't I think of that business model before?
So is it advertising? Yes and no: we are scanning the whole Internet in regular cycles for site accessibility and response rate checks. Very interesting data, btw - gives you an indication of how many webmasters are actually double checking logs for unknown referrers, typical response rates (some people will take up to three months to react), stats software distribution (usually, though not always, discernible from their referrers), etc. (No personalized data collected, only aggregated numbers.) You won't get that sort of information if your log footprint features something boring like "hello world".
Again, this is a mere accessibility check - no whacking job, no attempt to access restricted password protected content, no proxy hijacking, no harvesting of email addresses or personal data, no intrusion of any kind. Also, to minimize on bandwidth, contact is effected only once per total Internet run. Moreover, to keep matters as transparent as possible, our spider leaves a clear footprint, indicating where it comes from.
And yes, volatilegx, your multiBlocker(TM) analysis is quite precise, sleep cycling and all.
Key_Master: I'm not sure your assessment is correct. If you know of a product offering similar features on hotscripts (or elsewhere for that matter) for free, I'd sure appreciate it if you could point it out to us.
Anyway, no need to go for IP blocking yourself: if you object to your IP(s) being spidered that way, just drop me a sticky mail and we'll exclude it (them) from our subsequent runs.
I personally have a "Terms" which prohibits any visitor which use my resources for their commercial and/or revenue.
Of course the bots don't bother reading "terms" and as a result they are denied.
I'm afraid that includes fantomaster.
That's what robots.txt is there for. If you want to claim that you're running a legitimate bot, then you probably should use that. There's no way that webmasters can individually notify all spider operators that they don't want robots fetching pages from their sites. It's your job to make sure that you're welcome.
That's true on the surface. But then, you shoudn't be using "various UserAgents" for a bot that always does the same thing anyway. Stick to one unique signature bot name (and place any variable info into parens, if you must). You should also respect the "User-Agent: *" entries in the robots.txt files, whatever your UA at that time may be.
it's a mere header call
That's not what I'm seeing in my log files. Are you sure you told your programmers that you only want to request headers? ;)
In fact, if you actually did what you're now just saying here, then that would reduce the gravity of what I'm criticizing quite a bit.
Personally, I don't have much respect for spiders that leave invalid referrers (aka advertisements).
That's true on the surface. But then, you shoudn't be using "various UserAgents" for a bot that always does the same thing anyway.
Technically easily possible, of course, but it would seriously skew the results because of the response time lag already mentioned.
You should also respect the "User-Agent: *" entries in the robots.txt files, whatever your UA at that time may be.
Again, we never hit anything but the index page. You're not suggesting that you've blocked your site for any UA out there, or am I misreading you?
it's a mere header call
That's not what I'm seeing in my log files.
Hm, I'll check this with our tech dept. and I won't rule out a glitch without having analyzed the matter more thoroughly, but that's what was intended in the first place at least.
Key_Master:
a link to info about your spider on the site would probably have alleviated some of the dissent given here. Plus, you've got to admit spiders crawling from dialup IP's do tend to throw up red flags.
You're right about that spider info link, and we're going to implement it next week.
As for the red flags, I'm not so sure: whackers will usually crawl from dialups as well. While we don't particularly like them, either, we're not alarmed when they come. After all, you've got judge a spider by its overall behavior - a dialup IP may be a first indicator of something fishy going on, perhaps, but it's not as if our spiders were running rampant on people's
sites or anything.
As for "invalid referrers", I beg to differ. Nothing invalid about it, as far as I can see. And yes, there's advertising involved as well (as stated above), but again you can't really expect any decent response data unless you place some bait in people's way.
As for the "IP blocker for Spam, Fraud + Snoop Protection" string, we've deleted it now because we did get quite a few mails from concerned webmasters who thought we were actively blocking them or something, possibly commissioned to do so by competitors of theirs - this, of course, was quite unintentional. (An interesting reaction nevertheless ...)
Tell you what: I'll be happy to either pos a list of our spiders' UAs here or to a URL where you can download it, if you're all game. That way you folks can block them at your own discretion if you don't want us to exclude your IPs by default.
Just give me till Monday to do so as I'm currently busy holding an in house weekend seminar.
Marcia:
That includes me, I clicked on the link and joined up. It wasn't this one, it was another. You've got some industrial strength marketing there, good stuff.
Thanks for your appreciation - and for signing up! Nice to talk to someone belonging to that elite of 0.1% of webmasters who actually take their access logs seriously enough to check them out. :)
I'll be happy to either pos a list of our spiders' UAs here or to a URL where you can download it, if you're all game. That way you folks can block them at your own discretion if you don't want us to exclude your IPs by default.
Very kind. Should you put up such a list, I certainly wouldn't ban your spiders ( I only ban bandwidth leeches and email scanners ), but I would like to be able to add them to my list of known UAs.
Thanks :)
I can't quite follow you here. You've been using "Mozilla/4.0 (fantomBrowser)" for a long time now [with a few "(stealthBrowser)" thrown in], apparently without compromising your results. I may have missed some others, but what exactly do you gain by changing to "MultiBlocker browser" now, other than advertizing exposure?
Was or is contextbase/polyserve the same company, btw?
or am I misreading you?
There are sites out there (not mine) that don't want to get crawled by automated processes at all. There's any number of good reasons for such a wish, but that's another discussion. Those sites use the "User-Agent: *" entry in their robots.txt. That's not a block, but a polite request, and every well behaved robot should respect it. I assume you are familiar with the robot exclusion standards?
Hm, I'll check this with our tech dept. and I won't rule out a glitch without having analyzed the matter more thoroughly, but that's what was intended in the first place at least.
Your robot has exclusively been using GET requests at least since early in 2000. Microsoft would call it a "feature"... ;)
If I have an even remotely realistic idea about the volume of your spidering, then using HEAD requests will reduce your bandwidth requirements by a tremendous amount. Not to mention that your crawling runs might finish in a fraction of the time.
>>> Your robot has exclusively been using GET requests at least since early in 2000.
My guess would be that fantommaster is looking for sites that use fantom scripts. But it's only a guess.
Now compare that to the estimated 20 bots a day that hit here without every asking or leaving a trail back to themselves. Unregulated bots, are the #1 concern of many webmasters. When you compare system resources used, a single page view by a random bot isn't even mentionable compared to the thousands-millions of views by rogue agressive bots many of us see.
I also take partial blame/credit for giving fanto the idea a few years ago. However, that was back when you could still get 2-3% click back rates. It's nice stealth promotion.
However, that was back when you could still get 2-3% click back rates.
Interestingly enough we do get about 2% click back from ODP listed sites. It's the others which push the stats down.
Click-thru rates on the target pages, in case anyone's interested, range between 18 and 22.5%.
It's nice stealth promotion.
So it is. However, there's a lot more to it as well:
1. It helps us check overall site accesibility.
2. It helps us check click back rates, which constitutes valuable marketing data.
3. We're currently building a database of international domains, assigning them their proper IPs. This may or may not turn into a lookup service some day, we're still evaluating.
4. It facilitates finding open (non pw protected) stats and access logs pages out there - many, many, many. :)
5. It increases link pop via all those open log pages esp. with FAST/Alltheweb and, to a lesser extent, Google.
6. Finally, we are seriously looking into the possibility of building an SEO industry friendly truly international search engine (submissions for SEO agencies only - both free and commercial, the rest of the Web to be crawled automatically), allowing for regulated cloaking, doorways, redirects, etc.
This project is as yet tentative and it will probably require some pretty heavy outside funding, which in turn requires exactly the sort of hard statistical data we're collecting this way.
volatilegx: We've set up that page listing our spiders' UserAgents here - [fantomaster.com ]. We may update this list on occasion, but most probably not within the year.
bird: what exactly do you gain by changing to "MultiBlocker browser" now, other than advertizing exposure?
Because these are repeat runs hitting all sites, we rotate different footprint entries to determine a more precise statistical average.
Was or is contextbase/polyserve the same company, btw?
No, contextbase is both one of our domains (contextbase.com) and the name of our dedicated US based server. Polyserve is a different company.
Your robot has exclusively been using GET requests at least since early in 2000. Microsoft would call it a "feature"...
I checked it with our tech dept. and you're right - my mistake, and sorry for seeming misleading: we've been using the GET command so far and were actually in the process of implementing header only calls today (should be finalized in about half an hour's time or so). This again is a check mode feature: checking response rates in dependence on whether we call the full page or the header only.
It will, of course, preclude robots.txt calls as well.
Key_Master: Since my last response to this thread my site was hit (GET request) by user agent "multiBlocker browser". Was it you fantomaster or someone else? I can't tell because it came from a dialup IP. I really don't care but it doesn't look very professional.
Yes, that was probably our spider (unless someone's started to spoof it).
We actually used a static IP from our US based server in the beginning, but a) bandwidth costs were exorbitant, and b) we experienced quite a few (unsuccessful) crack attempts and the odd (equally unsuccessful) DOS attack or two in the course of those runs. So we set up that ADSL box instead which is a flat rate service. Another reason for working with dynamic IPs from the same server is our fairly extensive search engine submission setup.
My guess would be that fantommaster is looking for sites that use fantom scripts. But it's only a guess.
Not so. Too much hassle. Not we haven't considered it ... :)