A new agent or new advertisement?

Forum Moderators: open

Message Too Old, No Replies

A new agent or new advertisement?

Multiblocker

Tapolyai

5:19 pm on Mar 6, 2002 (gmt 0)

The following agent was in my logs a minute ago:


multiBlocker browser - IP blocker for Spam, Fraud + Snoop Protection

with referer as "http://multiblocker.com/home.html", but the IP is for a dial-up "pD9511A95.dip.t-dialin.net"...

Is this an other vendor trying to sell garbage or a real agent?

As for the product... uhhhh.. $99 for configuring your .htaccess - Why didn't I think of that business model before?

volatilegx

7:19 pm on Mar 6, 2002 (gmt 0)

Looks like multiblocker is put out by Fantomaster, who sells a number of SEO related CGI scripts. On their site you can check out a demo of the script which actually gives you the source of the script as well. It doesn't look like they are really using .htaccess to deny access to IPs, hosts and User Agents... they are using a Server Side Include to call a CGI script which processes the deny/allow action. Looks like they put those who are banned into a sleep cycle which basically just denies them access to your page. I wonder what the performance hit on your server would be running the script. All in all, I would say it is an interesting idea, very similar to cloaking, but easily implemented yourself if you know Perl/CGI.

rcjordan

7:29 pm on Mar 6, 2002 (gmt 0)

Heh! Ol' Fanto writing his ads into the logs... Imagine that!

bird

7:36 pm on Mar 6, 2002 (gmt 0)

The folks from fantomaster have a history of spamming everybody's log files with fake referrers and UAs. Whatever the quality of their software may be, their advertizing tactics are more than just a bit a nuisance.

volatilegx

11:20 pm on Mar 6, 2002 (gmt 0)

It'd be the height of irony to use Fantomaster's multiblocker script to block his advertisements!

Key_Master

11:35 pm on Mar 6, 2002 (gmt 0)

Sounds vaguely familiar. In my opinion, the script is antiquated. You can get a similar script from hotscripts for free.

fantomaster

9:50 pm on Mar 9, 2002 (gmt 0)

Cool it, folks - there's nothing "fake" about our referrers and UAs, bird: this is one of our spiders run from a dedicated Linux box in Germany dialing in via adsl.

So is it advertising? Yes and no: we are scanning the whole Internet in regular cycles for site accessibility and response rate checks. Very interesting data, btw - gives you an indication of how many webmasters are actually double checking logs for unknown referrers, typical response rates (some people will take up to three months to react), stats software distribution (usually, though not always, discernible from their referrers), etc. (No personalized data collected, only aggregated numbers.) You won't get that sort of information if your log footprint features something boring like "hello world".

Again, this is a mere accessibility check - no whacking job, no attempt to access restricted password protected content, no proxy hijacking, no harvesting of email addresses or personal data, no intrusion of any kind. Also, to minimize on bandwidth, contact is effected only once per total Internet run. Moreover, to keep matters as transparent as possible, our spider leaves a clear footprint, indicating where it comes from.

And yes, volatilegx, your multiBlocker(TM) analysis is quite precise, sleep cycling and all.

Key_Master: I'm not sure your assessment is correct. If you know of a product offering similar features on hotscripts (or elsewhere for that matter) for free, I'd sure appreciate it if you could point it out to us.

Anyway, no need to go for IP blocking yourself: if you object to your IP(s) being spidered that way, just drop me a sticky mail and we'll exclude it (them) from our subsequent runs.

Brett_Tabke

10:36 pm on Mar 9, 2002 (gmt 0)

What's the click back rate these days?

fantomaster

10:49 pm on Mar 9, 2002 (gmt 0)

Around 0.1%.

wilderness

11:19 pm on Mar 9, 2002 (gmt 0)

Hey fantomaster,
Nice to see some bot folks PARTICIAPING here.
There are some others albeit lurking :-(

I personally have a "Terms" which prohibits any visitor which use my resources for their commercial and/or revenue.
Of course the bots don't bother reading "terms" and as a result they are denied.

I'm afraid that includes fantomaster.

fantomaster

11:32 pm on Mar 9, 2002 (gmt 0)

No problem, wilderness: as I said, if anyone's objecting against our spidering their sites, feel free to give us the IPs you wish to exclude - will be effected on the next working day.
As we use various spiders, UAs and referral IPs as well as target URLs (which allows us to better check repeat patterns), it's not trivial for webmasters trying to block them all before the event, but do as you deem fit.

jeremy goodrich

12:05 am on Mar 10, 2002 (gmt 0)

hmmm...seems to me it would be easy enough to 'cloak' and then deny based on referal string (if you are that concerned about it).

>>>> .1% click back rate.

Interesting.

bird

12:41 am on Mar 10, 2002 (gmt 0)

if anyone's objecting against our spidering their sites, feel free to give us the IPs you wish to exclude

That's what robots.txt is there for. If you want to claim that you're running a legitimate bot, then you probably should use that. There's no way that webmasters can individually notify all spider operators that they don't want robots fetching pages from their sites. It's your job to make sure that you're welcome.

fantomaster

1:03 am on Mar 10, 2002 (gmt 0)

That wouldn't really solve anything: webmasters would still have to configure the robots.txt file to specifically exclude/disallow our various UserAgents. Also, it's a mere header call, and on the index page at that.

bird

1:16 am on Mar 10, 2002 (gmt 0)

webmasters would still have to configure the robots.txt file to specifically exclude/disallow our various UserAgents

That's true on the surface. But then, you shoudn't be using "various UserAgents" for a bot that always does the same thing anyway. Stick to one unique signature bot name (and place any variable info into parens, if you must). You should also respect the "User-Agent: *" entries in the robots.txt files, whatever your UA at that time may be.

it's a mere header call

That's not what I'm seeing in my log files. Are you sure you told your programmers that you only want to request headers? ;)
In fact, if you actually did what you're now just saying here, then that would reduce the gravity of what I'm criticizing quite a bit.

Key_Master

1:20 am on Mar 10, 2002 (gmt 0)

fantomaster, a link to info about your spider on the site would probably have alleviated some of the dissent given here. Plus, you've got to admit spiders crawling from dialup IP's do tend to throw up red flags.

Personally, I don't have much respect for spiders that leave invalid referrers (aka advertisements).

Marcia

1:29 am on Mar 10, 2002 (gmt 0)

>>Around 0.1%.

That includes me, I clicked on the link and joined up. It wasn't this one, it was another. You've got some industrial strength marketing there, good stuff.

>response rate checks.
Very useful statistical information.

fantomaster

2:28 am on Mar 10, 2002 (gmt 0)

bird:

That's true on the surface. But then, you shoudn't be using "various UserAgents" for a bot that always does the same thing anyway.

Technically easily possible, of course, but it would seriously skew the results because of the response time lag already mentioned.

You should also respect the "User-Agent: *" entries in the robots.txt files, whatever your UA at that time may be.

Again, we never hit anything but the index page. You're not suggesting that you've blocked your site for any UA out there, or am I misreading you?

it's a mere header call

That's not what I'm seeing in my log files.

Hm, I'll check this with our tech dept. and I won't rule out a glitch without having analyzed the matter more thoroughly, but that's what was intended in the first place at least.

Key_Master:
a link to info about your spider on the site would probably have alleviated some of the dissent given here. Plus, you've got to admit spiders crawling from dialup IP's do tend to throw up red flags.

You're right about that spider info link, and we're going to implement it next week.
As for the red flags, I'm not so sure: whackers will usually crawl from dialups as well. While we don't particularly like them, either, we're not alarmed when they come. After all, you've got judge a spider by its overall behavior - a dialup IP may be a first indicator of something fishy going on, perhaps, but it's not as if our spiders were running rampant on people's
sites or anything.

As for "invalid referrers", I beg to differ. Nothing invalid about it, as far as I can see. And yes, there's advertising involved as well (as stated above), but again you can't really expect any decent response data unless you place some bait in people's way.
As for the "IP blocker for Spam, Fraud + Snoop Protection" string, we've deleted it now because we did get quite a few mails from concerned webmasters who thought we were actively blocking them or something, possibly commissioned to do so by competitors of theirs - this, of course, was quite unintentional. (An interesting reaction nevertheless ...)

Tell you what: I'll be happy to either pos a list of our spiders' UAs here or to a URL where you can download it, if you're all game. That way you folks can block them at your own discretion if you don't want us to exclude your IPs by default.
Just give me till Monday to do so as I'm currently busy holding an in house weekend seminar.

Marcia:
That includes me, I clicked on the link and joined up. It wasn't this one, it was another. You've got some industrial strength marketing there, good stuff.

Thanks for your appreciation - and for signing up! Nice to talk to someone belonging to that elite of 0.1% of webmasters who actually take their access logs seriously enough to check them out. :)

volatilegx

3:55 am on Mar 10, 2002 (gmt 0)

I'll be happy to either pos a list of our spiders' UAs here or to a URL where you can download it, if you're all game. That way you folks can block them at your own discretion if you don't want us to exclude your IPs by default.

Very kind. Should you put up such a list, I certainly wouldn't ban your spiders ( I only ban bandwidth leeches and email scanners ), but I would like to be able to add them to my list of known UAs.

Thanks :)

bird

4:17 am on Mar 10, 2002 (gmt 0)

but it would seriously skew the results because of the response time lag already mentioned.

I can't quite follow you here. You've been using "Mozilla/4.0 (fantomBrowser)" for a long time now [with a few "(stealthBrowser)" thrown in], apparently without compromising your results. I may have missed some others, but what exactly do you gain by changing to "MultiBlocker browser" now, other than advertizing exposure?
Was or is contextbase/polyserve the same company, btw?

or am I misreading you?

There are sites out there (not mine) that don't want to get crawled by automated processes at all. There's any number of good reasons for such a wish, but that's another discussion. Those sites use the "User-Agent: *" entry in their robots.txt. That's not a block, but a polite request, and every well behaved robot should respect it. I assume you are familiar with the robot exclusion standards?

Hm, I'll check this with our tech dept. and I won't rule out a glitch without having analyzed the matter more thoroughly, but that's what was intended in the first place at least.

Your robot has exclusively been using GET requests at least since early in 2000. Microsoft would call it a "feature"... ;)

If I have an even remotely realistic idea about the volume of your spidering, then using HEAD requests will reduce your bandwidth requirements by a tremendous amount. Not to mention that your crawling runs might finish in a fraction of the time.

Key_Master

4:25 am on Mar 10, 2002 (gmt 0)

Since my last response to this thread my site was hit (GET request) by user agent "multiBlocker browser". Was it you fantomaster or someone else? I can't tell because it came from a dialup IP. I really don't care but it doesn't look very professional.

>>> Your robot has exclusively been using GET requests at least since early in 2000.

My guess would be that fantommaster is looking for sites that use fantom scripts. But it's only a guess.

Brett_Tabke

12:17 pm on Mar 10, 2002 (gmt 0)

Lets just notice two things: a) a spider runner that is willing to explain their bots actions, b) willing to heed webmaster wishes and block their ip.

Now compare that to the estimated 20 bots a day that hit here without every asking or leaving a trail back to themselves. Unregulated bots, are the #1 concern of many webmasters. When you compare system resources used, a single page view by a random bot isn't even mentionable compared to the thousands-millions of views by rogue agressive bots many of us see.

I also take partial blame/credit for giving fanto the idea a few years ago. However, that was back when you could still get 2-3% click back rates. It's nice stealth promotion.

littleman

8:18 pm on Mar 10, 2002 (gmt 0)

Just imagine if everyone with the ability to slap together a spam bot did what fanto is doing. Weblogs would be reduced to worthless volumes of garbage. I am thankful most people show more restraint.

fantomaster

1:23 am on Mar 11, 2002 (gmt 0)

First, thanks for the support, Brett. And while it is true that you pointed us in this direction quite a while back, the responsibility, of course, is ours.

However, that was back when you could still get 2-3% click back rates.
Interestingly enough we do get about 2% click back from ODP listed sites. It's the others which push the stats down.

Click-thru rates on the target pages, in case anyone's interested, range between 18 and 22.5%.

It's nice stealth promotion.
So it is. However, there's a lot more to it as well:
1. It helps us check overall site accesibility.
2. It helps us check click back rates, which constitutes valuable marketing data.
3. We're currently building a database of international domains, assigning them their proper IPs. This may or may not turn into a lookup service some day, we're still evaluating.
4. It facilitates finding open (non pw protected) stats and access logs pages out there - many, many, many. :)
5. It increases link pop via all those open log pages esp. with FAST/Alltheweb and, to a lesser extent, Google.
6. Finally, we are seriously looking into the possibility of building an SEO industry friendly truly international search engine (submissions for SEO agencies only - both free and commercial, the rest of the Web to be crawled automatically), allowing for regulated cloaking, doorways, redirects, etc.
This project is as yet tentative and it will probably require some pretty heavy outside funding, which in turn requires exactly the sort of hard statistical data we're collecting this way.

volatilegx: We've set up that page listing our spiders' UserAgents here - [fantomaster.com ]. We may update this list on occasion, but most probably not within the year.

bird: what exactly do you gain by changing to "MultiBlocker browser" now, other than advertizing exposure?
Because these are repeat runs hitting all sites, we rotate different footprint entries to determine a more precise statistical average.

Was or is contextbase/polyserve the same company, btw?
No, contextbase is both one of our domains (contextbase.com) and the name of our dedicated US based server. Polyserve is a different company.

Your robot has exclusively been using GET requests at least since early in 2000. Microsoft would call it a "feature"...
I checked it with our tech dept. and you're right - my mistake, and sorry for seeming misleading: we've been using the GET command so far and were actually in the process of implementing header only calls today (should be finalized in about half an hour's time or so). This again is a check mode feature: checking response rates in dependence on whether we call the full page or the header only.
It will, of course, preclude robots.txt calls as well.

Key_Master: Since my last response to this thread my site was hit (GET request) by user agent "multiBlocker browser". Was it you fantomaster or someone else? I can't tell because it came from a dialup IP. I really don't care but it doesn't look very professional.
Yes, that was probably our spider (unless someone's started to spoof it).
We actually used a static IP from our US based server in the beginning, but a) bandwidth costs were exorbitant, and b) we experienced quite a few (unsuccessful) crack attempts and the odd (equally unsuccessful) DOS attack or two in the course of those runs. So we set up that ADSL box instead which is a flat rate service. Another reason for working with dynamic IPs from the same server is our fairly extensive search engine submission setup.

My guess would be that fantommaster is looking for sites that use fantom scripts. But it's only a guess.
Not so. Too much hassle. Not we haven't considered it ... :)