homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Missigua Locator 1.9 is Baaack
Spidered my entire site.

 11:51 am on Jan 4, 2005 (gmt 0)

A deprecated thread (one message only) from 2003 says in part:

419 Scammers use of Missigua Locator 1.9 is again a problem from Mozambique, Nigeria, Cote d'Ivory, Togo & Benin.

Deutsche Telekom: In most cases, their backbone is used by other countries and while they should be informed of scammers/spammers using their network, they are not the culprit.

Israel: Runs a satellite backbone that feeds some African countries. Again, we can not blame a backbone.

UK: Same notation. Feeds many African countries.

Massive IP blocks may lock out "friendly" visitors, therefore be highly selective in your blocks and htaccess.

- - -

I found that post on Google, after the same, uh, entity crawled my entire site.
It never looked at robots.txt (not banned there) and took every html file I had, but no images.

All hits were from the same IP # 64.165.***.** which traces back to Pacific Bell
(now SBC) in San Francisco.

Having only 140 pages or so, I wasn't hit too badly.
But, does anybody know who/what this actually is,
and what purpose they have in crawling a UFO site?

What is a "419 scammer" anyway?

- Larry

[edited by: volatilegx at 2:04 pm (utc) on Jan. 4, 2005]
[edit reason] obscured IP address [/edit]



 2:06 pm on Jan 4, 2005 (gmt 0)

From U.S. Secret Service website [secretservice.gov]:

The perpetrators of Advance Fee Fraud (AFF), known internationally as "4-1-9" fraud after the section of the Nigerian penal code which addresses fraud schemes...

For the record, I want to say that just because certain 419 fraudsters may have used Missigua Locator, it doesn't mean every use of the UserAgent can be linked with 419 fraud.

[edited by: volatilegx at 4:02 pm (utc) on Jan. 19, 2005]


 5:15 am on Jan 5, 2005 (gmt 0)

Thanks Volatile, just another tag for general Nigerian style scams I take it.

I don't think that applies here though. Somebody used or spoofed the 'Missigua Locator 1.9'
tag to spider my entire site. No images, but all 140 .html files instead.

Does anyone know who this is, and what's their game? - Larry


 3:36 pm on Jan 9, 2005 (gmt 0)

Seems to be listed (in its most recent incarnation) as a referer and comment spam-bot on a lot of sites.

It hit a blog I host and headed straight for the bot-trap.


 6:30 pm on Jan 9, 2005 (gmt 0)

I have the following from May 2004: - - [30/May/2004:22:48:06 -0700] "GET /myfolder/mypage.htm
HTTP/1.1" 200 24705 "-" "Missigua Locator 1.9"

Hardly coming from any non-NA IP range.

I've also had constant spidering attempts by a varity of sources in Missauga, Ontario

In addition the following thread appears to reference blogs:

IMO, all three provide reasons for denial.


 7:13 am on Jan 10, 2005 (gmt 0)

Missigua is back yet again!

Four days running, spidered entire site yet again ..
THIS time the DNS # is 69,228.XX.XX.
I put that into the ADDRESS bar and (slowly) up comes the ZEND corporation, "the .php people".
Zend is headquartered in Cupertino, CA i.e. nearby.
I emailed their webmaster (via their form mail).

This has me mystified. The only thing in common is the Missugua Locator (phony?) user agent.

Previously, they came in with DNS'es
67.119.#*$!.#*$! -and- 67.119.yyy.yyy (different last two #s.

All come in via the SBC backbone (our local phone company) but that tells me nothing.

Whoever this is, they really cover their tracks.
I can't ban them based on DNS #s if they keep changing them that much.

Any clues at all? - Larry


 7:15 am on Jan 10, 2005 (gmt 0)

Some people have referred to Missigua as a spam operation.
That makes no sense to me .. Why spider my entire site over and over?
They don't generate 404 errors looking for email vulnerabilities etc.
I'm stuck for an answer. - Larry


 3:49 pm on Jan 10, 2005 (gmt 0)

I'm stuck for an answer

Harvesting is not a difficult task to comprehend!
Take a look at ALL the software's created in the past to harvest:

The reason the users continue harvesting is apprently because they generate revenue. Accept it and make corrections in how you both create and allow access to you pages.

Denying access to visitors based on IP ranges is a secondary choice, ALWAYS.
The best and most effective option is to deny visitors based on a User Agent which is used by few visitors and reduces the denial of innocents.

The locator thing creates any easy solution.
SetEnvIf User-Agent Missigua keep_out (or what ever close you use)
SetEnvIf User-Agent Locator (or what ever close you use)

You may also use Rewrite for UA's, I do not.

RewriteCond %{HTTP_USER_AGENT} Missigua [OR]


RewriteCond %{HTTP_USER_AGENT} Locator [OR]


 3:44 pm on Jan 19, 2005 (gmt 0)

Yeah, and with a vengance. It tripped the trap and was fed EVERY 403 there is, but kept right on trying all the way thru my domain. - - [19/Jan/2005:07:04:18 -0800] "GET / HTTP/1.1" 200 20407 "-" "Missigua Locator 1.9"

No robots.txt.


 7:28 pm on Jan 19, 2005 (gmt 0)

Persistant sob, I'll say that. Came back and hammered every file ( trap tripped again ) before I had a chance to:

RewriteCond %{HTTP_USER_AGENT} Missigua [OR]

Drat! But it's in there nooowwwww....


 2:47 pm on Jan 28, 2005 (gmt 0)

got it here. just noticed it hammering my site today. thousands upon thousands of hits in my logs.

tracerouted source back to

irc109.hostcentric.com ( 179.944 ms 167.879 ms 167.906 ms

irc109? some sort of IRC bot maybe?


 1:26 pm on Jan 30, 2005 (gmt 0)

Http Code: 200 Date: Jan 30 07:20:29 Http Version: HTTP/1.1 Size in Bytes: 23308
Referer: -
Agent: Missigua Locator 1.9

Hitting my site every 30 seconds to grab a page.


 2:49 pm on Jan 31, 2005 (gmt 0)

Same here... - - [31/Jan/2005:09:23:15 -0500] "GET / HTTP/1.1" 200 28799 "-" "Missigua Locator 1.9"

Banned the i.p. but it is likely a dhcp isp address and will probably change.

Was also misfiring: - - [31/Jan/2005:09:23:03 -0500] "GET /directory/keyword-keyw...%A0 HTTP/1.1" 404 986 "-" "Missigua Locator 1.9"

<added> Now banned it is generating a 403 request every 10-12 seconds. Slow learner. </added>


 4:12 pm on Feb 6, 2005 (gmt 0)

Got a bunch of (failed) track back spams from
User agent:
Missigua Locator 1.9

I'd disabled the script earlier this week, so it never went through. But this is the same IP as someone had noticed earlier in this thread.


 8:10 pm on Mar 6, 2005 (gmt 0)

Missigua Locator 1.9 fell into a bot trap today from 66.111.***.***.

Along with Ask :). Ask just never learns! I freed it a few days ago, and it's back into the trap.


 8:45 pm on Mar 6, 2005 (gmt 0)

Rather than expending a lot of effort at chasing these guys or trying to figure out what they want, I suggest:
  1. Block by user-agent using mod_access or mod_rewrite on Apache, or ISAPI Rewrite on IIS.
  2. Block by IP address or IP address range, using mod_access or mod_rewrite on Apache, or ISAPI Rewrite on IIS. If you administer your own server, IPchains is also an option, as is blocking at the firewall.
  3. Install Key_master's bad-bot script [webmasterworld.com] (PERL), or birdman's PHP version [webmasterworld.com] of it. These scripts trap malicious visitors based on robots.txt violations. Adding exclusions to avoid banning *any* major 'bot is a good idea, as moltar's experience indicates.
  4. Install xlcus' php sript [webmasterworld.com] to ban runaway crawlers. This script detects excessively-fast consecutive requests.
  5. Don't worry about *why* they try to harvest your site any more.
  6. If they don't learn after getting 403'ed, then redirect their requests to a forbidden file in a subdirectory which contains a zero-byte custom 403 Error page. This conserves bandwidth for the really eggregious harvesters.
  7. Relax.
By combining the above methods, you can reduce the number of successful harvests to almost zero.



 12:30 am on Apr 1, 2005 (gmt 0)

i could handle a request every 12 seconds or so.

they were hitting me multiple times per second, tho! ;)

mod_dosevasive started giving them 403's right away, though, and they're now banned by UA...


 7:14 pm on Apr 25, 2005 (gmt 0)

Seeing this bot over the weekend: - - [24/Apr/2005:06:10:34 -0400] "GET / HTTP/1.1" 200 14187 "-" "Missigua Locator 1.9"

I am leaning toward some form of "for purchase" automated content scraper or harvesting program leaving this bot description because of this being an apparent dynamic i.p. and the i.p. range of this particular request. I have banned similiar dynamic i.p. ranges from this provider in the past for scraping content. Anybody know definitively what this bot is from?

[edited by: idoc at 7:44 pm (utc) on April 25, 2005]


 7:24 pm on Apr 25, 2005 (gmt 0)

Came from on April 24. Kept going for 8 hours, without being too aggressive. It displayed typical bot behavior, including trying to load
feed:http etc, triggering a 404.

I loaded the IP number as a website. Title is:
Top 10 Search engine placement _ Need Web Traffic?

But the only visible content is a green background and a simple form field/submit with this text above it:
To be removed, enter your email address below and click REMOVE

My guess would be - e-mail harvesters.


 11:45 am on May 5, 2005 (gmt 0)

im getting visits on which i banned through htaccess



 12:18 pm on May 5, 2005 (gmt 0)

That's a full featured IIS server, but no website I can see. Probably used for bots, rather than websites.


 10:45 am on May 24, 2005 (gmt 0)

Dammit, Missigua Locator 1.9 is back yet again!

OK, its time to ban it. Problem is, I'm scared to death of screwing up my .htaccess file.

Here's what I have so far, just a little code to 301 redirect non-www to my full www URL:

= = =

RewriteEngine On
RewriteCond %{HTTP_HOST}!^www\.mysite\.net [NC]
RewriteRule ^(.*)$ [mysite.net...] [R=301,L]

= = =

First off, is the code above perfect in every way? I'm already using it.

NOW, I want to ban any and all requests from anything with 'missigua' in name of user-agent.

Can somebody write me the code to do this?

If so, PLEASE put it together with the code above so its all in one piece. I don't know what comes first,
the 301 rewrite or the missigua ban, and I don't know how to stitch those together.

I'd like to make it case insensitive, and just use the one word missigua in case they change version
numbers etc.

Once that's in place, I will do my best to figure it all out, and ban other abusive bots on my own.

Any help much appreciated. The last time I tried to bad a bot I shut my whole site down!

Sticky me if that will help. Many thanks in advance - Larry

PS: Assuming this works, will missigua show up in my error_log files, like mistyped URLs etc do now?
I'd really like that. Sort of like hearing the bug-zapper go snap! -LH


 10:55 am on May 24, 2005 (gmt 0)

Yes, your 301 is fine.
I added a Missigua ban. Will be served a 403 and show up in your error log.

RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\.mysite\.net [NC]
RewriteRule ^(.*)$ http://www.mysite.net/$1 [R=301,L]
RewriteCond %{HTTP_USER_AGENT} ^Missigua [NC]
RewriteRule .* - [F]


 11:19 am on May 24, 2005 (gmt 0)

THANK you Span, ever so much. (Fast reply too!)

I just sent it up and tested it. My site didn't vanish or anything awful.
Now I can't wait for Missigua monster to come back around.
If all goes well, my error_log file should be really big. Thanks again! - Larry


 11:35 am on May 24, 2005 (gmt 0)

since im not so familiar with .htaccess and 301's id like to ask if i could just copy span's code into my .htaccess to achieve banning of Missigua?


 11:46 am on May 24, 2005 (gmt 0)

Hi stef: I don't see why not. Be sure to replace mysite with the name of your site, and change
the .net to .com or whatever top-level-domain you use.

If you use it all, you will be redirecting NON-www urls to the www.yoursite type, which is considered
a good thing in general. - Larry

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved