homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Microsoft URL Control
A warning and a request for information

 3:45 am on Jan 2, 2002 (gmt 0)

I found the following two lines in my log files: - - [01/Jan/2002:03:31:31 +0000] "GET /cgi-bin/formmail.pl?recipient=etest6969@yahoo.com,&subject=Hey&email=e7382john@yahoo.com&=http://mydomain.co.uk/cgi-bin/formmail.pl HTTP/1.1" 302 375 "-" "Microsoft URL Control - 6.00.8862" - - [01/Jan/2002:03:31:46 +0000] "GET /cgi-bin/formmail.pl?recipient=etest6969@yahoo.com,&subject=Hey&email=e7382john@yahoo.com&=http://mydomain.co.uk/cgi-bin/formmail.pl HTTP/1.1" 200 661 "-" "Microsoft URL Control - 6.00.8862"

it's pretty obvious that someone is using Microsoft URL Control to automatically check websites for the existence of a basic form to email script which they can use, probably for spamming. this is a very good reason to block Microsoft URL Control.

for those who don't know, formmail.pl is a freely available Perl/CGI script for sending emails from forms on web pages. many people use this script as it is without renaming it, and they use it in the default location of the cgi-bin.

if you are using any kind of form to email script, do not name it something obvious like this. if you are using a freely available script, rename it.

ideally you should take other security measures as well. does anyone have any tips?

i'm too tired right now to check for instruction for blocking Microsoft URL Control, but i believe they have been posted in this forum. if anyone can help out with advice etc, then it would be much appreciated.



 3:50 am on Jan 2, 2002 (gmt 0)

More info: Did I get hacked? [webmasterworld.com]


 4:07 am on Jan 2, 2002 (gmt 0)

Stops em dead.

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*NEWT [OR]
RewriteCond %{HTTP_USER_AGENT} ^MSFrontPage [OR]
RewriteCond %{HTTP_USER_AGENT} ^[Ww]eb[Bb]andit [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla.*Indy [OR]
RewriteCond %{HTTP_USER_AGENT} ^Zeus.*Webster [OR]
RewriteCond %{HTTP_USER_AGENT} ^Microsoft.URL [OR]
RewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^sitecheck.internetseer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^InternetSeer.com [OR]
RewriteCond %{HTTP_USER_AGENT} ^Ping [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [OR]
RewriteCond %{HTTP_USER_AGENT} ^ia_archiver [OR]
RewriteCond %{HTTP_USER_AGENT} ^DIIbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^psbot [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailCollector
RewriteRule ^.* - [F]


 4:12 am on Jan 2, 2002 (gmt 0)

Well obviously any script that allows a person to send an arbitrary message to an arbitrary address is a bad idea.


 4:27 am on Jan 2, 2002 (gmt 0)

There is *supposed to be* a line in the script that restricts users to only certain domains. Apparently it doesn't work without further measures being taken.


 4:58 am on Jan 2, 2002 (gmt 0)

According to my logs, someone was hunting for formmail.pl on one of my sites last week. Since I don't use it, they failed.


 11:44 am on Jan 2, 2002 (gmt 0)

thanks guys. very useful info there.

are there other ways to protect form to mail scripts?


 12:49 pm on Jan 2, 2002 (gmt 0)

Use "POST" to post the data (most script bots won't bother).

If you are using standard off-the-shelf scripts, then rename the core file. Do what every you have to do to reference the new filename. Often .pl to .cgi or vice-versa is enough. Just something so that a snoop bot like the above can't get it.


ms url control is just a generic name for a particular COM/OLE object.


 2:29 am on Mar 12, 2002 (gmt 0)

Just an update on this, as Brett said, change method to post, rename the file...But I just checked my logs and on half the sites I maintain, I'm getting:

POST /cgi-bin/formmail.pl
POST /cgi-bin/formmail.cgi
POST /cgi-bin/FormMail.pl
POST /cgi-bin/FormMail.cgi

From this ip address:

So it looks like the change to POST won't be enough, nor will a change to .cgi or .pl. These sites are returning 404, because I renamed it to FM.cgi, however, now I think that is not really safe, so I'm changing it to skek97eis.cgi or something. If you rename it, try something really off the wall. Probably any other well known packaged script should be treated this way, especially 'send this page to a friend' scripts could probably be vulnerable in the future.

[Forgot to add for toolman, UA is]
Mozilla/2.0 (compatible; MSIE 2.0; Windows 3.11; DigExt).

I have the host name and I ran a whois (Telecom Italia), but I don't know how to report abuse. Anybody?


 2:19 pm on Mar 12, 2002 (gmt 0)

Reporting abuse:



AFAIK: postmaster@somedomain has to exists.

else do a whois lookup, and look who's in change (technical contact), and tell them..


 2:22 pm on Mar 12, 2002 (gmt 0)

btw: ia_archiver respects robots.txt.

I laready posted in another forum, but If you run on Apache 1.3.x, the following url might be the solution:


Its a DSO modsule for apache blocking robots that dont respect robots.txt



 9:48 pm on Jun 18, 2002 (gmt 0)

Hey Toolman,

what does your rewrite stuff do exactly? Does it go in the .htaccess in public_html or every directory?

I'm looking to stop the microsoft url control stuff. At the moment I'm blocking a proxy IP address which someone/thing is using to launch these microsoft url control suspicious stuff. Obviosuly blocking the ip is not good because its stopping genuine visitors too.


 12:08 pm on Jun 19, 2002 (gmt 0)

Microsoft URL Control is also used by the freeware program "SpecSite" (Chicago Test Systems) "[...] to verify that a web site's internal and external links and image URLs are well intact". I have used version 1.2.0 to test the external links on my site from time to time.

So for Microsoft URL Control it's a case of what Woz wrote about "URL Spider Pro":

"[...] it could be a friend or a foe, depending on who is using it and why." [webmasterworld.com...] (Please see msg#:4 for the complete remark)

Banning Microsoft URL Control altogether is "Again a case of Babies and Bathwater..."


 12:19 pm on Jun 19, 2002 (gmt 0)

But would I be right in saying that anyone who is using a DHCP proxy to do some link checking or whatever has to be suspect?

If the IP address was fixed and traceable to a bona fide company then ok.

I get suspicious of proxies.


 2:58 pm on Jun 19, 2002 (gmt 0)

>I get suspicious of proxies. <

Frank I would just keep a good eye on your logs and see what they do. If they are looking for formmail.pl as the initial example is in this thread, and you have it, then do something to stop them because they are looking to exploit it. If they are looking for something else they may also be looking for an in somehow rather than just link checking.

Keep the log relating to them in case you need to complain to their isp if they do try to hack or abuse your server.



 4:01 pm on Jun 19, 2002 (gmt 0)

Well it does look like either a homebrew search engine of someone is trying to mirror my site. The logfile is full of hundreds of requests for pages but the jerky has the wrong paths in there: - - [14/Jun/2002:22:32:35 +0100] GET /../../index.html HTTP/1.1 400 360 - Microsoft URL Control - 6.00.8169 0 www.mysite.co.uk - - [14/Jun/2002:22:32:35 +0100] GET /../freestuff/freestuff.html HTTP/1.1 400 368 - Microsoft URL Control - 6.00.8169 0 www.mysite.co.uk

Its like he configured the wrong starting point. The ../../ are two levels out. But how do I tell the dude to sort it out? I have complained to NTL, I have blocked the IP address (but this also blocks genuine visitors).

Before I blocked it the spider or whatever would spend about an hour 5 times a day trying to locate about 500 pages. Because he has the paths wrong I get thousands of 400 errors each day.

Similar thread here:


[edited by: Frank_Rizzo at 4:36 pm (utc) on June 19, 2002]


 4:13 pm on Jun 19, 2002 (gmt 0)

I think he may be trying to use your server as a proxy but not know how, or simply not have achieved it yet. That would be to get your site to request pages and serve them on to him maintaining his identity secret from them.

Its interesting that he is getting 400 360 and that your server is not delivering your default or index page in place of whatever he is requesting.

Someone with a clue will be along shortly hopefully :-)



 9:29 pm on Jun 19, 2002 (gmt 0)

More visits again today. Here are the latest ones: - - [19/Jun/2002:22:15:44 +0100] "GET /xxxx_newsletters.html HTTP/1.1" 403 2422 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:15:56 +0100] "GET /sitemap.html HTTP/1.0" 403 2422 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:16:16 +0100] "GET /xxxx_newsletters.html HTTP/1.0" 403 2422 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:16:58 +0100] "GET /members/xxxx/xxxx/xxxx.html HTTP/1.1" 403 2422 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:17:07 +0100] "GET /sitemap.html HTTP/1.0" 403 2422 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:17:14 +0100] "GET /members/xxxx/xxxx/xxxx.html HTTP/1.1" 403 2422 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:17:49 +0100] "GET /members/xxxx/xxxx/xxxx.html HTTP/1.0" 403 2422 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:18:02 +0100] "GET /members/xxx/xxxx/xxx.html HTTP/1.1" 403 2434 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:18:14 +0100] "GET /samples/xxxx/xxxx/xxxx.html HTTP/1.1" 403 2434 "-" "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:19:22 +0100] "GET / HTTP/1.1" 403 2434 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 0 www.mysite.co.uk "-" - - [19/Jun/2002:22:19:26 +0100] "GET / HTTP/1.1" 403 2434 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)" 0 www.mysite.co.uk "-"

403 errors because I'm blocking the IP address. I guess I need to implement the rewrite stuff.

Come on guys. Anyone shed a little more light on this? Is it some homebrew search engine? Is is someone checking for site plaigiarism? Is it a rival competitor? Is it someone trying to crack the members area?


 12:17 am on Jun 20, 2002 (gmt 0)

The only thing I can say about MS URLControl is that I've had it visit about four times, from totally unrelated sources. Once it had the profile of a spam bot, twice looking for formail.pl (several times over weeks), and twice acting like a normal web browser.

I don't use formail.pl or CGI generally - is there something more 'useful' I could put in place as formail.pl for these circumstances? (like an autoreporting script :) )


 4:29 pm on Jun 21, 2002 (gmt 0)

frank - block by user agent with mod rewrite. somewhere in webmasterworld is a mod rewrite script for blocking bad bots - try a search for mod_rewrite

kev - use formmail v1.9 or above. rename your formmail.pl script to something very obscure. make sure you use the referrers. set the script up to only send email to you, nobody else.


 4:45 pm on Jun 21, 2002 (gmt 0)

If you rename the formmail.pl, does anything else in supporting files need to be changed? I know our system uses it, but I am new to it - just learning.


 10:20 am on Jun 26, 2002 (gmt 0)

Hey guys!
I had enough with those sucking home made spiders and decided to go war on them some time ago.
What I do is writing to my .htaccess "deny from IP" for the browsers listed in my %bad_browsers hash.
My pages use one file to build a bottom of the page, so no matter what you hit - you gonna get it.
It checks for IP, if known (log is written for each visitor and if IP is there, say file is present like /home/blah_blan/IP - user is fine). If no log present - I check it against bad browsers ident. I also take some streched guessing against something like this as a browser: JDKLFDKJFH (deleting all no good characters $browser =~ s/\W//g; and check for a length of ident and presents of digits).

If ident falls into bad category - write it to .htaccess - deny from IP (if IP exist, yeah, I've seen no IP visits???).
Once a day I re-create original .htaccess by the program. My .htaccess forwards errors (as 404s etc.) to the Perl file and if there something fishy as far as browser ident of visitor looking for any Micro$soft crap or formmail.pl or .cgi - I deny them too.

Works great and reqires to update only one file with %bad_browsers hash which is requested by all domains on my server! Very easy to maintain and if someone wants to post to my guestbook VIA spider - they'd have to pass through my error checking, POST method verification and preview - which makes it vertually impossible to go through... they have to click on Submit twice and I wonder how they'll do it:-)))



 8:38 am on Oct 13, 2002 (gmt 0)

"ewriteCond %{HTTP_USER_AGENT} ^Wget [OR]
RewriteCond %{HTTP_USER_AGENT} ^Link [OR]
RewriteRule ^.* - [F]"

Why do you want to ban Links users anyway?

I know tha WGet has options for recursive retrieval but I'm using it for legitimate downloads. I would get 403'd too if I was to download a file off your server.


 1:46 pm on Dec 2, 2002 (gmt 0)

It may be benign...

Whenever this application [desktop.accuweather.com...] hits our server, it comes in as Microsoft URL Control. This is probably because the developer used the IE shell when making the app. However, I guess this could mean someone is trying to spam as well, but not necessarily.


Cybervox, Voice of the Future


 2:09 pm on Dec 2, 2002 (gmt 0)

Some time ago I thought about the fact that forms are rather insecure and its only a matter of time until a spammer does exploit it. There are other ways it can be abused as well...never posted this anywhere because why give ideas...


 2:41 pm on Dec 2, 2002 (gmt 0)

Hi everybody!
Responding to the question if there are any other ways to protect e-mail scripts from being used by spiders etc.
I use different custom made perl scripts. The best way is to have a preview of the message to be sent and write a log of the preview. Write, say e-mail of the recepient in the log and verify it when submitted after preview.
This way it's harder to spam from your server and takes real good programmer to overcome it. Most of those guys who's trying to use your mail server to send spam are AOL-like lammers.


 5:17 pm on Dec 6, 2002 (gmt 0)


You could also add a simple line or two like this:

@referers = ("www.yoursite.com" , "your.ip.address" );

# (IN the head of the script)

sub check_url {

foreach $referer (@referers) {
if ($ENV{"HTTP_REFERER"} =~ m¦https?://([^/]*)$referer¦i) {
$check_referer = "1";

else {
$check_referer = "1";

if ($check_referer!= 1) {


### IN the Body

sub error {

($error,@error_fields) = @_;

print "Content-type: text/html\n\n";

if ($error eq "bad_referer") {
print "<HTML>\n <HEAD>\n<TITLE>Access Denied</TITLE>\n </HEAD>\n";
print "<BODY BGCOLOR=\"#FFFFFF\" TEXT=\"#000000\" LINK=\"#0000FF\" VLINK=\"#800080\" ALINK=\"#FF0000\">\n";
print "Your Error Page Here\n";
print "</BODY>\n</HTML>\n";

### At the base of the script



 2:06 am on Dec 20, 2002 (gmt 0)

Regarding FormMail scripts, while changing default filenames can help it is not the solution.

A thorough--although possibly too technical for the general reader--explanation of vulnerabilities in FormMail can be found at Anonymous Mail Forwarding Vulnerabilities in FormMail 1.9 [monkeys.com] [PDF].

As discussed in the aforementioned paper, such scripts are traditional woefully insecure. Worthy of mention is the infamous Matt Wright's version -- even the latest version, which many users aren't running anyway, contains insecure code. These problems inspired the NMS [nms-cgi.sourceforge.net] initiative--a group of proficient programmers who have created drop-in replacements for Wright's scripts. The result is a useful, secure, and well-supported suite of scripts. I highly suggest that webmasters use the NMS FormMail script rather than the others floating about the web. :)

(No, I'm not affiliated with either of the above URLs).


 12:45 am on Dec 23, 2002 (gmt 0)


A while back I was asked by the people who host my sites to change over from PERL scripts to PHP because apparently there are no security issues as with PERL. (Don't shoot me down in flames please. I am not a PERL or PHP expert).

I changed over and have had no problems. There are many PHP email scripts. The one I use can be found doing a search in Google on RomcoMail php. I found it easy to change over.


Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved