Forum Moderators: open
I'm using this method to ban bad bots.
SetEnvIfNoCase User-Agent "^badbot1" bad_bot
SetEnvIfNoCase User-Agent "^badbot2" bad_bot
SetEnvIfNoCase User-Agent "^badbot3" bad_bot
SetEnvIfNoCase User-Agent "^badbot4" bad_bot
<Limit GET POST>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>
But I'm a bit confused as to how best to ban a bot who is using Microsoft URL Control to hammer a script on my site. The more I research, the more confused I get.
Can I use;
SetEnvIfNoCase User-Agent "^Microsoft" bad_bot
Will this be enough, or am I likely to ban all things with Microsoft in the name?
Thanks
MFC Foundation Class Library*
MFHttpScan
MSN Feed Manager
MSProxy/*
The MSN bots pretty much all start with msnbot. At least all the ones that are confirmed as being from MSN. I have seen the following two bots coming from MSFT IP Addresses however my contact at MSFT tells me they are not official:
lanshanbot/1.0*
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT; MS Search 4.0 Robot, crawler)
I find remark statments just make things harder to read (my opinion) and don't generally use them.
The only references that I was able to find, were to four digit numbers that "ends with" is used.
These numbers were references in 2002 and still seem to be effective.
However, in all fairness, I don't believe that we are seeing the volume of Microsoft URL Control in the UA's that we once did?
bouncybunny,
could you possibly provide the FULL UA which would include the numbers that accompany the words?
Don
I'm sorry, I have to be honest, I'm not a big techie and I find your post quite hard to understand.
But the full user agent for two of these visitors is;
Microsoft URL Control - 6.00.8862
Microsoft URL Control - 6.00.8169
There are a couple of others too, I think. To be honest, they are not grabbing huge sections of my site any more. Just apparently unconnected pages as well as some incorrect and apparently randomly created/incomplete URLs.
For example they might grab something like http://www.webmasterworld.com/foru
Very odd.
[edited by: encyclo at 2:06 am (utc) on April 14, 2007]
[edit reason] delinked broken link [/edit]
Are you asking me to supply the full line from my logs where the Microsoft URL Control user agent is? I.E. IP number and so on?
I let all this garbage in cause I need to see what it's up to for my project.
I'm not sure about this UA being on the decline. The last time I saw it was April 8, 2007. It's visited close to 400 times since I first saw it several years ago.
EDIT: No problem BB. ;)
[edited by: GaryK at 5:04 pm (utc) on April 13, 2007]
SetEnvIfNoCase User-Agent "^Microsoft\ URL\ Control" bad_bot
Jim
The ending numbers were sufficient.
I have these old numbers denied
SetEnvIf User-Agent 30630$ keep_out
SetEnvIf User-Agent 0425$ keep_out
SetEnvIf User-Agent 47$ keep_out
SetEnvIf User-Agent 48$ keep_out
SetEnvIf User-Agent 51$ keep_out
SetEnvIf User-Agent 53$ keep_out
SetEnvIf User-Agent 63$ keep_out
SetEnvIf User-Agent 8862$ keep_out
SetEnvIf User-Agent 8877$ keep_out
SetEnvIf User-Agent 39)$ keep_out
SetEnvIf User-Agent 4319$ keep_out
Althoug I'm sure not all of them are related to URL Control.
Many thanks bouncybuuny.
Looks like it's popular with scraping and email harvesters.
I would only block the entire user agent "Microsoft URL control" to avoid any accidental whacking of legit things from MS.
Many thanks Gary.
Perhaps the software only visits where access is allowed (after the initial visit)?
kind of like the libwww-perl thing.
I would only block the entire user agent "Microsoft URL control" to avoid any accidental whacking of legit things from MS.
Thanks. I've implemented jdMorgan's method;
SetEnvIfNoCase User-Agent "^Microsoft\ URL\ Control" bad_bot
and that seems to work a treat when I test it with Firefox and the User Agent Switcher plugin. It blocks "Microsoft URL control", but let's through "Microsoft".
I'm understanding that most of you agree that this UA should be banned in most cases? It seems to be a bit of an unknown from some of the threads I've been reading.
As best I can tell it seems to target files that are likely to have e-mail addresses in them.
In my case, visits from most IPs only requested a few pages here and there. But one IP repeatedly requested the same two pages dozens of times. These two pages are connected (i.e. page 1 and 2 of the same article), but do not contain any email addresses. They do, however, contain the names of many people, so perhaps this was interpreted as possible contact information?
Microsoft URL Control is the user-agent given to applications that use the MSInet API under Visual C++ (source page not recorded, however the date was 1998)
Whether this is the same tool, another may determine
[microsoft.com...]
kind of like the libwww-perl thingActiveX control - see post #3063810 from msndude: