Forum Moderators: open
tide526.microsoft.com - - [09/May/2007:10:52:15 -0500] "GET / HTTP/1.1" 200 1436 "http://search.live.com/result.aspx?q=tramadol&mrt=en-us&FORM=LVSP" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; Win64; x64; SV1)" The above was a Tramadol search term, but there are other trashy pharmaceuticals that look like the one above. Since these words do not appear on any website I own, I know this is an illegitimate log entry.
I have read some speculation elsewhere on this forum
[webmasterworld.com...]
[webmasterworld.com...]
that these log entries are part of Microsoft quality control.
First off, given the state of MS's Vista, it's hard to believe that MS has any quality control department anywhere.
Secondly, why would Microsoft want to associate itself with sleazy pharmaceutical or even pornographic keywords?
Thirdly, if they are going to associate themselves with such seemingly unethical activities, shouldn't MS issue a statement to the webmaster community? TO force the words "quality control" in the URL would be a hint, but nooooo...
Why is the "GET" request a mere backslash?
There is something very wrong... am I crazy? It makes no sense.
So this would mean that someone did a query on live.com for tramadol, and while viewing the SERP for the query, the person decided to visit your site.
Which probably means that your site showed up in the SERP, or you are running an advert that was displayed to the visitor.
Another option: the visitor could be someone who already knew about your site, wanted some information on tramadol, and after getting the information, decided to pay you a visit while he or she was still viewing the SERP.
Another option: the visitor could be someone who already knew about your site, wanted some information on tramadol, and after getting the information, decided to pay you a visit while he or she was still viewing the SERP.
A type-in doesn't generate a HTTP_REFERER header.
Sounds to me like your site is showing up in the SERPs for tramadol. Maybe somebody posted a blog comment on your site about tramadol, or somebody linked to your site from a tramadol site, or somebody linked to your site using "tramadol" as link text... any number of possible explanations.
I think that may be the case. The point here is that we're seeing referrals from searches for terms that do not appear anywhere on the site -- Not in the page body, not in URLs, and -- in my case at least, not in any 3rd-party advertising (I've seen this on several of my informational/special-interest sites which have no advertising whatsoever).
They may in fact be checking dupe content or scraping that they have found on another site --not yours-- that has copied your content for use with (or to cloak) the search term... If I get the time, I'll have to go do some quoted-phrase searches to find out.
Jim
A type-in doesn't generate a HTTP_REFERER header.
You are absolutely right. But I was thinking more about someone using the history / favourites / bookmark / cache feature of the browser. I checked and rfc 2616 says:
The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard.
Which sounds like it covers someone using the bookmark feature as well, unless the bookmark is published and has its own uri, or the browser is not compliant :)
They may in fact be checking dupe content or scraping that they have found on another site --not yours-- that has copied your content for use with (or to cloak) the search term... If I get the time, I'll have to go do some quoted-phrase searches to find out.
Assuming this is the case, the log entry in question will indicate that the scraped content contains a link back to the site that was scraped. Don't know if a scraper can be that careless, but if I were scraping, I'll certainly make sure that the scraped content doesn't contain any link to the original source.
Note that the HTTP_REFERER quoted in the above log example leads to a 404. Spoof?
The referral uri? Yup, could be spoofed. Although what JdM suggested is also possible, I'll lean more towards it being spoofed.
If it isn't spoofed, though, and if live.com hasn't changed its uri format since the log entry was recorded, then it would have to be something internal to live.com that isn't meant to be used from outside (which would mean jdM is very likely right).
Note that "result.aspx" is definitely bogus as referrals from live.com are "results.aspx"
Assuming that the referral is indeed spoofed, it makes me wonder what their objective might be. For example if the goal is to inflate stats for live.com, shouldn't they be sending a proper referral then?
What is the point of a bad guy sending an incorrectly formatted referral that arouses suspicion? And why would a bad guy want to inflate stats for live.com anyway?
This referral spam just doesn't make sense to me - unless there is a vulnerability in the wild that we don't anything about and they are trying to exploit it.
This referral spam just doesn't make sense to me - unless there is a vulnerability in the wild that we don't anything about and they are trying to exploit it.
On second thought, since all the search keywords in the referrals are chosen to be inflammatory, this spam could be a form of attack on Microsoft, the goal being to degrade the **perceived** quality of the live.com search engine.
I.e. if enough webmasters keep seeing this garbage apparently coming from live.com, the attacker hopes they will conclude that Microsoft's search engine sucks or is badly broken.
Hard to say, but it's not showing up in my logs and I'm feeling left out.
I just got back some junk with all these domians filled with words Brett would shoot me for posting, but this is one I can post
MICROSOFT.COM.ZZZOMBIED.AND.HACKED.BY.WWW.WEB-HACK.COM
run a trace and got routed nowhere near Microsoft
tried the same with Yahoo.com and got similar junk
tried a few of my domains and everything is normal
whats up here? did crsnic.net get hacked? just weird stuff I have not seen before...
[edited by: Drag_Racer at 6:32 pm (utc) on May 17, 2007]
Maybe the goal is no more complicated than getting words like "tramadol" in your referral tracking to get people to click back to the SE and maybe on their ads while trying to figure out how you got there from LIVE in the first place, or to show up in blog referral lists for the same purpose.
You see, that's the part that I don't quite understand: the spoofer's uri is not in the referrer, and if you do click back to the SE through the referrer, what you get is a 404 - not a page with a link to the spoofer's site. Even if we publish the referral logs on our blogs, the spoofer still doesn't benefit because the referrer uri is broken.
The only beneficiary of this spam that I can think of is live.com (if the spam is an attempt to inflate stats) or someone who doesn't like live.com (if the spam is an attempt to degrade the perceived quality of live.com). Or maybe live.com really is broken and the search engine is misbehaving? Hard to tell.
The "Live.com usage inflation" doesn't fly either, because the very group it would be targeted at would discover it to be a fake, simply by following the referral link... and noticing that the time-on-site after the initial request is zero.
Jim
Or again, someone at microsoft looking at your site because a spam site targeting that keyword has scraped your content, and they want to see if you are involved in the spam "network."
This does make sense. But it makes sense to me only if the scraped content contains a link back to the original source, which is hard to imagine unless the scraper is thoroughly clueless.
Besides, Microsoft must know that webmasters and log analyzers take referrals seriously, so sending a broken or internal referrer that webmasters can't clickthrough is kind of worrying, especially given the keywords that are being used. If Microsoft really is doing this, it better not come out or there might be a lawsuit or two :)
The referrer is definitely a fake. But MS could easily find your (our) sites by simply copying a good long portion of the scraped content into their search box and slapping quotes around it. Get the URL, fake up a referrer, and grab your page -- or the automated equivalent.
I think I'll just block this whole subnet and see what happens. I've got all of their broken 'bots blocked now anyway, until they figure out how to do proper robots.txt prefix-matching... Hint "msnbot/" in robots.txt *does not* match "msnbot-media"... :)
Jim
what you get is a 404 - not a page with a link to the spoofer's site. Even if we publish the referral logs on our blogs, the spoofer still doesn't benefit because the referrer uri is broken.
Been there before.
I witnessed a huge influx like this and the domain names suddenly went live weeks later and that's when it was timed to be in the search results.
You have to stop thinking like a webmaster and thing like a spammer.
I witnessed a huge influx like this and the domain names suddenly went live weeks later and that's when it was timed to be in the search results.
I bet that when this happened, the referrer (1) was not broken and (2) did refer to a page which directly or indirectly linked to the spammer's yet-to-be-launched site. But in the case we are looking at right now, neither (1) nor (2) is true, so I still fail to see how the spammer hopes to gain from this particular spam.
You have to stop thinking like a webmaster and thing like a spammer.
maybe these come from MS employees working on a beta version checking serps on a busted algo they are trying?
This sounds quite plausible, so I decided to follow up on it. I found one entry in my bot database without referrer spam:
accept: */*
accept-language: en-us
connection: Keep-Alive
ua-cpu: x86
user-agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Tablet PC 1.7; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; MS-RTC LM 8)
via: 1.0 RED-PRXY-21
x-botcaps: 8
x-botserial: 07D70503101A0A232
x-icookies: yes
x-peer: 131.107.0.71
x-revdns: 131.107.0.71[tide501.microsoft.com]
The data is suspiciously similar to the one I posted under my "Is this really Yahoo's slurp" thread where someone appeared to be using a search engine host as a proxy.
A little research on Google suggests the problem may be related to Microsoft Security Bulletin MS04-039 (http://www.microsoft.com/technet/security/Bulletin/MS04-039.mspx). I still don't know for sure, though.
run a whois on that domain
I just got back some junk with all these domians filled with words Brett would shoot me for posting, but this is one I can postMICROSOFT.COM.ZZZOMBIED.AND.HACKED.BY.WWW.WEB-HACK.COM
run a trace and got routed nowhere near Microsoft
tried the same with Yahoo.com and got similar junk
Which domain did you run whois against?
"Vulnerability in ISA Server 2000 and Proxy Server 2.0 Could Allow Internet Content Spoofing (888258)"
I read that "This vulnerability could enable an attacker to spoof trusted Internet content. Users could believe they are accessing trusted Internet content when in reality they are accessing malicious Internet content, for example a malicious Web site. However, an attacker would first have to persuade a user to visit the attacker’s site to attempt to exploit this vulnerability."
These log entries do not appear to be the most efficient way to persuade people to visit a malicious website - only a few curious webmasters.
How might all this relate to the tide526.microsoft.com log entries?
How might all this relate to the tide526.microsoft.com log entries?
I'll be the first to admit that this is a bit farfetched, but what if a large corporate network running one of the products listed in the bulletin is compromised? Users on the network will think they are visiting some site A when in fact they are being served spoofed made-for-ad content. If a user gets curious and clicks a link on the spoofed page, the spammer makes a buck or two and the OP gets a visitor along with a spoofed referral. (This assumes of course that live.com has something similar to adsense or ypn.)