More on tide526.microsoft.com - Crawler, Spider, and User Agent ID forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

More on tide526.microsoft.com

helleborine

5:54 pm on May 14, 2007 (gmt 0)

10+ Year Member

Like many other webmasters, I found logs such as these recently:

tide526.microsoft.com - - [09/May/2007:10:52:15 -0500] "GET / HTTP/1.1" 200 1436 "http://search.live.com/result.aspx?q=tramadol&mrt=en-us&FORM=LVSP" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; Win64; x64; SV1)"

The above was a Tramadol search term, but there are other trashy pharmaceuticals that look like the one above. Since these words do not appear on any website I own, I know this is an illegitimate log entry.

I have read some speculation elsewhere on this forum
[webmasterworld.com...]
[webmasterworld.com...]
that these log entries are part of Microsoft quality control.

First off, given the state of MS's Vista, it's hard to believe that MS has any quality control department anywhere.

Secondly, why would Microsoft want to associate itself with sleazy pharmaceutical or even pornographic keywords?

Thirdly, if they are going to associate themselves with such seemingly unethical activities, shouldn't MS issue a statement to the webmaster community? TO force the words "quality control" in the URL would be a hint, but nooooo...

Why is the "GET" request a mere backslash?

There is something very wrong... am I crazy? It makes no sense.

keyplyr

7:28 am on May 15, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Why is the "GET" request a mere backslash?

That's requesting default page (index.html, index.htm, index.php, etc)

volatilegx

2:27 pm on May 15, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

This is just a pet peeve of mine, but that's not a backslash, it's a forward slash (or just a "slash").

botslist

2:55 pm on May 15, 2007 (gmt 0)

10+ Year Member

I'm not sure what you meant by "illegitimate log entry". The tramadol url looks like a referral, not a direct request to your server (which is the root document according to the log entry you posted).

So this would mean that someone did a query on live.com for tramadol, and while viewing the SERP for the query, the person decided to visit your site.

Which probably means that your site showed up in the SERP, or you are running an advert that was displayed to the visitor.

Another option: the visitor could be someone who already knew about your site, wanted some information on tramadol, and after getting the information, decided to pay you a visit while he or she was still viewing the SERP.

volatilegx

4:57 pm on May 15, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Another option: the visitor could be someone who already knew about your site, wanted some information on tramadol, and after getting the information, decided to pay you a visit while he or she was still viewing the SERP.

A type-in doesn't generate a HTTP_REFERER header.

Sounds to me like your site is showing up in the SERPs for tramadol. Maybe somebody posted a blog comment on your site about tramadol, or somebody linked to your site from a tramadol site, or somebody linked to your site using "tramadol" as link text... any number of possible explanations.

malachite

11:49 am on May 16, 2007 (gmt 0)

10+ Year Member

I've been getting these occasional weird referrals too. Always a one-word 'search' for a pill or pr0n and like the OP, these terms DO NOT appear on my site.

Trying to replicate the search produces zero clues.

Staffa

12:08 pm on May 16, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I just had a request on a new site of mine. The word searched for does not appear on my site but is rather famous for a production from a site owned by someone else but with a name along the same line as the name of my site. However the content of both sites is totally different from each other.
Therefore it makes me wonder if MS was checking for scraping or dupe content.

jdMorgan

5:05 pm on May 16, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

> Therefore it makes me wonder if MS was checking for scraping or dupe content.

I think that may be the case. The point here is that we're seeing referrals from searches for terms that do not appear anywhere on the site -- Not in the page body, not in URLs, and -- in my case at least, not in any 3rd-party advertising (I've seen this on several of my informational/special-interest sites which have no advertising whatsoever).

They may in fact be checking dupe content or scraping that they have found on another site --not yours-- that has copied your content for use with (or to cloak) the search term... If I get the time, I'll have to go do some quoted-phrase searches to find out.

Jim

botslist

7:17 pm on May 16, 2007 (gmt 0)

10+ Year Member

A type-in doesn't generate a HTTP_REFERER header.

You are absolutely right. But I was thinking more about someone using the history / favourites / bookmark / cache feature of the browser. I checked and rfc 2616 says:

The Referer field MUST NOT be sent if the Request-URI was obtained from a source that does not have its own URI, such as input from the user keyboard.

Which sounds like it covers someone using the bookmark feature as well, unless the bookmark is published and has its own uri, or the browser is not compliant :)

botslist

7:36 pm on May 16, 2007 (gmt 0)

10+ Year Member

They may in fact be checking dupe content or scraping that they have found on another site --not yours-- that has copied your content for use with (or to cloak) the search term... If I get the time, I'll have to go do some quoted-phrase searches to find out.

Assuming this is the case, the log entry in question will indicate that the scraped content contains a link back to the site that was scraped. Don't know if a scraper can be that careless, but if I were scraping, I'll certainly make sure that the scraped content doesn't contain any link to the original source.

volatilegx

8:43 pm on May 16, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Note that the HTTP_REFERER quoted in the above log example leads to a 404. Spoof?

botslist

10:39 pm on May 16, 2007 (gmt 0)

10+ Year Member

Note that the HTTP_REFERER quoted in the above log example leads to a 404. Spoof?

The referral uri? Yup, could be spoofed. Although what JdM suggested is also possible, I'll lean more towards it being spoofed.

If it isn't spoofed, though, and if live.com hasn't changed its uri format since the log entry was recorded, then it would have to be something internal to live.com that isn't meant to be used from outside (which would mean jdM is very likely right).

incrediBILL

12:34 am on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Note that "result.aspx" is definitely bogus as referrals from live.com are "results.aspx"

botslist

2:31 pm on May 17, 2007 (gmt 0)

10+ Year Member

Note that "result.aspx" is definitely bogus as referrals from live.com are "results.aspx"

Assuming that the referral is indeed spoofed, it makes me wonder what their objective might be. For example if the goal is to inflate stats for live.com, shouldn't they be sending a proper referral then?

What is the point of a bad guy sending an incorrectly formatted referral that arouses suspicion? And why would a bad guy want to inflate stats for live.com anyway?

This referral spam just doesn't make sense to me - unless there is a vulnerability in the wild that we don't anything about and they are trying to exploit it.

botslist

2:50 pm on May 17, 2007 (gmt 0)

10+ Year Member

This referral spam just doesn't make sense to me - unless there is a vulnerability in the wild that we don't anything about and they are trying to exploit it.

On second thought, since all the search keywords in the referrals are chosen to be inflammatory, this spam could be a form of attack on Microsoft, the goal being to degrade the **perceived** quality of the live.com search engine.

I.e. if enough webmasters keep seeing this garbage apparently coming from live.com, the attacker hopes they will conclude that Microsoft's search engine sucks or is badly broken.

incrediBILL

4:56 pm on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Maybe the goal is no more complicated than getting words like "tramadol" in your referral tracking to get people to click back to the SE and maybe on their ads while trying to figure out how you got there from LIVE in the first place, or to show up in blog referral lists for the same purpose.

Hard to say, but it's not showing up in my logs and I'm feeling left out.

helleborine

5:57 pm on May 17, 2007 (gmt 0)

10+ Year Member

If you google "tide526.microsoft.com" you'll see that the phenomena is quite widespread, and is puzzling many a webmaster.

Drag_Racer

6:08 pm on May 17, 2007 (gmt 0)

10+ Year Member

maybe these come from MS employees working on a beta version checking serps on a busted algo they are trying?

Drag_Racer

6:26 pm on May 17, 2007 (gmt 0)

10+ Year Member

run a whois on that domain

I just got back some junk with all these domians filled with words Brett would shoot me for posting, but this is one I can post

MICROSOFT.COM.ZZZOMBIED.AND.HACKED.BY.WWW.WEB-HACK.COM

run a trace and got routed nowhere near Microsoft

tried the same with Yahoo.com and got similar junk

tried a few of my domains and everything is normal

whats up here? did crsnic.net get hacked? just weird stuff I have not seen before...

[edited by: Drag_Racer at 6:32 pm (utc) on May 17, 2007]

Drag_Racer

7:00 pm on May 17, 2007 (gmt 0)

10+ Year Member

I went through a bunch of the serps at Google looking at logs and all have UA string with windows 2003 IE6 or 7 on a 64 bit machine

maybe a new virus has hit the 64bits...

botslist

8:17 pm on May 17, 2007 (gmt 0)

10+ Year Member

Maybe the goal is no more complicated than getting words like "tramadol" in your referral tracking to get people to click back to the SE and maybe on their ads while trying to figure out how you got there from LIVE in the first place, or to show up in blog referral lists for the same purpose.

You see, that's the part that I don't quite understand: the spoofer's uri is not in the referrer, and if you do click back to the SE through the referrer, what you get is a 404 - not a page with a link to the spoofer's site. Even if we publish the referral logs on our blogs, the spoofer still doesn't benefit because the referrer uri is broken.

The only beneficiary of this spam that I can think of is live.com (if the spam is an attempt to inflate stats) or someone who doesn't like live.com (if the spam is an attempt to degrade the perceived quality of live.com). Or maybe live.com really is broken and the search engine is misbehaving? Hard to tell.

jdMorgan

9:12 pm on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Or again, someone at microsoft looking at your site because a spam site targeting that keyword has scraped your content, and they want to see if you are involved in the spam "network."

The "Live.com usage inflation" doesn't fly either, because the very group it would be targeted at would discover it to be a fake, simply by following the referral link... and noticing that the time-on-site after the initial request is zero.

Jim

botslist

10:05 pm on May 17, 2007 (gmt 0)

10+ Year Member

Or again, someone at microsoft looking at your site because a spam site targeting that keyword has scraped your content, and they want to see if you are involved in the spam "network."

This does make sense. But it makes sense to me only if the scraped content contains a link back to the original source, which is hard to imagine unless the scraper is thoroughly clueless.

Besides, Microsoft must know that webmasters and log analyzers take referrals seriously, so sending a broken or internal referrer that webmasters can't clickthrough is kind of worrying, especially given the keywords that are being used. If Microsoft really is doing this, it better not come out or there might be a lawsuit or two :)

jdMorgan

10:19 pm on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

> This does make sense. But it makes sense to me only if the scraped content contains a link back to the
> original source, which is hard to imagine unless the scraper is thoroughly clueless.

The referrer is definitely a fake. But MS could easily find your (our) sites by simply copying a good long portion of the scraped content into their search box and slapping quotes around it. Get the URL, fake up a referrer, and grab your page -- or the automated equivalent.

I think I'll just block this whole subnet and see what happens. I've got all of their broken 'bots blocked now anyway, until they figure out how to do proper robots.txt prefix-matching... Hint "msnbot/" in robots.txt *does not* match "msnbot-media"... :)

Jim

incrediBILL

10:37 pm on May 17, 2007 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

what you get is a 404 - not a page with a link to the spoofer's site. Even if we publish the referral logs on our blogs, the spoofer still doesn't benefit because the referrer uri is broken.

Been there before.

I witnessed a huge influx like this and the domain names suddenly went live weeks later and that's when it was timed to be in the search results.

You have to stop thinking like a webmaster and thing like a spammer.

botslist

11:00 pm on May 17, 2007 (gmt 0)

10+ Year Member

I witnessed a huge influx like this and the domain names suddenly went live weeks later and that's when it was timed to be in the search results.

I bet that when this happened, the referrer (1) was not broken and (2) did refer to a page which directly or indirectly linked to the spammer's yet-to-be-launched site. But in the case we are looking at right now, neither (1) nor (2) is true, so I still fail to see how the spammer hopes to gain from this particular spam.

You have to stop thinking like a webmaster and thing like a spammer.

Ah, but that is exactly what we are doing in this forum right now :)

botslist

11:51 pm on May 17, 2007 (gmt 0)

10+ Year Member

maybe these come from MS employees working on a beta version checking serps on a busted algo they are trying?

This sounds quite plausible, so I decided to follow up on it. I found one entry in my bot database without referrer spam:

accept: */*
accept-language: en-us
connection: Keep-Alive
ua-cpu: x86
user-agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Tablet PC 1.7; .NET CLR 1.0.3705; .NET CLR 1.1.4322; .NET CLR 2.0.50727; InfoPath.2; MS-RTC LM 8)
via: 1.0 RED-PRXY-21
x-botcaps: 8
x-botserial: 07D70503101A0A232
x-icookies: yes
x-peer: 131.107.0.71
x-revdns: 131.107.0.71[tide501.microsoft.com]

If the headers are correct and I'm not misinterpreting the data, this looks like a normal browser visiting my site through the RED-PRXY-21 proxy, which somehow suggests that tide501.microsoft.com is or has been turned into a proxy.

The data is suspiciously similar to the one I posted under my "Is this really Yahoo's slurp" thread where someone appeared to be using a search engine host as a proxy.

A little research on Google suggests the problem may be related to Microsoft Security Bulletin MS04-039 (http://www.microsoft.com/technet/security/Bulletin/MS04-039.mspx). I still don't know for sure, though.

botslist

11:57 pm on May 17, 2007 (gmt 0)

10+ Year Member

run a whois on that domain
I just got back some junk with all these domians filled with words Brett would shoot me for posting, but this is one I can post
MICROSOFT.COM.ZZZOMBIED.AND.HACKED.BY.WWW.WEB-HACK.COM
run a trace and got routed nowhere near Microsoft
tried the same with Yahoo.com and got similar junk

Which domain did you run whois against?

helleborine

11:36 pm on May 18, 2007 (gmt 0)

10+ Year Member

The Security Bulletin says:

"Vulnerability in ISA Server 2000 and Proxy Server 2.0 Could Allow Internet Content Spoofing (888258)"

I read that "This vulnerability could enable an attacker to spoof trusted Internet content. Users could believe they are accessing trusted Internet content when in reality they are accessing malicious Internet content, for example a malicious Web site. However, an attacker would first have to persuade a user to visit the attacker�s site to attempt to exploit this vulnerability."

These log entries do not appear to be the most efficient way to persuade people to visit a malicious website - only a few curious webmasters.

How might all this relate to the tide526.microsoft.com log entries?

botslist

4:33 pm on May 19, 2007 (gmt 0)

10+ Year Member

How might all this relate to the tide526.microsoft.com log entries?

I'll be the first to admit that this is a bit farfetched, but what if a large corporate network running one of the products listed in the bulletin is compromised? Users on the network will think they are visiting some site A when in fact they are being served spoofed made-for-ad content. If a user gets curious and clicks a link on the spoofed page, the spammer makes a buck or two and the OP gets a visitor along with a spoofed referral. (This assumes of course that live.com has something similar to adsense or ypn.)