This looks like old fashioned referrer spam to me.
They are hitting my sites mostly with keywords that appear on the site.
(With the exception of the porn keywords that are bad enough that I won't list them here.)
Sending porn referrers to Web sites on a massive scale cannot seriously be considered a "quality check", unless we are talking about the low quality of the concept.
My thought is that Microsoft wants webmasters to click through on the referrer links which will boost live.com's traffic. Microsoft might be thinking, "if we send them keywords that they are interested in, maybe they will start using Live.com Search".
"My thought is that Microsoft wants webmasters to click through on the referrer links which will boost live.com's traffic. Microsoft might be thinking, "if we send them keywords that they are interested in, maybe they will start using Live.com Search."
I may be cynical by nature, but the exact same theory has passed my mind too.
I'm struggling to find any other reason.
My logs show I'm still being hammered by these 'fake' searches, even though they've been eating 403s for almost a month.
The good news though is that despite banning the nuisance IP range, we've seen an upturn (up from 1% to 3% of total) in the number of genuine searches coming from Live Search and the real bot is happily spidering away and updating cached pages.
|I may be cynical by nature, but the exact same theory has passed my mind too. |
* Microsoft is sending keywords that match the subject matter of the Web sites
* They also insert some porn-related keywords
* This exactly matches referrer-spam behavior
* "MSNdude" ambiguously calls this a "quality check" and will not directly reply to concerns about this problem
* What kind of "quality check" would require sending of referrer strings to the webmasters' logs? Why does a webmaster need to see these referrers? Is MSN going to interview us about it when they are done?
Come on... there are a lot of experienced webmasters here. Can anyone think of any possible reasons why a "quality check" would require sending referrer strings that match the content of the site, or referrers that include porn words?
This is referrer spam.
|Come on... there are a lot of experienced webmasters here. Can anyone think of any possible reasons why a "quality check" would require sending referrer strings that match the content of the site, or referrers that include porn words? |
I can think of 2 non-legit reasons, and while they might both be a bit out there and tin foil hattish, I cannot think of any legit ones that would make more sense.
1) They are trying to boost their market shares by tricking millions of webmasters into checking their rankings on live.com
2) They are inflating the impressions of people AdSense reports, lowering their eCPM, in anticipation of releasing their own contextual advertising program for publishers, since if people start making less on G they would be more likely to switch.
Any other theories?
But then why would they continue "hammering" sites with no ads at all? It makes no sense. ;)
|But then why would they continue "hammering" sites with no ads at all? It makes no sense. ;) |
Maybe because it would be too obvious if the only sites they hammered were ones with AdSense on them. :-)
Have 2 sites that run AdSense...no problem here...
|They are inflating the impressions of people AdSense reports, lowering their eCPM... |
For those who can't or won't filter:
A quick `grep -v 'search.live.com/result.aspx'` before processing the logs may help to let the referals statistics part look as it was before, and as it should be: no sight of that live.com thingie any more, and hey, it finally dropped from 0.x% to nowhere to be seen. Looks good now.
Am I negatively biased from history? Well, probably. But this is too much, M$, way too much.
And while reading the other recent thread "MSN Live Needs Serious Work" ... it just made me check the logs to see that this MSN fake-referal-log-spam-bot is still active, as of today ...
Still active for me too, seriously, this is very annoying, I cannot phantom why MSN would need that kind of test while other engines never did something of the sort. Referer spam is not the best way to get in the good graces of the webmasters community. For a day or two, it could be acceptable, for months? Come on.
At least the latest lot of fake referrals from Microsoft did not include porn/spam words completely irrelevant to my site. But I remain annoyed with this spamming of our weblogs. Is it really necessary? What on earth is it achieving?
"What is it achieving?"
It pumps up their traffic numbers when Webmasters click through on their referrers to MSN Live.com.
I've already contacted a couple of media organization about it. Hopefully the story will be investigated and publicized.
It's corrupting my stats.
Maybe there should be a "ban LVSP" movement. Someone should write a WordPress plugin to block LVSP referrers...
I have the same, not an issue but, here on our sites. The interesting part is that every page that was visited by this spider all of a sudden starts to rank for its intended LONG Tail. So maybe it is a QC check. Now if we could get the users to use it, this would be Fantastico.
As far as blocking this spider, not taking the Bandwidth into consideration, wouldn’t the entry still appear in the log file, but just with different status code returned? So would this be kind of useless as far as not having the entries in the log file?
|The interesting part is that every page that was visited by this spider all of a sudden starts to rank for its intended LONG Tail. So maybe it is a QC check. |
I highly doubt it. If that were the case, why would MS send a referrer?
Sending a fake referrer = referrer spam.
|wouldn’t the entry still appear in the log file, but just with different status code returned? |
If you block with .htaccess rules, yes, you are right, the logs get polluted with that referrer spam, regardless if it comes with 200 or 403 status codes.
A better medicine would be grounding their IP addresses in the firewall, so the web server woulndn't see them at all.
That, however, prevents from seeing what is going on.
So, I let my webservers send them a 403, watch their tail through the logs, and later cut them out with a grep -v before processing the statistics.
This thread isn't dead yet ...
we(not me) trick - they trick, more: we see they do, do you think they do?
... and they are still active.
At least the IP address of the referrer-spam-bot now resolves to a valid "...search.live.com" name instead of the former invalid ".....phx.gbl". Did not check if they just changed IP addresses or finally fixed their DNS records.
livebot-65-55-210-70.search.live.com - - [22/Nov/2007:xx:xx:17 +0000] "GET /special/page.htm HTTP/1.0" 200 9820 "-" "msnbot/1.0 (+http://search.msn.com/msnbot.htm)" 188.8.131.52
livebot-65-55-165-98.search.live.com - - [22/Nov/2007:xx:xx:49 +0000] "GET /special/page.htm HTTP/1.0" 403 213 "http://search.live.com/results.aspx?q=special&mrt=en-us&FORM=LIVSOP" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)" 184.108.40.206
Apparently, they don't get the message ...
Well, it's been three months since msndude issued his "If you ban us, we might ban you!" ultimatum. If you have banned the "quality check" visits, have you noticed any adverse effects?
I ask only out of curiousity, as threatening to ban a peeved-off webmaster is no threat at all... Never forget! They need us more than we need them! :)
|... have you noticed any adverse effects? |
Hard to tell, as I got very few traffic from MSN/serach.live before, and when looking at the raw logs now, I still see some real users occasionally coming from MSN, after all.
So doesn't look like a hard ban. Another piece of FUD from MSN?
Since the logs are screwed up by their ongoing* log spam and are therefore worthless, I can't tell for sure, if it is the same amount of traffic as before, or more, or less.
*ongoing: yes, this unfamous operation is still ongoing. Not high volume, but annoying.
I unbanned the bad bot yesterday, to see if they are still doing the AdSense thing. They've been hitting the site and getting the 403's all along, and I've still been getting phantom Mediapartners bot visits, but I can't line the two up while they're banned. Will check later to see if they are still correlating.
|One of the biggest challenges with relevancy is how to distinguish legitimate information from various forms of search spam. This is one area that we've made especially good progress in over the last 8 months through a suite of tools that helps us detect, evaluate and manage spam. One of these tools is an extension to MSNBot, giving us an additional way to detect cloaking. |
rustybrick, I don't know about you, but I am not satisfied at all with what they wrote. There was some misinformation in there, as well as some answers that just don't suit. For one, they made the claim that they did in fact follow robots.txt, by using a cached version of what they had. This cannot be true, as if they had they never would have downloaded the AdSense scripts in the first place, since they are all blocked by robots.txt. There's other points as well that are lacking.
You know what.... At least they are honest about what they are doing...
Google and Yahoo don't say a word about this...
Supposedly, search engines still won't spider the page with these spiders if their bot is blocked. But who knows...
Microsoft is still learning and Webmasters are much more advanced these days compared to the older days when Google was developing technology like this...
|I don't know about you, but I am not satisfied at all with what they wrote. |
They had to make a public response so they are trying to save face -- I'm not buying it either.
* Why send referrers? The fact that they use an easily identifiable referrer (LIVSOP) means that it should be very easy to cloak against.
* Does anyone seriously spend a lot of effort cloaking to MSN? MSN/Live often sends little more traffic than Ask.com.
* Why did it take so long for Microsoft to come up with an explanation to the problem? And why doesn't MSNdude respond here?
It seems more likely to me that it was at least partially designed as a referrer spam campaign.
A client hosting a pharmacy website with us just reported this issue: 900! "searches" this month for the keyword "drugs", similar to this one:
Http Code: 200 Date: Dec 19 05:39:12 Http Version: HTTP/1.0 Size in Bytes: 11209
Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
Considering the large amount of traffic on these secondary pages, a redirect script was added to send all external visitors (that are not referred from a local page) to the main page.
Also the customer complained about huge amounts of traffic beeing eaten up by bots. :¦
[edited by: engine at 12:03 pm (utc) on Dec. 19, 2007]
[edit reason] See TOS [webmasterworld.com] [/edit]
|As if MSRBOT, Microsoft Data Access, and Microsoft URL Control weren't rude enough...|
Two of my sites are just now getting hammered by the so-called MSLIVSOP bot log-spamming with fake, site-specific keyword referers of the form previously reported --
-- all hailing from MS corporate hosts/IPs --
-- all supposedly running --
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)
-- and some, but not all, hitting js in addition to html.
I decided the quickest solution was to allow only msnbot and/or msnbot-media from .search.live.com. Live human visitors don't show as coming from that host anyway.
And I thought I had enough problems with occasional MS employee abusers via tideXX.microsoft.com. Alleged MS corporate abuse using MSLIVSOP is significantly more resource-wasting.
[[b]edited by[/b]: Pfui at 12:24 am (utc) on Jan. 14, 2008][/1]
The other day I noticed a pattern:
msnbot grabs robots.txt
msnbot grabs a page
Referral comes in the form of LVSP.
For a company their size, their approach to search engine technology is a joke. Their support is a series of "canned" responses. Their index is full of spam. Thankfully all their problems are completely obvious to end users.
If Live weren't the default on many browsers, they'd have close to zero share of search.
|If Live weren't the default on many browsers, they'd have close to zero share of search. |
They already are pretty close to zero... maybe 1% of traffic on my sites. There are usually just a couple dozen more hits from Live & MSN combined than from Ask.com.
I like this solution:
| This 135 message thread spans 5 pages: < < 135 ( 1 2 3  5 ) > > |