I've seen a few of these, with searches for random keywords (one was 'sex' bizarrely) that could not have found the site in question. In each case, the IP of the visitor was owned by Microsoft.
I think it's some type of internal testing, although I haven't figured out of what. I think the actual referrer is faked.
I'm seeing this since yesterday. Half of them on honeypot pages. When will they learn that links matching robots.txt rules are a no-no? Not even as a url-only reference in the SERP. Apparently this set of robots is using URLs from their index database. That's what I call eating ones own s..t :)
Once again a broken "technology" from microsoft.
[edited by: Achernar at 10:37 am (utc) on Aug. 17, 2007]
|In each case, the IP of the visitor was owned by Microsoft. |
I have found they are all owned by MS also. This has been going on for a long time and yes the terms are often 'sex related'. Why, I have no idea and I'm sure they get a kick out giving this type of referrer spam.
Yes, it has been going on for some time now. But it was once or twice a week. Currently it's 3 or 4 per hour.
Thanks for the feedback. I have now had several thousand requests over multiple sites in the last few days so I have a clearer picture of what is happening - mainly single word requests with most of them drug related. The requests call pages with Adsense on with the results that not only are my website logs completely messed up but also my Adsense statistics because of all the 'false' visitors. Conspiracy theory time!
Anyways, the requests all seem to come from the IP range 65.55.165.#*$! so I have denied them through .htaccess as follows:
deny from 65.55.165.
allow from all
I have also emailed Live support - let's see if I get a response!
Any more cases of this out there?
I'm expierencing the same. All to a subdirectory that is excluded in my robots.txt file. Most of the search terms are pharmacy rleated and my site is travel related.
Yes, I've seen this too. Just another sign of the mess at Microsoft.
... they are using 'strange' key words from the grey side : cartier volkswagen codeine zithromax lodging cell+phone mazda ringtone pron make+money+online mazda ... etc.
It is offending to run such crap against my sites, which are not about such a thing as 'ringtones'.
It is a parallel operation. It looks automated --> a bot. It is not looking for /robots.txt. It does not show a robot UA either but hiding behind a faked user browser USER_AGENT. And furthermore, they are not even able to have valid rDNS PTR records: a thing like 'bl2sch1082116.phx.gbl' is not a valid domain name and as such unprofessional.
This is just rude and nasty and showing poor recognition of basic netiquette. Bad manners.
What do they think? Not much, probably.
So this dubious operation is excluded from my sites now, which -- ironically -- are on Linux systems.
Are we having fun yet?
I have recieved an email from Microsoft Support requesting additional information and pointed them at this thread and asked them to comment.
Consequently, if anyone can add further details and examples then this would be appreciated, particularly information about the scale of the issue and its effects.
Thanks in advance.
Is it possible it is through one of the filters.... are people using any of the demographic filters to cause this?
Is this organic traffic? Or are the sites in question buying traffic from MS and they are testing some bizarre behavioral or demographic filter?
|Is this organic traffic? Or are the sites in question buying traffic from MS and they are testing some bizarre behavioral or demographic filter? |
Nope, never paid for any traffic myself and this is coming from inside MS.
Based on my experience it's not paid or anything else. The one I noticed had to do with a prescription diet pill - which is a long way from my topic (finance).
I thought I read something about MS checking new results with this type of query - if that's true they've got big problems.
I've also getting these strange search strings with no correlation to my site or any of my clients. It's been happening for months now.
All have Microsoft IP's and were related to drugs now they've switched to fairly offensive sex terms. I checked the referring pages and there aren't any site results showing at all for the search terms used today.
I did notice that I had another Microsoft IP in that range show up with the same Agent string but with no referral info.
Made me curious enough to see if anyone else was seeing the same thing.
I had a more detailed look into this for one site. Here's some info about this MS 'visitor':
- Landing pages appear to be pseudo-random (seemed to request 'batches' of related pages)
- IPs changeable in the range 65.55.165.*
- Spoofed referrers are search.live.com/result.aspx?q=[KEYWORD]&mrt=en-us&FORM=LVSP. However, the bot also makes requests without a referrer, some times in the course of the same visit
- Keywords are usually single words, varying from obvious commercial (but inoffensive) to drug names and pr0n-related words
I'm not seeing any flood of requests, but this bot appears to visit on a pattern of a number of pages a day. It then requests a handful of pages within a short time frame, but from different IP addresses and with cookies reset.
Now, this isn't a huge problem, but as it seems extremely likely this is automated traffic, it really isn't good for MS to make it appear human, and thus skew website stats. In addition, some of the keywords are adult, and should not be showing up indiscriminately in non-adult site's logfiles.
I am definitely interested in what the MS justification/explanation of this is.
[edited by: Receptional_Andy at 2:24 pm (utc) on Aug. 20, 2007]
I've seen this also - on a small scale on a small, mostly personal website. I assumed these were spoofed IPs when I first started seeing the referrers since the referring SERP is surely bogus. I haven't checked logs on any larger sites sites but I can share this:
Several referrals with a search query term that is irrelevant to site and landing page. Always generic search term. Things like "cash" and "payday advance" and "airline" and "nokia" are all terms this little website shouldn't and will never rank for.
Two IP ranges:
In August, all hits were from 65.55.165.*
In July, hits were from 131.107.0.*
Both of which are supposedly inside Microsoft Redmond.
And, of course, as reported here, none of the referring pages actually work. Page Not Found messages.
Here are examples:
That's possible, if the bot doesn't need to retrieve any information. However this bot/whatever it is does request files linked from a particular page like stylesheets and images, so it seems unlikely that the IP is fake, unless it is collecting data from another source/a real IP.
I can confirm the alternative IP range for July (different U/A then too).
One other note, I did find an older thread whioch seems to be about the same bot:
Possible Bot or Spammer? [webmasterworld.com]
I'm not sure the QC explanation will work here, since I can't see how the sites in question could ever appear for the sorts of words in the referrers (since they don't mention any of the words in question at all).
|this bot/whatever it is does request files linked from a particular page like stylesheets and images |
To add to it, they started hammering my poetry site on Thursday, and they were also downloading my AdSense blocks, completely inflating my stats. Myblogblog however, didn't recognize them as visitors, so the discrepancy was immediately obvious.
I blocked the IP range as soon as I saw them, but am concerned that there might at some point be legit MSN traffic from that range. If anyone knows for sure, or if an MS rep could reply to this thread that would be great. :)
Live Search Technical Support have replied to my second email with the following:
"Thank you for writing back to Live Search Technical Support. This is Marichu and I understand that you would like to know what is form LSVP and the purpose of this Microsoft activity. I realize the importance of this matter.
I have forwarded your concern to the Live Search Product Specialist Team so it may be given due attention. I understand the importance of this issue. Rest assured that we are doing everything within our means to remedy the situation.
We appreciate your continued support as we strive to provide you with the highest quality service available. Thank you for using Live Search."
Looks like a pass the parcel response to me or are they recognising a 'situation'? I will wait for a response but I am not hopeful of receiving one. Time will tell.
I have scoured Live to try and find form 'LSVP', searched everywhere that I acn think of.
First of all, you mixed up some of the letters. Your URL says "LVSP" but you are searching for "LSVP"
Next, LVSP is an acronym for Linux Virtual Server Project
So I did a search on Netcraft, and found that search.live.com is running Linux.
Is Microsoft running Linux? *gasp* Is that the TRUE conspiracy here? NOT. Actually, they've been using Linux for years.
So my conclusion from all of this is that MSN is experimenting with Linux Virtual Server Project, and well, computers being computers, things aren't going as planned.
|So my conclusion from all of this is that MSN is experimenting with Linux Virtual Server Project, and well, computers being computers, things aren't going as planned. |
That does nothing to explain the referrer spam for terms that none of us rank for though, or why the bot would be downloading AdSense scripts from Google.
The IP addresses beginning with 131.107 indicate the same exploit discussed in this thread [webmasterworld.com], where someone is using MSN's "tide" proxy servers to make it appear that these requests are coming from Microsoft when in fact, they're proxied requests. These can easy be blocked by denying access if the REMOTE_HOST contains "tideNNN.microsoft.com", where NNN is a series of numbers. This is discussed in the previous thread.
The 65.55.165.*** address range resolves back to MSN as well, but no PTR records exist for this range so it's not possible to tell if these are also proxy servers at MSN. However, it may still be possible to block these requests by looking for VIA or X-FORWARDED_FOR headers on the requests.
I am now seeing lots of the following type using a different form but emanating from the previously identified IP address range:
So a new form and a new set of keywords but the same IP range.
Still no reply from Microsoft - cannot say that I am surprised.
I see the same now, now it's LIVSOP
I am getting the same thing with the terms online and then support, mostly support. This morning I checked and found cc.msnscache.com/... as a refer 112 times. The url leads to a cache of a photo gallery that we removed from the site.
I'll bet it is from the live family safety beta.
Based on my experience I'm guessing it's not an MS test run foul but a proxy service being abused.
I've seen similar crap happen with Yahoo and Google and it's usually the result of someone running a scraping operation via one of their proxy services, something like a translator, web accelerator, wireless services, etc..
This too looks like a scrape attack to me because the keyword in the query couldn't possibly resolve to the landing page being accessed, at least not based on the current Live SERPs so I'm thinking the actual QUERY string is what's being faked just to throw people off the trail of what's really happening.
Anyone know of a proxy service Live runs that could be used in such a manner?
Whois for the 65.55.165.* range shows MS but the reverse DNS resolved to something like bl2sch0000000.phx.gbl whatever that is.
The user agent is always:
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
Is that the same user agent you all see?
in my case the user agent is always:
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
33 requests from 25 distinct IPs since Aug 15th, 2007
I say zisis a scraypa
"Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)" <<<yep thats the one.
| This 135 message thread spans 5 pages: 135 (  2 3 4 5 ) > > |