Welcome to WebmasterWorld Guest from 23.22.207.70

Forum Moderators: mack

Message Too Old, No Replies

Strange Referrer Activity

live

     
8:29 am on Aug 17, 2007 (gmt 0)

Junior Member

5+ Year Member

joined:May 19, 2006
posts:109
votes: 0


I am getting thousands of hits where the items in my log show the referrer as follows;

http://search.live.com/result.aspx?q=KEYWORD&mrt=en-us&FORM=LVSP

When I load the referred page then I am told that there are no results. Also there is no relationsfip between the keyword and the page requested. The Kkeywords are single words and seem to be mainly concerned with the normal spam areas.

I have scoured Live to try and find form 'LSVP', searched everywhere that I acn think of.

Can anyone enlighten me as to what the heck form LSVP is? Have the spammers foound another flaw? I am based in the UK.

Thanks in advance.

[edited by: engine at 10:30 am (utc) on Aug. 18, 2007]
[edit reason] delinked [/edit]

11:24 pm on Aug 28, 2007 (gmt 0)

Full Member

5+ Year Member

joined:Dec 3, 2006
posts:257
votes: 0


Since 3-4 days, the pages requested with these referrer query strings are always preceded by a valid msnbot (valid ip) request of the same url. And some of the search terms are valid (like in the example below).

example:

65.55.209.48 - - [28/Aug/2007:22:55:11 +0200] "GET /aut.php?id=3244&bib=1 HTTP/1.0" 200 6152
[deleted lines]
65.55.165.11 - - [28/Aug/2007:22:56:14 +0200] "GET /aut.php?id=3244&bib=1 HTTP/1.0" 200 6152 "http://search.live.com/results.aspx?q=tetsuya&mrt=en-us&FORM=LIVSOP" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
65.55.165.11 - - [28/Aug/2007:22:56:15 +0200] "GET /fct.js HTTP/1.0" 200 12745
65.55.165.11 - - [28/Aug/2007:22:56:16 +0200] "GET /skins/bdN.css HTTP/1.0" 200 3113 "" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"

11:55 pm on Aug 28, 2007 (gmt 0)

New User

10+ Year Member

joined:May 5, 2005
posts:13
votes: 0


It's good to know I'm not the only one seeing this. I've been seeing the same thing -- and have for a few weeks now.

The hits, which claim to be from Microsoft, list the user agent as: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)

These hits always come within 60 seconds of a visit from msnbot for the exact same page.

I don't get much traffic from live.com, which is one of the reasons that these hits stand out so much.

The other reason that they stand out is that the query string listed in the referering pages from live.com shows really interesting search terms. For example, I got one earlier today for "insurance." I wish I was getting real traffic for a term like that!

I'm reluctant to block the traffic, but I suppose if the volume of these hits increases, I might not have a choice.

What actions are you guys taking?

12:37 am on Aug 29, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14642
votes: 93


If it's not an MS service being exploited by a scraper the only other possible answer I have is that they're actively checking for cloaking.

Hard to say either way and if it's the latter no MS people will tell us that's the case.

4:02 am on Aug 29, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Aug 8, 2001
posts:926
votes: 0


Getting this too! - around 2000 every 24 hours same IP range.

The keyword showing in my logs is "travel" - that's the sector the affected site is in.

Possibly M$ is beta testing some directory or context advertising system? - simulating or possibly running with thousands of PC's and watching peoples surfing habits? pure conjecture of course.

Maybe they learnt something from Halo.... Wired Report [wired.com] on Halo game testing

10:22 am on Aug 29, 2007 (gmt 0)

Senior Member

joined:Mar 8, 2002
posts:2897
votes: 0


I'm reluctant to block the traffic, but I suppose if the volume of these hits increases, I might not have a choice. What actions are you guys taking?

Well I'm working on getting some feedback from Microsoft before it gets out of hand. Hopefully posted here.

10:37 am on Aug 29, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 29, 2002
posts:1954
votes: 0


Until they respond they are blocked... screwing up my stats..

[edited by: The_Contractor at 10:55 am (utc) on Aug. 29, 2007]

10:22 am on Aug 30, 2007 (gmt 0)

Senior Member

joined:Mar 8, 2002
posts:2897
votes: 0


Until they respond they are blocked... screwing up my stats..

Fair enough! Anyone else doing the same?

There are times when silence is a virtue from a big company. This is not one of them.

10:49 am on Aug 30, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 29, 2002
posts:1954
votes: 0


Seems this has to be a bot...not a smart one as it seems to like to be sent to a 403 page on the .net domain for the site.

These are all from the 65.55.165.xx range that use the adult/spam faked referrers.

I also see one coming from bl1sch2044210.phx.gbl at 65.55.235.217 with UA of msnbot-media/1.0 (+http://search.msn.com/msnbot.htm). It first showed up on 10:03:23 AM on Sunday, August 26, 2007 and has since come back for a total of 129 pages. I have not blocked this as it only hit less than 130 pages.

There also seems to be a msnbot-media/1.0 bot running with livebot-65-55-213-6.search.live.com (65.55.213.6) and one at livebot-65-55-235-202.search.live.com (65.55.235.202). These do not use the adult/spam/faked referrers.

[edited by: The_Contractor at 10:54 am (utc) on Aug. 30, 2007]

11:45 am on Aug 30, 2007 (gmt 0)

Senior Member

joined:Mar 8, 2002
posts:2897
votes: 0


So here's my not so techie guess.

Someone's running a scraper bot using an IP faker and false referrer. The content is getting jumbled and republished somewhere in a way that may or may not be getting indexed. My guess is that the spammer is cloaking this content to Googlebot and/or just putting it up there for the traffic value or any vague hope of link juice. (Find a unique word in your content, then search for it and go to the end of the results where the spam is).

Using Microsoft IP's and referrer convenient as blocking them might also block real traffic.

Question - why use Microsoft instead of Google? Have Google found a fix or does the spammer think Microsoft would result in less suspicion? OR... have they found a hole in the MS technology (not necessarily search) that let's them spoof or use Microsoft's IPs? JdMorgan points out [webmasterworld.com] that they are from tide.microsoft.com so maybe there's a technology there that's being exploited.

I still think Microsoft should confirm if this isn't them. There was a guy at AdChamps in the UK from MS that gave the most knowledgable presentation of click fraud I have ever seen - from anywhere in the industry. So they have the expertise. Just not the publicity machine on the organic side to be able to tell us.

[edited by: Receptional at 11:47 am (utc) on Aug. 30, 2007]

2:39 pm on Aug 30, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14642
votes: 93


Someone's running a scraper bot using an IP faker and false referrer.

Everyone always thinks people fake IPs and that has no value for data retreival, so unless you're mounting an attack faking does nothing.

Just a little light reading on spoofing [securityfocus.com]...

While some of the attacks described above are a bit outdated, such as session hijacking for host-based authentication services, IP spoofing is still prevalent in network scanning and probes, as well as denial of service floods. However, the technique does not allow for anonymous Internet access, which is a common misconception for those unfamiliar with the practice. Any sort of spoofing beyond simple floods is relatively advanced and used in very specific instances such as evasion and connection hijacking.

Basically, if someone was spoofing MS we'd all be sending the data BACK to MS and not the spoofer, get it? So if it's not MS IP's doing this then it's actually someone engaging all of our servers to mount and attack against MS and even 403 errors send packets.

I'm still in the camp that thinks it's a) a MS project of some sort of b) a proxy service being abused.

Blocking it will probably have no repercussions unless it's an cloaking checker.

I'm running reverse cloaking so if any of the content collected from those IPs is actually used I'll know about it and let you know if it ever appears.

3:39 pm on Aug 31, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Aug 7, 2005
posts:52
votes: 0


Are people still seeing this activity? So far today would make the second day of not having any of the live searches.
3:49 pm on Aug 31, 2007 (gmt 0)

Senior Member

joined:Mar 8, 2002
posts:2897
votes: 0


Basically, if someone was spoofing MS we'd all be sending the data BACK to MS and not the spoofer, get it? So if it's not MS IP's doing this then it's actually someone engaging all of our servers to mount and attack against MS and even 403 errors send packets.

I'm still in the camp that thinks it's a) a MS project of some sort of b) a proxy service being abused.

Yep - I get that now. Thanks for clarifying, Bill. Receptional_andy also pointed out the error of my thought process :).

8:49 pm on Sept 1, 2007 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member jab_creations is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 26, 2004
posts:3148
votes: 12


It's an interesting topic that I was not aware of though have confirmed after initially reading some of the posts here. This is what I am seeing...

Semi-advanced robot. It initially looks like a human but if understand the patterns in log files and what they imply you'll know that this is indeed a bot. I will not however go in to any further detail on that aspect however.

I'm not sure I can agree with the cloaking theory because after all you wouldn't want to make people aware that you're looking to figure out if they are cloaking?

The site scrapper seems (without deep insight in to my own logs) to make the most sense initially. Spammers aren't apologetic in the least about screwing up our statistical analysis.

Here is an important question, does Microsoft's Live spider support the application/xhtml+xml media type? I know Google does not. This bot is requesting pages with the following query on my site...

file.php?mime=axml

I think my site's media type switcher isn't functioning correctly (oh well it's well over a year old and soon to be replaced anyway) though I'm sure this has some implications?

Will blocking with the earlier mentioned Apache script block legitimate traffic and legitimate Microsoft Live spider crawling?

- John

6:29 pm on Sept 5, 2007 (gmt 0)

New User

5+ Year Member

joined:Sept 5, 2007
posts:7
votes: 0


"I'll bet it is from the live family safety beta."

That would be my guess as well. You might want to be careful when blocking these bots. you might be blocking your site from Live. Just my bit of worthless info.

6:52 pm on Sept 5, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


> Here is an important question, does Microsoft's Live spider support the application/xhtml+xml media type? I know Google does not.

I'm not sure what you base this statement on. Both Googlebot and Googlbot-Mobile regularly fetch and index my mobile-device pages, and all are of MIME-type application/xhtml+xml. These mobile pages are also indexed in MSN, so I conclude that msnbot can handle tha MIME-type as well.

Also, that query string is meaningless to the server. It is just a query string, and unless your file.php makes use of it, it is ignored; It does not 'select' an application/xhtml+xml response unless your script interprets it as such.

Maybe I'm not seeing the same "Strange Referrer Activity" as the rest of the respondents to this thread, but I've managed to block all of these requests by denying access to Microsoft's "Tide" proxy servers, as I noted above.

Jim

7:34 pm on Sept 5, 2007 (gmt 0)

Full Member

10+ Year Member

joined:May 23, 2004
posts:261
votes: 0


Thanks for all the feedback on this thread.

First, we appreciate the concerns and issues that have been raised and apologize for any incovenience this might have caused.

Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on
addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.

Please keep the feedback and thoughts coming as we will use this to help improve this process and make sure that it impacts your sites as little as possible.

thanks
- msndude (msd)

9:40 pm on Sept 5, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14642
votes: 93


The traffic you are seeing is part of a quality check we run on selected pages

I understand your need for quality checking but trying to bypass site security just to check for cloaking is a bit much. Besides, it came from Microsoft IPs and was easily detectable (we all caught it) means it can also be easily cloaked so if you think you're really doing quality control you're just fooling yourself.

FWIW, my bot blocker quarantined that IP range as a roque bot a long time ago because your server kept asking for pages and couldn't answer the captcha.

9:55 pm on Sept 5, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 29, 2002
posts:1954
votes: 0


The traffic you are seeing is part of a quality check we run on selected page

Sorry, but when you run through a proxy and use fake adult, spam, and s@x related referring query strings, you are blocked. Maybe you should be running a quality check on your engineers...

I'll risk a little traffic loss over the principle that a "real" company shouldn't use faked adult, spam, and s@x related referrers.

2:34 am on Sept 6, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


Now I'm really confused. I run a personal finance website and I get a quality check for phentermine? That word does not appear on my site - I guarantee it.

I guess that explains why my website doesn't show up in Live.com SERPs.

[edited by: BillyS at 2:36 am (utc) on Sep. 6, 2007]

6:51 pm on Sept 6, 2007 (gmt 0)

Junior Member

5+ Year Member

joined:May 12, 2006
posts:112
votes: 0


After Msndude replied stating a quality check...

My interpretation is that it is running through all the top spam search terms and clearing a site of having those terms... therefore is viewed as a 'quality' website.

However, by really screwing over webmasters logs and getting blocked in many cases ... (which may prevent webmasters sites being shown on LIVE serps), Msn are actually making their serps worse by being denied access to legitimate sites.

Now to perform a quality check on a page it shouldn't be interferring with webmasters logs ... an idea would be to cache the page on msn servers and run quality checks on the cached copies and NOT on the webmasters site.

Simple process - download the sitemap file, retrieve updated/modified pages, compare pages with existing cached copy, evaluate page changes with quality check on cached pages and reassign scoring rank based on the cached page.

8:26 pm on Sept 6, 2007 (gmt 0)

Junior Member

5+ Year Member

joined:Mar 6, 2007
posts:48
votes: 0


I thought it was just me. Shows how out of touch I am. I've been getting weird referrals from Live.com and just kind of ignored it.
But it got me to thinking, why would MS Live associate my site with Prozac, Viagra and Sexy Bikinis. I even searched through my backlinks trying to find something. ARGH!
This has been going on since the beginning of August.
That's just plain inconsiderate. I read all about it, maybe I'm just dense but I still don't get the point of leaving this crap in people's log files!?
Oh, and none of those terms appear anywhere on my site or in my backlinks that I could find anyhow, can't be sure since they took out their linkdomain: operator.
Who needs em anyhow!
1:35 pm on Sept 8, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


I'm seeing increased activity of this sort this morning. The good news is the queries actually make sense - words that should be on my site and point to pages that best answer the query.

Maybe MSN will actually let me back in their SERPs.

12:55 pm on Sept 18, 2007 (gmt 0)

New User

5+ Year Member

joined:Sept 18, 2007
posts: 2
votes: 0


I am posting as administer of around 200 commercial sites. My tasks include human behavioural tracking. In this case it is very important to filter out any robotic traffic.

For several weeks I have been annoyed by this "quality check" as it imposes it self as real behaviour with full fledge browser capabilities and a standard user agent. All of the sudden my customers are all excited over getting all this search engine traffic from Live Search which they are in fact NOT.

I am very very close to just block out all Microsoft traffic in the 65.55.165.* segment now! So Live-guys, please, PLEASE state in the user agent that this is robotic traffic, e.g. "Live Search Quality Check Robot" or what ever, and by that give us a chance to deliver correct data to our customers.

Regards
Jesper

9:57 am on Sept 20, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Mar 29, 2002
posts:1954
votes: 0


Looks like they have a new one:

A visitor from 131.107.151.157 was logged nnn times,
starting at 10:17:08 PM on Wednesday, September 19, 2007.
The initial browser was MSRBOT (http://research.microsoft.com/research/sv/msrbot/.

6:09 am on Sept 22, 2007 (gmt 0)

Full Member

10+ Year Member

joined:Apr 28, 2005
posts: 221
votes: 0


msndude, unless I read a press release or a communication from Microsoft posted on one of the Microsoft's websites, I'll treat what you say as questionable.

Although I did not do any research as to your genuine identity as a Microsoft employee, I generally question any poster's authenticity when it comes to any well known company's new policy posted on a discussion forum and nowhere for it to be seen on their site or official news anywhere.

An important issue here which is affecting millions of companies websites as well as well known and highly trafficked sites, surely MS should have posted something about it.

Knowing that spoofing search queries, referrers, domains and IPs in any manner will trigger security software such as mod_security as well as any security systems and webmasters to manually and automatically block IPs, which will in the end prevent MS and its bots from requesting and indexing sites and pages of those millions of webmasters. Should that happens, which it looks to me it's happening alteady, sooner or later 90% of the web will be inaccessible to any MS Bot, hence the live database will have only few million of lower quality and unimportant sites / links.

I can't believe for one minute Microsoft will want that, and in my opinion, some smart hackjack is doing his/her bit to ruin MS. A competitor, or an insider employee acting as a Mole using a competitor's infrastructure and technology within one of the Microsoft buildings...

4:41 pm on Sept 23, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 5, 2005
posts: 66
votes: 0


Are you joking...?

msndude said:
Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on
addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.

You're sending queries to Google AdSense, downloading and processing Javascript blocks using people's AdSense publisher ID, greatly inflating impressions, causing a much lower CTR, which for all we know is decreasing the per-click earnings on those accounts.

On top of that, now you are saying if we don't let you continue, we might not get included in MSN Live search?

How the hell is that Quality Control?

-Michael

5:47 pm on Sept 23, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member billys is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 1, 2004
posts:3181
votes: 0


If something big doesn't happen this week with Live.com I'm going to ban MS and all its robots from my site. I'm sick of this. Search is a distant fifth at best (behind Ask and Gigablast). In fact, I'm embarassed as a MS shareholder that this is the best they can do after over two years of effort.

The fact that IE is installed on many new machines provides MS with a huge opportunity they cannot capitalize on. The average Joe knows it - Live stinks as a search engine.

10:40 pm on Sept 23, 2007 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 8, 2004
posts:176
votes: 0


Common guys, let's give them some credit.
They have come up with yet another cleaver way of identifying low quality sites (a.k.a. spammers/affiliates/scrapers/etc) but their implementation sucks. Obviously they don't give a s..t about messing up your stats. What they should have done is use keywords normally sent from their SERPs. Also they should have sent only few a day instead of hundreds of dumb referrals from the same IP block. This way very few people would have noticed their "quality checking". However, it seems they have chosen to cut few corners here. I think by now most of the cloakers have adopted which makes further probing kind of pointless.
11:57 pm on Sept 23, 2007 (gmt 0)

Junior Member

5+ Year Member

joined:May 2, 2006
posts: 59
votes: 0


"but their implementation sucks"

So what are you saying, it's OK for them to lie but they should just be better liars?

12:29 am on Sept 24, 2007 (gmt 0)

Senior Member

WebmasterWorld Senior Member jdmorgan is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Mar 31, 2002
posts:25430
votes: 0


dusky,

Although I did not do any research as to your genuine identity as a Microsoft employee...

You may be assured that all WebmasterWorld members claiming to be employees of well-known corporations are checked out thoroughly here.

Jim

This 135 message thread spans 5 pages: 135
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members