homepage Welcome to WebmasterWorld Guest from 23.20.149.27
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Pubcon Website
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

This 135 message thread spans 5 pages: < < 135 ( 1 [2] 3 4 5 > >     
Strange Referrer Activity
live
confuscius




msg:3424478
 8:29 am on Aug 17, 2007 (gmt 0)

I am getting thousands of hits where the items in my log show the referrer as follows;

http://search.live.com/result.aspx?q=KEYWORD&mrt=en-us&FORM=LVSP

When I load the referred page then I am told that there are no results. Also there is no relationsfip between the keyword and the page requested. The Kkeywords are single words and seem to be mainly concerned with the normal spam areas.

I have scoured Live to try and find form 'LSVP', searched everywhere that I acn think of.

Can anyone enlighten me as to what the heck form LSVP is? Have the spammers foound another flaw? I am based in the UK.

Thanks in advance.

[edited by: engine at 10:30 am (utc) on Aug. 18, 2007]
[edit reason] delinked [/edit]

 

Achernar




msg:3434934
 11:24 pm on Aug 28, 2007 (gmt 0)

Since 3-4 days, the pages requested with these referrer query strings are always preceded by a valid msnbot (valid ip) request of the same url. And some of the search terms are valid (like in the example below).

example:

65.55.209.48 - - [28/Aug/2007:22:55:11 +0200] "GET /aut.php?id=3244&bib=1 HTTP/1.0" 200 6152
[deleted lines]
65.55.165.11 - - [28/Aug/2007:22:56:14 +0200] "GET /aut.php?id=3244&bib=1 HTTP/1.0" 200 6152 "http://search.live.com/results.aspx?q=tetsuya&mrt=en-us&FORM=LIVSOP" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"
65.55.165.11 - - [28/Aug/2007:22:56:15 +0200] "GET /fct.js HTTP/1.0" 200 12745
65.55.165.11 - - [28/Aug/2007:22:56:16 +0200] "GET /skins/bdN.css HTTP/1.0" 200 3113 "" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)"

orbiter




msg:3434954
 11:55 pm on Aug 28, 2007 (gmt 0)

It's good to know I'm not the only one seeing this. I've been seeing the same thing -- and have for a few weeks now.

The hits, which claim to be from Microsoft, list the user agent as: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322)

These hits always come within 60 seconds of a visit from msnbot for the exact same page.

I don't get much traffic from live.com, which is one of the reasons that these hits stand out so much.

The other reason that they stand out is that the query string listed in the referering pages from live.com shows really interesting search terms. For example, I got one earlier today for "insurance." I wish I was getting real traffic for a term like that!

I'm reluctant to block the traffic, but I suppose if the volume of these hits increases, I might not have a choice.

What actions are you guys taking?

incrediBILL




msg:3434976
 12:37 am on Aug 29, 2007 (gmt 0)

If it's not an MS service being exploited by a scraper the only other possible answer I have is that they're actively checking for cloaking.

Hard to say either way and if it's the latter no MS people will tell us that's the case.

gethan




msg:3435092
 4:02 am on Aug 29, 2007 (gmt 0)

Getting this too! - around 2000 every 24 hours same IP range.

The keyword showing in my logs is "travel" - that's the sector the affected site is in.

Possibly M$ is beta testing some directory or context advertising system? - simulating or possibly running with thousands of PC's and watching peoples surfing habits? pure conjecture of course.

Maybe they learnt something from Halo.... Wired Report [wired.com] on Halo game testing

Receptional




msg:3435305
 10:22 am on Aug 29, 2007 (gmt 0)

I'm reluctant to block the traffic, but I suppose if the volume of these hits increases, I might not have a choice. What actions are you guys taking?

Well I'm working on getting some feedback from Microsoft before it gets out of hand. Hopefully posted here.

The Contractor




msg:3435315
 10:37 am on Aug 29, 2007 (gmt 0)

Until they respond they are blocked... screwing up my stats..

[edited by: The_Contractor at 10:55 am (utc) on Aug. 29, 2007]

Receptional




msg:3436397
 10:22 am on Aug 30, 2007 (gmt 0)

Until they respond they are blocked... screwing up my stats..

Fair enough! Anyone else doing the same?

There are times when silence is a virtue from a big company. This is not one of them.

The Contractor




msg:3436408
 10:49 am on Aug 30, 2007 (gmt 0)

Seems this has to be a bot...not a smart one as it seems to like to be sent to a 403 page on the .net domain for the site.

These are all from the 65.55.165.xx range that use the adult/spam faked referrers.

I also see one coming from bl1sch2044210.phx.gbl at 65.55.235.217 with UA of msnbot-media/1.0 (+http://search.msn.com/msnbot.htm). It first showed up on 10:03:23 AM on Sunday, August 26, 2007 and has since come back for a total of 129 pages. I have not blocked this as it only hit less than 130 pages.

There also seems to be a msnbot-media/1.0 bot running with livebot-65-55-213-6.search.live.com (65.55.213.6) and one at livebot-65-55-235-202.search.live.com (65.55.235.202). These do not use the adult/spam/faked referrers.

[edited by: The_Contractor at 10:54 am (utc) on Aug. 30, 2007]

Receptional




msg:3436440
 11:45 am on Aug 30, 2007 (gmt 0)

So here's my not so techie guess.

Someone's running a scraper bot using an IP faker and false referrer. The content is getting jumbled and republished somewhere in a way that may or may not be getting indexed. My guess is that the spammer is cloaking this content to Googlebot and/or just putting it up there for the traffic value or any vague hope of link juice. (Find a unique word in your content, then search for it and go to the end of the results where the spam is).

Using Microsoft IP's and referrer convenient as blocking them might also block real traffic.

Question - why use Microsoft instead of Google? Have Google found a fix or does the spammer think Microsoft would result in less suspicion? OR... have they found a hole in the MS technology (not necessarily search) that let's them spoof or use Microsoft's IPs? JdMorgan points out [webmasterworld.com] that they are from tide.microsoft.com so maybe there's a technology there that's being exploited.

I still think Microsoft should confirm if this isn't them. There was a guy at AdChamps in the UK from MS that gave the most knowledgable presentation of click fraud I have ever seen - from anywhere in the industry. So they have the expertise. Just not the publicity machine on the organic side to be able to tell us.

[edited by: Receptional at 11:47 am (utc) on Aug. 30, 2007]

incrediBILL




msg:3436622
 2:39 pm on Aug 30, 2007 (gmt 0)

Someone's running a scraper bot using an IP faker and false referrer.

Everyone always thinks people fake IPs and that has no value for data retreival, so unless you're mounting an attack faking does nothing.

Just a little light reading on spoofing [securityfocus.com]...
While some of the attacks described above are a bit outdated, such as session hijacking for host-based authentication services, IP spoofing is still prevalent in network scanning and probes, as well as denial of service floods. However, the technique does not allow for anonymous Internet access, which is a common misconception for those unfamiliar with the practice. Any sort of spoofing beyond simple floods is relatively advanced and used in very specific instances such as evasion and connection hijacking.

Basically, if someone was spoofing MS we'd all be sending the data BACK to MS and not the spoofer, get it? So if it's not MS IP's doing this then it's actually someone engaging all of our servers to mount and attack against MS and even 403 errors send packets.

I'm still in the camp that thinks it's a) a MS project of some sort of b) a proxy service being abused.

Blocking it will probably have no repercussions unless it's an cloaking checker.

I'm running reverse cloaking so if any of the content collected from those IPs is actually used I'll know about it and let you know if it ever appears.

chance1376




msg:3437761
 3:39 pm on Aug 31, 2007 (gmt 0)

Are people still seeing this activity? So far today would make the second day of not having any of the live searches.

Receptional




msg:3437778
 3:49 pm on Aug 31, 2007 (gmt 0)

Basically, if someone was spoofing MS we'd all be sending the data BACK to MS and not the spoofer, get it? So if it's not MS IP's doing this then it's actually someone engaging all of our servers to mount and attack against MS and even 403 errors send packets.

I'm still in the camp that thinks it's a) a MS project of some sort of b) a proxy service being abused.

Yep - I get that now. Thanks for clarifying, Bill. Receptional_andy also pointed out the error of my thought process :).

JAB Creations




msg:3438862
 8:49 pm on Sep 1, 2007 (gmt 0)

It's an interesting topic that I was not aware of though have confirmed after initially reading some of the posts here. This is what I am seeing...

Semi-advanced robot. It initially looks like a human but if understand the patterns in log files and what they imply you'll know that this is indeed a bot. I will not however go in to any further detail on that aspect however.

I'm not sure I can agree with the cloaking theory because after all you wouldn't want to make people aware that you're looking to figure out if they are cloaking?

The site scrapper seems (without deep insight in to my own logs) to make the most sense initially. Spammers aren't apologetic in the least about screwing up our statistical analysis.

Here is an important question, does Microsoft's Live spider support the application/xhtml+xml media type? I know Google does not. This bot is requesting pages with the following query on my site...
file.php?mime=axml

I think my site's media type switcher isn't functioning correctly (oh well it's well over a year old and soon to be replaced anyway) though I'm sure this has some implications?

Will blocking with the earlier mentioned Apache script block legitimate traffic and legitimate Microsoft Live spider crawling?

- John

drummerboy9000




msg:3442196
 6:29 pm on Sep 5, 2007 (gmt 0)

"I'll bet it is from the live family safety beta."

That would be my guess as well. You might want to be careful when blocking these bots. you might be blocking your site from Live. Just my bit of worthless info.

jdMorgan




msg:3442211
 6:52 pm on Sep 5, 2007 (gmt 0)

> Here is an important question, does Microsoft's Live spider support the application/xhtml+xml media type? I know Google does not.

I'm not sure what you base this statement on. Both Googlebot and Googlbot-Mobile regularly fetch and index my mobile-device pages, and all are of MIME-type application/xhtml+xml. These mobile pages are also indexed in MSN, so I conclude that msnbot can handle tha MIME-type as well.

Also, that query string is meaningless to the server. It is just a query string, and unless your file.php makes use of it, it is ignored; It does not 'select' an application/xhtml+xml response unless your script interprets it as such.

Maybe I'm not seeing the same "Strange Referrer Activity" as the rest of the respondents to this thread, but I've managed to block all of these requests by denying access to Microsoft's "Tide" proxy servers, as I noted above.

Jim

msndude




msg:3442263
 7:34 pm on Sep 5, 2007 (gmt 0)

Thanks for all the feedback on this thread.

First, we appreciate the concerns and issues that have been raised and apologize for any incovenience this might have caused.

Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on
addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.

Please keep the feedback and thoughts coming as we will use this to help improve this process and make sure that it impacts your sites as little as possible.

thanks
- msndude (msd)

incrediBILL




msg:3442416
 9:40 pm on Sep 5, 2007 (gmt 0)

The traffic you are seeing is part of a quality check we run on selected pages

I understand your need for quality checking but trying to bypass site security just to check for cloaking is a bit much. Besides, it came from Microsoft IPs and was easily detectable (we all caught it) means it can also be easily cloaked so if you think you're really doing quality control you're just fooling yourself.

FWIW, my bot blocker quarantined that IP range as a roque bot a long time ago because your server kept asking for pages and couldn't answer the captcha.

The Contractor




msg:3442426
 9:55 pm on Sep 5, 2007 (gmt 0)

The traffic you are seeing is part of a quality check we run on selected page

Sorry, but when you run through a proxy and use fake adult, spam, and s@x related referring query strings, you are blocked. Maybe you should be running a quality check on your engineers...

I'll risk a little traffic loss over the principle that a "real" company shouldn't use faked adult, spam, and s@x related referrers.

BillyS




msg:3442598
 2:34 am on Sep 6, 2007 (gmt 0)

Now I'm really confused. I run a personal finance website and I get a quality check for phentermine? That word does not appear on my site - I guarantee it.

I guess that explains why my website doesn't show up in Live.com SERPs.

[edited by: BillyS at 2:36 am (utc) on Sep. 6, 2007]

Ganceann




msg:3443302
 6:51 pm on Sep 6, 2007 (gmt 0)

After Msndude replied stating a quality check...

My interpretation is that it is running through all the top spam search terms and clearing a site of having those terms... therefore is viewed as a 'quality' website.

However, by really screwing over webmasters logs and getting blocked in many cases ... (which may prevent webmasters sites being shown on LIVE serps), Msn are actually making their serps worse by being denied access to legitimate sites.

Now to perform a quality check on a page it shouldn't be interferring with webmasters logs ... an idea would be to cache the page on msn servers and run quality checks on the cached copies and NOT on the webmasters site.

Simple process - download the sitemap file, retrieve updated/modified pages, compare pages with existing cached copy, evaluate page changes with quality check on cached pages and reassign scoring rank based on the cached page.

dan404




msg:3443409
 8:26 pm on Sep 6, 2007 (gmt 0)

I thought it was just me. Shows how out of touch I am. I've been getting weird referrals from Live.com and just kind of ignored it.
But it got me to thinking, why would MS Live associate my site with Prozac, Viagra and Sexy Bikinis. I even searched through my backlinks trying to find something. ARGH!
This has been going on since the beginning of August.
That's just plain inconsiderate. I read all about it, maybe I'm just dense but I still don't get the point of leaving this crap in people's log files!?
Oh, and none of those terms appear anywhere on my site or in my backlinks that I could find anyhow, can't be sure since they took out their linkdomain: operator.
Who needs em anyhow!

BillyS




msg:3444967
 1:35 pm on Sep 8, 2007 (gmt 0)

I'm seeing increased activity of this sort this morning. The good news is the queries actually make sense - words that should be on my site and point to pages that best answer the query.

Maybe MSN will actually let me back in their SERPs.

jellegaa




msg:3453756
 12:55 pm on Sep 18, 2007 (gmt 0)

I am posting as administer of around 200 commercial sites. My tasks include human behavioural tracking. In this case it is very important to filter out any robotic traffic.

For several weeks I have been annoyed by this "quality check" as it imposes it self as real behaviour with full fledge browser capabilities and a standard user agent. All of the sudden my customers are all excited over getting all this search engine traffic from Live Search which they are in fact NOT.

I am very very close to just block out all Microsoft traffic in the 65.55.165.* segment now! So Live-guys, please, PLEASE state in the user agent that this is robotic traffic, e.g. "Live Search Quality Check Robot" or what ever, and by that give us a chance to deliver correct data to our customers.

Regards
Jesper

The Contractor




msg:3455787
 9:57 am on Sep 20, 2007 (gmt 0)

Looks like they have a new one:

A visitor from 131.107.151.157 was logged nnn times,
starting at 10:17:08 PM on Wednesday, September 19, 2007.
The initial browser was MSRBOT (http://research.microsoft.com/research/sv/msrbot/.

dusky




msg:3457836
 6:09 am on Sep 22, 2007 (gmt 0)

msndude, unless I read a press release or a communication from Microsoft posted on one of the Microsoft's websites, I'll treat what you say as questionable.

Although I did not do any research as to your genuine identity as a Microsoft employee, I generally question any poster's authenticity when it comes to any well known company's new policy posted on a discussion forum and nowhere for it to be seen on their site or official news anywhere.

An important issue here which is affecting millions of companies websites as well as well known and highly trafficked sites, surely MS should have posted something about it.

Knowing that spoofing search queries, referrers, domains and IPs in any manner will trigger security software such as mod_security as well as any security systems and webmasters to manually and automatically block IPs, which will in the end prevent MS and its bots from requesting and indexing sites and pages of those millions of webmasters. Should that happens, which it looks to me it's happening alteady, sooner or later 90% of the web will be inaccessible to any MS Bot, hence the live database will have only few million of lower quality and unimportant sites / links.

I can't believe for one minute Microsoft will want that, and in my opinion, some smart hackjack is doing his/her bit to ruin MS. A competitor, or an insider employee acting as a Mole using a competitor's infrastructure and technology within one of the Microsoft buildings...

mvandemar




msg:3458653
 4:41 pm on Sep 23, 2007 (gmt 0)

Are you joking...?

msndude said:
Second, we want to explain what this is all about. The traffic you are seeing is part of a quality check we run on selected pages. While we work on
addressing your conerns, we would request that you do not actively block the IP addreses used by this quality check; blocking these IP addresses could prevent your site from being included in the Live Search index.

You're sending queries to Google AdSense, downloading and processing Javascript blocks using people's AdSense publisher ID, greatly inflating impressions, causing a much lower CTR, which for all we know is decreasing the per-click earnings on those accounts.

On top of that, now you are saying if we don't let you continue, we might not get included in MSN Live search?

How the hell is that Quality Control?

-Michael

BillyS




msg:3458681
 5:47 pm on Sep 23, 2007 (gmt 0)

If something big doesn't happen this week with Live.com I'm going to ban MS and all its robots from my site. I'm sick of this. Search is a distant fifth at best (behind Ask and Gigablast). In fact, I'm embarassed as a MS shareholder that this is the best they can do after over two years of effort.

The fact that IE is installed on many new machines provides MS with a huge opportunity they cannot capitalize on. The average Joe knows it - Live stinks as a search engine.

Shurik




msg:3458816
 10:40 pm on Sep 23, 2007 (gmt 0)

Common guys, let's give them some credit.
They have come up with yet another cleaver way of identifying low quality sites (a.k.a. spammers/affiliates/scrapers/etc) but their implementation sucks. Obviously they don't give a s..t about messing up your stats. What they should have done is use keywords normally sent from their SERPs. Also they should have sent only few a day instead of hundreds of dumb referrals from the same IP block. This way very few people would have noticed their "quality checking". However, it seems they have chosen to cut few corners here. I think by now most of the cloakers have adopted which makes further probing kind of pointless.

exposure




msg:3458848
 11:57 pm on Sep 23, 2007 (gmt 0)

"but their implementation sucks"

So what are you saying, it's OK for them to lie but they should just be better liars?

jdMorgan




msg:3458857
 12:29 am on Sep 24, 2007 (gmt 0)

dusky,

Although I did not do any research as to your genuine identity as a Microsoft employee...

You may be assured that all WebmasterWorld members claiming to be employees of well-known corporations are checked out thoroughly here.

Jim

mvandemar




msg:3458989
 7:09 am on Sep 24, 2007 (gmt 0)

Jim,

You may be assured that all WebmasterWorld members claiming to be employees of well-known corporations are checked out thoroughly here.

Periodically...? :)

Although I am saying it jokingly, it is actually a valid concept that employment situations change. :P

This 135 message thread spans 5 pages: < < 135 ( 1 [2] 3 4 5 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved