Welcome to WebmasterWorld Guest from 54.145.235.72

Forum Moderators: mack

MSN Search Claims to Freeze Out Web Spam

   
7:26 pm on Jun 10, 2004 (gmt 0)

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



[pcmag.co.uk...]

In a sample of one billion web pages, Microsoft claims that eight per cent are spam.

In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages, linked to a porn site.

7:38 pm on Jun 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I'll reserve judgement for when I actually see it happen.

I seem to remember something about G** founders saying much the same thing a few years back as well. ;)

7:43 pm on Jun 10, 2004 (gmt 0)

WebmasterWorld Senior Member shak is a WebmasterWorld Top Contributor of All Time 10+ Year Member



In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages, linked to a porn site.

you know thats a WebmasterWorld member right?

;)

Shak

7:48 pm on Jun 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages

Wow, bully for Microsoft researchers. We've been debating scraping [webmasterworld.com] ...here in WW [webmasterworld.com] for yonks now.

<off topic> PC Magazine in the UK went bust. PC Mag in the US is still operational. Strange that the URL is pcmag.co.uk. </off topic>

7:50 pm on Jun 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



here's part of the research on that report.

[webmasterworld.com...]

8:47 pm on Jun 10, 2004 (gmt 0)

WebmasterWorld Administrator skibum is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Only 8%? Seems like a very low number, maybe more like 25-50%? or maybe they crawling a different Internet than Google.
8:52 pm on Jun 10, 2004 (gmt 0)

WebmasterWorld Senior Member essex_boy is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Wicked! Depends on what they define as spam.

Would like to know just out of interest.

9:15 pm on Jun 10, 2004 (gmt 0)

WebmasterWorld Administrator webwork is a WebmasterWorld Top Contributor of All Time 10+ Year Member



8%? That's reassuring.
.
.
.
.
.
That means the other 32% flies under the radar.
10:28 pm on Jun 10, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> means the other 32% flies under the radar

I'd bust out laughing, but you're probably 100% correct, and well, the more I consider it, the less humor I find in it.
:(

12:02 am on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Read the paper and you will see that these guys are about 4-5 years behind in their studies.
12:07 am on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages, linked to a porn site.

Back to the drawing board I go...

12:44 am on Jun 11, 2004 (gmt 0)

10+ Year Member



It's a good feeling to know that MS only detected 8% of pages as spam because the real number is probably right around 99. Everyone is safe.
12:46 am on Jun 11, 2004 (gmt 0)

10+ Year Member



Just doing a search on MSDN and got a window asking me if I'd like to complete a survey on their search results, "please enter email address and a survey will be sent". Seemed worth mentioning here, even though it's not web results, just MSDN in-house SERPs.
2:31 am on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interesting read... they characterize "spam pages" as completely useless to humans so that may help explain the low number.

I now understand why some WebmasterWorld'ers were complaining of the MS bot hitting their site so hard, over and over. Looks like they have been live testing their spam detectors at our expense. Nice. I'll remember that. Payback's a _____.

2:47 am on Jun 11, 2004 (gmt 0)

10+ Year Member



Only 8%? Seems like a very low number, maybe more like 25-50%? or maybe they crawling a different Internet than Google.

Technically they are crawling a different Internet. There must be a huge number of sites that currently block msnbot from crawling their webpages.

3:13 am on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Read the paper and you will see that these guys are about 4-5 years behind in their studies.
They don't seem to have much of a clue about search engine operations either. The presence the query in the url is important as a lot of search engines tend to work on a set of weightings assigned to each element of a page. It is possible to tweak a search engine's algorithm by changing the weighting for the url.

I've read the paper. It is interesting in that some of its claims are reasonably accurate. However the reference to query strings being present in the URLS and thus influencing results as "folklore" show these guys up to be a bunch of clueless academics who know less about search engine operations than the average SEO.

The wildcard DNS spam is fairly old and can easily be detected by filters. Also these guys do not seem to have much of a clue about geo-location of websites either. One of the side-effects of the geo-location process is that you build up a very efficient database of IPs and websites. It allows clusters to be detected with a very simple SQL query. If Microsoft thinks that this kind of research will impress anyone other than academic tossers, then it is in for a very rude awakening. Search engine spam is a far more complex, and evolving problem. Such filtering will probably detect a pile of spam but the time and resources taken will probably be wasted as spammers will move on. This would give the spammer technique approximately a two cycle (or two Google dance) lucrative operational lifetime. The funny thing is that the paper and research probably took longer to do than the lucrative operational lifetime of the spammer technique.

Regards...jmcc

4:29 am on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A Modest Proposal for SEO

I just read the M$ research paper on search, and quite frankly, I am scared. My economic viability depends on my ability to outwit, outplay, and outlast my competitors in my niche. I must keep ahead of the search engines and secure top spots in the SERPs. Each day I generate tens of thousands of dollars of wealth via the top spots for thousands of keywords, knowing -- nay, expecting -- that even an arbitrary Google burp could stop it all dead instantly, or at best after a single cache update.

As if the pain and torture of this uncertainty weren't enough to drive one insane, the associated stats checking addicition burns like acid on the wound.

Now the big boys - the monster monopoly - the Gates of Hellfire - are onto us. They've figured out every trick, technique, slip, hole, and nuance in the SEO bag o' tricks. It's all there - right in the research paper. Got a URL with more than 3 dashes or dots? NAILED - they proved it is a SPAMMY SITE - you're toast, dude. Got a long URL? WHAMMO! U-R-banned! No more cashola. Nada. No donuts, dude. You lose. They got that covered.. they showed SCIENTIFICALLY that a mere 0.173% of all domain names are over 45 characters, and "the vast majority of those appear to be spam". Man, this stuff freaks me out. I knew they were smart, and with Google hiring all those Ph.D.s I've gotten accustomed to going toe-to-toe with braniacs, but I never suspected it would get this serious. These guys are unstoppable!

Look at that paper - I beg you to pay attention! Got subdomains? They're onto you. Nobody uses subdomains except spammers! NAILED! These guys have achieved a level of sophistication so advanced, so beyond anything you have ever cogitated, that in their M$ research labs even "casual inspections" reveal spammy sites! Listen up guys, they are so advanced that manual spam site detection is so easy it ISN'T EVEN WORK FOR THEM! In their inner circles they are calling some SEO's "naive" and refering to sites as "templatic". Read the writing on the walls.... we are doomed.

Thought you could run circles around whois by rolling your own DNS? Gotcha covered! Go ahead and TRY to point 5,000 domains to a single IP - they'll catch you for sure! They even found one guy with 8,967,154 host names resolving to a single IP address! Is no one safe? Next thing you know they'll probably even figure out who it is! Is there nothing left?

Is your "average rate of page mutation" to high? Do your webpages completely change with every GET or POST to the same URI? They can even detect sites where almost every page changes almost completely every week. THERE IS NO WAY TO HIDE FROM THESE GUYS! (note to self - remove that updated $Dates/$time script thingy from the header pronto).

THERE IS NO FUTURE FOR SEO!

Or is there? Perhaps we are not doomed as a species, fated to succumb to the superior intellect of the great M$ machine after all. Here I propose a CALL TO ACTION for every white, grey, gray, or black hatted SEO. Stand now, defend your ground! Preserve our race! Secure my independent wealth, and that of my cat!

I here propose something more diabolical, more hideously insane and absolutely unspeakable than anything ever uttered across the dark, beer-stained table tops of Pubcon. Here me this, WebmasterWorld'ers. United in our resolve, and unified in our practice standards, we shall overcome this plageous assault. We have one last chance. One final moment to make a stand before the harbingers of righteous annihiliation.

Starting today, you must make your websites seem normal. That's right - natural. Organic almost. As if, nurished by divine intervention, they were growing in the webring patches. Observe your daft amateur peers closely, and EMULATE. Start using Fro*n**age. Use <b> instead of <strong> -- every other time. Open your tags with <P>, and if you close them, close them with "little P". Forget you ever heard of <br/> - consider it of the curse! Hide <b> tags within <ol>'s - fit in with the crowd! Leave content stale, to rot for years so as to avoid detection. It may be your only hope for survival.

Bring back the dancing bears and busy shovels of "Under Construction"! Show these goliaths you know NOT the way of W3C, standards, and performance! Seek not the <abbrev> tag and 508 compliance, and bring forth the gruel! Slide those prancing shadow boxes! Bevel your bevels! Hide those subnavmenus deep, deep within the bowels of your site. Better yet - delink the nav randomly, as do your peers and the average man. Security through obscurity is the call! BE LIKE EVERYONE ELSE AND YOU TOO SHALL BE SAVED.

Praise the DNS honeypots in Hong Kong, Moldavia, Ivory Coast, and Tuvalia, for they busy the giants while we collect the coins. Like the chameleon and the Phasmida, our survival hinges on our ability to adapt, yet like the Dung beetle that adaptation must enable us to blend in with the crap. Heed the call, I beg of you.

4:48 am on Jun 11, 2004 (gmt 0)

10+ Year Member



paybacksa,

ROFLMAO, Thank you for a very (funny) post.

I think you hit the nail on the head - we are all doomed - M$ are way to smart for us dumbos - I mean when did we ever come up with anything new or inventive?

Ah well, its back to selling newspapers on a cold street corner for me - thanks bill...

4:55 am on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Bring back the dancing bears and busy shovels of "Under Construction"!

Lol - that post is an absolute classic payback!

6:45 am on Jun 11, 2004 (gmt 0)

10+ Year Member



Leave content stale, to rot for years so as to avoid detection. It may be your only hope for survival.

Bring back the dancing bears and busy shovels of "Under Construction"!

RALMAO you've made my day!

7:37 am on Jun 11, 2004 (gmt 0)

WebmasterWorld Administrator robert_charlton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



toe-to-toe

Watch those dashes... you're getting close to the magic number.

10:56 am on Jun 11, 2004 (gmt 0)

10+ Year Member



My favorite: "Bevel your bevels!"
11:09 am on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



excellent post, paybacksa! Very funny indeed :)
11:39 am on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member leosghost is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Magical post paybacksa.. Magical!
2:31 pm on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



funny post, indeed ;) But then those guys probably are not ALL THAT DUMB ....

Maybe they just want a couple of cracks to show off a bit? And don't forget: Everything they really wanted to get so far, they got it.

nerd

2:38 pm on Jun 11, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Great post payback...I laughed...I'm scared too!;)
3:12 pm on Jun 11, 2004 (gmt 0)



I knew never working out how to build self replicating pages would be my saviour in the end.

Bring...it...on!

Dixon.

12:33 am on Jun 12, 2004 (gmt 0)

10+ Year Member



This could have the makings of a bad horror movie... MSN thinks they are analyzing a dead corpus, until it lunges off the table at them.
1:11 am on Jun 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Laugh all you want guys, but you're neglecting to account for a couple of things:

1. the bar is going to be raised

SEO isn't going to be a schmoe's game anymore.

You're going to have develop a certain amount of sophistication when you've got 3 companies breathing down each other's neck to have the least amount of spam in their engines.

yahoo, google, and msft are all going to be learning and improving on one another.

2. That paper is just a baseline.

That research may have been 'naive' but you've got to realise that it's a baseline for them. They're not going to publish trade secrets on how to fight spam like that. The tricks they have up their sleeves, are just that - they are up their sleeves.

2:28 am on Jun 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



<shhhh....as always, the joking is a distraction.... lot's of good clues out there about what's in the works....looking forward to the next phase for sure... >
This 48 message thread spans 2 pages: 48
 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month