Forum Moderators: mack
In a sample of one billion web pages, Microsoft claims that eight per cent are spam.In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages, linked to a porn site.
In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages
<off topic> PC Magazine in the UK went bust. PC Mag in the US is still operational. Strange that the URL is pcmag.co.uk. </off topic>
[webmasterworld.com...]
I now understand why some WebmasterWorld'ers were complaining of the MS bot hitting their site so hard, over and over. Looks like they have been live testing their spam detectors at our expense. Nice. I'll remember that. Payback's a _____.
Read the paper and you will see that these guys are about 4-5 years behind in their studies.They don't seem to have much of a clue about search engine operations either. The presence the query in the url is important as a lot of search engines tend to work on a set of weightings assigned to each element of a page. It is possible to tweak a search engine's algorithm by changing the weighting for the url.
I've read the paper. It is interesting in that some of its claims are reasonably accurate. However the reference to query strings being present in the URLS and thus influencing results as "folklore" show these guys up to be a bunch of clueless academics who know less about search engine operations than the average SEO.
The wildcard DNS spam is fairly old and can easily be detected by filters. Also these guys do not seem to have much of a clue about geo-location of websites either. One of the side-effects of the geo-location process is that you build up a very efficient database of IPs and websites. It allows clusters to be detected with a very simple SQL query. If Microsoft thinks that this kind of research will impress anyone other than academic tossers, then it is in for a very rude awakening. Search engine spam is a far more complex, and evolving problem. Such filtering will probably detect a pile of spam but the time and resources taken will probably be wasted as spammers will move on. This would give the spammer technique approximately a two cycle (or two Google dance) lucrative operational lifetime. The funny thing is that the paper and research probably took longer to do than the lucrative operational lifetime of the spammer technique.
Regards...jmcc
I just read the M$ research paper on search, and quite frankly, I am scared. My economic viability depends on my ability to outwit, outplay, and outlast my competitors in my niche. I must keep ahead of the search engines and secure top spots in the SERPs. Each day I generate tens of thousands of dollars of wealth via the top spots for thousands of keywords, knowing -- nay, expecting -- that even an arbitrary Google burp could stop it all dead instantly, or at best after a single cache update.
As if the pain and torture of this uncertainty weren't enough to drive one insane, the associated stats checking addicition burns like acid on the wound.
Now the big boys - the monster monopoly - the Gates of Hellfire - are onto us. They've figured out every trick, technique, slip, hole, and nuance in the SEO bag o' tricks. It's all there - right in the research paper. Got a URL with more than 3 dashes or dots? NAILED - they proved it is a SPAMMY SITE - you're toast, dude. Got a long URL? WHAMMO! U-R-banned! No more cashola. Nada. No donuts, dude. You lose. They got that covered.. they showed SCIENTIFICALLY that a mere 0.173% of all domain names are over 45 characters, and "the vast majority of those appear to be spam". Man, this stuff freaks me out. I knew they were smart, and with Google hiring all those Ph.D.s I've gotten accustomed to going toe-to-toe with braniacs, but I never suspected it would get this serious. These guys are unstoppable!
Look at that paper - I beg you to pay attention! Got subdomains? They're onto you. Nobody uses subdomains except spammers! NAILED! These guys have achieved a level of sophistication so advanced, so beyond anything you have ever cogitated, that in their M$ research labs even "casual inspections" reveal spammy sites! Listen up guys, they are so advanced that manual spam site detection is so easy it ISN'T EVEN WORK FOR THEM! In their inner circles they are calling some SEO's "naive" and refering to sites as "templatic". Read the writing on the walls.... we are doomed.
Thought you could run circles around whois by rolling your own DNS? Gotcha covered! Go ahead and TRY to point 5,000 domains to a single IP - they'll catch you for sure! They even found one guy with 8,967,154 host names resolving to a single IP address! Is no one safe? Next thing you know they'll probably even figure out who it is! Is there nothing left?
Is your "average rate of page mutation" to high? Do your webpages completely change with every GET or POST to the same URI? They can even detect sites where almost every page changes almost completely every week. THERE IS NO WAY TO HIDE FROM THESE GUYS! (note to self - remove that updated $Dates/$time script thingy from the header pronto).
THERE IS NO FUTURE FOR SEO!
Or is there? Perhaps we are not doomed as a species, fated to succumb to the superior intellect of the great M$ machine after all. Here I propose a CALL TO ACTION for every white, grey, gray, or black hatted SEO. Stand now, defend your ground! Preserve our race! Secure my independent wealth, and that of my cat!
I here propose something more diabolical, more hideously insane and absolutely unspeakable than anything ever uttered across the dark, beer-stained table tops of Pubcon. Here me this, WebmasterWorld'ers. United in our resolve, and unified in our practice standards, we shall overcome this plageous assault. We have one last chance. One final moment to make a stand before the harbingers of righteous annihiliation.
Starting today, you must make your websites seem normal. That's right - natural. Organic almost. As if, nurished by divine intervention, they were growing in the webring patches. Observe your daft amateur peers closely, and EMULATE. Start using Fro*n**age. Use <b> instead of <strong> -- every other time. Open your tags with <P>, and if you close them, close them with "little P". Forget you ever heard of <br/> - consider it of the curse! Hide <b> tags within <ol>'s - fit in with the crowd! Leave content stale, to rot for years so as to avoid detection. It may be your only hope for survival.
Bring back the dancing bears and busy shovels of "Under Construction"! Show these goliaths you know NOT the way of W3C, standards, and performance! Seek not the <abbrev> tag and 508 compliance, and bring forth the gruel! Slide those prancing shadow boxes! Bevel your bevels! Hide those subnavmenus deep, deep within the bowels of your site. Better yet - delink the nav randomly, as do your peers and the average man. Security through obscurity is the call! BE LIKE EVERYONE ELSE AND YOU TOO SHALL BE SAVED.
Praise the DNS honeypots in Hong Kong, Moldavia, Ivory Coast, and Tuvalia, for they busy the giants while we collect the coins. Like the chameleon and the Phasmida, our survival hinges on our ability to adapt, yet like the Dung beetle that adaptation must enable us to blend in with the crap. Heed the call, I beg of you.
ROFLMAO, Thank you for a very (funny) post.
I think you hit the nail on the head - we are all doomed - M$ are way to smart for us dumbos - I mean when did we ever come up with anything new or inventive?
Ah well, its back to selling newspapers on a cold street corner for me - thanks bill...
Bring...it...on!
Dixon.
1. the bar is going to be raised
SEO isn't going to be a schmoe's game anymore.
You're going to have develop a certain amount of sophistication when you've got 3 companies breathing down each other's neck to have the least amount of spam in their engines.
yahoo, google, and msft are all going to be learning and improving on one another.
2. That paper is just a baseline.
That research may have been 'naive' but you've got to realise that it's a baseline for them. They're not going to publish trade secrets on how to fight spam like that. The tricks they have up their sleeves, are just that - they are up their sleeves.