homepage Welcome to WebmasterWorld Guest from 54.226.43.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Microsoft / Bing Search Engine News
Forum Library, Charter, Moderators: mack

Bing Search Engine News Forum

This 48 message thread spans 2 pages: 48 ( [1] 2 > >     
MSN Search Claims to Freeze Out Web Spam
Brett_Tabke




msg:1537521
 7:26 pm on Jun 10, 2004 (gmt 0)

[pcmag.co.uk...]

In a sample of one billion web pages, Microsoft claims that eight per cent are spam.

In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages, linked to a porn site.


 

Philosopher




msg:1537522
 7:38 pm on Jun 10, 2004 (gmt 0)

Well, I'll reserve judgement for when I actually see it happen.

I seem to remember something about G** founders saying much the same thing a few years back as well. ;)

Shak




msg:1537523
 7:43 pm on Jun 10, 2004 (gmt 0)

In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages, linked to a porn site.

you know thats a WebmasterWorld member right?

;)

Shak

Macro




msg:1537524
 7:48 pm on Jun 10, 2004 (gmt 0)

In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages

Wow, bully for Microsoft researchers. We've been debating scraping [webmasterworld.com] ...here in WW [webmasterworld.com] for yonks now.

<off topic> PC Magazine in the UK went bust. PC Mag in the US is still operational. Strange that the URL is pcmag.co.uk. </off topic>

msgraph




msg:1537525
 7:50 pm on Jun 10, 2004 (gmt 0)

here's part of the research on that report.

[webmasterworld.com...]

skibum




msg:1537526
 8:47 pm on Jun 10, 2004 (gmt 0)

Only 8%? Seems like a very low number, maybe more like 25-50%? or maybe they crawling a different Internet than Google.

Essex_boy




msg:1537527
 8:52 pm on Jun 10, 2004 (gmt 0)

Wicked! Depends on what they define as spam.

Would like to know just out of interest.

Webwork




msg:1537528
 9:15 pm on Jun 10, 2004 (gmt 0)

8%? That's reassuring.
.
.
.
.
.
That means the other 32% flies under the radar.

kevinpate




msg:1537529
 10:28 pm on Jun 10, 2004 (gmt 0)

> means the other 32% flies under the radar

I'd bust out laughing, but you're probably 100% correct, and well, the more I consider it, the less humor I find in it.
:(

msgraph




msg:1537530
 12:02 am on Jun 11, 2004 (gmt 0)

Read the paper and you will see that these guys are about 4-5 years behind in their studies.

edit_g




msg:1537531
 12:07 am on Jun 11, 2004 (gmt 0)

In one case, the Microsoft researchers claim to have found a webpage in Germany that would constantly create pages filled with pieces of text that were copied from random web pages, linked to a porn site.

Back to the drawing board I go...

iblaine




msg:1537532
 12:44 am on Jun 11, 2004 (gmt 0)

It's a good feeling to know that MS only detected 8% of pages as spam because the real number is probably right around 99. Everyone is safe.

TheDave




msg:1537533
 12:46 am on Jun 11, 2004 (gmt 0)

Just doing a search on MSDN and got a window asking me if I'd like to complete a survey on their search results, "please enter email address and a survey will be sent". Seemed worth mentioning here, even though it's not web results, just MSDN in-house SERPs.

paybacksa




msg:1537534
 2:31 am on Jun 11, 2004 (gmt 0)

Interesting read... they characterize "spam pages" as completely useless to humans so that may help explain the low number.

I now understand why some WebmasterWorld'ers were complaining of the MS bot hitting their site so hard, over and over. Looks like they have been live testing their spam detectors at our expense. Nice. I'll remember that. Payback's a _____.

Kerrin




msg:1537535
 2:47 am on Jun 11, 2004 (gmt 0)

Only 8%? Seems like a very low number, maybe more like 25-50%? or maybe they crawling a different Internet than Google.

Technically they are crawling a different Internet. There must be a huge number of sites that currently block msnbot from crawling their webpages.

jmccormac




msg:1537536
 3:13 am on Jun 11, 2004 (gmt 0)

Read the paper and you will see that these guys are about 4-5 years behind in their studies.
They don't seem to have much of a clue about search engine operations either. The presence the query in the url is important as a lot of search engines tend to work on a set of weightings assigned to each element of a page. It is possible to tweak a search engine's algorithm by changing the weighting for the url.

I've read the paper. It is interesting in that some of its claims are reasonably accurate. However the reference to query strings being present in the URLS and thus influencing results as "folklore" show these guys up to be a bunch of clueless academics who know less about search engine operations than the average SEO.

The wildcard DNS spam is fairly old and can easily be detected by filters. Also these guys do not seem to have much of a clue about geo-location of websites either. One of the side-effects of the geo-location process is that you build up a very efficient database of IPs and websites. It allows clusters to be detected with a very simple SQL query. If Microsoft thinks that this kind of research will impress anyone other than academic tossers, then it is in for a very rude awakening. Search engine spam is a far more complex, and evolving problem. Such filtering will probably detect a pile of spam but the time and resources taken will probably be wasted as spammers will move on. This would give the spammer technique approximately a two cycle (or two Google dance) lucrative operational lifetime. The funny thing is that the paper and research probably took longer to do than the lucrative operational lifetime of the spammer technique.

Regards...jmcc

paybacksa




msg:1537537
 4:29 am on Jun 11, 2004 (gmt 0)

A Modest Proposal for SEO

I just read the M$ research paper on search, and quite frankly, I am scared. My economic viability depends on my ability to outwit, outplay, and outlast my competitors in my niche. I must keep ahead of the search engines and secure top spots in the SERPs. Each day I generate tens of thousands of dollars of wealth via the top spots for thousands of keywords, knowing -- nay, expecting -- that even an arbitrary Google burp could stop it all dead instantly, or at best after a single cache update.

As if the pain and torture of this uncertainty weren't enough to drive one insane, the associated stats checking addicition burns like acid on the wound.

Now the big boys - the monster monopoly - the Gates of Hellfire - are onto us. They've figured out every trick, technique, slip, hole, and nuance in the SEO bag o' tricks. It's all there - right in the research paper. Got a URL with more than 3 dashes or dots? NAILED - they proved it is a SPAMMY SITE - you're toast, dude. Got a long URL? WHAMMO! U-R-banned! No more cashola. Nada. No donuts, dude. You lose. They got that covered.. they showed SCIENTIFICALLY that a mere 0.173% of all domain names are over 45 characters, and "the vast majority of those appear to be spam". Man, this stuff freaks me out. I knew they were smart, and with Google hiring all those Ph.D.s I've gotten accustomed to going toe-to-toe with braniacs, but I never suspected it would get this serious. These guys are unstoppable!

Look at that paper - I beg you to pay attention! Got subdomains? They're onto you. Nobody uses subdomains except spammers! NAILED! These guys have achieved a level of sophistication so advanced, so beyond anything you have ever cogitated, that in their M$ research labs even "casual inspections" reveal spammy sites! Listen up guys, they are so advanced that manual spam site detection is so easy it ISN'T EVEN WORK FOR THEM! In their inner circles they are calling some SEO's "naive" and refering to sites as "templatic". Read the writing on the walls.... we are doomed.

Thought you could run circles around whois by rolling your own DNS? Gotcha covered! Go ahead and TRY to point 5,000 domains to a single IP - they'll catch you for sure! They even found one guy with 8,967,154 host names resolving to a single IP address! Is no one safe? Next thing you know they'll probably even figure out who it is! Is there nothing left?

Is your "average rate of page mutation" to high? Do your webpages completely change with every GET or POST to the same URI? They can even detect sites where almost every page changes almost completely every week. THERE IS NO WAY TO HIDE FROM THESE GUYS! (note to self - remove that updated $Dates/$time script thingy from the header pronto).

THERE IS NO FUTURE FOR SEO!

Or is there? Perhaps we are not doomed as a species, fated to succumb to the superior intellect of the great M$ machine after all. Here I propose a CALL TO ACTION for every white, grey, gray, or black hatted SEO. Stand now, defend your ground! Preserve our race! Secure my independent wealth, and that of my cat!

I here propose something more diabolical, more hideously insane and absolutely unspeakable than anything ever uttered across the dark, beer-stained table tops of Pubcon. Here me this, WebmasterWorld'ers. United in our resolve, and unified in our practice standards, we shall overcome this plageous assault. We have one last chance. One final moment to make a stand before the harbingers of righteous annihiliation.

Starting today, you must make your websites seem normal. That's right - natural. Organic almost. As if, nurished by divine intervention, they were growing in the webring patches. Observe your daft amateur peers closely, and EMULATE. Start using Fro*n**age. Use <b> instead of <strong> -- every other time. Open your tags with <P>, and if you close them, close them with "little P". Forget you ever heard of <br/> - consider it of the curse! Hide <b> tags within <ol>'s - fit in with the crowd! Leave content stale, to rot for years so as to avoid detection. It may be your only hope for survival.

Bring back the dancing bears and busy shovels of "Under Construction"! Show these goliaths you know NOT the way of W3C, standards, and performance! Seek not the <abbrev> tag and 508 compliance, and bring forth the gruel! Slide those prancing shadow boxes! Bevel your bevels! Hide those subnavmenus deep, deep within the bowels of your site. Better yet - delink the nav randomly, as do your peers and the average man. Security through obscurity is the call! BE LIKE EVERYONE ELSE AND YOU TOO SHALL BE SAVED.

Praise the DNS honeypots in Hong Kong, Moldavia, Ivory Coast, and Tuvalia, for they busy the giants while we collect the coins. Like the chameleon and the Phasmida, our survival hinges on our ability to adapt, yet like the Dung beetle that adaptation must enable us to blend in with the crap. Heed the call, I beg of you.

PhraSEOlogy




msg:1537538
 4:48 am on Jun 11, 2004 (gmt 0)

paybacksa,

ROFLMAO, Thank you for a very (funny) post.

I think you hit the nail on the head - we are all doomed - M$ are way to smart for us dumbos - I mean when did we ever come up with anything new or inventive?

Ah well, its back to selling newspapers on a cold street corner for me - thanks bill...

edit_g




msg:1537539
 4:55 am on Jun 11, 2004 (gmt 0)

Bring back the dancing bears and busy shovels of "Under Construction"!

Lol - that post is an absolute classic payback!

sanity




msg:1537540
 6:45 am on Jun 11, 2004 (gmt 0)

Leave content stale, to rot for years so as to avoid detection. It may be your only hope for survival.

Bring back the dancing bears and busy shovels of "Under Construction"!

RALMAO you've made my day!

Robert Charlton




msg:1537541
 7:37 am on Jun 11, 2004 (gmt 0)

toe-to-toe

Watch those dashes... you're getting close to the magic number.

ubaldo




msg:1537542
 10:56 am on Jun 11, 2004 (gmt 0)

My favorite: "Bevel your bevels!"

Macro




msg:1537543
 11:09 am on Jun 11, 2004 (gmt 0)

excellent post, paybacksa! Very funny indeed :)

Leosghost




msg:1537544
 11:39 am on Jun 11, 2004 (gmt 0)

Magical post paybacksa.. Magical!

the_nerd




msg:1537545
 2:31 pm on Jun 11, 2004 (gmt 0)

funny post, indeed ;) But then those guys probably are not ALL THAT DUMB ....

Maybe they just want a couple of cracks to show off a bit? And don't forget: Everything they really wanted to get so far, they got it.

nerd

stuntdubl




msg:1537546
 2:38 pm on Jun 11, 2004 (gmt 0)

Great post payback...I laughed...I'm scared too!;)

Receptional




msg:1537547
 3:12 pm on Jun 11, 2004 (gmt 0)

I knew never working out how to build self replicating pages would be my saviour in the end.

Bring...it...on!

Dixon.

sean




msg:1537548
 12:33 am on Jun 12, 2004 (gmt 0)

This could have the makings of a bad horror movie... MSN thinks they are analyzing a dead corpus, until it lunges off the table at them.

blaze




msg:1537549
 1:11 am on Jun 12, 2004 (gmt 0)

Laugh all you want guys, but you're neglecting to account for a couple of things:

1. the bar is going to be raised

SEO isn't going to be a schmoe's game anymore.

You're going to have develop a certain amount of sophistication when you've got 3 companies breathing down each other's neck to have the least amount of spam in their engines.

yahoo, google, and msft are all going to be learning and improving on one another.

2. That paper is just a baseline.

That research may have been 'naive' but you've got to realise that it's a baseline for them. They're not going to publish trade secrets on how to fight spam like that. The tricks they have up their sleeves, are just that - they are up their sleeves.

paybacksa




msg:1537550
 2:28 am on Jun 12, 2004 (gmt 0)

<shhhh....as always, the joking is a distraction.... lot's of good clues out there about what's in the works....looking forward to the next phase for sure... >

This 48 message thread spans 2 pages: 48 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Microsoft / Bing Search Engine News
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved