Welcome to WebmasterWorld Guest from 54.167.0.111

Forum Moderators: mack

Message Too Old, No Replies

Flavors of Spam

Some Kinds Are Worse Than Others

     

msndude

8:16 pm on Jun 24, 2006 (gmt 0)

10+ Year Member



Everyone complains about Spam, but the single term hides a multitude of different problems, and different people often seem to mean different things when they use it. I know how we use the term here at Microsoft, but I would be interested to hear your ideas about it; there seems to be enough difference of opinion to make for an interesting discussion.

Here are a few questions to get us started:

Does it make sense to talk about a hierarchy of spam? For example, at the bottom we could put pages that are so bad they’re completely useless. (E.g. a page of gibberish surrounded by ads.) At the top would be quality or authority pages that look great until you view the source or look at the inbound links.

CAN a quality or authority result ever really be spam?

Are affiliate sites “spam by definition?”

Is spam “worse than useless?” Is it worth losing a quality or authority result to get rid of a spam result?

I think we had a very productive discussion about quality and authority last week, so I’m hoping we can repeat that.

TypicalSurfer

7:28 pm on Jun 25, 2006 (gmt 0)

10+ Year Member



Spam is just an unwanted or unexpected result, one that has no use to the searcher.

You cannot define spam by how it got into a result set.

msndude

7:41 pm on Jun 25, 2006 (gmt 0)

10+ Year Member



The origin of the term "Spam" comes from the Monty Python song and refers to the repetition that drives most (or all?) Spam. A fair definition would be a bad search engine result caused by someone doing something over and over again where once should have been enough, but I'm not sure it'll work to just call ALL bad results "spam."

gregbo

9:28 pm on Jun 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I haven't seen any definition yet that comes close to achieving consensus on what 'spam' is and I doubt that I ever will. I do know that a majority can agree on what is relevant to a particular query.

Certainly, in the email world, there is no consensus on what spam is. The solutions proposed have all drawn their share of contention from parties that feel aggrieved by them.

steveb

9:48 pm on Jun 25, 2006 (gmt 0)

WebmasterWorld Senior Member steveb is a WebmasterWorld Top Contributor of All Time 10+ Year Member



Spam in a search engine context is just a rule. It's not something that needs consensus. It's a pronouncement. Google's definition covers everything perfectly. MSN could make a similar definition, or something else. If Joe Smith says "hidden text isn't spam", that doesn't matter. Joe can do what he wants. MSN can do what they want.

As we can see, some people would love to teleport to a planet where weak content is called spam, where you get shot for writing a medicore movie review. That's just not helpful or sensible. "Spam" does not equal sucky quality. It's not exactly just a coincicence that this is the case, but the two aren't joined at the hip. (Other aspects of the algo deal with quality; "spam" does not need to be the ONE WORD that covers every reason for not ranking something.)

What I wish msndude would have asked is: "People use a lot of tactics to try and trick us into ranking their websites higher than they deserve. Which of these tactics are worse than others?"

Top of my list: any search result for site1.com that when you click the result you go to site2.com/?trackinglink1234
These results never have any merit.

msndude

9:52 pm on Jun 25, 2006 (gmt 0)

10+ Year Member



Okay :-)

People use a lot of tactics to try and trick us into ranking their websites higher than they deserve. Which of these tactics are worse than others?

Obscure Javascript redirects are definitely more painful to US, of course.

katheesue

10:34 pm on Jun 25, 2006 (gmt 0)

10+ Year Member



I gave two really good specific spam examples in private messages to msndude around June 14th or 15th.

The first one which is obvious spam has moved from being number 1 to being number 2 for the search term while the "real site" has moved from number 2 to number 22.

This is the case of the hacker from Poland who acquired several hundred Yahoo related properties and converted them to "nefarious purposes."

The other specific example has moved up from the third page of msn results to number 20. This is better than the number 1 ranking it had for the last two years but it still astounds me that the topic in question could be considered on topic for the kind of sites which are being promoted in the hidden links.

That site is still ranked number one on Yahoo.

Hidden links are always spam, whether by using css or text/background color or whatever.

Yahoo, MSN and Google all failed on this one and Google continues to be the most collossal failure since they own the site and can turn it off any time they want.

idolw

10:34 pm on Jun 25, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



keyword-stuffed subdomains.
i like subdomains and use them but just for better organisation of my sites.
but MSN is full of single pages located on subdomains. and that sucks.

pages waiting for content
the 2nd largest travel site (that is what they call themselves) uses that technique to get more pages indexed and gets its pages ranked for search terms in google. try searching for some not very popular travel destinations in europe. the domain itself has a lot of trust so pages go up automatically.

BTW. MSNDude, do they force you to use MSN Search at work? ;-)

TypicalSurfer

11:38 pm on Jun 25, 2006 (gmt 0)

10+ Year Member



msndude, you're shifting gears here, you start broad with:what is spam?

Does it make sense to talk about a hierarchy of spam? For example, at the bottom we could put pages that are so bad they’re completely useless. (E.g. a page of gibberish surrounded by ads.) At the top would be quality or authority pages that look great until you view the source or look at the inbound links.

now you are fishing for something else: out spam techniques :0

People use a lot of tactics to try and trick us into ranking their websites higher than they deserve. Which of these tactics are worse than others?

Spam techniquies will evolve with your ranking algo, on your end its a moving target, you could just cave in and do a sandbox kind of thing like some other McEngine and return less than stellar results or you could just just watch crawl logs go by (tail) and see it live, make calls from that, requires human intervention but at the end of the day its about humans vs. humans, not algos (which can be beat).

Ivory tower IR won't work with whole web crawls because there is no reliable classification system.

But I will give one tip:

go light on PhDs ;)

RichTC

12:04 am on Jun 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"Are you arguing that there is no spam other than junk pages?"

Nope, what im saying is that the majority of so called spam is in fact junk. Pages of no real use to the end user - like the one i sent you by sticky mail (junk rather than spam).

IMO its a small amount of sites on the net that are spam sites designed to get traffic to sell on or just for harvesting email addresses whilst offering no genuine original or useful site content to the end user.

I dont think msn need to worry about spam as such but more about quality control - You have far more junk sites and sites of little content, bloggs, sub-sub-sub domain junk, non authority sites ranking over authority sites and general dross listed in your serps than you do have spam sites. I just think the word "Spam" gets associated with "Junk".

If a serps results takes me to site a) and i get Java re-directed to site b) then fair enough take the relevent action.

If a site has nothing but location-subdomains listed on it - its spam, take relevent action.

If a site is thin content, comming soon, a blogg, low quality or of no use to the end user then its junk - again take the relevent action.

If a site has lots of genuine content to it, links to it, but perhaps has high keyword density
on a page and has relevent outbounds then its unlikely to be spam imo. But this kind of situation can be easily mistaken - remember spam teams can check density levels easily of autogenerated pages whilst a webmaster adding content to their site is not as likely to imo.

What you dont want to have is good content deep sites being held back in your serps because they trip a couple of your filters you put in to stop spam whilst thin content junk sites slip through which is currently what i see in a number of search results.

Quality control is key imo - perhaps striking a deal with Yahoo to use their directory data (which imo is the only large unbiased directory on the net imo, rather than outdated dmoz) OR better still start working on your own directory would be a way to introduce some additional quality control to your serps.

Any automated system you introduce will struggle imo to weed out all junk and spam because you dont have the history to your search data or link data that Google and Yahoo have.

katheesue

12:12 am on Jun 26, 2006 (gmt 0)

10+ Year Member



>OR better still start working on your own
>directory would be a way to introduce some
>additional quality control to your serps.

Now there is an interesting comment.

MSN does have a directory for listing small business sites but it does not appear to be valued very much by Y, G, or M.

msndude

5:31 am on Jun 26, 2006 (gmt 0)

10+ Year Member



IdolW: Nope. Microsofties are free to use whatever Search Engine they please. We do encourage people to try ours first and only use a competing one if ours fails them, but Microsoft doesn’t compel anyone to do this.

TypicalSurfer: I shifted gears in response to a request from steveb. Really, though, I don’t mean to dominate the discussion. And I certainly didn’t expect people to tell me their secret SEO tricks. :-)

RichTC: You are correct that there is far more junk than spam. And it is very hard to keep from throwing the baby out with the bathwater.

oaktown

5:12 pm on Jun 26, 2006 (gmt 0)

10+ Year Member



FWIW, I tend to think of SERPs as being composed of three elements:

1) Great Stuff: I searched for "buy blue widgets" and I get a search result with a link to a site where I can "buy" "widgets" that are "blue". The selection of "widgets" is broad and the content is recent.

2) Good Stuff: I searched for "buy blue widgets" and I get a search result with a link to a site where I can "buy" a "widget" that is "blue".

3) Junk: I searched for "buy blue widgets" and I get a search result with a link to a site with an article about a company that made "widgets" declaring bankruptcy, that was written in 2004.

4) Crap: I searched for "buy blue widgets" and I get a search result with a link to a site where I can "learn about" a "vaguely widget-like thingamajiggers" that are "green".

5) Spam: I searched for "buy blue widgets" and I get search results with a link to a site where I can pick from hundreds of links to hundreds of one-page sites or sub-domains, each of which channels me into whatever affiliate program the "Spammer" is promoting. (Not all affiliate programs are spam.)

(Ends rant, climbs down from soapbox)

timster

5:41 pm on Jun 26, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The worst kind of spamming is not gibberish, but when someone steals content and then uses some SEO tricks to (try to) drown out the original content.

Gibberish pages are reprehensible, but not as bad as really stealing content, because stealing content more hurts the parties who are creating the content which makes the web useful.

At first blush, a page of "borrowed" content might not seem terrible (from the user's perspective at least) but who will make tomorrow's content if the fruits of labor get stolen? (Less new original content is bad for the search engine, too.)

rohitj

6:53 pm on Jun 26, 2006 (gmt 0)

10+ Year Member



An interesting issue brought up in another forum (#*$!) was that France's laws/treatment towards spam is much different. They allow certain businesses to send unsolicited emails to businesses who may be in the same niche and/or be linked to their product. The wording was somewhat unclear since my french is hazy but this did cause some headaches for web hosts. Some ultimately chose to abide by the laws where their server is located, whereas others cited their AUP/TOS, wherein such disparities were discussed.

econman

6:54 pm on Jun 26, 2006 (gmt 0)

10+ Year Member



Generally, when I think of Spam in a search engine context, I think about fake content which has been assembled in great volume. But, that isn't the only type of SE Spam.

SE Spam usually involves deception (such as fake content) for the purpose of tricking the search engines into giving a site more visibility than it would otherwise have.

Spam and Quality are two concepts that tend to go hand in hand, but they are not the same thing.

A site may have high quality but it may rely on Spam techniques to get more traffic (e.g. creating thousands of fake web pages to inflate the apparent size of the site, or to create the appearance of more link popularity than actually exists).

Spam is detrimental to the search engines and their customers because it creates "noise" or pollution, which makes it harder to find the best documents matching any given query.

Spam is also detrimental to society for roughly the same reasons that vandalism, theft, and mislabeling of merchandise are detrimental – resources are poured into activities that do not contribute any real value. These unproductive activities are profitable for the entity engaging in the activity. Furthermore, if the Spam is successful, it becomes harder, or impossible, for competing sites that don't engage in deceptive practices to survive or prosper.

This 51 message thread spans 4 pages: 51
 

Featured Threads

Hot Threads This Week

Hot Threads This Month