| This 51 message thread spans 2 pages: < < 51 ( 1  ) || |
|Flavors of Spam|
Some Kinds Are Worse Than Others
Everyone complains about Spam, but the single term hides a multitude of different problems, and different people often seem to mean different things when they use it. I know how we use the term here at Microsoft, but I would be interested to hear your ideas about it; there seems to be enough difference of opinion to make for an interesting discussion.
Here are a few questions to get us started:
Does it make sense to talk about a hierarchy of spam? For example, at the bottom we could put pages that are so bad they’re completely useless. (E.g. a page of gibberish surrounded by ads.) At the top would be quality or authority pages that look great until you view the source or look at the inbound links.
CAN a quality or authority result ever really be spam?
Are affiliate sites “spam by definition?”
Is spam “worse than useless?” Is it worth losing a quality or authority result to get rid of a spam result?
I think we had a very productive discussion about quality and authority last week, so I’m hoping we can repeat that.
>> CAN a quality or authority result ever really be spam?
>> Are affiliate sites “spam by definition?”
>> Is it worth losing a quality or authority result to get rid of a spam result?
My definition of SE spam is when I search for a particular kw and get a high ranking (well any webpage) that is not on kw subject. I also consider websites/pages which are designed to only get you in the door, then your only option is to select from the list of other links to maybe find something of use elsewhere. This includes MFA sites.
I think it's kind of a meaningless question in some sense because "spam" doesn't really mean all that much in the context of search engines. It's an imported term, and in its broadest sense people are just using it to mean "anything that deliberately attempts to alter search engine results from their pristine state of nature."
I don't think trying to define it is a useful question to be asking from a search engine's perspective. Arguing about what is spam and what is junk and whether there is a difference semantically doesn't really help. The question I would be asking is "What attempts to alter our SERPS do we want to do something about, and if so what?"
That's probably harder to hold a public discussion about (at least for the latter part of the question). But what I would suggest the important things a search engine needs to keep in mind when fighting "spam" are:
1) The purpose of fighting spam is to benefit the end-user in the sense of improving the quality of results.
Some kinds of "spam" are inherently pointless to the user. An affiliate site with a bunch of text copied from a datafeed is just clutter. A scraper site with random things thrown in so they can show adsense on every conceivable term on a topic is utterly useless. On the other hand, some kinds of "spam" are conditional. A reciprocal link exchange that puts an "authority" site at the top of the results is manipulating search results, but not damaging them. On the other hand, a reciprocal link exchange for a crappy site is damaging their quality.
2) There are gradations in the harmfulness to the quality of results that "spam" produces, and the focus should be on things which most reduce that quality.
You probably can't get rid of all kinds of spam, but my suggestion is the focus should be on the ones that most degrade the results. It has to be about prioritizing. I would put scraper sites as the most damaging, followed by sites with duplicate content / feeds that have no additional content, but you might have different ideas. At any rate, I would actively try to decide what techniques / tactics are most damaging to the quality of what is being shown to the end user, and focus on stopping those.
3) On-page factors are most useful in determining what "spam" should be most penalized.
If you are going to accept that there is a gradation in spam, there should probably also be a gradation in terms of penalty for spamming. I'll put a caveat here by saying I don't have the technical background to advise you as an engineer. But it seems to me that in the age of automated web page creation, the search engines need to go retro and put some more focus on on-page factors that used to drive rankings. What is the only sure-fire way to guarantee that spam is not damaging to the results? To me, I would guess by developing better ways to identify which pages are "junk" in the sense of being useless to the end user. I'm not saying that you should abandon the new use of links in ranking - just that if you can come up with an algorithm that does a good job of determining what is a human produced pgae, and penalizes things that aren't, you will be able to kill off all the spam that is MOST damaging.
4) You might not care about some things that are "spam" because they don't hurt your users.
CAVEAT: this is obviously self-serving. But in the end, there might be some kinds of spam that you don't want to bother dealing with. Some of these we can probably all agree on. Example: using title tags to target a specific search. If I write a good web page answering the question "What is spam?" and then use title tags to try to manipulate the results so that I rank higher for it, you probably don't care. You want that page to be higher in the engine. This may be more or less true for different techniques - but I would suggest that if a page is an ideal result in the sense of answering the question of the user, you don't care if it got its position by trying to manipulate the search engine. You don't particularly care WHO answers a question a user poses in their query, just that it gets answered. If the webmasters want to fight it out amongst themselves for a top spot but are providing useful pages, I think a search engine should be fine with it. The problem comes when that fight spills over into harming the users - which is why I think that with an on-page focus or something else, your algorithm should be focused on determining what is good, not necessarily what is "spam" in the sense of manipulation.
|Are you arguing that there is no spam other than junk pages? |
A good example of when spam pages are not necessarily junk is typospam. I've seen instances when the page lists a long, long paragraph of common misspellings of a search term at the bottom. This could be used on a page with otherwise good content. Typos could also be introduced subtly into alt text and so forth. So in theory at least, spam is not junk.
However you're far more likely to come across spammy techniques like this on a site of pure junk. This is exactly what I found at the bottom of a cloaked, keyword-stuffed page which using multiple subdomains and random nonsense text, the other day. I also find a lot of the referrer spam I find in my logs is on domain names that are obvious typos of common words.
What I want to know is, how feasible is it to put a spellchecker in your algo, to search for text with more than a certain % of typos?
As a merchant, I despise sites occupying our industry's SERPs when the site is just a collection of junk pages full of adsense and affiliate links. The fact that search engines aren't able to tell that "widgets.thebestcoolsites.cz" is not a valid page for an American web surfer looking for widgets is disheartening. The fact that you can put AdSense ads on a page full of CJ links is insane. Pages with stolen content just shouldn't be able to rank.
I believe that Google has ruined the Internet with AdSense, and Overture, EBay, and everyone else are pitching in. While there are great content sites that have benefitted from AdSense, they are just a few drops in the bucket compared to the spammers and trash domain owners who are generating the real profits.
Let me say it first: Google is writing cheques to the spammers that they say they hate. As long as Google's making money off it they will not stop. If there was no CJ, and no AdSense, the vast majority of spammers would stop in one week. The genie's out of the bottle, though, and it will never be corrected. Only the search engines, who created the problem in the first place, have any chance of fixing it.
As a Google user, I have to agree with jamesa when he says that he doesn't care how he gets there, as long as he gets to the right destination. I would say most search engine users are just trying to get where they're going. They don't know that they ended up on a splog that stole copyrighted content in order to jam in AdSense ads. They probably don't care, either. They just eventually want to get to a site that sells red widgets, and they'll keep blindly clicking on things until they get there.
If it is what you are looking for, helps you find what you are looking, or shows you that there is something even better - more info something you didn't know, a discount you didn't know you could get or provided some added value, it is good to go.
Spam is stacking the top-10 results with pages that all essentially look the same to the end user, off topic sites coming up on searches. Sometiems this may be base don hwo the pages look, maybe you serch for news and you see CNN, MSNBC, ABCNews, and so on down the line all with essentially the same stories, you could call that spam but since they look nice and the presentation is different on each, you probably wouldn't. If that was 10 pharma affiliates all pointing to the same end merchant, that would probably be called spam.
Make it relevant to the search and provide some variety and search results should be good to go.
This is some great feedback! Let me add a couple of ideas.
To oversimplify our discussion on quality and authority, a quality result is one that satisfies a customer – even if a spammer did produce it. Building on that, I suggest the following three rules
1) “Spam is a low-quality page that ranks above a high-quality page” works pretty well, as long as you agree that Spam is a bug in the engine, not a moral failing in someone else. The best part about that point of view is that you have a better chance at fixing your own code than you do of elevating anyone else’s morals.
2) Even a quality result is only quality the first time it appears in the results. The repeats are all Spam.
3) A page with stolen content is always Spam. Even though it might satisfy a customer short-term, it undermines the entire enterprise in the long run because it discourages people from producing original content.
I think this is compatible with most of the feedback so far. Did I leave anything out?
User A searches for: Yellow Widget Pictures.
a.) View Yello Widget Pictures
(...sounds interesting, lets click it).
A long time ago I had YELLOW WIDGET PICTURES and then I went to school. When I was done with school, YELLOW WIDGET PICTURES crossed my mind again but I forgot about it until I met someone who DOWNLOAD YELLOW WIDGET PICTURES. I thought, this is great. I will DOWNLOAD YELLOW WIDGET PICTURES and think about YELLO WIDGETS at the same tiem. My friends will be so interested in YELLOW WIDGET PICTURES, it will be great.
Throw a couple blocks of AdSense in there, and there you go: a spam site. Absolutely no relevance to what the searcher is looking or at all.
> Everyone complains about Spam, but the single term hides a multitude of different problems, and different people often seem to mean different things when they use it.
Agreed. Search spam is one of those things the majority of people agree on when they see it but with little consensus as to why they agree.
> Does it make sense to talk about a hierarchy of spam?
No. Spam does it's damage by obscuring other better sites. It doesn't really matter what the spam is. Spam is spam.
My quick but non-inclusive definition: Pages optimized to score well for a particular search published with the intention of displacing other more usable or relevant results from a user's SERP, as determined from the user's POV.
What is spam can vary depending on the user's search goals. If I am searching for general information about a subject then sites only offering to sale products related to the subject are spam. But if I'm looking to buy a product... well then somehow spam never seems to be a problem.
Some types of SERs which are spam:
1) Multiple sites published by or originated by the same source to flood the top SERP positions and obscure other more unique and varied results by pushing them deeper.
2) Sites optimized to score high for a particular search query but designed not to supply any useable answer but rather to pass the searcher on to another site in return for compensation.
3) Sites optimized to score for a topic combined with any geographic placename regardless of whether the site actually has any locally relevant content to the placename.
> Are affiliate sites "spam by definition?"
As long as they only publish one site per affiliation affiliates are not by definition spammers as they're only trying to get a piece of the pie. But the affiliate system itself is by definition spam. So unless affiliate sites rise above their base definition by offering truly unique useable content the sites are spam.
> CAN a quality or authority result ever really be spam?
Yes. Type 1 sites especially. Sometimes the multiple sites when considered individually are quality and deserve their high placements. It's their repetition and intent which make them collectively spam.
> Is it worth losing a quality or authority result to get rid of a spam result?
A one to one exchange rate? Run for the hills! :) There's obviously some point at which the exchange rate would become worthwhile, but it's probably subjective for every user.
Google "Webmaster World". The third result on the front page is total spam. Links with nothing to MFA's... I think it's rather pathetic, but that's just me.
"'Spam is a low-quality page that ranks above a high-quality page'" works pretty well, as long as you agree that Spam is a bug in the engine, not a moral failing in someone else."
Nooooo way, a low quality page doing nothing wrong can easily rank higher than a high-quality page that is badly seo'ed, which is very common. Low quality has no reflection on spam whatsoever. An engine has to get that out of its head.
Spam is deceiving the algo. The algo working absolutely sensibly can rank a poorer quality page above a higher quality page, and there is nothing wrong with that at all. The point is that in the main, the best quality results win out, and that when pages are similar in structure, a high quality page titled right will win out over a weaker quality site also titled right.
"Spam is deceiving the algo. The algo working absolutely sensibly can rank a poorer quality page above a higher quality page, and there is nothing wrong with that at all."
I totaly agree. Some seem to think that a low quality page that is ranking is automatically considered spam. Unless they are doing something to purposefully deceive the algo they should not be considered spam. If a low quality site just happens to rank without any deception then it is the downfall of the algo or the seemingly "quality" site may not be providing what is necessary for the search engine to rank it on it's merits (bad seo?).
|3) A page with stolen content is always Spam. Even though it might satisfy a customer short-term, it undermines the entire enterprise in the long run because it discourages people from producing original content. |
What if that content was presented in a more user-friendly way? For example, someone takes the trouble to illustrate, with images/sound/video/animation, complex, key-points made in (someone else's) dialogue?
As an example, suppose, back in the cold war days of the nuclear arms race, Govt advice on self-protection was released as uninspiring and dry text: Authorititive? Yes; Quality? Yes... until someone "steals" that content and makes it easier to understand with coloured headings and images etc...
The question then I suppose is; can a site/page rise from being a source of quality, to being both a source of authority AND quality, because people (understandably) credit it with links etc...?
It also raises questions about dupe content issues too!
T'is an interesting poser you seed MSNDUDE; each question begats another, and this thread alone outlines just how different people define spam... I pity you transcribing the many definitions of spam into an algo :-p
|A good example of when spam pages are not necessarily junk is typospam. I've seen instances when the page lists a long, long paragraph of common misspellings of a search term at the bottom. |
Okay, I know what you are talking about, but to get to your destination, you have to do down a very slippery slope.
Tell me how you would plan a travel page for SEs when the text is full of location names that have to be transliterated and there are several different accepted romanized spellings.
I live in a place that could be transliterated as Ban or Baan plus Krud, Krut, Krood, Kruit. Yep, 8 different spellings. Which is the misspelling? And yes, they are all deliberate.
If someone searches on "Ban Krood", I want to have that on my page.
Going with the consensus in this tread.
SPAM = "Anything I don't like":)
Thank you for the chance to post my views. Here are a few examples of spam:
A few days ago, I did a search on the name (a completely unique name - not a keyword name) of one of my newer sites with very little traffic. On all four top engines, including MSN, there were four results that all redirected to the same site. They were there because they had scraped my site and picked up my site name. They had spots one and two, as well as two lower down. SPAM
In recently researching keywords for a new site, I found on many searches, particularly on Google, that the top ten showed a site that did not exist, eg a placeholder page, with the keywords as the domain name and displaying a Google ad (a handy service Google offers to large registrars and domain brokers). SPAM
When I click on a top ten result and find a pseudo-directory with nothing but a Google ad and searchbox on it - SPAM
I am an affiliate marketer. My sites are not spam. They provide interesting content and a way for people to find products from a variety of sources in a very focused niche. My largest site, which has been trashed by Google for over a year apparently for affiliate links (who knows), has over 90% of visitors adding to favorites. That's right. NOT spam.
Thanks for the chance to vent. As you can probably tell, I loathe Google, and I hope with all my heart that you beat them out in the end.
take away the money, the spam will go away
but i like money so i'll keep on generating spammy sites
personally i don't care if my spammy sites have no benefit to the searcher - i make money out of them so i'm happy
|1) “Spam is a low-quality page that ranks above a high-quality page” works pretty well |
"Quality" is too subjective for your purposes.
|2) Even a quality result is only quality the first time it appears in the results. The repeats are all Spam. |
Let's rephrase this: Even an authoritative result is only relevant the first time...
Nope. A page with stolen content may be a great resource. Just because it's infringing on another's copyright doesn't make it spam or junk. Especially if the content is stolen from a real authority.
|3) A page with stolen content is always Spam... |
I don't think it's up to the SE's to decide what's "stolen" and what came first and all that. Honestly, I don't think they're capable.
Bottom line is, if I search for "widget clips" or "widget wmv" I'm not likely to get the official site of the film Widget, nor will I get the studio or anything official or authoritative.
I will get a bunch of pages with stolen, literally illegal and pirated clips of the film. Some will just have the trailer scraped from the original widgetthemovie.com site (which was not optimized at all, since it was an indie movie and there was no budget for it).
Of course, this stolen, illegal content is exactly what I wanted, and that, as a user, is not spam.
I think most webmasters distinguish between quality and spam; they aren't the same thing; as well, there is a difference between a flawed set of SERPs and Spam.
So, how about this modification to msndude's proposed definition:
“Spam is typically a low-quality page that tricks the search engine into ranking it above higher-quality pages through deceptive techniques”
I agree that a quality result is only quality the first time it appears in the results. However, repetition isn't the same thing as Spam. Repetition could be due to flaws in the Search Engine's Algos, and not due to deceptive techniques or "spamming" activities.
Stealing or duplicating content can be part of a Spamming effort, particularly if the Search Engine is deceived into thinking the document in question is the original source of the content in question. But, we shouldn't try to squeeze all these different morally dubious concepts into a single catch-all name "Spam."
A good example of when spam pages are not necessarily junk is typospam. I've seen instances when the page lists a long, long paragraph of common misspellings of a search term at the bottom.
|Okay, I know what you are talking about, but to get to your destination, you have to do down a very slippery slope. |
Tell me how you would plan a travel page for SEs when the text is full of location names that have to be transliterated and there are several different accepted romanized spellings.
It's all in the amount. I don't see how any algorithm can detect typos that make a small percentage of the text. That would penalise people who use new jargon, fail to spellcheck, use the odd foreign word or phrase, and so on. But when typos make up 30% of the text, especially if it's crowded together in one section, you know it's likely to be spam.
With all spam-detection there's a danger of throwing the baby out with the bathwater. I really don't envy the people who have to write the algorithms to sort the wheat from the chaff. You say looking for typos would put us on a slippery slope, but I think we are already sliding down it.
I have been following this thread for a while and have not heard anyone talk about misleading domain name spam.
What about sites like christian-widget-family-travel.org/ that leads to a viagra sales page or some other totally unrelated site? Does the domain name that has keywords in it have to match the content at all?
I am also seeing people that buy advertising pages on very high profile sites. For instance what if I could host some "ad space" on msn.com with a url like msn.com/viagra/... it does not take much to get this to rank high as it is on an "authority" site... is that wrong? Can those pages be called spam if they have proper info or product sales? It is clear the person setting this up is doing to to have their site be an instant hit with the search engines and so this is in fact playing the algo's... is it not?
| This 51 message thread spans 2 pages: < < 51 ( 1  ) |