Welcome to WebmasterWorld Guest from 184.108.40.206
Well, as an editor is it expected of me to look for -
1. Spam issues such as hidden text, CSS manipulations, repeated words.
2. Thin affiliate that has predominantly scraped content.
3. Over implementation of SEO techniques
Few things are apparent on the first look. For few more, you should go after finding. Having been in the SEO industry, I am afraid I am not thinking like an ordinary editor and need help.
Overuse of SEO methods and using hidden text spam would not usually be a consideration for denying a listing - but many sites that engage in such activities would often be more likely to be engaging in other things that would make such a site unlistable as per the editing guidelines.
I play a part in formulating the rules/policy of the directory.
"Non-unique content would get the site, any site, a bar to listing at the ODP for example".
This would mean, using tools to see if the site is duplicating content, if it is, then it doesn't rightfully own the content, so on and so-forth. Does each editor follow these rigours? or are they doing the job of search engines?
"and more experienced spam-sniffers persuivant can walk you through the process of the traditional tests of fire, water, and trial by combat"
This process sure can't be automated for a directory editor. Apart from what is an editor's main job (of making sure relevant sites are listed in appropriate categories, in the specified format), to what extent should one go in finding things that may fall in the grey zone.
For example, when I see an affiliate site, I first look for unique content, and/or if it is doing any value-add. Is this done by a typical editor or required to be done by one?
Thanks for all your feedbacks :)
With a little practice, the thinscraper sites will start jumping out at you like popping popcorn -- it'sll be obvious that they have unique word-shuffling but no unique information or unique knowledge or unique perspective. And you can spend more time on the sites that don't shout (so loud) "I'm spam! can me!"
Also, contemplating the concepts of "authenticity" and "authoritativeness" will make a lot of editing decisions easier.
But take it internally: that's where the real knowledge always is.
If you are setting up a directory you should make it policy to completely ignore spam.
Remember that we are talking about a directory targeted to human visitors, not one targeted to a SE audience. What do those visitors expect to find? If they use your directory and choose sites to visit, do they care if those sites have hidden keywords? I don't think so, they care about the content they find. If that content warrants a listing, list the site.
Apart from that, I think it's the job of the SE to care about their types of spam, not the job of directory editors. Why do people optimize this way? Because of poorly designed SE. Maybe SE need to redesign their algorithms, but it's not "our" fault as directory editors if SE can be fooled.
(According to the amount of people complaining about listed sites with hidden keywords and stuff, a directory with "certified hidden keyword free" sites could target a niche market. But I guess approx. 100% of those complaints come from competitors, so that might not be a good idea at all ;-) )
This process sure can't be automated for a directory editor.
joined:Mar 13, 2005
Far as originality, who defines this?
Find me a site you think is original, find me the MOST original site on the Internet and I will find you its duplicate.
Please define spam (seriously... but don't go over the edge).spam is unsolicited information, of any kind, that is not required by the recipient. (Some people like mass mailouts or brochures shoved in their letterboxes, but, for those who do not it's spam)
Far as originality, who defines this?There are many possible answers to this, but I believe you mean in terms of a choice between multiple sites presenting the same, similar, or plagiarised content. For this the real world answer is that the first site someone sees with that content is the 'original' (unless or until proven otherwise).
In terms of a directory, the originality of the content is not the primary consideration, but rather 'would adding this source of information add to the sum of information already listed/available in the category?' ie is the information unique in terms of the already listed category content? If someone comes along and says "that content is copied from the CIA factbook" well, then the directory can choose whether to replace the listing with the more authoritative source, or continue to list the copy.
Each directory will have their own rules on how to handle this type of situation.
Directories that are commercial (trying to make money for themselves) are going to have more of a dilemma where a non-commercial and a commercial site with the same content exist, and the latter is prepared to pay for the listing (or pay more than the non-commercial) - example would be a hotel with it's own site, and a hotel booking service with a micro site for the establishment. Both have substantially the same content, the only difference is who takes the bookings.
Much of the time I don't want it because I've seen it dozens of time already. But you might be the first person to send me an e-mail offering to sell me stock in a Tibetan Cross-country Canal Corporation -- and you'd still be a spammer.
Most of the time you don't know (or care) that I don't want it because you're broadcasting it. But you might pick me out of millions, to send your one and only Nigerian Inheritance offer -- and you'd still be a spammer.
Most of the time you'd be doing it for crass commercial purposes. But you could be looking for converts for the First United Mosque of Messianic Wicca, and it would still be spam (unless, of course, I'd posted on some newsgroup that I was looking for an eclectic yet exoteric religion that didn't involve making odd postures in public places.)
The electronic medium really doesn't matter. E-mail, fax, doorway pages in search engines or directories, it's all the same if it's between me and what _I_ wanted to use that medium to receive.