Forum Moderators: Robert Charlton & goodroi
My 3 employees and I are totally dependent on income from my web sites. Life has been pretty good as a web site owner, and I want to build on what we've already established. We want to be in this business long-term.
During this week of exile I began to look at my web sites differently. If my banned site was given that golden 30 second review by a real human at the 'plex, what would they see? Would they shrug their shoulders and say the Internet would be a better off without my web site? Would they think that my web site looked exactly like 100 others that they had reviewed previously that day? Or would they see an active, vibrant web site that at least appeared to have a real purpose to serve in cyber space?
Then I started looking at other sites differently. Would CNN.com ever get banned accidentally? How about Adobe.com or WellsFargo.com? Why wouldn't they? They are web sites just like my web sites, except they have a LOT more visitors, and some sort of a brick-and-mortar presence, but the G-bot doesn't know that. I want my sites to be as stable in the SERP's as those sites!
As I started looking for the earmarks of these obviously high-quality sites, I started to notice that the "Signals of Quality" that have been eluded to in many of the discussions here at WW kept coming up. How long are their domain names registered for? How many outbound links do they have? How often are their pages updated? How fast do the pages load? Do the pages validate? These are the kind of things that separate Adobe.com from my sites. As unfair as we may think these signals of quality are, I don't think there are any webmasters at CNN worrying about if their web site is going to be banned overnight. I don't want to have to worry about it either.
I immediately started to compile a list of the "signals of quality" that we've read about in the Google patents, the ones that we've heard about through our discussions with the Google engineers and the ones that we know as a matter of common sense. I would like to share my list in the hopes that others will share theirs too. I realize that some of these signals are controversial, but I don't want to get off on any tangents. Some of these can't be proven, but through anecdotal evidence at least it seems that they might have some bearing. Please share your lists.
Domain name registered for more than 1 year, preferably 10.
Fast loading pages.
Dedicated IP address.
Hosted by a "trusted host".
Low link "churn" (links not changing too fast on a page).
Correctly formatted and validated web pages.
Web site regularly growing in size (not by large spurts).
Backlinks regularly growing in size (not by large spurts).
Real activity visible from the home page.
Home page not overtaken by advertising, AdSense or otherwise, particularly above the fold.
Session ID's in URL not required for viewing web site.
Valid use of Robots.txt file.
Low or no duplicate content.
Low number of affiliate links.
No site-wide external linking.
CNN.com, Adobe.com, and WellsFargo.com would show these signals of quality. What else should be on this list?
I will try to fight my way thru items 38-40 of the patent.
No promises of comprehension, the Stella Artois beer is getting the better of me. [burp!]
I take it a 5-year registration should be enough to indicate a non-throw-away domain.
I see no advantage to 10 year registration with you-know-wholutions.
5-year reg offers some good savings over 1 year in any case.
As for 'legitimacy of the domain name', I suppose that means obviously dodgy domains
which a genuine content site wouldn't normally use, and/or names
far too suspiciously similar to major trademarked businesses. Best -Larry
That's one of the greatest strengths of Bayesian filtering. As the "known good" and "known bad" samples change, so does the weighting and scoring. Again, it's not a black-and-white system, but one of percentages. When a good signal of quality becomes one that is manipulated, then it is automatically given less weight.
Then using Bayesian filtering to implement those "signals of quality" aspects of that patent in an algo is inherently flawed.
I won't get into Bayesean Filtering until I learn more about it.
My limited understanding is that it is self correcting, i.e. some mathematical feedback mechanism.
The reason for this post is pretty small, just a reclection on the name of it:
"Signals of Quality .. what are they?"
The TITLE of the post is the worse criticism I can make.
ANYbody who has ever pored thru trade magazines, professional journals
and/or (gawd help the living) educational tracts, has seen titles like this.
They almost inevitably come from experts who plant trees upside down,
musical savants who cannot play an instrument, and on one notably case
from an educator whose best known quote is:
"WE needs must revise upwards our standards of excellence .."
Obviously a screaming idiot. He can't say: "We need to raise our standards/" .. too simple.
The difference is that you gave some specifics to chew on, darned good ones.
Now please invent a better title. We should revise upwards our WW thread titles. -Larry
The goal of SEO is to emulate the profile of quality and the job of the search engines is to separate the emulators from emulated.
The flaw in Bayesean Filtering is that its profiles are developed from history not current practice. The emulators who learn faster than the Bayesean Filters will thrive and churn spam sites faster than the filters can profile them. But there is a point of diminishing returns for the spammers. When they look for the next play, scraping will then be dead. Finally.
The beat goes on.
I understood most of the potential "signals of quality" that have been mentioned, but didn't understand this one:
"...having an MX record associated with a domain is a big deal, in my opinion. A quality domain usually has an associated email address."
I can see why email policies could provide an indication of quality, or the absence of quality. For instance, the people who steal content sometime suse fake registrations, making it difficult or impossible to track them down.
I'd never heard of an "MX record" before. After a bit of research, I found this cryptic explanation:
"An MX record is simply the method used by DNS to route mail bound for one machine to another instead. An MX record is created by a single line in one of your 'named' files: .... This line says that all mail destined for hostA in your domain should instead be delivered to hostB in your domain. ..."
Now, can someone please explain what is being suggested? For instance, suppose I operate 3 websites:
abcnichesite.com
xyzforum.com
and
mycorpsite.com
People contact me by sending emails to
webguy@mycorpsite.com
What else would I need to do, in order to ensure that all 3 of these sites have this possible "MX" related signal of quality?