Forum Moderators: Robert Charlton & goodroi
My 3 employees and I are totally dependent on income from my web sites. Life has been pretty good as a web site owner, and I want to build on what we've already established. We want to be in this business long-term.
During this week of exile I began to look at my web sites differently. If my banned site was given that golden 30 second review by a real human at the 'plex, what would they see? Would they shrug their shoulders and say the Internet would be a better off without my web site? Would they think that my web site looked exactly like 100 others that they had reviewed previously that day? Or would they see an active, vibrant web site that at least appeared to have a real purpose to serve in cyber space?
Then I started looking at other sites differently. Would CNN.com ever get banned accidentally? How about Adobe.com or WellsFargo.com? Why wouldn't they? They are web sites just like my web sites, except they have a LOT more visitors, and some sort of a brick-and-mortar presence, but the G-bot doesn't know that. I want my sites to be as stable in the SERP's as those sites!
As I started looking for the earmarks of these obviously high-quality sites, I started to notice that the "Signals of Quality" that have been eluded to in many of the discussions here at WW kept coming up. How long are their domain names registered for? How many outbound links do they have? How often are their pages updated? How fast do the pages load? Do the pages validate? These are the kind of things that separate Adobe.com from my sites. As unfair as we may think these signals of quality are, I don't think there are any webmasters at CNN worrying about if their web site is going to be banned overnight. I don't want to have to worry about it either.
I immediately started to compile a list of the "signals of quality" that we've read about in the Google patents, the ones that we've heard about through our discussions with the Google engineers and the ones that we know as a matter of common sense. I would like to share my list in the hopes that others will share theirs too. I realize that some of these signals are controversial, but I don't want to get off on any tangents. Some of these can't be proven, but through anecdotal evidence at least it seems that they might have some bearing. Please share your lists.
Domain name registered for more than 1 year, preferably 10.
Fast loading pages.
Dedicated IP address.
Hosted by a "trusted host".
Low link "churn" (links not changing too fast on a page).
Correctly formatted and validated web pages.
Web site regularly growing in size (not by large spurts).
Backlinks regularly growing in size (not by large spurts).
Real activity visible from the home page.
Home page not overtaken by advertising, AdSense or otherwise, particularly above the fold.
Session ID's in URL not required for viewing web site.
Valid use of Robots.txt file.
Low or no duplicate content.
Low number of affiliate links.
No site-wide external linking.
CNN.com, Adobe.com, and WellsFargo.com would show these signals of quality. What else should be on this list?
I wonder if Google developed a new system for getting close to spam-free SERPS if that technique could be crossed back over to email spam.
I wonder how google views domains that have their information registered as private. I used to have it public, but I received so many phone calls and mail with people begging me to advertise on my site (thats when I ranked very well in google) that I decided to make my information private. I wonder if google frowns on that. I do have an e-mail contact on my page though.
[yahoo.com...]
I guess Google doesn't consider Yahoo a quality site. ;)
it appears that what Google is now doing is a sort of Bayesian filtering that takes into account external factors such as domain registration and the reputation of the web host.
It doesnt appear that way to me at all. Just because a patent is floating around doesnt mean it is being employeed in an algo.
I have a site that has consistently ranked at the top in a highly competitive market that has survived virtually untouched every Google algo update since before Florida.
I renew the domain automatically every year.
It does not have a dedicated IP, but has been on the same shared IP for years.
Links are not added frequently. I dont chase links and am particular to what I link to from this site.
The content doesnt change often, as some info doesnt need to be updated.
It doesnt have a robots.txt file as I dont need to exclude anything.
What this site does have is most everything on Steve's list. 95% of the interior pages rank well for their specific and relevant page titles. Signs of quality manifest themselves sitewide. Just look at Adobe and see how many pages other than the home page are PR10.
I can create sites all day long that utilize every item on your list, but to create sites that employ all aspects of Steve's list takes time and some circumstances beyond my control. That outside objectivity is crucial.
If you want to look at patents, then go back to the beginning. The off page concepts behind page rank are still key and IMO are the primary deciding factors in determining quality as well as authoritative status.
Very well said Kirby. But even with that, those circumstances beyond your control are not quite as beyond it as you might think. That's what makes steveb's list so good. It's not a spammer formula, you can't achieve this without creating a quality site, with quality content, over time. And if spammers did follow it, well, then they'd stop being spammers, and start making quality sites. Since there's no danger of that ever happening, spammers always being out for a quick buck, a quick fix, minimum work for maximum revenue, that list will continue to be a pretty good indicator for some time to come.
Here are my inputs:
The Google Patent-
The Google patent may just be a ruse to find more SEO's out there.. Let's say that 1% of sites have a ten year domain registration.. now let's say after the patent that number goes up to 25%.... Google and take the extra 24% and penalize due to manipulation of search results...
Some if it makes a lot of sense and I'm sure they didn't release it for that reason.. But again, I have not and will not change my webmastering practices based on this patent.
Robots.Txt -
As far as the robots.txt file goes, that is just rediculous. You only need a robots.txt file if you have pages you don't want indexed. Most of these big guys have thousands of pages that may meet this criteria for bandwidth purposes, etc...
Privacy Statements -
I have to agree with this one... I think other engines such as Yahoo even go as far to say you should have one. I can't find the quote but I read it somewhere. I noticed the same coincidence on my sites.. The ones with privacy statement for the most part maintained ranking and the ones without lost ranking and in Yahoo's case, got booted..
Sitewide Links -
I think a site can safely employ a few of these but too many may indicate that the site is a link farm in disguise.
Low Link Churn -
This one was spot on... This an easy filter on quality of sites and I'm sure they are developing link churning filters
Dedicated IP -
The majority of pages on the net are on shared ip's... I doubt this is a signal of quality. It's just a signal how deep the webmaster's pockets are.
The companies in question do have one thing that most of us don't.. A STRONG OFFLINE PRESENCE.
They are advertised in just about every medium which creates visitors which then creates links which then boosts ranking and pr... It's just the fact that they can "Buy" the signals of quality. For a large budget, anyone could get any useful high quality site ranking number one for any term... Just spend it on advertising and provided it's a quality site, it too can become one of these mega-authorities.
I promote my sites equally as hard offline as I do online... I have created two strong pr8 sites over the last two years by doing this that are rock solid in Google Serps. I promote these sites as if search engines did not exist.. I never buy links nor do I participate in link exchanges, etc... That is key.. Build a quality site, don't try to manipulate search results, and get traffic through all means available (online and offline).
To sum up, I agree with almost everything posted in the initial post. Nice post.:)
Not any searcher I have ever seen or heard of! C'mon. People search for things that apply helpfully to their query, not things that just happen to be about the query.
Relevance is a 1998 issue. That's over. Anybody can find relevance. Quality and accuracy are what people want, not merely being on-topic.
Dataguy, you are confusing "signals of likeliness of non-spam" with "signals of quality". There is a massive difference between those. Just because a site is not a piece of utterly useless garbage doesn't mean it is "quality".
It's like I mentioned about link text and title and H1. It is a red flag for garbage to have a high amount of link text totally different than page title and H1, but it is not really much of a signal of quality have them conform.
In other words, the abcense of cancer doesn't mean a person is healthy.
Dedicated IP, blue chip host. 99.99% uptime
Domain registered from 2002-2011, public contact info.
Has been indexed by Google since mid 2002, pr5 since 2003.
A well developed policy page (incl. privacy) and many conversational pages about who I am and what my site is about.
I deleted robots.txt because AdSense seemed to have some problem with it. I have since replaced it.
I began AdSense in late '04, had no afilliates or ads before that. Tried and dropped Amazon a few months ago.
I have hand coded my own template, created all my own graphics.
All static html or shtml pages...
I have used excerpts from sites to describe them including meta descriptions, this may be where I ran into trouble. I have denied googlebot these pages in robots.txt and with googlebot "noindex, nofollow" meta tags. I can't delete them they do ok with MSN and Yahoo...
::: sigh ::: still banned, fingers crossed.
I do think that some thought needs to be given to the idea that what has been mentioned by Google staff and patents as "signals of quality" as having nothing to do with positioning in search results.
I know, if it doesn't have to do with search positioning, then why is it important? I'm not sure how to answer that, but I can say that GoogleGuy mentioned looking at "signals of quality" when he said they were about to perform an update that resulted in Google banning a whole lot of web sites on July 28th.
There have been numerous mentions of conversations with the Google engineers where the engineers talked about factors that are looked at which don't effect search positioning. I think the two that were most talked about was the use of click-thru data and the use of surfing habits recorded through the use of the toolbar.
What are they doing with this data if it doesn't effect search positioning? I overheard these conversations myself. The reason given for not using this data directly was because of previous patents and because it would make gaming the system too easy. Ok, easily understood, but then why are they collecting this data?
Could private info with Whois be a problem with google
Probably not by itself, but it could be one more nail in the coffin if other negative "signals of quality" (or non-quality) exist.
And certainly the bots can err on one or more factor...
I renew mine every year. No problems so far, it's been 4 years.
Counting that variable as a sign of quality is assuming those who do not have enough money for 10 years' registration fee to be of poor quality. Would you buy a bus pass for the next 10 years just so you can get a seat on the bus in the morning? Do you buy a 10 year membership at the tennis club? Do you buy a 10 year membership to Time magazine? Do you sign a 10 year rental agreement for your 1 bedroom apartment?
Not really a good assumption quite frankly.
Not really a good assumption quite frankly.
Again, most of these are not mere assumptions. The registration signal is stated directly in one of Google's patents. Also I don't think there is a lot of assuming going on in the 'plex. I would imagine that each element of any equation would go through quite a bit of statistical testing before it could be considered.
Now you are going off onto another road. Just because something is used in ranking doesn't make it a "signal of quality". No way. A page title is used in ranking, but in most cases there is zero signal of quality in a title. (Some titles could be negative signals, like repeating one pharm word eight times.) A page title of George Bush offers ZERO quality signal. It offers a relevance signal, but relevance is trivial. If you have a million pages tilted simply George Bush, they all offer a signal of relavance, but they are interchangeable in terms of quality, and therefore signals of quality are needed.
The ten year registration is a small signal of quality. Alone it means very little, but combined with other more significant signals, it can contribute to helping judge something better.
positively no downside at all;
And then google says Hey! We want every one to do this...another $100?
Google hasn't said that.
In fact, Google almost certainly wouldn't want every business to rush out and buy a 10-year domain renewal for SEO purposes, because length of domain registration would then become useless as a signal of quality.
Once some attribute is discussed here and on the other forums and at the "secret SEO conferences" its value begins to degrade. They all go the way of the back link, bought and sold.
Just when was the patent published, with "10-year registration" mentioned?
Anybody have a link (URL) so I can see just how that's worded?
How about 5-year registrations? I registered for 5 years with the big former monopoly.
Definitely not a throw-away domain, its very similar to my own name.
Due to expire a year from now, I'm wondering whether to renew for 5 years again, or for 10. - Larry
Company Name
Address
Tel, Fax etc
Company Number
VAT (Tax) Number
Unique Content
Code and CSS that Validates
Speed at which pages load
Ability to click back to last site
Site not over optimised
Time spend buy surfers or customers in the site
Up to date security certificate
Customer / Supplier Testimonials
Regular Updates, amendments
Time specific data
Disability Access
Disability Access Statement
Membership(s) and links to trusted Trade Organisations and Bodies
Well written Privacy Policy
FAQs
About Us
Complaints Procedure
Quality on topic links inbound
Quality on top links outbound
Footnote:
Since the acts of terrorism in London we’ve had a few very well established online businesses wobble because they only concentrated on London widgets. Think about spreading risk if your company or product is based on one location. All the people we speak to say they never thought it would happen here and therefore had not prepared. Also consider your suppliers risk and risk throughout the value chain / network.
**ANY** metric, once it becomes known as part of the Google algo is no longer a signal of quality but is then a signal of SEO.
That's one of the greatest strengths of Bayesian filtering. As the "known good" and "known bad" samples change, so does the weighting and scoring. Again, it's not a black-and-white system, but one of percentages. When a good signal of quality becomes one that is manipulated, then it is automatically given less weight.
Larry, The google patent is found HERE [appft1.uspto.gov]
It's not easy reading by any stretch. Items 38 to 40 have to do with scoring based on criteria from the 'associated domain' of a document. #39 has to do with the legitimacy of the domain name (what the heck does that mean?). #40 mentions scoring based on the expiration date of the associated domain. The 10-year term is not stated, but it's what is often used as an example because it is the longest term that a name can be registered for.
[edited by: lawman at 4:40 am (utc) on Dec. 21, 2005]