Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Signals of Quality - What are they?

Here's my list

         

dataguy

9:50 pm on Aug 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Many of us - perhaps most of us - have been hit by the recent spate of updates from Google. I have a site of my own that was completely de-indexed and thankfully was re-included after only a week. During that week I did some real soul-searching.

My 3 employees and I are totally dependent on income from my web sites. Life has been pretty good as a web site owner, and I want to build on what we've already established. We want to be in this business long-term.

During this week of exile I began to look at my web sites differently. If my banned site was given that golden 30 second review by a real human at the 'plex, what would they see? Would they shrug their shoulders and say the Internet would be a better off without my web site? Would they think that my web site looked exactly like 100 others that they had reviewed previously that day? Or would they see an active, vibrant web site that at least appeared to have a real purpose to serve in cyber space?

Then I started looking at other sites differently. Would CNN.com ever get banned accidentally? How about Adobe.com or WellsFargo.com? Why wouldn't they? They are web sites just like my web sites, except they have a LOT more visitors, and some sort of a brick-and-mortar presence, but the G-bot doesn't know that. I want my sites to be as stable in the SERP's as those sites!

As I started looking for the earmarks of these obviously high-quality sites, I started to notice that the "Signals of Quality" that have been eluded to in many of the discussions here at WW kept coming up. How long are their domain names registered for? How many outbound links do they have? How often are their pages updated? How fast do the pages load? Do the pages validate? These are the kind of things that separate Adobe.com from my sites. As unfair as we may think these signals of quality are, I don't think there are any webmasters at CNN worrying about if their web site is going to be banned overnight. I don't want to have to worry about it either.

I immediately started to compile a list of the "signals of quality" that we've read about in the Google patents, the ones that we've heard about through our discussions with the Google engineers and the ones that we know as a matter of common sense. I would like to share my list in the hopes that others will share theirs too. I realize that some of these signals are controversial, but I don't want to get off on any tangents. Some of these can't be proven, but through anecdotal evidence at least it seems that they might have some bearing. Please share your lists.

Domain name registered for more than 1 year, preferably 10.
Fast loading pages.
Dedicated IP address.
Hosted by a "trusted host".
Low link "churn" (links not changing too fast on a page).
Correctly formatted and validated web pages.
Web site regularly growing in size (not by large spurts).
Backlinks regularly growing in size (not by large spurts).
Real activity visible from the home page.
Home page not overtaken by advertising, AdSense or otherwise, particularly above the fold.
Session ID's in URL not required for viewing web site.
Valid use of Robots.txt file.
Low or no duplicate content.
Low number of affiliate links.
No site-wide external linking.

CNN.com, Adobe.com, and WellsFargo.com would show these signals of quality. What else should be on this list?

borisbaloney

3:58 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



Great thread dataguy. I've suspected "signal of quality" dependant Bayesian filters / bonuses have been the X factor in Google results. You just have to look at the different techniques being applied to spam filtering for other ideas as well on how Google might attempt to clean it's results.

I wonder if Google developed a new system for getting close to spam-free SERPS if that technique could be crossed back over to email spam.

sunflower12

4:12 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



My site is over 5 years old and it has always had a privacy policy posted. It was banned on July 28. I had a shared ip so I switched hosts and now I have a dedicated ip. I was told that was probably the main problem. That was changed last week but it does not seem to have helped as of yet. Corrections to many html errors have been made also.

I wonder how google views domains that have their information registered as private. I used to have it public, but I received so many phone calls and mail with people begging me to advertise on my site (thats when I ranked very well in google) that I decided to make my information private. I wonder if google frowns on that. I do have an e-mail contact on my page though.

GeorgeK

5:59 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



Yahoo doesn't have a robots.txt file on their main site:

[yahoo.com...]

I guess Google doesn't consider Yahoo a quality site. ;)

dataguy

6:24 pm on Aug 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, Yahoo's homepage PR did change from PR10 to PR9. ;)

wanna_learn

7:01 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



dataguy
one more major "ideology" difference I noted in your and Steveb's list is "he more talks about how to sustain the good positioning" or in other words "how to remain a quality site in google's eye in long run", while you talk about "minimum basic things to be present in any site to give it a good start in the eyes of google".

Kirby

7:27 pm on Aug 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



it appears that what Google is now doing is a sort of Bayesian filtering that takes into account external factors such as domain registration and the reputation of the web host.

It doesnt appear that way to me at all. Just because a patent is floating around doesnt mean it is being employeed in an algo.

I have a site that has consistently ranked at the top in a highly competitive market that has survived virtually untouched every Google algo update since before Florida.

I renew the domain automatically every year.
It does not have a dedicated IP, but has been on the same shared IP for years.
Links are not added frequently. I dont chase links and am particular to what I link to from this site.
The content doesnt change often, as some info doesnt need to be updated.
It doesnt have a robots.txt file as I dont need to exclude anything.

What this site does have is most everything on Steve's list. 95% of the interior pages rank well for their specific and relevant page titles. Signs of quality manifest themselves sitewide. Just look at Adobe and see how many pages other than the home page are PR10.

I can create sites all day long that utilize every item on your list, but to create sites that employ all aspects of Steve's list takes time and some circumstances beyond my control. That outside objectivity is crucial.

If you want to look at patents, then go back to the beginning. The off page concepts behind page rank are still key and IMO are the primary deciding factors in determining quality as well as authoritative status.

2by4

8:25 pm on Aug 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"but to create sites that employ all aspects of Steve's list takes time and some circumstances beyond my control."

Very well said Kirby. But even with that, those circumstances beyond your control are not quite as beyond it as you might think. That's what makes steveb's list so good. It's not a spammer formula, you can't achieve this without creating a quality site, with quality content, over time. And if spammers did follow it, well, then they'd stop being spammers, and start making quality sites. Since there's no danger of that ever happening, spammers always being out for a quick buck, a quick fix, minimum work for maximum revenue, that list will continue to be a pretty good indicator for some time to come.

JaySmith

9:31 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



Great Thread.. This is one of the better threads started here a WebmasterWorld.

Here are my inputs:

The Google Patent-

The Google patent may just be a ruse to find more SEO's out there.. Let's say that 1% of sites have a ten year domain registration.. now let's say after the patent that number goes up to 25%.... Google and take the extra 24% and penalize due to manipulation of search results...

Some if it makes a lot of sense and I'm sure they didn't release it for that reason.. But again, I have not and will not change my webmastering practices based on this patent.

Robots.Txt -

As far as the robots.txt file goes, that is just rediculous. You only need a robots.txt file if you have pages you don't want indexed. Most of these big guys have thousands of pages that may meet this criteria for bandwidth purposes, etc...

Privacy Statements -

I have to agree with this one... I think other engines such as Yahoo even go as far to say you should have one. I can't find the quote but I read it somewhere. I noticed the same coincidence on my sites.. The ones with privacy statement for the most part maintained ranking and the ones without lost ranking and in Yahoo's case, got booted..

Sitewide Links -

I think a site can safely employ a few of these but too many may indicate that the site is a link farm in disguise.

Low Link Churn -

This one was spot on... This an easy filter on quality of sites and I'm sure they are developing link churning filters

Dedicated IP -

The majority of pages on the net are on shared ip's... I doubt this is a signal of quality. It's just a signal how deep the webmaster's pockets are.

The companies in question do have one thing that most of us don't.. A STRONG OFFLINE PRESENCE.

They are advertised in just about every medium which creates visitors which then creates links which then boosts ranking and pr... It's just the fact that they can "Buy" the signals of quality. For a large budget, anyone could get any useful high quality site ranking number one for any term... Just spend it on advertising and provided it's a quality site, it too can become one of these mega-authorities.

I promote my sites equally as hard offline as I do online... I have created two strong pr8 sites over the last two years by doing this that are rock solid in Google Serps. I promote these sites as if search engines did not exist.. I never buy links nor do I participate in link exchanges, etc... That is key.. Build a quality site, don't try to manipulate search results, and get traffic through all means available (online and offline).

To sum up, I agree with almost everything posted in the initial post. Nice post.:)

Eltiti

9:37 pm on Aug 17, 2005 (gmt 0)

10+ Year Member



You only need a robots.txt file if you have pages you don't want indexed.

I just use one to keep "meaningless" 404s out of my logs; I don't think it's a "quality indicator"...

steveb

9:56 pm on Aug 17, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"A search engine must balance the two because what searchers want is primarily relevance"

Not any searcher I have ever seen or heard of! C'mon. People search for things that apply helpfully to their query, not things that just happen to be about the query.

Relevance is a 1998 issue. That's over. Anybody can find relevance. Quality and accuracy are what people want, not merely being on-topic.

Dataguy, you are confusing "signals of likeliness of non-spam" with "signals of quality". There is a massive difference between those. Just because a site is not a piece of utterly useless garbage doesn't mean it is "quality".

It's like I mentioned about link text and title and H1. It is a red flag for garbage to have a high amount of link text totally different than page title and H1, but it is not really much of a signal of quality have them conform.

In other words, the abcense of cancer doesn't mean a person is healthy.

andrea99

10:11 pm on Aug 17, 2005 (gmt 0)



Banned July 28:

Dedicated IP, blue chip host. 99.99% uptime
Domain registered from 2002-2011, public contact info.
Has been indexed by Google since mid 2002, pr5 since 2003.

A well developed policy page (incl. privacy) and many conversational pages about who I am and what my site is about.

I deleted robots.txt because AdSense seemed to have some problem with it. I have since replaced it.

I began AdSense in late '04, had no afilliates or ads before that. Tried and dropped Amazon a few months ago.

I have hand coded my own template, created all my own graphics.

All static html or shtml pages...

I have used excerpts from sites to describe them including meta descriptions, this may be where I ran into trouble. I have denied googlebot these pages in robots.txt and with googlebot "noindex, nofollow" meta tags. I can't delete them they do ok with MSN and Yahoo...

::: sigh ::: still banned, fingers crossed.

dataguy

3:27 pm on Aug 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You can disagree with the initial list, that's fine. The point of a discussion is to bat around ideas.

I do think that some thought needs to be given to the idea that what has been mentioned by Google staff and patents as "signals of quality" as having nothing to do with positioning in search results.

I know, if it doesn't have to do with search positioning, then why is it important? I'm not sure how to answer that, but I can say that GoogleGuy mentioned looking at "signals of quality" when he said they were about to perform an update that resulted in Google banning a whole lot of web sites on July 28th.

There have been numerous mentions of conversations with the Google engineers where the engineers talked about factors that are looked at which don't effect search positioning. I think the two that were most talked about was the use of click-thru data and the use of surfing habits recorded through the use of the toolbar.

What are they doing with this data if it doesn't effect search positioning? I overheard these conversations myself. The reason given for not using this data directly was because of previous patents and because it would make gaming the system too easy. Ok, easily understood, but then why are they collecting this data?

JuniorOptimizer

4:15 pm on Aug 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The trouble is there's been nothing but vague discussions and vague comments from GG or Google engineers, and nothing specific.

sunflower12

7:04 pm on Aug 19, 2005 (gmt 0)

10+ Year Member



"public contact info"

I asked this in an earlier post, but did not get an answer. Could private info with Whois be a problem with google. I do have contact info on my website; an e-mail address.

Kirby

8:03 pm on Aug 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



then why are they collecting this data?

Knowledge (information) is power.

europeforvisitors

8:14 pm on Aug 19, 2005 (gmt 0)



Could private info with Whois be a problem with google

Probably not by itself, but it could be one more nail in the coffin if other negative "signals of quality" (or non-quality) exist.

andrea99

8:27 pm on Aug 19, 2005 (gmt 0)



I think there are a whole constellation of attributes that are signals of quality. They are weighted differently and scored based on the seriousness and quantity with which they appear on a given site. There is probably a total threshold score and one very egregious offense can wipe out a lot of "quality" scores and a lot of borderline scores taken together can also cross the threshold. This is why no one factor seems to be the trigger but it is the combined score.

And certainly the bots can err on one or more factor...

Pico_Train

8:36 pm on Aug 19, 2005 (gmt 0)

10+ Year Member



From my experience the term of registration of a domain name, 1 year, 2 years, 10 years isn't that important.

I renew mine every year. No problems so far, it's been 4 years.

Counting that variable as a sign of quality is assuming those who do not have enough money for 10 years' registration fee to be of poor quality. Would you buy a bus pass for the next 10 years just so you can get a seat on the bus in the morning? Do you buy a 10 year membership at the tennis club? Do you buy a 10 year membership to Time magazine? Do you sign a 10 year rental agreement for your 1 bedroom apartment?

Not really a good assumption quite frankly.

dataguy

8:48 pm on Aug 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not really a good assumption quite frankly.

Again, most of these are not mere assumptions. The registration signal is stated directly in one of Google's patents. Also I don't think there is a lot of assuming going on in the 'plex. I would imagine that each element of any equation would go through quite a bit of statistical testing before it could be considered.

steveb

8:52 pm on Aug 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"if it doesn't have to do with search positioning, then why is it important"

Now you are going off onto another road. Just because something is used in ranking doesn't make it a "signal of quality". No way. A page title is used in ranking, but in most cases there is zero signal of quality in a title. (Some titles could be negative signals, like repeating one pharm word eight times.) A page title of George Bush offers ZERO quality signal. It offers a relevance signal, but relevance is trivial. If you have a million pages tilted simply George Bush, they all offer a signal of relavance, but they are interchangeable in terms of quality, and therefore signals of quality are needed.

The ten year registration is a small signal of quality. Alone it means very little, but combined with other more significant signals, it can contribute to helping judge something better.

texasville

9:12 pm on Aug 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I really think that it needs to be added the word "perceived" signals of quality because we all know in the real world almost none of these things have to do with either a "quality" site or the "quality" of the business the site represents. This is the problem with algorythms and non-human interaction of controls. Smacks of the scenario of machines ruling the world.

randle

9:26 pm on Aug 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Correct who is information and registering your domain out five or ten years; does it matter? Who knows for sure, but it’s easy, not very expensive, and doing it won’t harm you. Possible upside with positively no downside at all; why wouldn’t you?

andrea99

9:40 pm on Aug 19, 2005 (gmt 0)



positively no downside at all;

There is one. If the domain is banned by Google for ten years you've wasted your money.

texasville

9:42 pm on Aug 19, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What? Another $100? Go to your clients (small businesses) and tell them another $100 bucks. And then google says Hey! We want every one to do this...another $100? Where does it stop? Google needs to wake up and stop lumping everyone in the millionaire category.
Small businesses keep getting noodled to death.

europeforvisitors

2:47 am on Aug 20, 2005 (gmt 0)



And then google says Hey! We want every one to do this...another $100?

Google hasn't said that.

In fact, Google almost certainly wouldn't want every business to rush out and buy a 10-year domain renewal for SEO purposes, because length of domain registration would then become useless as a signal of quality.

andrea99

3:24 am on Aug 20, 2005 (gmt 0)



**ANY** metric, once it becomes known as part of the Google algo is no longer a signal of quality but is then a signal of SEO.

Once some attribute is discussed here and on the other forums and at the "secret SEO conferences" its value begins to degrade. They all go the way of the back link, bought and sold.

texasville

4:00 am on Aug 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



To tell the truth, I don't know what to think about google anymore. I look at the top site for my niche category- a commercial widget sold from brick and mortar store that is big industry but sold thru smaller franchise types- and the signals of quality are there. However they also have the hidden links that fool the se's into thinking they are linking to related sites and they have hidden text. But they also have been online for a while and are listed in the dmoz. But what really gets me is all their information and safety pages are scraped directly from a government site-verbatim and pics! So where is the dupe content filters? Where are the filters that detect the hidden stuff?
Mean while I toil on and phfffft! I can't get this site out of the sandbox because I have tripped some unknown filter. Tops for all my keywords and phrases in yahoo and msn but google seems to hate me. Privacy policy, terms of service and all.

larryhatch

6:18 am on Aug 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Dataguy (or anyone else)

Just when was the patent published, with "10-year registration" mentioned?
Anybody have a link (URL) so I can see just how that's worded?

How about 5-year registrations? I registered for 5 years with the big former monopoly.
Definitely not a throw-away domain, its very similar to my own name.
Due to expire a year from now, I'm wondering whether to renew for 5 years again, or for 10. - Larry

TravelDog

10:46 am on Aug 20, 2005 (gmt 0)

10+ Year Member



Signals of Quality:

Company Name
Address
Tel, Fax etc

Company Number
VAT (Tax) Number

Unique Content
Code and CSS that Validates
Speed at which pages load
Ability to click back to last site
Site not over optimised
Time spend buy surfers or customers in the site
Up to date security certificate

Customer / Supplier Testimonials
Regular Updates, amendments
Time specific data
Disability Access
Disability Access Statement
Membership(s) and links to trusted Trade Organisations and Bodies
Well written Privacy Policy
FAQs
About Us
Complaints Procedure
Quality on topic links inbound
Quality on top links outbound

Footnote:

Since the acts of terrorism in London we’ve had a few very well established online businesses wobble because they only concentrated on London widgets. Think about spreading risk if your company or product is based on one location. All the people we speak to say they never thought it would happen here and therefore had not prepared. Also consider your suppliers risk and risk throughout the value chain / network.

dataguy

1:07 pm on Aug 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



**ANY** metric, once it becomes known as part of the Google algo is no longer a signal of quality but is then a signal of SEO.

That's one of the greatest strengths of Bayesian filtering. As the "known good" and "known bad" samples change, so does the weighting and scoring. Again, it's not a black-and-white system, but one of percentages. When a good signal of quality becomes one that is manipulated, then it is automatically given less weight.

Larry, The google patent is found HERE [appft1.uspto.gov]

It's not easy reading by any stretch. Items 38 to 40 have to do with scoring based on criteria from the 'associated domain' of a document. #39 has to do with the legitimacy of the domain name (what the heck does that mean?). #40 mentions scoring based on the expiration date of the associated domain. The 10-year term is not stated, but it's what is often used as an example because it is the longest term that a name can be registered for.

[edited by: lawman at 4:40 am (utc) on Dec. 21, 2005]

This 68 message thread spans 3 pages: 68