Testing methodologies - discovering how Google is actually working - Google Search and SEO forum at WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Testing methodologies - discovering how Google is actually working

Marcia

11:28 am on Dec 10, 2008 (gmt 0)

The following 8 messages were moved here from another discussion:
Experiments in keyword rich links to Home [webmasterworld.com] >

Let's say you test something on a minimum 30 or so sites and it works on all 30 sites.

And those 30 test sites are 100% equal to each other in every single respect, with regard to every single relevant algo factor (without exception), including the number, relevance, percentage, ratio and age of inbound link, as well as age, update and freshness factors? Right?

Is that how such tests can be - and have been - conducted using 30 test sites? Everything is completely identical other than the one factor being tested, right?

[edited by: tedster at 7:14 pm (utc) on Dec. 18, 2008]

Marcia

11:42 am on Dec 10, 2008 (gmt 0)

Has anyone done an exact match anchor text search at Google on any sites, for their internal anchor text, like:

inanchor:"fuzzy green widgets"

I just have, for a site that's way over the top for repetition on a per page level, that's now no longer pulling Yahoo! or Live traffic of any significance. There's not a sign of that backlink internal anchor text for the site doing such a search, even though all the pages are indexed.

All told, it's a perfect time to start some internal backlink testing with it.

whitenight

11:51 am on Dec 10, 2008 (gmt 0)

And those 30 test sites are 100% equal to each other in every single respect, with regard to every single relevant algo factor (without exception), including the number, relevance, percentage, ratio and age of inbound link, as well as age, update and freshness factors? Right?
Is that how such tests can be - and have been - conducted using 30 test sites? Everything is completely identical other than the one factor being tested, right?

No, that would defeat the point.

It's no different that any other statistical testing, the most obvious case would be voting polls.

You can basically wrap the entire populace into 30-40 "different" sub-classes with enough difference and similarities to account for everyone. (within a small margin of error)

Therefore if one can test a theory on 30 sites with meaningful differences (of course there a similarities within EVERY site, just as there are similarities with every HUMAN) then you can give a statistically significant CERTITUDE of how the theory works.

Add a dash of intuition, critical thinking, and in depth analysis of other sites with your database (if available) and Presto!, you have the test down to a very small percentage of "outliers" and no longer a "test"

btw- Have I mentioned setting up a database of a minimum 100 varied keywords SERPS (that would be 1000-2000 total sites) database recently? And how the algo becomes crystal clear after 3 months of studying it?

If you REALLY want to have fun, you then set up 3,4,5-deep sites SIMILAR to the each original 30 sites so you can conduct tests between nearly SIMILAR sites AND different sites.

(these would also be the "back ups of the back ups" i often refer to)

Marcia

1:22 pm on Dec 10, 2008 (gmt 0)

Therefore if one can test a theory on 30 sites with meaningful differences (of course there a similarities within EVERY site, just as there are similarities with every HUMAN) then you can give a statistically significant CERTITUDE of how the theory works.

Meaningful differences provide certitude of decisive factors in comparisons? Isn't that somewhat akin to comparing the Vitamin C content of oranges, lamb chops and tennis shoes?

Unless there are differences in testing that are as controlled, reliable and relevant, much as those used in clinical trials of medications, I'm afraid this thread has now officially jumped the shark.

whitenight

1:33 pm on Dec 10, 2008 (gmt 0)

Meaningful differences provide certitude of decisive factors in comparisons? Isn't that somewhat akin to comparing the Vitamin C content of oranges, lamb chops and tennis shoes?
Unless there are differences in testing that are as controlled, reliable and relevant, much as those used in clinical trials of medications, I'm afraid this thread has now officially jumped the shark.

I'm assuming my whole bit about statistically testing was ignored....

And without officially "jumping the shark" ala the Fonz, then I really suggest either re-reading what I'm wrote, doing a quick study of how statistical polling works via MOE, outliers, and scientific methodology, OR re-reading what i wrote :)

Unless you're just being ornery?
Then just ignore it.
Someone else might find it useful for setting up basic testing measurements when millions of webpages are taken into consideration.

[edited by: whitenight at 1:37 pm (utc) on Dec. 10, 2008]

Shaddows

1:37 pm on Dec 10, 2008 (gmt 0)

I think the point is that you can either know something works in a particular situation (not great as transferable knowlede, or giving general advice), or you can test it in as many different situations as possible, such as you can give it a likelihood of working in an unknown situation.

With a bit of experience, knowledge etc, you can find space between the two. So you know the liklihood in general, you know the specifics of the situation, and you can then further narrow your expecations by combining these.

As Ted mentions every now and then- the algo isnt a scorecard. Its complex. You need to have rules-of-thumb, and for that you need varied test-beds (or, personally, analysis of lots of different sites)

Marcia

3:12 pm on Dec 10, 2008 (gmt 0)

I think for evaluating the value of any testing or analysis we need to look at what constitutes the scientific method [biology.clc.uc.edu].

And as far as the value of reading patents is concerned: Does anyone think that Google files and applies for patents because of them having some relation to pragmatic (practical) reality to search algos and factors (either past, present or future), or is it because paying attorneys for filing patent applications gives them fresher breath and whiter teeth?

whitenight

3:28 pm on Dec 10, 2008 (gmt 0)

Marcia,

The "test" you would like to conduct DOES NOT EXIST.
Not in ANY scientific discipline.
Not in this universe or any parallel universes.
Nor will it ever exist.

EVER

No "test", theory, scientific experiment is EVER the same.
Aside from BASIC quantum physics which states the testOR is affecting the testED at some sub-atomic level;

Even the MOST SIMPLE of tests are done at different TIMES, with the sun, moon, weather
and a TRILLION other factors slightly changing the results.
(whether the test records those factors or not)
There is NO SUCH THING as a "perfectly controlled repeatable" test.

You can't enter the same stream twice. EVER.

Even if you could hold Time itself still. The INDIVIDUAL doing the test would have changed (physically, emotionally, mentally, etc. just from doing the test the FIRST time) and therefore the test would be DIFFERENT.

Basically, in order for concrete "proof"
You want the exact same website?
with the exact same domain name?
on the exact same IP address?
with the exact same links?
with the exact same content?
ALL tested at the exact same moment in time?

to prove it works?! :)

Good luck with that.

WAIT! There IS a way

THE ONLY WAY to do YOUR "definition" of that particular test is to TEST IT ON YOUR site. :)

Butttttt.... you only get to "test" it ONCE :(
(by your definition of the scientific method)

[edited by: tedster at 9:36 pm (utc) on Dec. 10, 2008]

Marcia

6:39 am on Dec 11, 2008 (gmt 0)

TEST

Actually, you're advising people to test; however, in all those times, you never have defined or clarified what you mean by test. All that's been mentioned is that you use 30 sites (that are all different?), but that isn't a process and it isn't a definition. It really isn't saying anything; for example, anyone who's never heard of SEO testing wouldn't have a clue what that advice means.

Marcia

2:39 pm on Dec 12, 2008 (gmt 0)

The title of this thread starts with:

Testing methodologies

Notice that it's in the plural: methodologies. Hopefully, those who contribute comments to this thread will strive to contribute their insights into methodologies in order maintain focus.

For the benefit of anyone tuning in late and being somewhat befuddled by any dissonance, I think it might help for anyone who contributes to offer definitive clarity to stay on focus.

Very simply put, there's a general process that divides testing into three simple, separate (though integrated) phases, or components:

1) Define and state the hypothesis: the theory, the supposition, and/or the purpose or intended goal.
2) Take some kind of action related to what's been hypothesized.
3) Assess and document tangible results or outcomes.

It really can be that simple; and it really can work on only one single site, for testing a specific factor.

learning how Google is actually working

That might not be the same yesterday as it is today, or will be tomorrow, but all we have to work with is what we see and know right now. :)

[edited by: Marcia at 2:42 pm (utc) on Dec. 12, 2008]

Yoshimi

2:48 pm on Dec 12, 2008 (gmt 0)

Surely you can only get a reliable result by testing on a single site if your hypothesis starts "I believe that on this site when I do x, Y will happen"

If your hypothesis starts "I believe that throughout the internet", or "I believe that Google works by" you need to test on a range of sites that are as near to representative of the internet as possible, testing on one site is like testing a new medicine on one person and claiming it a miracle cure when they don't die.

wheel

3:04 pm on Dec 12, 2008 (gmt 0)

Do people actually do anything as refined as A-B testing? I find the idea difficult. At one point I'd considered building some sort of multivariate testing tool but in the end it didn't seem worth the effort.

Seems to me that there are two types of 'tests' that people are likely to be running.

One is 'trying s*** out'. That's about all the testing I do. I hear something works, I might set up a website to try it and see what happens even if I have no plans of ever using it on my main websites. I find this educational in general terms. So I might try a network, or a site with a bunch of paid links, or 301'ing or dropped domains, stuff like that (not that I've necessarily tried any of those). Just to dabble so I know what's going on.

The second type of testing that I don't do, that I suspect the high end folks are doing isn't so much testing as deconstructing. I suspect (but don't know) that some folks deconstruct things about their competitor's sites to get ideas on what's working. Link graphs or keyword density would be the most simplistic of these things.

Not that I'm a pro, but I kinda take the attitude that while others are reviewing reports, time is better spend doing link development. Still, I must say that I find the idea of secret testing potions that would help me rank to be as attractive as the next person :).

Shaddows

4:29 pm on Dec 12, 2008 (gmt 0)

I've started deconstruction, simply because I can no longer spot trends using only my 'patch'.

I'd love to test some new ideas (I'm having a lot recently since my knowledge has been improving pretty rapidly since joining), but I don't have disposable domains, and I dont currently have time to get some launched (especially the mooted 30)

The problem with single-site testing is that you dont know if its the change you make that is having an effect, or just algo tweaks or even upstream PR adjustments.

wheel

5:01 pm on Dec 12, 2008 (gmt 0)

The problem with single-site testing is that you dont know if its the change you make that is having an effect, or just algo tweaks or even upstream PR adjustments.

Which is why I'll just generally try stuff. And if it works, then I assume generally that what I tried works.

That doesn't tell me stuff like is link A better than link B.

But here's something you could test as an example (I've not tested this). If you buy an old domain with some nice backlinks to it and repurpose it into something completely different, will it rank on the old or new terms, or neither? That tells you something in general.

BradleyT

4:10 pm on Dec 18, 2008 (gmt 0)

And those 30 test sites are 100% equal to each other in every single respect, with regard to every single relevant algo factor (without exception), including the number, relevance, percentage, ratio and age of inbound link, as well as age, update and freshness factors? Right?

I'm not as involved in testing as you all are but if the test "worked" on all 30 sites couldn't we make an assumption that none of those above factors mattered [much]?

tedster

5:38 pm on Dec 18, 2008 (gmt 0)

if the test "worked" on all 30 sites couldn't we make an assumption that none of those above factors mattered [much]?

Yes, you can make that assumption - and to a certain degree of significance (or confidence level, or margin of error, etc.) Bayesian analysis gets very strong when you analyze over a data set of some size. Unavoidable variations in startgin conditions actually become a PLUS when the final results are consistent, even though other starting factors still vary. You're tossing them away, discovering they are not in play for this particular question you are testing.

So it's a GOOD idea to have variation across the test cases, so that you don't overlook other possible causation factors.

You can have a situation where one factor was unintentionally consistent throughout the test set - it was always there. If that factor was the actual cause (or part of the cause) then its role will be hidden if it never varies. In physics, for example, the inaccuracy of Newton's model was hidden until measurement became possible on very large and very small scales.

Lorel

7:59 pm on Dec 18, 2008 (gmt 0)

Let's say you test something on a minimum 30 or so sites and it works on all 30 sites.

If you are the tester and you adjust those sites according to your test, I don't see how this can be a scientific test because you designed all those sites yourself and thus they will have similar design features or techniques or level of SEO knowledge.

It seems to me this would have to be done on 30 sites designed by 30 different people who make their own changes for testing purposes.

Simsi

10:47 pm on Dec 18, 2008 (gmt 0)

Marcia:
Everything is completely identical other than the one factor being tested, right?

I'm a litte confused. If you have identical text, Google's duplicate content algo would have a say in which one got the "nod" surely? And if you alter the text, you alter the density and probably the relevance. I don't see how you can do this accurately myself, even if you lanched 30 identical sites simultaneously on the same shared host etc.

EDIT: I guess rather than English text you could have gobbledegook and throw in the odd on-topic word maybe? And the domains would have to be meaningless too. IE: wkjkh4c.tld kjflj4f.tld etc

[edited by: tedster at 12:04 am (utc) on Dec. 19, 2008]
[edit reason] change example domain names to .tld [/edit]

whitenight

12:02 pm on Dec 19, 2008 (gmt 0)

So much confusion. So little time.

There are basically two ways to "TEST" something.

A. The Just Try It Out Method

As explained by wheel, this is the "final" step anyways. If it works, great. If it doesn't, then change it back or change the test. This is how many SEOs or webmasters with few sites do it.
It's perfectly legitimate BUT it doesn't make for giving ABSOLUTISM ADVICE on forums read by many people.

All too often, SEO's who use the "try it out" method then makes posts as FACT. When it's only FACT for THEIR site(s).

It leads to SEO's being confused about "what works"
ESPECIALLY when, like many topics, different "trusted" posters give CONTRARY information regarding their experiences about their "test"

B. Wider Statistical Testing

If one is WORRIED about the OPPOSING IDEAS they are reading on forums, work for a corporate clients, or have a highly lucrative MONEY SITE that they don't want to make "mistakes" with -
then THIS is the testing one should be engaged in FIRST.

This testing is NO DIFFERENT than "medical studies", how they figure out "99.999997% it the baby's daddy" on talk shows =), or the many, many different scientific disciplines.

One simply gets a probability of a technique working. .
One can ELIMINATE various factors being "anomalies," "site dependent," the algo "dancing," et., by this type of testing.

AND MOST IMPORTANTLY
if conducting correctly, this type of testing APPLIES to nearly ALL SITES
(NOTHING IS EVER 100% in ANYTHING, SEO or life...).

But at the end of the day, even THIS type of testing on TEST-ONLY sites, has to be RE-tested by the FIRST METHOD, by "trying it out" on the site you'd like to GET THE BENEFITS OF THE TEST.

---------------
So when I say "test it". I mean, both, or either. It does not matter until you GET THE RESULTS YOU WANT for YOUR site.

Sitting around being scared you'll screw up your site (Especially if you've already gotten a penalty of some sort),
is a good way to be out of the SERPs within a year.

If you're scared, then DO STATISTICALLY TESTING FIRST to either build confidence or rule out the theory for one's "main" site.

EVERYTHING you have ever learned about in SEO or life itself, Someone tested it first and then someone else, until enough people realized it worked to a certain degree of high probability for all sites.

Shaddows

12:37 pm on Dec 19, 2008 (gmt 0)

Great post.

Tedster posted similar on the PR Sculpting thread when it started to stall

whitenight

12:38 pm on Dec 19, 2008 (gmt 0)

Great post.
Tedster posted similar on the PR Sculpting thread when it started to stall

Thanks :)

----------------
lol too slow. Edited to add.

One of the GREATEST benefits of Statistically Testing v. "Try it Out" testing is:

That one gets a exponential increase in understanding of why something does or does not work.

For example - if something works on 56% of a statistically significant population of sites, one starts to figure out why it does NOT on the other 44%.

This leads to insights that are BEYOND Value.

It's as if once you've discovered the figurative "wheel", you already moving on to using rubber tires, while everyone else is still using wooden and stone wheels.

You're using a plane to get across the country, when everyone else is using covered wagons.

It goes without saying this understanding equals MONEY AND RANKINGS that others are simply unable to mimic as you "know" that little secret "tweak" that no one else has seen because you really only "SEE" it when doing statistically testing.

tedster

9:40 pm on Dec 20, 2008 (gmt 0)

It seems to me this would have to be done on 30 sites designed by 30 different people who make their own changes for testing purposes.

You're right - if one person creates all the sites that they test on, certain common footprints are naturally going to creep in. That creates a kind of "unconscious bias" that can lead to overlooking a factor that is actually causative, or additionally causative.

Networking and collaboration is a solid approach to testing. Even just comparing notes with others who've already taken the "just try it" approach around the same algo factors can lead to more confident conclusions.

Having a trusted collaborator who can double check your testing logic can be a big boon. In science they call it peer review, and the principle crosses over to SEO quite well.

whitenight

3:59 pm on Dec 21, 2008 (gmt 0)

That creates a kind of "unconscious bias" that can lead to overlooking a factor that is actually causative, or additionally causative

98% of the time, it's a non-factor as one is ALSO working with the same "Quantum field" effect with the site they would like to see the RESULTS with.

Remember, we aren't trying to reverse engineer the algo for the sake of building our own SE.
We are doing these tests to see HOW they would effect OUR SITES
(that are already affected by the same unconscious bias)

If the placebo, "unconscious", quantum field effect happens, it happens.

By the way, it is impossible to avoid it anyways.

If it's "unconscious," the TEST itself contains that information when other "peer review" and collaborator's unconscious minds "check" the data and are "unconsciously" affected by it.

lol, hard science has just started recognizing this in the past few decades
(although the "theory" has existed for a century and sages have alluded to it for ages).

If it becomes an excuse to NOT test, then it's a poor one.

As said above, the testOR is by definition affecting the testED.
It can be no other way.

Shaddows

10:26 am on Dec 22, 2008 (gmt 0)

98% of the time, it's a non-factor as one is ALSO working with the same "Quantum field" effect with the site they would like to see the RESULTS with.

Ah but... then you are falling towards the 'test it and see [TIAS]' camp WRT advising others. Effectively you are saying "this works withing my quantum field, thus it must work within yours". They whole point of Methodology Testing is that it produces 'stateless' results.

Of course it would be flawed logic to expand "Methodology testing retains bias" to "Methodolology testing is as biased as TIAS"

Clearly, it depends on your aims. If you want academically pure results (or claim to give advice without caveats required), group and collaboratative testing is ideal. If you are looking for something that works on multiple sites, all authored by you, you will probably be best served by your own test groups, with your own biases necessarily built in.

If you just have the one site, and feel no need to mitigate risk, TIAS coupled with good instint will probably give you the best results for the effort expended. Bang For Your Bucks, to borrow a phrase.

Crush

12:29 pm on Dec 22, 2008 (gmt 0)

heh, how google works is not hard.

1) unique content
2)nice link profile with some from juicy sites.
3) lots of links, lots of content, some relevant titles, some internal linking.

Experiments like this are OK if you have too much time on your hands but think deep down we all know what works best.

Shaddows

1:09 pm on Dec 22, 2008 (gmt 0)

Right, so you cover your three-step basics, and have a site.

Now, how do you improve? How do you puch that top site off the topspot? Break into the above-the-fold listing on a competitive phrase?

You can add more links, or you could harness the power you have. Internal navigation- breadcrumb or universal. How do you structure (or eliminate) your mega-menu, but maintain usability?

Maybe experience makes these and others no-brainers to some. But I guarentee there will be an A/B situation that you have not come across before. Your left with TIAS or testing. If you decide to test, how do you go about it- and thats the point of this thread.

Crush

1:55 pm on Dec 22, 2008 (gmt 0)

I think you guys think too much. It is basically about links and content. You are never going to work out what happens behind the scenes. I find the unexplainable situations in SEO remain unexplainable. Look at these -50 -950 threads, they go on for months with no real conclusion.

Google holds the keys to that. You can try and work out where you went wrong but it is either the content is too thin or the links are too weak. Work on unique content with little dupe and good links and you should be fine.

Shaddows

2:40 pm on Dec 22, 2008 (gmt 0)

You can try and work out where you went wrong but it is either the content is too thin or the links are too weak. Work on unique content with little dupe and good links and you should be fine.

Sure, but I don't find myself penalised that often. And as I've said before, people who are penalised have usually done something that G doesn't like without realising it. Repetitive anchor text in menus and inbounds being a huge one.

And I'm not trying to figure the algo necessarily, just get the best out of my site. And also, when I craft a new page, I want to know how to do that best. Where do I put my keywords (both in tags and within content), which attributes do I pay most attention to?

I've only begun to test, and I'll have to wait to rank on anything but obsure terms. So I analyse. And I can tell you that content and links TEND to the top of SERPS. But very annoyingly, so do thin scrapers with spammy inbounds. As do strong content sites with relatively few links.

But I digress. Take a hypothetical:
Two sites with great content and a large equal number of links. I want mine to win. Adding links will be insignificant compared to the total number of links. I CAN change my content. I CAN change presentational and semantic markup. I CAN do various SEO things. Testing/analysis tells me what to put my effort into first.

Content is king. Links are key. Navigation is important. Lets assume high-ranking webmasters know this- so how do you make the final push, reach the top- thats the goal and aim of testing.

Incidently, you can't possibly be suggesting that it is not possible to infer alorithmic rules? The fact that there is advice to be shared surely belies that idea.

[edited by: Shaddows at 3:03 pm (utc) on Dec. 22, 2008]

tedster

4:57 pm on Dec 22, 2008 (gmt 0)

One of the ways in which the try-it-on-one-site approach fails is that people try all kinds of things at basically the same time. You see such comments quite often - "I shortened my menu, dropped the keyword from 30% of the links, did a url rewrite and changed my feeder domains so they just 301 redirect."

No matter what kind of ranking changes you see after a flurry of actions like that, you still have learned almost nothing. Another kind of report you see is a sudden change in rankings but the person has no record of what significant changes they made in the past couple weeks.

I try to make significant changes one at a time, and keep a changelog of what they were. That way, whether trouble strikes or improvements come along, I have some idea what changes might have triggered the results. I also have a better idea whether Google changed something or it was me that caused the shift.

So I recommend keeping a site changelog for key changes - especially sitewide changes, server configurations and that kind of thing. It can be a lifeline.

whitenight

8:09 pm on Dec 22, 2008 (gmt 0)

I think you guys think too much. It is basically about links and content.

lol I've heard this argument twice in this thread and several times in other threads.

The above argument goes without saying and is BEST left to OTHERS who you train who do this FOR YOU. So you have more time to THINK SMARTER, not HARDER

If one is spending one's own time writing content, that's utterly useless.
And if one is spending any time on all but the MOST CRUCIAL LINKS, then that's wasted time-energy as well.

Make the world a better place... build a BUSINESS (not hobby) and use your time more wisely.
The world and your bank account will thank you for it :)

This 44 message thread spans 2 pages: 44