New Google Patent Details Many Google Techniques - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

New Google Patent Details Many Google Techniques

«
1
2
3
4
5
6
7
»

msgraph

3:47 pm on Mar 31, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Probably one of the best bits of information released by them in a patent.

Large number of inventors listed on here, even Matt Cutts that guy that attends those SE conferences. Explains a bit about what is already known through experience as well as comments made by search engine representatives.

Example:

[0039] Consider the example of a document with an inception date of yesterday that is referenced by 10 back links. This document may be scored higher by search engine 125 than a document with an inception date of 10 years ago that is referenced by 100 back links because the rate of link growth for the former is relatively higher than the latter. While a spiky rate of growth in the number of back links may be a factor used by search engine 125 to score documents, it may also signal an attempt to spam search engine 125. Accordingly, in this situation, search engine 125 may actually lower the score of a document(s) to reduce the effect of spamming.

USPTO version [appft1.uspto.gov]

< Note: the USPTO has at times either moved or removed this
patent. If that happens again, here's an online back-up copy:
Information retrieval based on historical data [webmasterwoman.com]>

[edited by: tedster at 3:04 am (utc) on April 10, 2008]

MHes

12:12 pm on Apr 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>What isn't in the patent... a method of discerning niche authority

Would this need to be in? Once a set of results for a keyword/phrase has been found, the 'authority' sites could then be identified, resulting in the 'niche authority' for that search.

rich42

12:49 pm on Apr 2, 2005 (gmt 0)

10+ Year Member

The big story is the suggestion that high CTR in adwords can influence search engine ranking. Am I dreaming, or weren't there some "over my dead body" posts from google representatives concerning linkage between adwords and search engine ranking?

I've seen claims of that nature by various google entities - and I'm still prone to believe them.

just because they've patented it doesn't mean they're doing it.

I also suspect there's a lot of stuff they -are- doing that's not in that doc.

SlyOldDog

4:18 pm on Apr 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I think that was Adsense CTR if you use it on your site. Not Adwords on Google's pages.

SlyOldDog

4:20 pm on Apr 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

The entry about spikiness gives it away. They do not say how they will differentiate the spam form the genuine article.

There would seem little point in patenting the method then.

mrMister

6:18 pm on Apr 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

So Google is basing SERP rankings on the Toolbar?

I think the patent relates more to the Google browser/OS/Application that they're rumoured to be producing.

I think the patent describes certain features of a search application as well as a search engine.

If Google had a browser of their own, most of this user tracking would be very simple.

henry0

11:41 pm on Apr 2, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I hate the idea of registering for 1oo years!

Any thoughts about those that cannot spend $100 ++ only for registering.

Plus sometimes when you try an idea do you register always for many years?
I always experiment and give myself a year or two before reaching a final decision.
So a good trial might be thrown away from the SE due to not registering for many years ahead of time
That�s rubbish.

roadhazard

11:51 pm on Apr 2, 2005 (gmt 0)

10+ Year Member

Important for us AMs, Google in 2003 became affiliated with Amazon.com

"The multiyear agreement will make Google's search technology and targeted sponsored links available on Amazon.com within the next several months. In fact, sponsored links are already available on a selection of Amazon.com Web pages."

I have a lot of Amazon links on my affiliate gifts site, but you can believe it, I am adding a TON more.

I contacted my host about paying for several years for my domain name.

Just when I was thinking my AM site would never "make it"...maybe it will now...

Vadim

12:22 am on Apr 3, 2005 (gmt 0)

10+ Year Member

I contacted my host about paying for several years for my domain name.

I would be cautious. It may give the opposite effect. We first should figure out *how* Google will use its claims (if it will).

For example, there is no much sense for a regular site to register domain for many years in advance. However, it has sense for the domain for sale, if the domain had easy remembered and good name. It is clear sign that you should pay to get it.

So Google might begin to consider your 100 year in advance registered site as the site with the content that is going to vanish very soon.

As far as I can understand from the Google recommendation to webmasters any optimization that the user cannot benefit from is considered as spam.

Vadim.

antonaf

1:48 am on Apr 3, 2005 (gmt 0)

10+ Year Member

The truth is if webmasters concentrate on what benefits and better targets the consumers/visitors then you will rank fair. Google knows this and this confirms that. Of course you need to create sites for robots as well as visitors but not trickery or focus/emphasis on the SE, concentration should be on the visitors.

But of course SEO isn't about ranking...its about ranking fast, because anyone can rank the slow and easy way. ;)

danny

3:28 am on Apr 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

This is getting ridiculous. I'm going to keep on doing what I've done for ten years - make my web site as easy to use as possible - and let Google do whatever it does.

Otherwise I'll be running around in circles like a headless chook.

Reid

4:10 am on Apr 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I agree Danny. Make a good website and it will become popular. SEO should only be in the back of your mind while making a good website is in the front of your mind.

rogoff

5:51 pm on Apr 3, 2005 (gmt 0)

10+ Year Member

... and that's probably exactly the effect they were hoping this patent application would have ;)

Lorel

8:41 pm on Apr 3, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

google toolbar with page rank option turned on is enough for G to know how long you spend there and if you go to Amazon from there.
They have got all the bases covered.

Not quite. The google tool bar has not been designed for the Mac yet and that leaves out a lot of web designers and particularly graphic designers.

Re the release date just before April Fools--isn't it the patent office that controls when info is posted and not the applicant?

sunzon

11:42 pm on Apr 3, 2005 (gmt 0)

10+ Year Member

Consider the alexa toolbar using sample data to determine traffic rank for a website.
Check out WebmasterWorld: alexa.com/data/details/main?q=&url=http://www.webmasterworld.com or your own site.
Consider also that alexa is amazon.

Leosghost

5:57 am on Apr 4, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Biggest surprise in all this is how this

That must be why 'br*tney sp*ars n*de' pages are all over the first page for many terms that have nothing to do with her or nudity.

<altered in the name of "good taste">
got past Brett's "snip fingers" or the "auto censor"..

Which knobs and buttons they push has always depended on variables in your site and it's category ..so what's new ?

example :Hidden text is the easiest thing in the world to algo out ..but they only do it ( penalize )sometimes ...we all know pages where its there ..so you'll just drive yourselves crazy trying to work out at what point the wires are tripped for any or none of this stuff ..

Game was always on ..

And anyones toolbar was always spyware ...

marin

9:13 am on Apr 4, 2005 (gmt 0)

10+ Year Member

For instance, search engine 125 may monitor whether physically correct address information exists over a period of time, whether contact information for the domain changes relatively often, whether there is a relatively high number of changes between different name servers and hosting companies, etc

Four months ago we changed our company's name; two weeks ago our hosting company was bought by a big player, so our DNS will change soon.
Should our website be punished for these reasons?

0088] According to an implementation consistent with the principles of the invention, information relating to traffic associated with a document over time may be used to generate (or alter) a score associated with the document. For example, search engine 125 may monitor the time-varying characteristics of traffic to, or other "use" of, a document by one or more users. A large reduction in traffic may indicate that a document may be stale (e.g., no longer be updated or may be superseded by another document).

Could this explain the recent achievment of URCHIN?

elguiri

10:08 am on Apr 4, 2005 (gmt 0)

10+ Year Member

Top Contributors Of The Month

Having spent the weekend re-reading and reflecting, I return to the point someone made earlier (can't find it to reference it - I know it's there somewhere): the processing requirements make these claims near impossible to implement wholly.

The conclusion from reading the original Hilltop document was that you would need a two-step process to rank the pages. Doing that on Google's 8 yards of a document universe would be impossible. This is true not least because G's response time is possibly it's most competitive factor.

That is no less true with these claims. While many relate to static valuations of a document - such as inception or discovery date - and could be built into to a constant in the ranking algo, others are query-dependant (like the age of anchor text, where query terms are included) and would need to be calculated on the hop. Even with massive improvements in processing power, that would be impossible without a decrease in performance.

So if these claims are being implemented to any degree, it's probably over a small subset of "money terms".

That's probably little comfort to many of the people that participate in these pages. Nonetheless, given that many of these claims are either a) conflicting with other search objectives or b) mad, it shouldn't be difficult to extract a short workable list points to be drawn from the patent, such as:

1. watch for linking "spikiness" (what a word!). Build links constantly and steadily. Plan long-term.
2. determine whether your target phrases are better served by stale pages or new pages (how? testing?), then act accordingly.
3. use hosting and nameservers perceived as quality.
4. bookmark your pages
5. hide affiliate links that are perceived as poor quality. Show those perceived as quality. (I imagine we're talking about more than just Amazon)
6. If you use the G toolbar or allow cookies, be aware that you're being watched (we knew that anyway!)

claus

11:40 am on Apr 4, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>> processing requirements

Not all processing needs to be done on-the-fly, some can be prepared beforehand. Your example with a table of the different variations of anchor text could easily be prepared as a batch job.

elguiri

12:38 pm on Apr 4, 2005 (gmt 0)

10+ Year Member

Top Contributors Of The Month

Your example with a table of the different variations of anchor text could easily be prepared as a batch job.

But for what % of search terms? And for terms outside the top, say, 1%, ordered by commercial value, how often could you run the batch?

We're talking about over 8 billion documents. And I've heard the statistic of 50% of search terms being unique. (which, curiously enough, are the ones I target ;) )

SteveJohnston

4:51 pm on Apr 4, 2005 (gmt 0)

10+ Year Member

Thanks SlyOldDog, Elguiri and Claus, for the most incisive comments and conclusions.

I particularly liked your five types or rating/ranking theory Claus.

I may try and struggle through the whole patent, but feel much less need now.

Thanks people.

Steve

claus

5:25 pm on Apr 4, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

>> But for what % of search terms?

I did not mean search terms. For 100% of anchor text terms. It could run.. well, i don't know... perhaps every time any page was indexed: If link found, then add anchor text + URL to the table.

If you then wanted to do something at search time, then it will be a simple lookup in a table, no expensive calculation needed.

>> 8 billion documents

I personally suspect that at least 10% of those are not documents as such. Still, even for the part that are real documents - it's a lot of documents. But then again, Google has got a whole lot of capacity, and they increase it continually.

ownerrim

8:21 pm on Apr 5, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

re:"What bothers me:
1. You can buy sitewide links with totally unrelated terms and your competitor is toast. Someone could it to you. If you read this, links and anchor text are still the king, either directly or inderectly they make about 90% of the ranking. They can make or break you or your competitor. The prices will defintely go down now, so it's not expensive to nuke your competitor."

Are you saying that sitewide links are advisable, or inadvisable? I'm just wondering because I've seen a couple sites go to the top of the MSN serps using site-wide text links purchased from other sites. These same sites don't do well at all in google and yahoo though, which makes me wonder if google and yahoo can counteract this attempt to artificially boost position (game the system) a lot better than msn can.

Webmeister

2:14 pm on Apr 6, 2005 (gmt 0)

10+ Year Member

Google has always inferred that there is no way your competitors can harm your ranking. If getting too many links to quick can hurt someone's ranking, then we could all add our competitors to every page in several 5000-page websites and squash our competitors off of the search engines.

It will never happen. I honestly believe that most of the verbiage in this patent is old data that Google has been writing for years. Their way around too many links too quick is called "the sandbox".

digitalje5u5

4:06 pm on Apr 6, 2005 (gmt 0)

What is the patent number?

ronburk

3:31 pm on Apr 7, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Many (most?) of the posts in this thread are implicitly from the view that Google has "a" algorithm, that human beings are putting individual variables into a complex function and then trying to tune those parameters to get good SERPs.

You can see this implicit point of view in everything from comments like "that's too many variables -- this just must be a laundry list of things they might like" to the more refined "too many control variables leads to instability, so I doubt this is worth reading".

However, the Google founders are coming from a data mining background, and while there almost certainly is some formulaic aspect to how Google calculates SERPs, it is increasingly unfruitful to try to understand Google's SERPs without understanding data mining.

From the point of view of data mining, there is no great problem adding more variables (certainly up to hundreds) so long as you have computing power (which Google can manage). From the point of view of data mining, there is no "algorithm", at least not in the sense that any human being understands how a particular SERP ranking was calculated.

It's better to think of data mining as a giant black box. You pour in variables you think might be relevant in the top, you give the machine a sample set of pages and how they *should* be ranked, then you let that baby grind away. Data mining machines can learn how to find incredibly complex associations all on their own -- they just need the horsepower to grind away. In this case, the associations are between an ever-growing number of variables that Google engineers can supply, and a sort order that a real human agrees makes sense. IOW, this is more a process of "training" than "tuning", which is why there's no particular problem of instability that results from throwing in a few dozen more variables.

Just as a computer can beat you at chess by brute-force trying every possible move and exploring its implications (though, in practice, they take some shortcuts to shrink the solution space to a manageable size), data mining can devise a very good and complex algorithm that produces the desired results by brute-force trying all combinations of the input variables to see what works (though, in practice, they take some shortcuts to shrink the solution space to a manageable size).

Once the input variable list includes things like "does this page contain words about topic X" (e.g., real estate, travel, sex, etc.), the resultant behavior looks nothing like a simple formula, and can easily explain all the imagined cases of Google engineers personally tweaking knobs or manually altering behavior for specific types of websites or topic areas.

Can you still game Google's algorithm? Sure. But it's a good bet it will continue (as it already has) getting harder and harder to game, requiring constant attention. As Google gets better and better at recognizing good content, the dreaded alternative of simply incrementally building a website with good content over time becomes more attractive.

In the past, it was probably possible to actually use data mining yourself to get pretty close to Google's algorithm, particularly for small sets of keywords. However, as Google incorporates more variables that only they can calculate (such as historical analysis of page rank -- you probably haven't been storing a copy of the top 500 websites for each of your favorite keywords for the last 5 years that you can analyze), that too becomes more difficult.

In general, people that think Google is incredibly smart have underestimated how easily their algorithm can be gamed. But we're getting to the tipping point now where more people who work at gaming Google are underestimating how difficult they can make it, and this is largely a lack of understanding of how data mining can make a hugely complex formula easy to construct and manage (without any human being ever having to understand that formula). You can track this sea change by graphing the percentage of SEO posts of the form "but I did all the stuff I'm supposed to and still don't rank well".

elguiri

4:27 pm on Apr 7, 2005 (gmt 0)

10+ Year Member

Top Contributors Of The Month

Nice post.

What Google can do with pixels, SEO games with a broad brush and a tin of vinyl matt.

MHes

10:31 am on Apr 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Great post ronburk

Its nice to see a well written and new spin on things. I always consider that Google's algo was simpler than we sometimes think but this explains how baffling it can be.

neuron

12:13 pm on Apr 8, 2005 (gmt 0)

10+ Year Member

Most excellent post, ronburk.

IMHO google went black box with the Florida Update, nearly 17 months ago, [black box: A device or theoretical construct with known or specified performance characteristics but unknown or unspecified constituents and means of operation], and as such is impossible to reverse engineer.

The way I see them making the ideas in this new patent work is that they would take a set of known non-spammy sites and a set of spammy sites and for each of the parameters listed in this patent application, do some preliminary statistical analysis of the sites for some tail chopping and/or for tagging. Then they would run what�s left, with a high-degree of manual review, to get a good training set, and then it would be off to the number crunching races.

We do a lot of note comparisons here at Webmaster World. While I�ve learned a great deal here there is also a lot of confusion. I have sites in several industries, but they are not representative of all sites, nor of the web as a whole. Many others here have a site or two in some special niche, while others occupy virtually every industry that�s in the money. We all view the feedback we get here through the eyes of our site(s), trying to put forth generic information and glean what we can from others, but nearly always as to how different things effect our own sites. I see many people say things like �content is king, all you need is the best content in your niche and people will link to you�. That may be true for your particular niche, but applying it to all sites just does not extrapolate. Other sites can benefit from a large amount of SE spam, while the same amount of spam on a different site might get it banned immediately.

What it all comes down to is that google is getting better and will continue to get better at detecting spam and removing those sites that violate it�s guidelines from the SERPs, by hook or crook. This patent application gives us a pretty good idea of what they are looking at or intend to look at to do this. Those industries that are the most spammy will likely be the hardest hit. Other industries may be little effected, which will only add to the confusion.

The nicest thing I got out of reading the application is that google is still all about linking.

claus

7:10 pm on Apr 8, 2005 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

As outlined in msg #96 i think that there are query specific issues ("ranking") and non query specific issues ("rating").

There's some kind of limit to how many ranking parameters you would like to have at query time, but there's no limit to rating parameters as these can be calculated any time.

(just another way of looking at "data mining", essentially)

yycowns

12:02 am on Apr 12, 2005 (gmt 0)

10+ Year Member

Regarding "Stickiness". How will G measure this? Crawling logs or?

This 189 message thread spans 7 pages: 189

«
1
2
3
4
5
6
7
»