Welcome to WebmasterWorld Guest from 188.8.131.52
Large number of inventors listed on here, even Matt Cutts that guy that attends those SE conferences. Explains a bit about what is already known through experience as well as comments made by search engine representatives.
 Consider the example of a document with an inception date of yesterday that is referenced by 10 back links. This document may be scored higher by search engine 125 than a document with an inception date of 10 years ago that is referenced by 100 back links because the rate of link growth for the former is relatively higher than the latter. While a spiky rate of growth in the number of back links may be a factor used by search engine 125 to score documents, it may also signal an attempt to spam search engine 125. Accordingly, in this situation, search engine 125 may actually lower the score of a document(s) to reduce the effect of spamming.
USPTO version [appft1.uspto.gov]
< Note: the USPTO has at times either moved or removed this
patent. If that happens again, here's an online back-up copy:
Information retrieval based on historical data [webmasterwoman.com]>
[edited by: tedster at 3:04 am (utc) on April 10, 2008]
Would this need to be in? Once a set of results for a keyword/phrase has been found, the 'authority' sites could then be identified, resulting in the 'niche authority' for that search.
The big story is the suggestion that high CTR in adwords can influence search engine ranking. Am I dreaming, or weren't there some "over my dead body" posts from google representatives concerning linkage between adwords and search engine ranking?
I've seen claims of that nature by various google entities - and I'm still prone to believe them.
just because they've patented it doesn't mean they're doing it.
I also suspect there's a lot of stuff they -are- doing that's not in that doc.
So Google is basing SERP rankings on the Toolbar?
I think the patent relates more to the Google browser/OS/Application that they're rumoured to be producing.
I think the patent describes certain features of a search application as well as a search engine.
If Google had a browser of their own, most of this user tracking would be very simple.
Any thoughts about those that cannot spend $100 ++ only for registering.
Plus sometimes when you try an idea do you register always for many years?
I always experiment and give myself a year or two before reaching a final decision.
So a good trial might be thrown away from the SE due to not registering for many years ahead of time
"The multiyear agreement will make Google's search technology and targeted sponsored links available on Amazon.com within the next several months. In fact, sponsored links are already available on a selection of Amazon.com Web pages."
I have a lot of Amazon links on my affiliate gifts site, but you can believe it, I am adding a TON more.
I contacted my host about paying for several years for my domain name.
Just when I was thinking my AM site would never "make it"...maybe it will now...
I contacted my host about paying for several years for my domain name.
I would be cautious. It may give the opposite effect. We first should figure out *how* Google will use its claims (if it will).
For example, there is no much sense for a regular site to register domain for many years in advance. However, it has sense for the domain for sale, if the domain had easy remembered and good name. It is clear sign that you should pay to get it.
So Google might begin to consider your 100 year in advance registered site as the site with the content that is going to vanish very soon.
As far as I can understand from the Google recommendation to webmasters any optimization that the user cannot benefit from is considered as spam.
But of course SEO isn't about ranking...its about ranking fast, because anyone can rank the slow and easy way. ;)
google toolbar with page rank option turned on is enough for G to know how long you spend there and if you go to Amazon from there.
They have got all the bases covered.
Not quite. The google tool bar has not been designed for the Mac yet and that leaves out a lot of web designers and particularly graphic designers.
Re the release date just before April Fools--isn't it the patent office that controls when info is posted and not the applicant?
That must be why 'br*tney sp*ars n*de' pages are all over the first page for many terms that have nothing to do with her or nudity.<altered in the name of "good taste">
Which knobs and buttons they push has always depended on variables in your site and it's category ..so what's new ?
example :Hidden text is the easiest thing in the world to algo out ..but they only do it ( penalize )sometimes ...we all know pages where its there ..so you'll just drive yourselves crazy trying to work out at what point the wires are tripped for any or none of this stuff ..
Game was always on ..
And anyones toolbar was always spyware ...
For instance, search engine 125 may monitor whether physically correct address information exists over a period of time, whether contact information for the domain changes relatively often, whether there is a relatively high number of changes between different name servers and hosting companies, etc
Four months ago we changed our company's name; two weeks ago our hosting company was bought by a big player, so our DNS will change soon.
Should our website be punished for these reasons?
0088] According to an implementation consistent with the principles of the invention, information relating to traffic associated with a document over time may be used to generate (or alter) a score associated with the document. For example, search engine 125 may monitor the time-varying characteristics of traffic to, or other "use" of, a document by one or more users. A large reduction in traffic may indicate that a document may be stale (e.g., no longer be updated or may be superseded by another document).
Could this explain the recent achievment of URCHIN?
The conclusion from reading the original Hilltop document was that you would need a two-step process to rank the pages. Doing that on Google's 8 yards of a document universe would be impossible. This is true not least because G's response time is possibly it's most competitive factor.
That is no less true with these claims. While many relate to static valuations of a document - such as inception or discovery date - and could be built into to a constant in the ranking algo, others are query-dependant (like the age of anchor text, where query terms are included) and would need to be calculated on the hop. Even with massive improvements in processing power, that would be impossible without a decrease in performance.
So if these claims are being implemented to any degree, it's probably over a small subset of "money terms".
That's probably little comfort to many of the people that participate in these pages. Nonetheless, given that many of these claims are either a) conflicting with other search objectives or b) mad, it shouldn't be difficult to extract a short workable list points to be drawn from the patent, such as:
1. watch for linking "spikiness" (what a word!). Build links constantly and steadily. Plan long-term.
2. determine whether your target phrases are better served by stale pages or new pages (how? testing?), then act accordingly.
3. use hosting and nameservers perceived as quality.
4. bookmark your pages
5. hide affiliate links that are perceived as poor quality. Show those perceived as quality. (I imagine we're talking about more than just Amazon)
6. If you use the G toolbar or allow cookies, be aware that you're being watched (we knew that anyway!)
Your example with a table of the different variations of anchor text could easily be prepared as a batch job.
But for what % of search terms? And for terms outside the top, say, 1%, ordered by commercial value, how often could you run the batch?
We're talking about over 8 billion documents. And I've heard the statistic of 50% of search terms being unique. (which, curiously enough, are the ones I target ;) )
I particularly liked your five types or rating/ranking theory Claus.
I may try and struggle through the whole patent, but feel much less need now.
I did not mean search terms. For 100% of anchor text terms. It could run.. well, i don't know... perhaps every time any page was indexed: If link found, then add anchor text + URL to the table.
If you then wanted to do something at search time, then it will be a simple lookup in a table, no expensive calculation needed.
>> 8 billion documents
I personally suspect that at least 10% of those are not documents as such. Still, even for the part that are real documents - it's a lot of documents. But then again, Google has got a whole lot of capacity, and they increase it continually.
Are you saying that sitewide links are advisable, or inadvisable? I'm just wondering because I've seen a couple sites go to the top of the MSN serps using site-wide text links purchased from other sites. These same sites don't do well at all in google and yahoo though, which makes me wonder if google and yahoo can counteract this attempt to artificially boost position (game the system) a lot better than msn can.
It will never happen. I honestly believe that most of the verbiage in this patent is old data that Google has been writing for years. Their way around too many links too quick is called "the sandbox".
You can see this implicit point of view in everything from comments like "that's too many variables -- this just must be a laundry list of things they might like" to the more refined "too many control variables leads to instability, so I doubt this is worth reading".
However, the Google founders are coming from a data mining background, and while there almost certainly is some formulaic aspect to how Google calculates SERPs, it is increasingly unfruitful to try to understand Google's SERPs without understanding data mining.
From the point of view of data mining, there is no great problem adding more variables (certainly up to hundreds) so long as you have computing power (which Google can manage). From the point of view of data mining, there is no "algorithm", at least not in the sense that any human being understands how a particular SERP ranking was calculated.
It's better to think of data mining as a giant black box. You pour in variables you think might be relevant in the top, you give the machine a sample set of pages and how they *should* be ranked, then you let that baby grind away. Data mining machines can learn how to find incredibly complex associations all on their own -- they just need the horsepower to grind away. In this case, the associations are between an ever-growing number of variables that Google engineers can supply, and a sort order that a real human agrees makes sense. IOW, this is more a process of "training" than "tuning", which is why there's no particular problem of instability that results from throwing in a few dozen more variables.
Just as a computer can beat you at chess by brute-force trying every possible move and exploring its implications (though, in practice, they take some shortcuts to shrink the solution space to a manageable size), data mining can devise a very good and complex algorithm that produces the desired results by brute-force trying all combinations of the input variables to see what works (though, in practice, they take some shortcuts to shrink the solution space to a manageable size).
Once the input variable list includes things like "does this page contain words about topic X" (e.g., real estate, travel, sex, etc.), the resultant behavior looks nothing like a simple formula, and can easily explain all the imagined cases of Google engineers personally tweaking knobs or manually altering behavior for specific types of websites or topic areas.
Can you still game Google's algorithm? Sure. But it's a good bet it will continue (as it already has) getting harder and harder to game, requiring constant attention. As Google gets better and better at recognizing good content, the dreaded alternative of simply incrementally building a website with good content over time becomes more attractive.
In the past, it was probably possible to actually use data mining yourself to get pretty close to Google's algorithm, particularly for small sets of keywords. However, as Google incorporates more variables that only they can calculate (such as historical analysis of page rank -- you probably haven't been storing a copy of the top 500 websites for each of your favorite keywords for the last 5 years that you can analyze), that too becomes more difficult.
In general, people that think Google is incredibly smart have underestimated how easily their algorithm can be gamed. But we're getting to the tipping point now where more people who work at gaming Google are underestimating how difficult they can make it, and this is largely a lack of understanding of how data mining can make a hugely complex formula easy to construct and manage (without any human being ever having to understand that formula). You can track this sea change by graphing the percentage of SEO posts of the form "but I did all the stuff I'm supposed to and still don't rank well".
IMHO google went black box with the Florida Update, nearly 17 months ago, [black box: A device or theoretical construct with known or specified performance characteristics but unknown or unspecified constituents and means of operation], and as such is impossible to reverse engineer.
The way I see them making the ideas in this new patent work is that they would take a set of known non-spammy sites and a set of spammy sites and for each of the parameters listed in this patent application, do some preliminary statistical analysis of the sites for some tail chopping and/or for tagging. Then they would run whatís left, with a high-degree of manual review, to get a good training set, and then it would be off to the number crunching races.
We do a lot of note comparisons here at Webmaster World. While Iíve learned a great deal here there is also a lot of confusion. I have sites in several industries, but they are not representative of all sites, nor of the web as a whole. Many others here have a site or two in some special niche, while others occupy virtually every industry thatís in the money. We all view the feedback we get here through the eyes of our site(s), trying to put forth generic information and glean what we can from others, but nearly always as to how different things effect our own sites. I see many people say things like ďcontent is king, all you need is the best content in your niche and people will link to youĒ. That may be true for your particular niche, but applying it to all sites just does not extrapolate. Other sites can benefit from a large amount of SE spam, while the same amount of spam on a different site might get it banned immediately.
What it all comes down to is that google is getting better and will continue to get better at detecting spam and removing those sites that violate itís guidelines from the SERPs, by hook or crook. This patent application gives us a pretty good idea of what they are looking at or intend to look at to do this. Those industries that are the most spammy will likely be the hardest hit. Other industries may be little effected, which will only add to the confusion.
The nicest thing I got out of reading the application is that google is still all about linking.
There's some kind of limit to how many ranking parameters you would like to have at query time, but there's no limit to rating parameters as these can be calculated any time.
(just another way of looking at "data mining", essentially)