|Google Files a New "PR" Patent|
Lots of this "new" patent looks very familiar, but with subtle changes. First impression: something about this patent is focused on a spam fighting algorithm. There are also direct references to using real traffic data.
|The method is particularly useful in enhancing the performance of search engine results for hypermedia databases, such as the world wide web, whose documents have a large variation in quality... |
Real usage data, when available, can be used as a starting point for the model and as the distribution for the alpha factor. This can allow this ranking model to fill holes in the usage data, and provide a more accurate or comprehensive picture. Thus, although this method of ranking does not necessarily match the actual traffic, it nevertheless measures the degree of exposure a document has throughout the web.
US Patent Office Reference [patft.uspto.gov]
I'm not sure if there is really anything new here. It appears to consolidate parts of the original patent filing of the same name, with the second pagerank patent, Method for scoring documents in a linked database [patft.uspto.gov].
There are some very subtle differences in language in a few places, and the main additions that aren't in the first or second pagerank patents, are the addition of a few extra paragraphs in the "Summary of the invention" section.
But the section about "real usage data" is in the second pagerank patent, which was granted in 2004.
I'm probably showing my ignorance about the US patent system, but why is the patent not in Google's name? Who is this body it is in the name of? And if it is Google, why is Google getting government grants?
"Lawrence Page" is Larry Page, one of the Google founders. He and Sergey Brin developed PageRank while they were students at Stanford -- which is how the government grant came into the picture and how Stanford's board of trustees became the assignee.
|the section about "real usage data" is in the second pagerank patent, which was granted in 2004 |
Right you are - I wasn't reading closely enough. So maybe this is just some legal CYA going on? I plan to compare them a bit more closely soon. I feel like they have just got to be up to something, here.
|So maybe this is just some legal CYA going on? I plan to compare them a bit more closely soon. I feel like they have just got to be up to something, here. |
There are a lot of changes, but many of them, do seem to be adding stuff from the second patent to what appears in the first one - like the references cited.
The claims section in the third patent has been reduced considerably, but seems to be a melding of the claims in the first patent and the second one.
I see a number of really subtle changes. Some of them involve punctuation. Some of them are corrections where things were left out of the first patent or the second (and shouldn't have been).
Some are word switches, so that instead of "obtaining" pages, pages are "identified" instead.
The main change does seem to be the addition of a number of paragraphs in the "summary of the invention" section, that don't appear in either the first patent or the second. But I'm not sure how much those paragraphs really add. I will be looking over them again a few times to try to understand why they've been included.
I cannot really see anything significant here either, in this newest patent grant, but it does make the hair on the back of my neck stand up.
Method for node ranking in a linked database [patft.uspto.gov] Filed January 9, 1998, Granted September 4, 2001
Method for scoring documents in a linked database [patft.uspto.gov] Filed July 6, 2001, Granted September 28, 2004
Method for node ranking in a linked database [patft.uspto.gov] Filed July 2, 2001, Granted June 6, 2006
|Some are word switches, so that instead of "obtaining" pages, pages are "identified" instead. |
As I mentioned in another thread, this is a BIG deal.
As the BD infrastructure has shown us, Google HAS CHANGED THEIR MISSION STATEMENT.
They are no longer "organizing the world's information and make it universally accessible and useful"
They are "organizing SOME of the world's information and making SOME of it accessible according to guidelines that often have little to do with the 'usefulness' of that information"
I'm surprised more people haven't picked up on this.
And the issuing of a new patent also indicates a CHANGE in direction of what Google is doing.
They aren't simply doing this like a college paper that needs re-editing.
There's a REASON behind why they issued the new patent.
Edited to add. There's a big difference between:
Organizing and ranking ALL the world's information accordingly
Organizing and ranking MOST of the world's information because collecting ALL of it is just too costly and/or not profitable.
Do not know if this is significant but the dates of filing mean the 3rd patent (by date granted) was really the 2nd (by date filed). 2nd and 3rd were filed within 4 days of each other but granted 1 3/4 years apart and both were filed before the 1st was granted.
That is normal for patents, sometimes they go right through, other times they take time. The first patent always takes time because the govt has to research a bit and make sure they are not granting double patents. The second and third patent you can look at as changes or add ons. They do not take as long to pass through.
|and how Stanford's board of trustees became the assignee. |
That explains it. So if someone infringed the patent it would be down to Stanford's board of trustees to sue, not Google?
>>Some are word switches, so that instead of "obtaining" pages, pages are "identified" instead.
There's a big difference between identifying the presence and/or existence of pages and "obtaining" the pages for inclusion in the index.
Bill, in addition to that first patent you referenced (the older LP authored one), which I've read a few times over the months not for PR reasons but for identifying link_value_related factors in other respects, there's a thing or two mentioned that ring a bell and otherwise relate. It isn't impossible that we're seeing some of what's mentioned in the VIPS/Block Level Link Analysis papers, even though they're MSN, the common denominator being location.
It strikes me that, if sites were ranked according to traffic received, then the world's biggest brands would all rise to the top of the search engines for their natural search terms.
Whilst this would be logical, it would also mean that those companies could stop spending large sums of money on PPC advertising.
Now if I were CEO of Google, I think I would leave the traffic-driven ranking club in my bag!
Since this is Stanford's patent instead of Google's, I'm not sure how much incentive there is to build too much new into it.
But the issue that you raise is a good one. If you haven't looked at this patent application assigned to Google before, you might see some ideas in it that may be of interest:
Methods and systems for determining a meaning of a document to match the document to content [appft1.uspto.gov]
Here's a snippet from that patent application:
|If the source meaning of the web page has not been determined, the preprocessor 134 first identifies concepts contained in the web page and regions contained in the web page. For example, the preprocessor may determine that the web page has four regions corresponding to the title region, the story region, the banner ad region and the links region and that the web page contains concepts relating to salmon, fly fishing, Washington, automobiles, news, weather, and sports. The regions do not necessarily correspond to frames on a web page. The meaning engine then determines local concepts for each region and ranks all of the local concepts. A variety of weighing factors can be used to rank the concepts, such as, the importance of the region, the importance of the concept, the frequency of the concept, the number of regions the concept appears in, and the breadth of the concept, for example. |
While the inventors are two who came over during the Applied Semantics acquisition, and the concepts collected are most likely used to derive context for advertisements, it does show an ability on Google's part to break a page down into sections, and understand what links and text on pages are related to different parts of a page.
I probably should have added a snippet to this section of the patent application I referenced since it's a little more illustrative of what actually happens when they try to segment parts of a page:
|Regions of the document can be determined, for example, based on certain heuristics, including formatting information. For example, for a source document that is a web page that comprises HTML labels, the labels can be used to aid in identifying regions. For example, text within <title> . . . </title>tags can be marked as text in a title region. Text in a paragraph where more than seventy percent of the text is within tags <a> . . . </a>can be marked as in a link region. The structure of the text can also be used to aid in identifying the regions. For example, text in short paragraphs or columns in a table, without the structure of a sentence, such as, for example, without a verb, too few words, or no punctuation to end the sentence, can be marked as being in a list region. Text in long sentences, with verbs and punctuation, can be marked as part of a text region. When the type of region changes, a new region can be created starting with the text marked with the new type. In one embodiment, if a text region gets more than twenty percent of the document, it can be broken in smaller pieces. |
The Microsoft paper, Block-level Link Analysis [research.microsoft.com], relies upon DOM structure, as shown on this page from Microsoft:
VIPS: a VIsion based Page Segmentation Algorithm [ews.uiuc.edu]
Would Google use this type of segmentation to value some links over others? The idea is out there, from Microsoft's work. And they have some experience attempting to do something similar.
I don't think that Google is up to anything in particular (in regards to what is in this patent, that is) at this time.
All that has happened, is that something that was written five years ago has finally made it through n layers of beaurocracy and eventually been published.
|Text in long sentences, with verbs and punctuation |
in very clear terms, what algorithms are configured to love. simple and powerful.
nothing new, but still a good reminder . . .
Another patent from Google but there is not much new in this one.
I guess that is because it was written just about five years ago.
All that has happened now is that it is officially published.