homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 42 message thread spans 2 pages: 42 ( [1] 2 > >     
Four new Google patents - April 2007
Traffic, Query Analysis, Link-Based Criteria, and Document Inception Date

 7:30 pm on Apr 26, 2007 (gmt 0)

I just learned about 4 new Google Search technology patents. I will be studying and posting about them -- feel free to get there ahead of me, I'm not sure how soon I'll have the time to dig in. That first one looks particularly interesting, doesn't it?

1. Document Scoring Based on Traffic Associated with a Document [appft1.uspto.gov] [April 19, 2007 - Steve Lawrence]

2. Document Scoring Based on Query Analysis [appft1.uspto.gov] [April 19, 2007 - Jeffery Dean]

3. Document Scoring Based on Link-Based Criteria [appft1.uspto.gov] [April 26, 2007 - Anurag Acharya]

4. Document Scoring Based on Document Inception Date [appft1.uspto.gov] [April 26, 2007 - Matt Cutts]



 7:37 pm on Apr 26, 2007 (gmt 0)

I've long believed - and AFAIK, proven to myself in various ways - that the so-called sandbox involves aging of both the document and inbound links...so I don't personally find that to be newsworthy, although it's always nice to see something that seems to support work done.

But I agree tedster, the traffic one is most interesting. Looking more at that one right now.


 7:57 pm on Apr 26, 2007 (gmt 0)

Most of these criteria are mentioned in other patents - so the fact that these particular factors are involved is not going to be news. But the way in which the factors are being scored probably is.

The 2005 History and Age Data [webmasterworld.com] patent uncorked a lot of date-related stuff, link aging etc. So there's probably some juicy new stuff in these.


 9:23 pm on Apr 26, 2007 (gmt 0)

Just went through the first one. A mouthful as usual. Much of it variation or repetition of already published stuff.

One thing that is interesting to me is how they are determining what constitutes an advertisement, which is not specified. In any event, it was intriguing to read, since just thist week we declined to run ads on one of our sites because we felt the advertiser was sub-par. Dunno if what's in this doc is or will go into actual practice, but it sorta supported the choice we made, to the extent that we made the choice honestly in part out of fear. Hate leaving that money on the table. But G seems to have concluded that if all your advertisers are low quality sites, then perhaps your site is as well.

Also, for those who actually read the doc, I personally would not assume that just because they note Amazon as a quality advertiser whose ads might appear on your site (perhaps even to your site's benefit), it does not mean that G's algo likes seeing sites where the majority of links out are to Amazon, or any other quality advertiser/affiliate merchant. Just a caution. Not a knock. :P

Another thing that intrigues me here is that this has the potential to further aggravate what I see as an already spiraling problem. Namely that webmasters have become so nervous about linking to smaller sites, or so often link to sites like wikipedia, to prove that they themselves are quality, that it's becoming impossible for small gem sites to rank. Everybody's afraid.

And now people are going to start accepting ads based in part on the same logic (as my own story above implies). G's compulsive focus on only links to and from big trusted authority sites, even now WRT advertisements, is very, very bad for the Web, and has already led us to a place where WAY too many listings high in the SERP's include wikipedia and a small number of other overly trusted, overly valued sites.


 9:28 pm on Apr 26, 2007 (gmt 0)

the traffic one is most interesting. Looking more at that one right now.

When I first read the title of the traffic one, I wanted to read it thoroughly because of the potential SEO aspects.

But after diving into it it looks like the patent is all about advertising and not any organic SEO. So, looks like the "traffic" part has to do with PPC, which makes sense: show ads from advertisers that have more traffic (so Google can make more money).


 9:38 pm on Apr 26, 2007 (gmt 0)

That is what I thought at first, when looking at the top of the doc, but upon reading it fully, it seems to be more about using regular ads on a site as another factor in site ranking, and assigning qualitative measures to the adverstisers based on traffic and other available info about the advertisers. That's my read anyway.

And that is only one small part of the full doc, even though it's the focus of the intro. A bit confusing in that respect. To me anyway.


 9:55 pm on Apr 26, 2007 (gmt 0)

You're right, caveman--that appears to be the emphasis of the first part of the patent.

Add the social bookmarking aspect to this: it would make sense that if a site gets a lot of traffic via social bookmarking then "real visitors" are saying that they like the document. So a search engine should reciprocate by ranking the document accordingly.


 10:15 pm on Apr 26, 2007 (gmt 0)

Yep. Sure looks like a few more steps towards being able to use all available data to rank sites.

I'd be more appreciative of their achievements if they had not had such a profoundly negative effect on the Web overall, in the past few years. Sad, really.

Also, more than a bit hypocritical that they publically eschew commercial sites (i.e., in favor of informational sites), run adsense on clearly-spammy-and-entirely-commercial sites, and now regard ads on informational sites as quality indicators. Hehe. Talk abourt irony.

Hey, I'm thinking of placing a bunch of free ads promoting various G services on some of my sites, to get my rankings up. Just kidding. :p


 12:15 am on Apr 27, 2007 (gmt 0)

1- I wonder if this doesn't describe G's internal QC on adsense. Maybe I should read it three times before drawing concluseions though.

2- the last methods describe everflux as I've observed it, and as discussed in a Tedster thread a few weeks ago:

    23. A system, comprising: means for identifying a document that appears as a search result document for a plurality of discordant search queries; means for determining a score for the document; means for negatively adjusting the score for the document; and means for ranking the document with regard to at least one other document based, at least in part, on the negatively-adjusted score.

    24. The system of claim 23, further comprising: means for determining whether the document is authoritative; and means for bypassing the negative adjustment of the score when the document is determined to be authoritative.


 12:24 am on Apr 27, 2007 (gmt 0)

It depends on manual or automatic spam detection.


 12:40 am on Apr 27, 2007 (gmt 0)

Good find tedster, I am glad its explaining links.


 12:41 am on Apr 27, 2007 (gmt 0)

Social bookmarking is already well spamfested. Now they are advocating it... I expect social bookmarking sites to be worthless in no time at all.

Google has a habit of destroying the good things on the net.

Take for instance MFA sites. Google are the creators of this spam and yet still allow it to flourish.

The sooner google stops INOVATING on the net ... the better the net will become!


 2:39 am on Apr 27, 2007 (gmt 0)

Social bookmarking is already well spamfested. Now they are advocating it

Probably everyone at the plex is very excited about web 2.0 and wants to make google more moment-oriented.
Unfortunately, that means
a) undue emphasis from a small number of social bookmarking sites
b) Social bookmarking sites are really easy to spam, so therefore google is easy to spam
c) Social bookmarking sites are clogging up with spam because everyone is using them to rank in google
d) this will be the death of social bookmarking.

Google is like King Midas: everything he touched turned to gold. He thought this was great, until he hugged his daughter and turned her into a golden statue.
You don't want to be embraced by Google. You want to be invisible to them, lest they apply their golden touch.


 7:16 am on Apr 27, 2007 (gmt 0)

Nice analogy callivert!


 8:49 am on Apr 27, 2007 (gmt 0)

Am I right to assume that for traffic factors G is relying on the Google Analytics data?


 9:09 am on Apr 27, 2007 (gmt 0)

Am I right to assume that for traffic factors G is relying on the Google Analytics data?

I think that they may also use:

  • the google toolbar
  • cookies when you do a search and go back to it
  • adsense ads
  • jecasc

     12:38 pm on Apr 27, 2007 (gmt 0)

    Big news. And look at that - I think they will even use computers to implement all that stuff:

    [0027] FIG. 2 is an exemplary diagram of a client or server entity (hereinafter called "client/server entity"), which may correspond to one or more of clients 110 and servers 120-140, according to an implementation consistent with the principles of the invention. The client/server entity may include a bus 210, a processor 220, a main memory 230, a read only memory (ROM) 240, a storage device 250, one or more input devices 260, one or more output devices 270, and a communication interface 280. Bus 210 may include one or more conductors that permit communication among the components of the client/server entity.

    What a load of hot air.


     1:16 pm on Apr 27, 2007 (gmt 0)

    traffic factors G is relying on the Google Analytics data

    Perhaps, but I would point first towards Google Toolbar data.


     1:23 pm on Apr 27, 2007 (gmt 0)

    Isn't this traffic patent paving the way for Google to understanding a paid link?
    What if this will allow sites to rank with paid links even if they don't pass any parameters?

    --- skip this part, go down to "chronology" ---

    The whys and hows that I can imagine.

    Paid links and ads that link to cr@p:
    The current problem is, paid links are generating some automated parameters and make the target document rank just because it receives n number of relevant, quality inbounds, while in fact it may be that the actual content has no value related to the targeted keyphrase. Or in other words, people who do a search for that phrase wouldn't be satisfied with what they find. Especially it it installs something on their computer. In AdSense there's little telling of what user satisfaction is after a click - for PPC anyways - unless they pair the data up with Analytics or whatever data they find lying around.

    Paid links, ads that link to value:
    The amount of references and amount of passed parameters that a specific, smaller site/page/business would need currently makes it impossible to break into Google even if the given content would be much more useful for the users, for it's local / more specific / newer, whatever. But exactly because of these qualities ( it is too specific to touch enough web-sensitive people ) it will not generate the amount of natural links needed to pass the threshold for trust, or simply not get pass relevancy calculations. Instead you'll see seven nonexistent Wikipedia pages. In AdSense the big players may easily outbid the locals with equally relevant pages just to keep them from being able to build momentum. Even if the global company offers nothing but a landing page that was very well optimized. While people would in fact love the local service, but... did not find it.

    That's why most of the sites bought links in the first place, so if they invent something to counter these effects, that's like saving two birds from the same stone.

    The new system would take note of the reference from the advertiser, and see if people were fond of the advertisement ( clicked it or not ) and/or the actual content ( hit 'back' within 10 seconds / did not browse further / closed browser or not ).

    This would apply to advertisements that google identifies as advertisements, which I'm not sure if would include AdSense. But why not. Why not include every single kind of ad, including banners, javascript links, flash ads, doubleclick ads ...

    It may include paid links from the major players. Text link brokers, news sites, even paid directories. It would know of the relation, unless the links are hidden, but including the instances of a link passing no parameters.

    It would further analyze the links, and where it doesn't look on relevancy, pagerank, trustrank ( as it does for links it finds "natural" ) it would look at the traffic a link generates.

    --- Chronology ---

    'nofollow' panic originating from MC blog.
    Webmasters flag advertisements for Google with a nofollow.
    Links that are clear as daylight to be paid ads and do not have nofollow are devalued, and flagged as ads. Practically speaking, Google applies that nofollow for you if your forgot to.
    Links that aren't detected as paid text links continue to pass PR, relevancy and TrustRank ( if they aren't devalued by the ultra-strict thematic trust-relevance checks, see -950 thread )

    Links that are now identified as paid ads get a new, traffic based ranking, just for them.
    Ads that perform well and ads with high user satisfacton generate a scrore for probably both ends. Sites that are on-topic, ads that are on-topic get a higher quality ranking. Sites with certain patterns in their ad campaigns and sites with certain patterns in the ads they show will either see more or less of this score. Sites with only a few but effective ads on them, and sites with only a few but effective ads to them are probably better than having 500.000 sitewides on barely relevant pages.

    Even without passing "natural" link related parameters text links, and other ads are integrated into the SERPs to offset the current balance, allowing smaller sites to perform better for specific searches.

    Google users are more satisfied.
    Text link ads are both devalued and keep their worth, but start working for their algo. Any kind of ad in fact will be working for them in regards of determining user satisfaction in the areas where most of the references HAVE to be ads, for the given area is so specific, or exaclty the opposite... it's that competitive.
    Webmasters are better off than not having this in place.
    SEO isn't really changed for this doesn't really realte to SEO, at least it doesn't provide new dimensions to it. It's rather the content that counts.


    Everyone is happy.


    And then I woke up.


     4:55 pm on Apr 27, 2007 (gmt 0)

    4 clearly describes the intentions underlying the invention of assessing document inception dates:

    ...There are several factors that may affect the quality of the results generated by a search engine. For example, some web site producers use spamming techniques to artificially inflate their rank. Also, "stale" documents (i.e., those documents that have not been updated for a period of time and, thus, contain stale data) may be ranked higher than "fresher" documents (i.e., those documents that have been more recently updated and, thus, contain more recent data). In some particular contexts, the higher ranking stale documents degrade the search results.

    [0009] Thus, there remains a need to improve the quality of results generated by search engines.


     6:15 pm on Apr 27, 2007 (gmt 0)

    "stale" documents... may be ranked higher than "fresher" documents

    I seem to remember a recent thread, in which scraping was discussed. The consensus was that google uses the age of the document to figure out who the scraper is. It was also concluded that if you update your page, google resets the age, which can land you in the supplementals while the scrapers get the glory (because their page is older). If this is true, then there may be a tradeoff for webmasters of holding onto proprietary content versus freshness.


     6:26 pm on Apr 27, 2007 (gmt 0)

    I was also focusing on that area, Bentler. Freshness of backlinks is also being used to score freshness of the document -- and since the backlinks are also occuring on documents, it now sounds like freshness may be an iterative calculation rather than a simple scoring.


     8:50 pm on Apr 27, 2007 (gmt 0)

    hmm i wonder if #4 would be important in relation to my news releases. We tend to host all of the press releases we do on our site, but often times we'll put it up on newswires before we get the time to add it to our site. Maybe now I'll update our site with the news release before 'releasing' it. Does this make sense?

    thanks for the info good thread


     8:38 pm on Apr 28, 2007 (gmt 0)

    cookies when you do a search and go back to it

    I'm not following you on that one. How does that work exactly?

    This is an interesting discussion. It sounds like Google tried to do the same thing with the ridiculous nofollow tag.

    Yesterday Wikipedia decides not to use nofollow. Their outlinks mean a lot. Today they turn on nofollow, suddenly all the SERPS are changed radically. How arbitrary and stupid is that?


     6:40 pm on Apr 30, 2007 (gmt 0)

    I was keeping it for myself ;) (traffic as a factor), but now it's already known, let me note that I consider it as a step back. The old chicken and egg question.

    I prefer the old Yahoo algo, where new sites/pages had a preference in the SERPS for some time, just for measuring users' responses.


     6:50 pm on Apr 30, 2007 (gmt 0)

    Note what this "traffic patent" is about -- it's the quality of advertisers shown on a page and the traffic their ads are generating. From the Google patent [appft1.uspto.gov]:

    A system determines:

  • an extent to which advertisements are presented
    or updated within a document,

  • a quality of an advertiser associated with an
    advertisement provided within the document,

  • whether an advertisement in the document relates
    to an advertising document that has more than a
    threshold amount of traffic,

  • and/or an extent to which an advertisement provided
    within the document generates user traffic to an
    advertising document related to the advertisement.

    (bullets added)

  • [edited by: tedster at 7:15 pm (utc) on April 30, 2007]


     7:11 pm on Apr 30, 2007 (gmt 0)

    I have noticed a lot of pointers that the same patent applies in some rate to organic results too.


     7:20 pm on Apr 30, 2007 (gmt 0)

    Your'e right --

    [0087] In summary, search engine 125 may generate (or alter) a score associated with a document based, at least in part, on information corresponding to individual or aggregate user behavior relating to the document over time.

    The text of these four new patents seems particularly chaotic - ecah one contains sections that are about the other three patents' areas, and that text seems disconected --as if it was copy/pasted in. So the relationships between all four patents are not at all clear to me right now.


     7:28 pm on Apr 30, 2007 (gmt 0)

    And not only advertising traffic.
    Look at the summary [0081]-[0084].


     7:51 pm on Apr 30, 2007 (gmt 0)

    Yep, that same section appears in the other patents, too. The bullet-point quote I pasted in above is the Abstract for the traffic patent. Somehow these four patents are entangled, with each being a piece of a bigger purpose. The stated purposes of the patents, according to their Abstracts, seem to be:

    1. Traffic: scoring based on advertising on the page
    2. Query Analysis: which search result gets the click
    3. Link-Based: freshness, staleness and churn around links
    4. Document Inception Date: methods for determing when a document was published

    Does this all point to methods for an attack on paid links?

    This 42 message thread spans 2 pages: 42 ( [1] 2 > >
    Global Options:
     top home search open messages active posts  

    Home / Forums Index / Google / Google SEO News and Discussion
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved