Forum Moderators: Robert Charlton & goodroi
its not the trademarked ghost dataset that went missing, and it wasn't a rebuild like the halloween update.
No, but the overall technique has a familiar feel to it. More than one dataset may be involved this time - and perhaps many more. Interesting that three weeks ago we were hearing reports of googlebot spidering like crazy, and in recent days, reports of googlebot not even showing up for some sites.
[edited by: tedster at 5:09 am (utc) on July 15, 2009]
You want to prove something that can't be proven. (for most people)
You also want to argue the proof of a negative
ie. how can you prove something does NOT exist.
It's backwards circular logic
(aka pure untested and untest-able chicken bone throwing),
but go at it.
(and certainly hasn't stopped you before)
I know you and steveb have been waiting to get at me for sometime now...here's your chance, eh? ;)
Have fun....
oh btw...you two might want to pass on some knowledge that would help the rest of the board in the meantime.
Impart your wise understanding of your analysis of patents
(or the infinitely more important ALGO and current update) to those who need help
I'm done...and/or waiting...
I think the '-950' affair has stopped on-site over-optimisation by those in the game, so things like 'run-of-site keyword footer backlinks' or 'heavily competitive keyworded sidebar backlinks' could be next up.
Disclaimer: I have no pals in the GooglePlex and am not too pushed about reading patents, 'specially as in the USA it seems you can try to patent any simplisitic notion (which could also be a cheap and sneaky way to dangle a few red-herrings before techy webmasters. Or protect software you don't actually have working yet. Just a thought.)
You also want to argue the proof of a negative
And yet you've claimed, repeatedly, to have done just that -- proven that Google doesn't use traffic as a ranking factor. By your own admission, you can't have done that, and even if you did things might have changed since then.
I'm not saying I think it is or it isn't*, but attempting to prevent the idea from even being discussed at all can't be productive.
"It is ethical to doubt. It is unethical to be certain." - David Garcia
And, generally speaking, my advice would be to find a large quantity of something illegal, smoke it, and then get a massage or something. Sheesh.
(edit)
*My opinion, for what it's worth, is that it probably isn't, on the grounds that that would tend to suppress novelty, which is probably not Google's goal. But even if you're right, that's no excuse for trying to shout down anyone who disagrees.
Goolge Updates and SERP Changes for July 2009.
The change that stands out a bit to us is the emergence of interior pages, ranking well for some strong keywords whose top SERP’s were historically ruled by home pages.
potential for a 13 item SERP is still with us
Were in the US northeast and we have not seen this at all. (although we keep looking!)
but let's not see that turn into "Goog is ranking based on traffic"
We don’t see any indications of this, which in itself is important. Obviously there has been a significant change here but just no real evidence you can point to regarding traffic factors influencing position. However, take great care not to just dismiss this outright. There’s no question their capable of injecting this into their ranking algorithm, and if they can do it rest assured they will want to experiment with it, and probably already have. It’s just the nature of the beast; Google has a constant, and unbending urge to tinker with this thing.
[0036] The frequency of visit score equals log2(1+log(VF)/log(MAXVF). VF is the number of times that the document was visited (or accessed) in one month, and MAXVF is set to 2000. A small value is used when VF is unknown. If the unique user is less than 10, it equals 0.5*UU/10; otherwise, it equals 0.5*(1+UU/MAXUU). UU is the number of unique hosts/IPs that access the document in one month, and MAXUU is set to 400. A small value is used when UU is unknown. The path length score equals log(K-PL)/log(K). PL is the number of `/` characters in the document's path, and K is set to 20.
At this same patent, BTW, Google states that Search Engine must go beyong term-based methods and link-based methods to improve search algorithms:
[0007] Conventional methods of determining relevance are based on matching a user's search terms to terms indexed from web pages. More advanced techniques determine the importance of a web page based on more than the content of the web page. For example, one known method, described in the article entitled "The Anatomy of a Large-Scale Hypertextual Search Engine," by Sergey Brin and Lawrence Page, assigns a degree of importance to a web page based on the link structure of the web page.[0008] Each of these conventional methods has shortcomings, however. Term-based methods are biased towards pages whose content or display is carefully chosen towards the given term-based method. Thus, they can be easily manipulated by the designers of the web page. Link-based methods have the problem that relatively new pages have usually fewer hyperlinks pointing to them than older pages, which tends to give a lower score to newer pages.
[0009] There exists, therefore, a need to develop other techniques for determining the importance of documents.
Again IMHO, one important item of these "other techniques" is the query-based factor (ie, how a page responds to search queries).
This is mentioned at several parts of the previously quoted patent and also by this other patent [patft.uspto.gov]:
Yet another query-based factor may relate to the extent to which a document appears in results for different queries. In other words, the entropy of queries for one or more documents may be monitored and used as a basis for scoring. For example, if a particular document appears as a hit for a discordant set of queries, this may (though not necessarily) be considered a signal that the document is spam, in which case search engine 125 may score the document relatively lower.
SO,
If your site's topic is [widgets], and if several people search for [widgets] this time of the year, then your pages will get several hits for [widgets], and provided that the algorithm doesn't identify any "discordant set of queries", your pages will gain scoring points for [widgets] and will move up the SERPs accordingly.
To me, these two patents explain much of the current Google algo and SERPs.
A classic example is that truly monster patent on historical data [webmasterworld.com] that Google applied for in 2005. It's got everything in there but the kitchen sink, and it scared the bejabbers out of SEOs when it was first published. But only some of the factors crept into the live algo over time, at least as far as you could test and verify them.
And that's one value of reading patents, as far as I can see. It can give us ideas about what factors to test - factors that we never would have thought up without reading the patent. Another value is the way it exposes the mechanics of the backroom and how data might be handled.
I would have to agree with that statement from my experience over the last couple of weeks. My sense is still that a greater level of semantics has been applied to various search phrases.
To me, there is an obvious connection between search phrase semantics, seasonality and click through rate. The three seem to fit quite naturally where any given change in any of these three elements would also affect how Google handles any of the other two.
Does anyone else see this in the sectors they watch, where terms that were previously dominated by ecommerce or informational websites are now sharing a larger piece of either?
For a phrase I watch, single word, which semantically to me is split between commerce and information, about 40% more information based pages have risen to the top, and it is understandable if Google has (and always is) getting better at applying semantics to results.
I see more inner informational pages from very aged sites and then product inner pages from also very aged site
And, I am still seeing a lot of results on page 1-3 that do not have the keywords in the title, but instead something synanomous, like searching for laptops, the title of the result will say computers, but have laptop in the snippet.
Coupled with this roll out i belive is also the reduction of link value from off topic sites,things like possibly Yahoo directory not counting for anything, despite it being a human edited signal for example.
The net result as i see it is that some sites have taken a smack where:-
Home page is rich with keyword & semantic related phrases that relate to the same. Perhaps collective semantic keyword density level plays a part?
Inbound links are measured in the same way taking word associations into account
Good sites that had human review links no longer get any value for them and treated the same as some spammy site without any.
Hence a site called "Blue Widgets Today" finds it harder to rank for the term "blue widgets" because of the above factors and penalty issues but a site called "Blue GoGo" thats not so on topic rises higher for the term "Blue Widgets" because it doesnt have penalties.
The laugh of all this is that G once said, build sites for users not search engines - The reality now following these changes is that webmasters now have to look even closer at their sites and adjust them if they want to rank in Google
In order of relavancy its now, Bing & yahoo before G unless its a very long tail search
As more and more people start finding they can't locate easily what their looking for on the first page anymore, they are going to use an engine that gives them just that. Kind of like Google did a few years ago.
In trying to make their sauce the best, Google has over thought it to the point of breaking it. Their thinking is that when a person searches for "blue widgets" they really don't want a site about blue widgets and istead show them sites about "purple blue small widgets". Trying to think for the user, is Googles biggest downfall because most user's don't think like a room full of overpaid PHD types.
Inbound links are and will always be part of the equation. But let's face it, if Google can better determine, without inbound links, WHO your website should cater to best by analyzing user behavior for given search phrases based on your domain history, then they are one step closer to relying less on links, which are of course very easy to manipulate.
It would seem to me that as the Internet expands at a large rate, and each keyword becomes more and more competitive, the best direction to go would be better understanding the actual meaning of keywords by crunching data numbers, then comparing to website performance historically.
Any thoughts?
WHO your website should cater to best by analyzing user behavior for given search phrases based on your domain historyI think that is the direction a group of PHDs sitting around a room would go. How are they going to monitor user behavior once clicked through to the site? Unless the user has the toolbar, as far as I know, they can't, so then their search results are based on users behavior that uses a toolbar and I'm pretty sure the number of users who use Googles toolbar is not representative of the number of users that use Google.
If that is what they are going for, then while it all sounds good in theory, it's not going to work and if the results I'm seeing for the stuff I search on are the results of that, then Google will not be the top search engine for long.
At least that's my way of thinking but then again I don't have a PHD either.
I don't have the stats nearby, but last I had checked over 70% of competitive websites on the Internet had Google Analytics installed. Perhaps someone can provide more quantifiable numbers for the purpose of this discussion?
There is an obvious reason why Google acquired Urchin and provided such a good product - so that businesses could provide verifiable data back to, well, Google. They are continuously gathering this data and analyzing it.
Even if they do not have exact stats from your website, they have enough information rolling from others like yours, and complex statistics on bounce rates for other websites in your genre.
The only parts left for them to do is align the actual keywords better with user intention, and I believe this is part of what we see going on.
I am going to put my two cent bet down on Google being here a long, long time. :)
I am going to put my two cent bet down on Google being here a long, long time
I'll agree I don't think there going anywhere anytime soon.
70% seems really high to me. I manage many servers, and hosting accounts for myself and clients in competitive markets and not a single one has urchin on them.
Don't you actually have to activate urchin for it to gather stats?
But, what is considered a low bounce rate? My sites average about a 40% bounce rate, is that too high?
I think the reason for this is Google is better at hitting the right page first time in the SERPs whereas Bing tends to land the user one click away from the info they're looking for, so could it not be considered that a high bounce rate could in fact be a good thing from the SE's perspective if the user finds their answer quickly?
Perhaps the next action is the key here - a related search or a click on the next search result would be the differenciator?