Forum Moderators: Robert Charlton & goodroi
"Okay, either I'm going nuts and seeing things or Google is playing tricks on me."
No you are not going nuts. I think that its Google which has deployed something that makes us go nuts.
Please welcome Google Virtual Datacenters :-)
I saw serps today which I couldn't see on any of the DCs that we know. Therefore I posted today msg #:149 about [google.sk...]
[edited by: Brett_Tabke at 2:25 pm (utc) on April 26, 2006]
Well... that have been the case since November-December 2005. I.e Google's serps have been in random for around 5-6 months. Have you heard any complaints from the general public ;-)
I hear what your saying and I tend to agree but on the other hand I have heard people complaining about Google's lack of relevancy actually. And these people are not webmasters like us! I honestly get a chuckle when somebody tells me they started using a different search engine and then they proceed to tell me why without me even asking. Of course I elaborate on my thoughts as well to them afterhand but I've been told by non-techies as well. The funny part is I keep hearing that MSN and Yahoo are losing ground on Google and it makes me ask myself, "Why?". I hear it from other people, I hear it from webmasters and yet Google is still gaining ground. I mean honestly if you can't find what your looking for on a certain search engine what is the first thing you and everyone does? WE and THEY jump ship to MSN or Yahoo. Correct? Am I alone in this if you say that is not the case?
I'm not barking at ya Reseller, I'm just stating a point mate! I am obviously with you and trying to adjust to the new schemes of Google and yes of course these DC flops are wreaking havoc on my brain! :>~
At the moment there look like there are 2 main sets of results in the datacenters, and for my keywords they are only a little different - [64.233.167.99...] is an example of one set and the other is [64.233.161.99....] I expect within 24 hours all the datacenters will be showing one of the other.
I've been banking on 64.233.167.99 from the start.
I hope you're right and they spread as opposed to the other set. Last night they had vanished again though (note again).
FWIW I think, having looked at the SERPS for some of the keyword combinations that I track, that 64.233.167.99 is a move towards "on page" text relevancy and semantic webs.
Since Florida they have been trying to rank pages with somewhat more emphasis on relevancy by focusing on the richness of the language used in certain areas of on page text and on the richness and topicality of the language used on the pages are linked from and that link to a page. They clearly use semantics and particularly stems of words in assessing richness and relevance of language already. Recent technology purchases confirm that this is a direction that Google is going to continue in the future.
The increased emphasis on rich language and semantic webs means that the importance of other elements of the algorithm will be reduced. So paying for inbound links from irrelevant sites and pages will be less beneficial than producing good comprehensive content and having links from and to other pages that have good comprehensive content on the same or closely related topics.
If they get this right then the only way for spammers to spam Google is to produce good comprehensive and relevant content linked to other good relevant comprehensive content. Which means that it will no longer be spam.
Best wishes
Sid
As pointed out, it also has a democratic element of course, however it could get tricky for Google to provide consistently good SERPS were it to stay with us longer term.
So my guess is, Random SERPS, if they stick, are a means to an end, but not the end product in itself.
Surely Google wants people to come back for more then there is more chance of them clicking one or more Adwords ads.
What they want to provide is paid for and organic relevance.
More quality = more money.
High traffic sticky organic sites presented high in SERPS causes polarisation, less Google search usage and reduced Adwords clicking.
Sid
On another front, I guess Google is also reshaping its fighting spam techniques. Less manual intervensions and more dependance on filters and algos. Of course several innocent sites will suffer too, as algos shall with no doubt affect those sites too. You might have noticed that Matt and GoogleGuy aren't so eager anymore in asking for spam reporting. Algos and filters should do the job ;-)
Therefore, crawling/spidering is done from one to a couple spots (alert: theory forming in process) within the plex. It makes sense in my mind, since they are now running two bots to crawl the internet that they would take a crawl data set from a particular time-series and push it to a single data center for rendering.
A single push is probably many gigabytes in data. Its possible for them to keep crawl data sets probably at a given data center and to mix and match, swap and flop them at will.
Keep in mind, that googles primary business is historical records, thus crawl data sets are going to be always stored on their network and will probably be crunched against more current data sets for years to come.
100% consistent serps would require completely sync'd data sets across the globe. 25 billion sites, lets figure a average of 5 pages per site. I'd average keywords, link info, page cache and images at roughly 50KB per site.
That puts at roughly 1.25 PB (1250 Terabytes). Even at a 8 billion sized cache were at 400 Terabytes roughly. They can compress it sure, but you get my point.
In five datacenters im ranked 11-13. In the other 50 datacenters (using the McDar tool), im between 54 and 63.
I see the slight change in each datacenter being due to the everflux, but their are two distinct data set ranges.
I have monitored the datacenters for the past two weeks and have seen an increase from 2 to 5 for the good dataset (where the good dataset means - good for me).
You others out their who see this two divergent sets of data, do you see the same ratio or are you getting better results than 5 good to 50 bad.
The homepage was cached on April 24, 2006. It looks like the pages I'm looking at became supplemental and are cached from last August 2005. Some of them are cached from July 2005.
On 64.233.167.99, I have fewer total pages indexed, but not a single one of them is supplemental. Also, my ranking is up significantly on that DC compared to others.
Crossing fingers and hoping this represents an advance...
very few pages indexed
A recent comment by Matt Cutts may have some relevance here. In that comment, he suggests emailing for such crawl problems, I think:
...the crawl caching proxy is (in my opinion) completely different from the issue of some people’s sites not being crawled as much in Bigdaddy. I was aware of the latter, but not the former. Regarding the latter, GoogleGuy mentioned that you can email to bostonpubcon2006 at gmail dot com with a subject line of “crawlpages” (all one word) to mention your site. Someone is going to look through that feedback.
[mattcutts.com...]
Good evening Folks
These days remind me of 2-3 February 2005, Allegra Update. I call it Black February. During those hard days, Forum 30 was crowded with threads and posts which were similar to some extent to the current ones. I don't have the exact figure, but I recall sites of several kind fellow members were either hardly hit or were killed.
If you take a look today at this thread and the other related threads, you shall see that they are in fact connected. They are all about the health of Google Datacenters and how sites look like on them, lost rankings, lost pages, deindexing, supplementals, cannonicals, random serps etc..
And it seems that our lovely Google Datacenters are suffering and bleeding. And we Google Datacenters Watchers feel that suffering too.
Having said that, I'm an optimistic person who doesn't forget to look at the bright side of life too.
I still believe in Google and our good fiends at the plex the youngster Googlers, and the older ones like Matt Cutts and GoogleGuy. Those friends aren't going to allow our beloved Google Datacenters keep suffering.
Tomorrow for sure shall be a better day.
Good night and God bless.
Very nice sun shine and a great morning here. Wish you the same.
"I see that [64.233.167.99...] now has the experimental results that have previously been on [72.14.207.99...] and on [72.14.207.104...] for the last few weeks (but are not at those two now)."
I see some "affiliates friendly" serps on [72.14.207.99...] and [72.14.207.104...] . Do you see the same?
Things look more calm this morning on the DCs. Few serps sets. But its Friday, ya know. And everthing might happen on weekends :-)
Thanks.
Good Morning Reseller,
I think you're right about the weekends, especially when associated with a bank holiday. This weekend is May Day Bank Holiday here in the UK and I'm expecting another Google change.
I especially like the look of 64.233.179.99 & 64.233.179.104 at present. I'm getting some strange results for my local market on 64.233.187.99
All the best
Col :-)
I see that [64.233.167.99...] now has the experimental results
Take a close look into the future of the Google algoritm.
In my niche that DC produces the most relevant, clean results I've ever seen on Google.
Sid
C:\>nslookup 202.43.196.230
DNS request timed out.
timeout was 2 seconds.
*** Can't find server name for address 216.148.227.68: Timed out
DNS request timed out.
timeout was 2 seconds.
*** Can't find server name for address 204.127.202.4: Timed out
Server: ns8.seren.com
Address: 64.85.239.20
Name: w1.search.vip.tpe.yahoo.com
Address: 202.43.196.230
"Take a close look into the future of the Google algoritm.
In my niche that DC produces the most relevant, clean results I've ever seen on Google."
Clean results! You must be kidding. On that particular DC, within Top 10 sites for my search keywords there is a site with hidden text which I have already reported to Google WebSpam Team several times.
Right Matt :-)
During update Jagger there was 'blending' of two sets of results to arrive at the final set.
If you compare these datasets with your searches would you agree that 64.233.187.104 is a blend of the other two?
It may be totally different depending on your search terms but that's the way it looks to me.
Clean results! You must be kidding. On that particular DC, within Top 10 sites for my search keywords there is a site with hidden text which I have already reported to Google WebSpam Team several times.
So they need to learn how to automatically spot hidden text and filter out pages employing that kind of crap. In my area the SERPS on that DC are the "most relevant, clean(est) results" I've seen.
The point is that the algo on the DC in question seems to have somewhat more emphasis on on page and neighbourhood relevance and IMHO that is what Google will be increasingly doing in the future.
Sid