homepage Welcome to WebmasterWorld Guest from 54.226.43.155
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

    
What is up with serps
back to last months data.
Powdork




msg:155441
 5:51 am on Oct 29, 2002 (gmt 0)

I have a site that I was very happy moved into first place for my target keywords. Now it is back to last months cache and position (#15). This is the first time this month it has moved back although there have been many periods in which the freshness date was gone but the cache was still recent and the position did not change. i'm confident it will move back and I know this doesn't mean the update has started, but....
Is the reversion to old caches important, i.e. is it Google getting ready? Has it happened prior to previous updates.

 

Brett_Tabke




msg:155442
 6:53 am on Oct 29, 2002 (gmt 0)

We call it Everflux: it can act mysteriously at times.

Here's the short story on it:

Google is constantly crawling and updated selected pages that meet some predetermined criteria. That may involve last modified dates and PR values.

Google has many data centers and runs a distributed load sharing system across more than 10k pc's running linux with 80 gig drives at last report. Somehow, the copy of the index must get transferred to all those hard drives in all those data centers. You ever transfer 80gig across the net? And then distribute that 80 gig down into thousands of hard drives?

All of that takes a great deal of time. It's a constant process for Google. More-than-likely, the daily updates only copy out those parts of the index that are really updated. That's yet another possibility where new and old data could get mixed.

Load sharing works transparently. You do a search on Google and the request is routed via dns magic to the either the nearest data center or the nearest data center with the least load (we don't know their load distribution criteria on that).

Lastly, they could be working on the index, rolling indexes back, switching parts of the index, backing up parts of the index, rewriting some offending part of the index, deleting parts of an index - or a multitude of other actions or problems that only Google could know about.

Take those combinations of not knowing which box you are going to connect to and which index it may have, and the possibility of daily updating going on at the same time, and results may be unpredictable. There could be dozens of different indexes floating around various data centers - we have no clue.

One minute you'll get one copy of a index during a search, and the next you'll get another. Sometimes that could be yesterdays crawl, or last months crawl, or four months ago crawl.

Powdork




msg:155443
 8:02 am on Oct 29, 2002 (gmt 0)

Thanks for the excellent info Brett.
Somehow, the copy of the index must get transferred to all those hard drives in all those data centers. You ever transfer 80gig across the net? And then distribute that 80 gig down into thousands of hard drives?

No, but I once downloaded a large image editing program over limewire with a 28k modem after 3 months on a computer. I can't imagine that amount of cussing times 400 mouths.;)

Lastly, they could be working on the index, rolling indexes back, switching parts of the index, backing up parts of the index, rewriting some offending part of the index, deleting parts of an index - or a multitude of other problems that only Google could know about.

The site in question insist on using their own server and the site is down frequently (I guess they're not very good at it). Would being down during a 'freshness' visit possibly cause the cache to revert to a state prior to the previous 'freshness' visit?
Also, as pertains to "rolling back part of the index", "switching parts of the index", and "backing up parts of the index". Is there any way to isolate any of these to see if there is any correlation to the update, just for future updates.
Sorry about
1. I didn't see thread 6394, which was two doors down when I started this and..
B. I pretty much knew the answer before I posted as I've read many similar posts. Doesn't quite sink in until its your site though.

On the flip side, I (and hopefully others) did gain from your post Bret).

ps. can we repost an update time once our previous one expires or is there no hope to ever obtain a mousepad? Wait a minute, I think I can use that image editing program and the picture of the mouse pad and my wifes rounded corner cutter and the special paper and backing. I may not be able to duplicate Google but i bet I can put out a black market mousepad without any venture capital at all.

rfgdxm1




msg:155444
 8:21 am on Oct 29, 2002 (gmt 0)

>Google has many data centers and runs a distributed load sharing system across more than 10k pc's running linux with 80 gig drives at last report. Somehow, the copy of the index must get transferred to all those hard drives in all those data centers. You ever transfer 80gig across the net?

I'll gather Brett you've never run a Usenet server that carries binaries? With all that pr0n and copyright challenged stuff on Usenet, the daily feed is 500 GB. Thus a Usenet server has to be able to handle taking in 80 GB in just a matter of hours.

Brett_Tabke




msg:155445
 8:40 am on Oct 29, 2002 (gmt 0)

If you consider each byte of the index is then transfered 10k times over (probably) via 100mps ethernet - that's on the order of terrabytes (maybe petabytes) moved/copied daily.

rfgdxm1




msg:155446
 8:58 am on Oct 29, 2002 (gmt 0)

Of course if they had fatter pipes between the machines, moving that sort of data around your own datacenter could be very quick, given that there aren't the congestion problem the Net has. Probably Google doesn't see this as necessary, but technically they could do it.

shady




msg:155447
 10:43 pm on Oct 29, 2002 (gmt 0)

I have around 300 pages on my site, which are a depth of two clicks from the home page.

For the last two months, the majority of these pages have appeared for around 5 days and then disappeared.

How long does this go on for, before the pages stick in the index?

Also, the cache of my index page is over two months old, from before the links even existed to these pages!

clickclick




msg:155448
 11:00 pm on Oct 29, 2002 (gmt 0)

There is no doubt that Google has hods of data to shift nearly all the time and I'm not going to take on Brett on the nitty gritty. There is however one thing that they manage to keep constant - 657,000!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved