Forum Moderators: open

Message Too Old, No Replies

Has google crashed?

         

Craig_W

1:03 am on May 10, 2003 (gmt 0)

10+ Year Member



Without question google has lost a significant amount of information. Googleguy will never admit that they did because they have enough data and had enough foresight to maintain multiple indices sufficient to continue posting, albeit outdated, results and it is not in Google's interest to admit a data loss. Let's face it, the internet is one big data management problem, google's business is to deal with it, this time they had a problem! My personal site went from having an accurate cache, to having an outdated cache, to having absolutely NO cache and now back to having an outdated cache. This also explains the loss of backward links. Not bad really, the public won't really know because they will just get some outdated links for a while, while the existence of some updated information points to the fact that there was not a total loss of data. Fortunately for the entire web community the loss is not permanent, the googlebot WILL return to regain the data lost.

abcdef

1:07 am on May 10, 2003 (gmt 0)

10+ Year Member



its a conspiracy

steve128

1:09 am on May 10, 2003 (gmt 0)



That is interesting... makes me think...mm

steve128

1:14 am on May 10, 2003 (gmt 0)



But then again how did my new site gain entry...beit for a short while?

Back to the drawing board

pmac

1:15 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome [webmasterworld.com] to WebmasterWorld Craig_W!

theBear

1:33 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In part Craig_W set in stone:

"Without question google has lost a significant amount of information."

I don't see where Google lost any information.

Would you care to add on to that statement?

I looked over SJ a bit for a couple of sites I work on and found the content to come close in count to what my logs showed as being grabed by freshbot.

There were no deepbot pages but if Google does multiple partial updates, they can bring in the rest of the pages at a later time.

Cheers,

Craig_W

1:45 am on May 10, 2003 (gmt 0)

10+ Year Member



Well, in all sincerity, the choice of words, 'without question' should have been more carefully picked. I entitled the forum "Has google crashed?" because it is my opinion that google has lost some data somewhere. What led me to this deduction is the fact that at some point I sought to obtain cache information and learned that no snapshot of my website was available (not an outdated version, simply nothing!) Further, the consistent posting of outdated snapshots of my website, taken at the latest in March, leads me to believe that somewhere along the line google has lost information pertaining to my site. I saw the same types of data loss on other sites. I feel that its in google's own interest to index the freshest data possible and when google already had fresh data and reverts to outdated data, it points to some type of data loss or at the very least a failure to update. However, since the outdated cache dates back to March 2003, I honestly believe google lost some data here. I don't think they lost all of it.

reneewood

1:46 am on May 10, 2003 (gmt 0)

10+ Year Member



They lost my information. I'd buy into Craig's thinking. I sound like a broken record. What happened to my homepage? What happened to my homepage? What happened to my homepage.......

hamster77

1:54 am on May 10, 2003 (gmt 0)

10+ Year Member



reneewood:

This might cheer you up a bit maybe - our disappeared site turned up on -fi this evening and is still there now, so maybe your homepage will make it too.

theBear

1:55 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They don't need to make a cache copy availible on the web from each and every server.

Depending where they are in the process they don't even need all of the page content.

It is possible for the PR calculations to be done without all of the page content the same can be said about returning SERPs on a server no need to keep all that HTML, XML, XHTML crud that would be required to render the cache.

Cheers,

1milehgh80210

1:59 am on May 10, 2003 (gmt 0)

10+ Year Member



Something has been lost.......
Some webmasters minds!

theBear

2:04 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1milehgh80210,

In some circles even that would be a non starter, can't lose what ain't there to start with ;) hmmmmm, I may qualify for that remark :).

33+ years of rolling code would indicate something was missing somewhere.

Cheers,

steveb

2:04 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The March deepcrawl did miss a huge amount of links in both the Dmoz and Google directories. There is "no question" of that. It's easy to check. Was this "missed crawl" phenomenon limited to those two directories, or more widespread across the Internet? That isn't so easy to check.

Did the April deepcrawl do a better job than than the March one? We'll see eventually.

-sj is just one more data point that something not-good has happened to Google's data the past two months. It may all correct itself and be fine and dandy when the update occurs in the next few days/hours, but at this point all we can see outside the Googleplex is an incomplete/not-good most recent deep crawl and the public showing of a very poor quality work in progress on -sj that was in no way ready for prime time.

It makes you go "hmmmm?", but beyond that we should wait for the update to actually update and draw some conclusions then.

theposter

2:10 am on May 10, 2003 (gmt 0)

10+ Year Member



ok Craig_W, which rival search engine do you work for? ;) Your post reminds me of one of the oldest tricks in Public Relations. Kick your competition when they are down.

I repeat myself...the update has not started. The past week we just got a glimpse into the internal workings of what makes up an update.Updates of such large magnitude are an incremental process. Knowing the amount of work that goers into our distributed databases, i know I am right here!

rfgdxm1

2:25 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Seems like data loss to me also. In addition to the missing backlinks some have reported, for one of my sites Google lost all the inbound anchor text on links, causing it to plummet in the SERPs. I have clear proof from the allinanchor text this happened. Since these backlinks show up on the link: command, clearly this wasn't a case of sites linking to me being down at the time of the crawl. It may not be that Google crashed, but that the algo is buggy in that it can't handle all the data being thrown its way, and those chokes on part of it.

deanril

2:52 am on May 10, 2003 (gmt 0)

10+ Year Member



Ok Im a newby, but I dont think google has "Lost" anything. Im mean come on, ever heard of a raid 1 array? I mean you think its like your harddrive. 1 copy thats it?

Not hardly. Google has 10,000 computers with 80 GB drives on each, Ill quote the rest:

[webmasterworld.com...]


We call it Everflux: it can act mysteriously at times.
Here's the short story on it:
Google is constantly crawling and updated selected pages that meet some predetermined criteria. That may involve last modified dates and PR values.

Google has many data centers and runs a distributed load sharing system across more than 10k pc's running linux with 80 gig drives at last report. Somehow, the copy of the index must get transferred to all those hard drives in all those data centers. You ever transfer 80gig across the net? And then distribute that 80 gig down into thousands of hard drives?

All of that takes a great deal of time. It's a constant process for Google. More-than-likely, the daily updates only copy out those parts of the index that are really updated. That's yet another possibility where new and old data could get mixed.

Load sharing works transparently. You do a search on Google and the request is routed via dns magic to the either the nearest data center or the nearest data center with the least load (we don't know their load distribution criteria on that).

Lastly, they could be working on the index, rolling indexes back, switching parts of the index, backing up parts of the index, rewriting some offending part of the index, deleting parts of an index - or a multitude of other actions or problems that only Google could know about.

Take those combinations of not knowing which box you are going to connect to and which index it may have, and the possibility of daily updating going on at the same time, and results may be unpredictable. There could be dozens of different indexes floating around various data centers - we have no clue.

One minute you'll get one copy of a index during a search, and the next you'll get another. Sometimes that could be yesterdays crawl, or last months crawl, or four months ago crawl

Losing data for Google would be like your credit card company losing the amount you owe, it aint Never going to happen.

theposter

2:52 am on May 10, 2003 (gmt 0)

10+ Year Member



ok, this rant is for no one in particular.

Please R E L A X. dang...the update has not started. Give it a break people. Think of the next deep crawl. Get back to work. sheeeez.

skipfactor

3:37 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>Welcome to WebmasterWorld Craig_W!

Nice to see that without a scolding underneath it for a change...hey, maybe that's why I never got a "Welcome..." :)

Cheer up everyone, I'm showing 1 backlink on -sj. If it were 20 I'd be worried.

Kackle

3:42 am on May 10, 2003 (gmt 0)



On a site with 107,000 pages, Google has been ending up with about 50,000 - 60,000 pages shown in the site:xxx xxx command for the last 2.5 years. Home page between 6 and 7. Deep pages between 0 and 2.

The March crawl seemed normal, but the pages that showed up on April 11, when the update kicked in, were 35,000 instead of the usual 50,000 to 60,000. What was more convincing is that my traffic has been cut in half since April 11.

Okay, so I look at www-sj. It showed 95,000 pages indexed a week ago. Yesterday it switched to 91,700 pages indexed. Sure, I'm excited. Even the slight dip didn't worry me. My backlinks are one-third what they should be, but so are everyone else's.

But now I decided to look deeper. Everything looks screwed up. The 90,000+ counts appear to be bogus. It's quite likely that the update will not be better for me at all.

In fact, I cannot explain the 35,000 number now. When I actually list the pages in one of the many directories, the current www situation looks like it beats the www2 situation easily. Not only are more pages showing up in www, but the ones that do show up don't have nearly as many URL-only links (such a link means that Google noticed the link, but didn't put the page in the index).

Am I confused? You betcha. Because the traffic still stinks.

Do I think Google engineers are confused? Yep, I think the whole thing is almost out of control out there. I don't know if it's a problem of bureaucratic non-communication, or if there are too many cooks in their gourmet kitchen, or if the Web has finally surpassed Google's ability to handle it all. But I'm losing a lot of confidence in anything I see on Google by way of numbers.

The only thing that counts is that little "percent of total traffic" number that comes from Google, by virtue of referrals in my logs. It's been flat-lined for the last 30 days. It's almost to the point where a Declaration of Independence is indicated. Until it creeps up and stays there, I'm going to stop obsessing over Google.

rfgdxm1

3:44 am on May 10, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>Losing data for Google would be like your credit card company losing the amount you owe, it aint Never going to happen.

Consider loss due to buggy software, not hard drive crashes.

dididudu

4:38 am on May 10, 2003 (gmt 0)

10+ Year Member



I agree with this topic too. I start to suspect google has lost some data and here is my theory:

I have 2 sites, different URL, stored on same ip, they are in comletely different fields, non related what so ever. Now both of them are missing on google. It make sense to me that google crawl sites and collect data (sort data) according to ip address, and the data from my ip are lost, which causes both of the sites lost from sj.

If you say one of them is banned, then how to you explain the other one? And again, I don't spam!

jranes

4:49 am on May 10, 2003 (gmt 0)

10+ Year Member



"Consider loss due to buggy software, not hard drive crashes. "

What the "delete index" function went out of control?

reneewood

4:54 am on May 10, 2003 (gmt 0)

10+ Year Member



This is an embarrassingly newbie question....

If they lost it, can they find it? Is the "old" information cached on a server somewhere and they can just download/retrieve it again? (Told you it was a newbie question!)

1milehgh80210

5:03 am on May 10, 2003 (gmt 0)

10+ Year Member



There's something no system is safe from..
"buggy people"

why2kit

5:29 am on May 10, 2003 (gmt 0)

10+ Year Member



Kackle - that's an intersting rant - but not one that should be ignored.