homepage Welcome to WebmasterWorld Guest from 54.242.126.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 183 message thread spans 7 pages: < < 183 ( 1 2 3 4 5 [6] 7 > >     
Google CEO admits, "We have a huge machine crisis "
Google admits that they have a problem with storing the world's information
Enkephalin420

10+ Year Member



 
Msg#: 34147 posted 4:48 pm on May 3, 2006 (gmt 0)

Google CEO admits - "We have a huge machine crisis - those machines are full".

I was reading the New York Times article Microsoft and Google Set to Wage Arms Race [nytimes.com] and there was paragraph that caught my eye on page 2 that quoted Eric Schmidt (Google CEO) admitting that they have problems with being able to store more web site information because their "machines are full" (see page 2 of NYT article).

I am a webmaster who has had problems with getting / keeping my webpages indexed by Google. I follow Google's guidelines to the letter and I have not practiced any blackhat seo techniques.

Here are some problems I have been having;

1. Established websites having 95%+ pages dropped from Google's index for no reason.
2. New webpages being published on established websites not being indexed (pages that were launched as long as 6-8 weeks ago).
3. New websites being launched and not showing up in serps (as long as 12 months).

We're all well aware that Google has algo problems handling simple directives such as 301 and 302 redirects, duplicate indexing of www and non-www webpages, canonical issues, etc.

Does anybody think that Google's "huge machine crisis" has anything to do with any of the problems I mentioned above?

[edited by: tedster at 5:03 pm (utc) on May 3, 2006]
[edit reason] fix side scroll potential [/edit]

 

legallyBlind

5+ Year Member



 
Msg#: 34147 posted 1:51 pm on May 5, 2006 (gmt 0)

I'm sorry, but what kind of a publicly traded technology company is capable of not monitoring it's storage needs and increasing it as needed with time. Why wait for the last minute and then tell everybody: sorry guys we ran out of hard drive space. Don't they still have programmers working on the search algorithm anymore to see that coming?

JuniorOptimizer

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34147 posted 1:57 pm on May 5, 2006 (gmt 0)

I think Blogger did'em in.

drall

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34147 posted 2:24 pm on May 5, 2006 (gmt 0)

You have to admit this is pretty funny. A company that is solely in business to store data runs out of room? Lol thats like Ford running out of metal or McDonalds running out of hamburger.

joeduck

10+ Year Member



 
Msg#: 34147 posted 2:44 pm on May 5, 2006 (gmt 0)

35 trillion IP enabled devices...potentially

That seems very high. This would be about 5,000 devices for every person on earth.

But a key question about content is how much of it is worthy of indexing? Seems to me the best search applications will be those that know what NOT to index in the first place rather than those that try to index everything and then sort it out later.

europeforvisitors



 
Msg#: 34147 posted 2:58 pm on May 5, 2006 (gmt 0)

I'm sorry, but what kind of a publicly traded technology company is capable of not monitoring it's storage needs and increasing it as needed with time. Why wait for the last minute and then tell everybody: sorry guys we ran out of hard drive space

IMHO, it's usually a mistake to take hyperbolic remarks literally.

If the guy had said "We're getting killed by our electricity supplier," we'd probably see a thread here with the title: "Google staff electrocuted, bodies pile up at the plex." :-)

theBear

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34147 posted 3:03 pm on May 5, 2006 (gmt 0)

IPv6 addresses are 128 bits long, or four times the size of IPv4 addreses.

The theoretical number of IPv6 addresses, about 3.4 × 10^38, is almost unimaginably large.

Me thinks that is larger than 35 trillion by a few.

decaff

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34147 posted 5:58 pm on May 5, 2006 (gmt 0)

Thanks Bear...
I was fascinated when I first read about IPv6 several years back...and the number of IP numbers that can be assigned to devices is staggering....everything can be connected to the web ... and I mean everything...(embedded, rfid chips, sites, cars,..watches, cell phones each with its own IP number...the possibilities are staggering...so are the problems...
I have seen several different numbers quoted on this...

Example of an IPv6 address:
1080:0000:0000:0000:0000:0034:0000:417A

Here's the math:
2128, or about 3.403 × 1038 unique host interface addresses. That translates into 340,282,366,920,938,463,463,374,607,431,768,211,456 addresses.

39 digits equals 1000 sextillion

IPv6 goes officially live in 2008 (though there is already an active IPv6 network in place....and all Linux dists support this protocol currently..)

LuckyGuy

5+ Year Member



 
Msg#: 34147 posted 6:19 pm on May 5, 2006 (gmt 0)

Hey everyone,

Matt Cutts did a post to the dropped pages today.

[mattcutts.com...]

He says all is fine! I think its a lie.

europeforvisitors



 
Msg#: 34147 posted 6:32 pm on May 5, 2006 (gmt 0)

Matt Cutts did a post to the dropped pages today.
He says all is fine! I think its a lie.

I can't find that post. Can you identify it by timestamp?

LuckyGuy

5+ Year Member



 
Msg#: 34147 posted 6:34 pm on May 5, 2006 (gmt 0)

This is the post Matt Cutts Post on his blog:

maxD, last week when I checked there was a double-digit number of reports to the email address that GoogleGuy gave (bostonpubcon2006 [at] gmail.com with the subject line of “crawlpages”).

I asked someone to read through them in more detail and we looked at a few together. I feel comfortable saying that participation in Sitemaps is not causing this at all. One factor I saw was that several sites had a spam penalty and should consider doing a reinclusion request (I might do it through the webmaster console) but even that wasn’t a majority. There were a smattering of other reasons (one site appears to have changed its link structure to use more JavaScript), but I didn’t notice any definitive cause so far.

There will be cases where Bigdaddy has different crawl priorities, so that could partly account for things. But I was in a meeting on Wednesday with crawl/index folks, and I mentioned people giving us feedback about this. I pointed them to a file with domains that people had mentioned, and pointed them to the gmail account so that they could read the feedback in more detail.

So my (shorter) answer would be that if you’re in a potentially spammy area, you might consider doing a reinclusion request–that won’t hurt. In the mean time, I am asking someone to go through all the emails and check domains out. That person might be able to reply to all emails or just a sampling, but they are doing some replies, not only reading the feedback.

Sounds not so great

europeforvisitors



 
Msg#: 34147 posted 6:45 pm on May 5, 2006 (gmt 0)

I saw that post, but it didn't say anything about everything being fine, so I'm still wondering what LuckyGuy thinks Mr. Cutts was lying about.

LuckyGuy

5+ Year Member



 
Msg#: 34147 posted 6:48 pm on May 5, 2006 (gmt 0)

europforvisitors,

hes says that the posted sides are either have a spam penalty or its the bot that has a different way of spidering. Conclusion:all is fine!

europeforvisitors



 
Msg#: 34147 posted 7:28 pm on May 5, 2006 (gmt 0)

That may be your conclusion, but even if it is, accusing him of "lying" (when the conclusion is in your head) seems a bit over the top.

Given the amount of nastiness and bile that people like Matt Cutts and GoogleGuy have to take from unhappy Webmasters, it's a wonder they're willing to communicate at all.

StupidScript

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34147 posted 11:27 pm on May 5, 2006 (gmt 0)

Reading the NYTimes article through to the end reminds us of the true issue at hand, regardless of how some people think the storage crunch at Google seems to point to an operations snafu.

As John Battelle, the editor of SearchBlog, stated:

"In the long run, it's about whether you have the best service."

Those of you who are pointing to recent shifts in the algo results as "evidence" of a storage problem would do well to look back through the recent past and remember that there is a shuffling every time algo changes are made, and we just experienced one, and that common sense tells us repeatedly to wait until things stabilize at all of the data centers before freaking out.

Google does not disclose technical details, but estimates of the number of computer servers in its data centers range up to a million.

Before spending an additional $1.5 billion on more.

Has anyone EVER ... in the history of the Earth ... tried to manage a project like that? Anyone? No, you haven't, and all of your wisdom on that topic (predicting need, rolling out infrastructure, etc.) is pretty frail in the face of the simply staggering numbers that this enterprising company is dealing with.

YOU GO, GOOGLE! ROCK OUR WORLD!

trinorthlighting

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 34147 posted 12:39 am on May 6, 2006 (gmt 0)

Before they buy new servers, they just need to clean up the index a bit. There is no reason to store very old and outdated data. Google knows this and I am sure they are working on it.

Start by cached pages from ebay auctions from a year ago.....

legallyBlind

5+ Year Member



 
Msg#: 34147 posted 3:13 am on May 6, 2006 (gmt 0)

Very Simple. Google has become an ad agency, their main goal is to collect and store old data. Analysis on old data can help search results increase ppc revenue.

Free search is free and you get what you paid for, that's all. It's time for all of us to wake up, smell the coffee and admit that google has become a global advertisement agency. Free / organic searches are for collecting stats and demographics.

We are a bit crazed with Google are we not? Is this normal?

Stefan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34147 posted 3:15 am on May 6, 2006 (gmt 0)

Has anyone EVER ... in the history of the Earth ... tried to manage a project like that? Anyone? No, you haven't, and all of your wisdom on that topic (predicting need, rolling out infrastructure, etc.) is pretty frail in the face of the simply staggering numbers that this enterprising company is dealing with.

Well yes, they've assembled the largest pile of internet garbage to date, and yes the numbers involved with that giant pile are truly staggering. But the thing about garbage is - even if you sort it carefully, and wash it before piling it, in the end it's still just garbage.

And EFV, I mostly agree with your view on all of this, but remember that Matt/GG/whomever are not posting those comments here and in blogs because they care about us webmasters - they're doing it as part of a PR operation for a giant, profit-minded company, and it's in their financial interest to do so.

cabowabo

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 34147 posted 3:22 am on May 6, 2006 (gmt 0)

Google will get huge discounts, though.

You are obviously not from the Bay Area. PG&E gives discounts to no one.

Cheers,

CaboWabo

simonmc

5+ Year Member



 
Msg#: 34147 posted 11:58 am on May 6, 2006 (gmt 0)


YOU GO, GOOGLE! ROCK OUR WORLD!

They are certainly rocking worlds. I will give you that observation.

mattg3

WebmasterWorld Senior Member 5+ Year Member



 
Msg#: 34147 posted 12:11 pm on May 6, 2006 (gmt 0)

You are obviously not from the Bay Area. PG&E gives discounts to no one.

No I am not, but I have an aunt that lives there.
G hopefully does not have data centers only in major earth quake areas ... :O

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34147 posted 5:57 pm on May 6, 2006 (gmt 0)

>> Before they buy new servers, they just need to clean up the index a bit. <<

For the last few days, on the "experimental" DC, the erroneous search that previously returned 900 vague supplemental results (that didn't match the search query) instead of just a few dozen relevant supplemental results (for deleted pages and expired domains), occasionally returned zero results - which is the correct result if Google ever cleaned up the old supplementals - for a phone number that has been completely removed from the web during the last few years.

Today, many DCs return zero results every time for this and several other similar queries for stuff that Google should have cleaned up long ago.

Now, is this a DC that has been cleaned up of old Supplemental Results, or is it a DC that has the Supplemental data missing and Google is going to add it back in again, in the next few days?

Time will tell.

.

A large website whose domain expired two weeks ago, had 12 000 pages listed, many of them supplemental for the last few years. The root shows a "domain expired" message. All other pages are gone from the site.

Google reindexed the site and overnight the number of listed pages has been reduced to under 100 on the "experimental" DC. It seems like Google is aggressively throwing away old data, whereas before they would have held on to it for years and years...

On the old "normal" DCs, Google still shows 12 000 pages listed.

bradical

5+ Year Member



 
Msg#: 34147 posted 8:02 am on May 9, 2006 (gmt 0)

Time to invest in Rackable Systems.

ulysee

10+ Year Member



 
Msg#: 34147 posted 2:52 pm on May 9, 2006 (gmt 0)

So basically some links are not being counted some pages are not being added and some webmasters are bleep out of luck.

heisje

5+ Year Member



 
Msg#: 34147 posted 3:36 pm on May 9, 2006 (gmt 0)

.

statistics from over 50 domains show that the volume of crawling / indexing by Google has been a fraction of that by Yahoo, Ask, MSN (individually) - since june 2005.

does not smell nice, does it now . . .

heisje

.

optimist

10+ Year Member



 
Msg#: 34147 posted 4:02 pm on May 9, 2006 (gmt 0)

50 domains does not sound like alot, where is ths data from?

Ellio

5+ Year Member



 
Msg#: 34147 posted 5:07 pm on May 9, 2006 (gmt 0)

Since we were "fixed" in Big Daddy we have seen Googlebot Mozilla overtake the MSN and Yahoo bots to take No.1 spot.

The site is now fully spidered every day. Whether this equal to TrustRank status or not who knows? We are PR6.

rbacal



 
Msg#: 34147 posted 7:44 pm on May 9, 2006 (gmt 0)

Maybe their capacity plans didn't allow for a flood of multimillion-page, template-based sites from Webmaster World members. :-)

I know you're kidding, but in all seriousness, can there be any doubt that the adsense program has probably been THE major contributor to the development of millions of useless websites and pages that have to be indexed by googlesearch?

Oops. Talk about the law of unintended consequences.

whoisgregg

WebmasterWorld Senior Member whoisgregg us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34147 posted 7:49 pm on May 9, 2006 (gmt 0)

the adsense program has probably been THE major contributor

SE spammers were already building the massive throw away sites well before Adsense. Any efficient monetization program would have had the same effect of increasing the quantity of those types of sites. (Regardless of whether it was Google's program or not.)

Point in case, the MFA sites are as much a problem for Yahoo and MSN (if not more so) then they are for Google.

g1smd

WebmasterWorld Senior Member g1smd us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 34147 posted 9:38 pm on May 9, 2006 (gmt 0)

Hmm, I added the phone number to a page that has never had that phone number on it before; and in the experimental DCs that page now appears in the index after just one day, with the telephone number showing in the snippet, but pointing to a 6 day old cache that pre-dates the edit.

In the older "BigDaddy" datacentres that page has failed to appear in the index for that search term (but still appears for the other search terms that it has ranked for, for the last 2 years).

I am guessing that those versions of the "BigDaddy" index are not being maintained, and will be phased out soon. Those "BigDaddy" datacentres usually return a higher number of pages, but are littered with ancient supplemental results.

In contrast, the "experimental" datacentres show a lower number of fully-indexed pages, but all of the supplmental results from before 2005 June are now gone. In their place some sites now show supplemental results from 2 to 10 months old instead. Supplemental results are for pages that no longer exist, or are the ghost of the content of pages that still exist but the supplemental results represent the previous version of the content of those pages.

Ellio

5+ Year Member



 
Msg#: 34147 posted 9:44 pm on May 9, 2006 (gmt 0)

g1smd,

Remind me of a few of the "New Big Daddy" DC's so I can do some comparisons.

Thanks

kk5st

10+ Year Member



 
Msg#: 34147 posted 11:43 pm on May 10, 2006 (gmt 0)

With all the conjecture over hardware requirements and their pitfalls, it might be of interest to get the straight dope right from the horse's mouth, to butcher a metaphor.

[zdnet.co.uk...] and [zdnet.co.uk...]

cheers,

gary

This 183 message thread spans 7 pages: < < 183 ( 1 2 3 4 5 [6] 7 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved