| 1:51 pm on May 5, 2006 (gmt 0)|
I'm sorry, but what kind of a publicly traded technology company is capable of not monitoring it's storage needs and increasing it as needed with time. Why wait for the last minute and then tell everybody: sorry guys we ran out of hard drive space. Don't they still have programmers working on the search algorithm anymore to see that coming?
| 1:57 pm on May 5, 2006 (gmt 0)|
I think Blogger did'em in.
| 2:24 pm on May 5, 2006 (gmt 0)|
You have to admit this is pretty funny. A company that is solely in business to store data runs out of room? Lol thats like Ford running out of metal or McDonalds running out of hamburger.
| 2:44 pm on May 5, 2006 (gmt 0)|
|35 trillion IP enabled devices...potentially |
That seems very high. This would be about 5,000 devices for every person on earth.
But a key question about content is how much of it is worthy of indexing? Seems to me the best search applications will be those that know what NOT to index in the first place rather than those that try to index everything and then sort it out later.
| 2:58 pm on May 5, 2006 (gmt 0)|
|I'm sorry, but what kind of a publicly traded technology company is capable of not monitoring it's storage needs and increasing it as needed with time. Why wait for the last minute and then tell everybody: sorry guys we ran out of hard drive space |
IMHO, it's usually a mistake to take hyperbolic remarks literally.
If the guy had said "We're getting killed by our electricity supplier," we'd probably see a thread here with the title: "Google staff electrocuted, bodies pile up at the plex." :-)
| 3:03 pm on May 5, 2006 (gmt 0)|
IPv6 addresses are 128 bits long, or four times the size of IPv4 addreses.
The theoretical number of IPv6 addresses, about 3.4 × 10^38, is almost unimaginably large.
Me thinks that is larger than 35 trillion by a few.
| 5:58 pm on May 5, 2006 (gmt 0)|
I was fascinated when I first read about IPv6 several years back...and the number of IP numbers that can be assigned to devices is staggering....everything can be connected to the web ... and I mean everything...(embedded, rfid chips, sites, cars,..watches, cell phones each with its own IP number...the possibilities are staggering...so are the problems...
I have seen several different numbers quoted on this...
Example of an IPv6 address:
Here's the math:
2128, or about 3.403 × 1038 unique host interface addresses. That translates into 340,282,366,920,938,463,463,374,607,431,768,211,456 addresses.
39 digits equals 1000 sextillion
IPv6 goes officially live in 2008 (though there is already an active IPv6 network in place....and all Linux dists support this protocol currently..)
| 6:19 pm on May 5, 2006 (gmt 0)|
Matt Cutts did a post to the dropped pages today.
He says all is fine! I think its a lie.
| 6:32 pm on May 5, 2006 (gmt 0)|
|Matt Cutts did a post to the dropped pages today. |
He says all is fine! I think its a lie.
I can't find that post. Can you identify it by timestamp?
| 6:34 pm on May 5, 2006 (gmt 0)|
This is the post Matt Cutts Post on his blog:
maxD, last week when I checked there was a double-digit number of reports to the email address that GoogleGuy gave (bostonpubcon2006 [at] gmail.com with the subject line of “crawlpages”).
There will be cases where Bigdaddy has different crawl priorities, so that could partly account for things. But I was in a meeting on Wednesday with crawl/index folks, and I mentioned people giving us feedback about this. I pointed them to a file with domains that people had mentioned, and pointed them to the gmail account so that they could read the feedback in more detail.
So my (shorter) answer would be that if you’re in a potentially spammy area, you might consider doing a reinclusion request–that won’t hurt. In the mean time, I am asking someone to go through all the emails and check domains out. That person might be able to reply to all emails or just a sampling, but they are doing some replies, not only reading the feedback.
Sounds not so great
| 6:45 pm on May 5, 2006 (gmt 0)|
I saw that post, but it didn't say anything about everything being fine, so I'm still wondering what LuckyGuy thinks Mr. Cutts was lying about.
| 6:48 pm on May 5, 2006 (gmt 0)|
hes says that the posted sides are either have a spam penalty or its the bot that has a different way of spidering. Conclusion:all is fine!
| 7:28 pm on May 5, 2006 (gmt 0)|
That may be your conclusion, but even if it is, accusing him of "lying" (when the conclusion is in your head) seems a bit over the top.
Given the amount of nastiness and bile that people like Matt Cutts and GoogleGuy have to take from unhappy Webmasters, it's a wonder they're willing to communicate at all.
| 11:27 pm on May 5, 2006 (gmt 0)|
Reading the NYTimes article through to the end reminds us of the true issue at hand, regardless of how some people think the storage crunch at Google seems to point to an operations snafu.
As John Battelle, the editor of SearchBlog, stated:
"In the long run, it's about whether you have the best service."
Those of you who are pointing to recent shifts in the algo results as "evidence" of a storage problem would do well to look back through the recent past and remember that there is a shuffling every time algo changes are made, and we just experienced one, and that common sense tells us repeatedly to wait until things stabilize at all of the data centers before freaking out.
|Google does not disclose technical details, but estimates of the number of computer servers in its data centers range up to a million. |
Before spending an additional $1.5 billion on more.
Has anyone EVER ... in the history of the Earth ... tried to manage a project like that? Anyone? No, you haven't, and all of your wisdom on that topic (predicting need, rolling out infrastructure, etc.) is pretty frail in the face of the simply staggering numbers that this enterprising company is dealing with.
YOU GO, GOOGLE! ROCK OUR WORLD!
| 12:39 am on May 6, 2006 (gmt 0)|
Before they buy new servers, they just need to clean up the index a bit. There is no reason to store very old and outdated data. Google knows this and I am sure they are working on it.
Start by cached pages from ebay auctions from a year ago.....
| 3:13 am on May 6, 2006 (gmt 0)|
Very Simple. Google has become an ad agency, their main goal is to collect and store old data. Analysis on old data can help search results increase ppc revenue.
Free search is free and you get what you paid for, that's all. It's time for all of us to wake up, smell the coffee and admit that google has become a global advertisement agency. Free / organic searches are for collecting stats and demographics.
We are a bit crazed with Google are we not? Is this normal?
| 3:15 am on May 6, 2006 (gmt 0)|
|Has anyone EVER ... in the history of the Earth ... tried to manage a project like that? Anyone? No, you haven't, and all of your wisdom on that topic (predicting need, rolling out infrastructure, etc.) is pretty frail in the face of the simply staggering numbers that this enterprising company is dealing with. |
Well yes, they've assembled the largest pile of internet garbage to date, and yes the numbers involved with that giant pile are truly staggering. But the thing about garbage is - even if you sort it carefully, and wash it before piling it, in the end it's still just garbage.
And EFV, I mostly agree with your view on all of this, but remember that Matt/GG/whomever are not posting those comments here and in blogs because they care about us webmasters - they're doing it as part of a PR operation for a giant, profit-minded company, and it's in their financial interest to do so.
| 3:22 am on May 6, 2006 (gmt 0)|
|Google will get huge discounts, though. |
You are obviously not from the Bay Area. PG&E gives discounts to no one.
| 11:58 am on May 6, 2006 (gmt 0)|
YOU GO, GOOGLE! ROCK OUR WORLD!
They are certainly rocking worlds. I will give you that observation.
| 12:11 pm on May 6, 2006 (gmt 0)|
|You are obviously not from the Bay Area. PG&E gives discounts to no one. |
No I am not, but I have an aunt that lives there.
G hopefully does not have data centers only in major earth quake areas ... :O
| 5:57 pm on May 6, 2006 (gmt 0)|
>> Before they buy new servers, they just need to clean up the index a bit. <<
For the last few days, on the "experimental" DC, the erroneous search that previously returned 900 vague supplemental results (that didn't match the search query) instead of just a few dozen relevant supplemental results (for deleted pages and expired domains), occasionally returned zero results - which is the correct result if Google ever cleaned up the old supplementals - for a phone number that has been completely removed from the web during the last few years.
Today, many DCs return zero results every time for this and several other similar queries for stuff that Google should have cleaned up long ago.
Now, is this a DC that has been cleaned up of old Supplemental Results, or is it a DC that has the Supplemental data missing and Google is going to add it back in again, in the next few days?
Time will tell.
A large website whose domain expired two weeks ago, had 12 000 pages listed, many of them supplemental for the last few years. The root shows a "domain expired" message. All other pages are gone from the site.
Google reindexed the site and overnight the number of listed pages has been reduced to under 100 on the "experimental" DC. It seems like Google is aggressively throwing away old data, whereas before they would have held on to it for years and years...
On the old "normal" DCs, Google still shows 12 000 pages listed.
| 8:02 am on May 9, 2006 (gmt 0)|
Time to invest in Rackable Systems.
| 2:52 pm on May 9, 2006 (gmt 0)|
So basically some links are not being counted some pages are not being added and some webmasters are bleep out of luck.
| 3:36 pm on May 9, 2006 (gmt 0)|
statistics from over 50 domains show that the volume of crawling / indexing by Google has been a fraction of that by Yahoo, Ask, MSN (individually) - since june 2005.
does not smell nice, does it now . . .
| 4:02 pm on May 9, 2006 (gmt 0)|
50 domains does not sound like alot, where is ths data from?
| 5:07 pm on May 9, 2006 (gmt 0)|
Since we were "fixed" in Big Daddy we have seen Googlebot Mozilla overtake the MSN and Yahoo bots to take No.1 spot.
The site is now fully spidered every day. Whether this equal to TrustRank status or not who knows? We are PR6.
| 7:44 pm on May 9, 2006 (gmt 0)|
|Maybe their capacity plans didn't allow for a flood of multimillion-page, template-based sites from Webmaster World members. :-) |
I know you're kidding, but in all seriousness, can there be any doubt that the adsense program has probably been THE major contributor to the development of millions of useless websites and pages that have to be indexed by googlesearch?
Oops. Talk about the law of unintended consequences.
| 7:49 pm on May 9, 2006 (gmt 0)|
|the adsense program has probably been THE major contributor |
SE spammers were already building the massive throw away sites well before Adsense. Any efficient monetization program would have had the same effect of increasing the quantity of those types of sites. (Regardless of whether it was Google's program or not.)
Point in case, the MFA sites are as much a problem for Yahoo and MSN (if not more so) then they are for Google.
| 9:38 pm on May 9, 2006 (gmt 0)|
Hmm, I added the phone number to a page that has never had that phone number on it before; and in the experimental DCs that page now appears in the index after just one day, with the telephone number showing in the snippet, but pointing to a 6 day old cache that pre-dates the edit.
In the older "BigDaddy" datacentres that page has failed to appear in the index for that search term (but still appears for the other search terms that it has ranked for, for the last 2 years).
I am guessing that those versions of the "BigDaddy" index are not being maintained, and will be phased out soon. Those "BigDaddy" datacentres usually return a higher number of pages, but are littered with ancient supplemental results.
In contrast, the "experimental" datacentres show a lower number of fully-indexed pages, but all of the supplmental results from before 2005 June are now gone. In their place some sites now show supplemental results from 2 to 10 months old instead. Supplemental results are for pages that no longer exist, or are the ghost of the content of pages that still exist but the supplemental results represent the previous version of the content of those pages.
| 9:44 pm on May 9, 2006 (gmt 0)|
Remind me of a few of the "New Big Daddy" DC's so I can do some comparisons.
| 11:43 pm on May 10, 2006 (gmt 0)|
With all the conjecture over hardware requirements and their pitfalls, it might be of interest to get the straight dope right from the horse's mouth, to butcher a metaphor.
[zdnet.co.uk...] and [zdnet.co.uk...]
| This 183 message thread spans 7 pages: < < 183 ( 1 2 3 4 5  7 ) > > |