Forum Moderators: open

Message Too Old, No Replies

The cost of crawling...

Google's lost revenue may have slowed things down a bit

         

stcrim

3:19 pm on Feb 29, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's a lot of talk about Google not crawling or not crawling much. And they are certainly not looking hard at new sites.

The answer may be as simple as cost. Now that Yahoo is doing their own thing, Google may have to conserve resources at bit.

-s-

mayor

8:03 pm on Mar 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



stcrim >> Has anyone put up a new site or new pages in a new directory in the last 14 days that has been crawled and included by Google?

I've put two new sites on the web and added new pages to an existing site in the past 21 days and only the index of page of the new sites was crawled and put in the serps. None of the new pages have been included in the Google serps.

They all have links for Googlebot to crawl to find them.

This behavior has happened before, like last August, for me, but most of the time new sites or pages have been getting into the Google serps within 3 weeks of being put on the web.

Maybe our geographic location has an effect. Like maybe our Google local service data center is assigned other tasks or overloaded at the moment.

My sites are hosted in the Northeast. How about yours, stcrim? See any patterns?

Liane

8:13 pm on Mar 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeah ... my index is crawled about every other day and freshed, and second level pages get crawled about every three or four days. Past that, it is now taking a long time (weeks) for deeper pages to be crawled and included.

woop01

8:21 pm on Mar 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



They have all but stopped crawling news sites over the past 2 weeks. And they have all but stopped looking at new pages in new directories on old sites.

I've got a new site (purchased the domain from a squatter January 1st), PR1, that has been hit for a minimum of 1,000 page views per day by Google over the past two weeks. Google has registered 19,706 hits during that period.

stcrim

8:21 pm on Mar 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



My sites are being hosted in the Mountain Time Zone in the midwest

-s-

SEOPTI

8:46 pm on Mar 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Right now it is this way:

1) Brand new sites get into the index for 9 days. After this period they disappear from the index.

It doesn't matter how many quality links you have, they will be trashed until the next update.

2) Old sites - major spider slowdown.

johnser

9:18 pm on Mar 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



1,200 fetched by Ask Jeeves/Teoma today
650 Gbot...

Is this a sign of things to come?

Am seeing major Gbot changes across multiple independent sites.

Several PR 4/5 sites not crawled for several weeks, other pages rebuilt 1 month ago were crawled deeply last & prev Mon yet still showing very old pages in cache.

J

grnidone

10:25 pm on Mar 1, 2004 (gmt 0)



Stcrim:

New site with 15K pages. Launched a little more than 2 weeks ago and I can't get the thing crawled to save my soul.

Wierd thing is, it is crawled by Mediaplex, but not by Googlebot.

Fischerlaender

12:01 am on Mar 2, 2004 (gmt 0)

10+ Year Member



Coming back to the original question as stated in the thread title: The cost of crawling.

It is just a simple arithmetic problem: Google's index contains 4,000,000,000 documents. Let's assume that they need to crawl 50% more, so there are 6,000,000,000 pages to crawl. The average HTML document contains about 7 KB - a number you can find in several papers. Google is also indexing some bigger PDF files, so we're using 10 KB per page as a rough number. My calc.exe tells me, that this gives 60,000 GB per crawl. I don't know the price per GB in the US, but here in Germany you get it for about $1.

Finally we'll find that a complete crawl would cost Google about $60,000 - this is nothing for a company heading for an IPO worth several billions of dollars!

Whatever (if so) Google causes to slow down its crawling, it has nothing to do with traffic costs.

mr_strong

11:30 am on Mar 2, 2004 (gmt 0)

10+ Year Member



New site with 15K pages. Launched a little more than 2 weeks ago and I can't get the thing crawled to save my soul.

Wierd thing is, it is crawled by Mediaplex, but not by Googlebot.

That's exactly what I'm seeing. A site launched on the 15th Feb, with several good incoming links has only had its index page indexed.

I am starting to panic a bit I must admit. I am wondering whether to contact Google to check if there's a problem.

Maybe Googleguy could advise?

europeforvisitors

11:44 am on Mar 2, 2004 (gmt 0)



I'd published a few stories several days ago that weren't yet in the Google index. On Monday morning I linked to them directly from the home page; they were in the index by Monday night, while other pages that didn't have homepage links were still waiting.
This 40 message thread spans 2 pages: 40