Forum Moderators: open

Message Too Old, No Replies

Deep Crawls at Google

Deepbot ran off with my wife. I sure do miss him...

         

Arnett

2:12 am on Sep 17, 2003 (gmt 0)

10+ Year Member



Now that deepbot has been retired how deeply can we expect to be crawled? I've added thousands of pages to my sites but they are not directly accessible from the home page. The index pages to the new pages not accessible from the home page. By my reckoning,the new pages are "two levels down" from the home page. It's been months and they are still not in Google's index. Googlebot has been all over my site yet they don't appear in the index. When will the new pages be indexed? Will they be added at all?

Marcia

3:00 am on Sep 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Great thread description, Arnett. :)

What's the PR of the index page and the next level in? And another other pages that might be in on what could be considered the third level - what's the PR of those?

Arnett

3:30 am on Sep 17, 2003 (gmt 0)

10+ Year Member



Thanks. It's an old southernism..

The site home page has a PR4. It was a PR5 before Dominic/Esmerelda so I wanted to add the pages to boost PR,traffic and sales. There are thousands of new pages. Each is indexed by an index page with 100 links. The master index is one level below the home page. The master index was already in Google. The new pages have not appeared in the index. I've been waiting since late July.

Arnett

3:41 am on Sep 17, 2003 (gmt 0)

10+ Year Member



Here's a diagram:

Home Page (PR4)-
¦
New Pages Master Index (PR3)-
¦
Index Page X (PR0)-
¦
Any page from any index Page (PR0)

Arnett

3:47 am on Sep 17, 2003 (gmt 0)

10+ Year Member



Here's a diagram:

Home Page (PR4)-
¦
New Pages Master Index (PR3)-
¦
Index Page X (PR0)-
¦
Any page from any index Page (PR0)

In theory,the 5000 pages at the lowest level should have a PR1. They all link to their index page and the Master index. The first 1,000 pages are in Google's index. 4,000 are not. The first 1,000 pages have been in place for over a year. I added 4,000 from June-August.

Arnett

3:53 am on Sep 17, 2003 (gmt 0)

10+ Year Member



Index Page X (PR0)-
¦
Any page from any index Page (PR0)

There are around 50 index pages each with 100 links to individual pages. None of the new pages added appear in Google's index.

Arnett

4:00 am on Sep 17, 2003 (gmt 0)

10+ Year Member



To even further complicate the issue I was told that since the pages that were already in Google's index were addressed with both a virtual hosting url and domain url that I was creating duplicate content penalties with every page added:

#1 [webhost...]
#2 [www...]

My webhost and I spent weeks figuring out the right way to exclude Googlebot from the virtual urls like #1 and to do a permanent redirect to the domain urls #2. Google should have deleted all the #1 type urls and redirected to the #2 urls. This should have forced an add for each virtual type url that was in the index. Once in,all the urls should have been crawled and indexed.

These are all static html files. I've only just started looking into php sites.

Arnett

4:02 am on Sep 17, 2003 (gmt 0)

10+ Year Member



...Then I remembered that DeepBot was gone and that an army of FreshBots had taken over. This was follwed by a period of grief and despair when I realized that at the lower levels,FreshBot may never see my new pages...

Arnett

4:06 am on Sep 17, 2003 (gmt 0)

10+ Year Member



If it were up to me I'd bring DeepBot back from retirement and put him back to work full-time. Things were better for large sites before when they were being crawled and indexed on a regular basis.

Jesse_Smith

5:14 pm on Sep 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



er the depcrawl never retired. The freshbot did. You get deepcrawled daily. Notice your listings don't get dumped after a few days? That's because it's the deepcrawl, not the freshbot.

Arnett

1:39 pm on Sep 18, 2003 (gmt 0)

10+ Year Member



Are you sure? Where are the fresh listings coming from then? I've heard and read that deepbot is gone and so is ip crawling. I'm pretty sure that came from GG.

mipapage

2:21 pm on Sep 18, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



so I wanted to add the pages to boost PR

Arnett, from what I understand, you need to get links to increase PR. Increasing content just gets you more real estate in the search engines.

There are many single page sites out there that have decent PR, and it comes from backlinks. (www.pr10.com, for example)

Netizen

2:49 pm on Sep 18, 2003 (gmt 0)

10+ Year Member



The old DeepBot IP address range was retired and the old FreshBot became DeepFreshMoveOverDarlingBot.

As far as your main problem goes - has GoogleBot spidered all the new pages? And if so, when?

Arnett

4:39 pm on Sep 18, 2003 (gmt 0)

10+ Year Member



As far as your main problem goes - has GoogleBot spidered all the new pages? And if so, when?

My logs don't show the pages that Googlebot spiders. The latest logs show Googlebot with over 10,000 hits to the domain. They may finally be getting around to adding the new pages.

I put an SSI date call in the footer of pages to check the cache image in Google to gauge a crawl date. As I recall from the Pre-Dominic days,if a page has a PR less than 4 it gets visited every 90 days. If it has a PR4 or higher it gets visited more often. This may be the cause of the delay.

Netizen

10:12 am on Sep 19, 2003 (gmt 0)

10+ Year Member



Well, in your original post you said GoogleBot had been all over your site - which is misleading if you don't have a breakdown by page in your logs. Are your logs done by day? Are you saying that GoogleBot hit your site thousands of times on any one day? If so then it probably has spidered the content but when it will appear in the index is anyone's guess lately.

Arnett

5:15 pm on Sep 19, 2003 (gmt 0)

10+ Year Member



Well, in your original post you said GoogleBot had been all over your site

"All over" means that the Googlebot has spidered files in every directory in the site. My logs don't report who accessed the file,just that it has been accessed. Since June I have added 4,000 static pages to the site. I have been watching the record of Googlebot accesses and they have increased monthly right along with the number of new pages added. Even though the number of Googlebot accesses has grown monthly with the number of new pages added NONE of the new pages have been added to the index.

...but when it will appear in the index is anyone's guess lately.

At least you're addressing the actual issue. Thanks for your valuable insight.

Arnett

2:35 am on Sep 20, 2003 (gmt 0)

10+ Year Member



I thought about it some and did a search for "site:www.domain.com domain". The search results header said that there were 3350 pages found. I did another search for "site:domain.com domain". This time the search results header said that there were 7850 pages found. Only 1000 are listed so I checked them out. There were some www. results included.

I put an SSI date call in the footer of my pages. The cache shows dates from late August to early this month. In the past I could tell the spider date by this even though the page didn't show up in the index until the "dance". The pages are getting spidered now and will probably be done entering the index in the next few weeks. Now all I have to wait for is for Google to get around calcuating all the backlinks and to update PR.

Thanks to all of you for your helpful comments.

Arnett

6:32 pm on Sep 20, 2003 (gmt 0)

10+ Year Member



Arnett, from what I understand, you need to get links to increase PR. Increasing content just gets you more real estate in the search engines.

The topic wasn't about PR.

mipapage

9:39 pm on Sep 20, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It was a PR5 before Dominic/Esmerelda so I wanted to add the pages to boost PR,traffic and sales.

Sorry! Maybe not the topic but just trying to help where I could.

Arnett

10:44 pm on Sep 20, 2003 (gmt 0)

10+ Year Member



Sorry! Maybe not the topic but just trying to help where I could.

NP. ALL incoming links contribute to PR whether they are from other pages in the site or from outside the site. Off site links get more weight. Now you know.

Jesse_Smith

12:01 am on Sep 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



:::The topic wasn't about PR.

Next time don't say.....

:::so I wanted to add the pages to boost PR

You mention PR in a Google thread, and others are going to give you an answer about PR. That's like talking about cooking and then say your not talking about food! :)

Arnett

9:28 pm on Sep 21, 2003 (gmt 0)

10+ Year Member



You mention PR in a Google thread, and others are going to give you an answer about PR. That's like talking about cooking and then say your not talking about food! :)

Not one post in this thread solved the problem. I wound up doing it myself. Thanks for all the help.

wmburke

12:43 am on Sep 22, 2003 (gmt 0)

10+ Year Member




As far as I know, and I've got a couple Senior Programmer friends @ Google, the deepbot's alive and well..

What brings on this concurrence it's gone?

plumsauce

2:58 am on Sep 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member





Not one post in this thread solved the problem. I wound up doing it myself.

and the solution was?

Arnett

3:09 am on Sep 22, 2003 (gmt 0)

10+ Year Member



I thought about it some and did a search for "site:www.domain.com domain". The search results header said that there were 3350 pages found. I did another search for "site:domain.com domain". This time the search results header said that there were 7850 pages found. Only 1000 are listed so I checked them out. There were some www. results included.

I put an SSI date call in the footer of my pages. The cache shows dates from late August to early this month. In the past I could tell the spider date by this even though the page didn't show up in the index until the "dance". The pages are getting spidered now and will probably be done entering the index in the next few weeks. Now all I have to wait for is for Google to get around calcuating all the backlinks and to update PR.

Thanks to all of you for your helpful comments.

Arnett

3:11 am on Sep 22, 2003 (gmt 0)

10+ Year Member



As far as I know, and I've got a couple Senior Programmer friends @ Google, the deepbot's alive and well..

What brings on this concurrence it's gone?

A lot of posts have mentioned it includig some from GoogleGuy. You'll have to use the message search to find them.