I never get a deep crawl by Googlebot

Forum Moderators: open

Message Too Old, No Replies

I never get a deep crawl by Googlebot

sorry if this question has been asked a hundred million times!

Typester

4:13 pm on Dec 11, 2002 (gmt 0)

Hi all,

I've looked through the knowledge base and searched for the answer to my problem. I kind of asked this question a while back but because of my newbie-ness, I didn't ask it correctly.

My site has been around since May 2002, indexed by DMOZ late summer. I guess you would consider my site fairly new but I've yet to see a deep-crawl by Googlebot. She comes around sometimes 5 or 6 times a day, grabs 1 or 2 pages and then she's gone.

I use robots.txt but I also use this: meta name="robots" content="all" (redundant/useless?). The robots.txt validates. I also used the spider simulator and everything looks fine to me. I'm kinda stumped.

My site is very small (30 pages) but it sure would be nice to get all of my pages indexed at some point.

Any ideas? Sorry for the redundant question but I know from visiting this forum that patience abounds, although I hope it's not wearing thin with questions like this one!

Typester

ciml

4:35 pm on Dec 11, 2002 (gmt 0)

Hi Typester

The default for Robots Exclusion Protocol is to allow spidering, so I don't see a problem there.

Getting crawled every day is normal for a page in the ODP, but seven months is quite a long time and 30 pages is quite small, so I am surprised that Google doesn't index you fully.

How high is your PageRank and do you have simple URLs (no "?", "&", etc.)?

Calum

Lots0

4:39 pm on Dec 11, 2002 (gmt 0)

Lots of different reasons why Googlebot would act this way, So it is very hard to say exactly why.

My first guess would be your robots.txt file. I don’t believe that the meta tag you mention matters one way or the other for Google. Second thought would be to validate your HTML, some code errors can cause this kind of behavior.

You said you validated your robots.txt, how did you do this? Some validators are better than others.

One thing you might try is to remove all the text from your roobots.txt file and just use a blank file for a month or so, with a small site like yours and wanting all the pages indexed I would not even use a robots.txt, but that is my personal opinion, others may feel differently.

Typester

7:26 pm on Dec 11, 2002 (gmt 0)

I have a pagerank of 5 and I use simple URL's.

The reason I use a robots.txt file is so my logs aren't full of 404 errors. I validated it with the one found at this forum.

To be honest, my life doesn't depend on whether my site is deep-crawled or not, but I keep thinking how much more traffic I'd get if all of my pages were indexed. I don't make my money from my website, but use my site as a *stepping stone* for people to purchase my products (in other words, they go elsewhere to buy my stuff). So, it could be important for me to figure this out!

Thanks for any help, it's appreciated. Any other ideas?

Shakil

7:34 pm on Dec 11, 2002 (gmt 0)

typester,

although the subject is a bit advanced for me, and considering there are gurus who know this stuff are lurking :).

can you share with us the following please:

are your backlinks showing up?
how many pages of your site are actually showing in google?

I base the above question on the fact that you have a PR5 rank.

Shak

Macguru

7:40 pm on Dec 11, 2002 (gmt 0)

Hi Typester,

Please try the following. replace "yoursite" by your actual domain name without "www" or ".com" :

allinurl:yoursite site:www.yoursite.com

So what do we see?

It can happen that people believe that all pages are not all in the index when tey click on "More results from " link in Google serps.

It's because Google makes up the following query "site:www.yoursite.com your query" . And "your query" is not on all pages of site...

Was it the case?

jady

8:33 pm on Dec 11, 2002 (gmt 0)

Try using this tag - <META NAME="ROBOTS" CONTENT="Index,Follow">

Also looking at your site's code, I noticed that your links to other pages just contain /pagename.html. When we experienced a similar problem, we added the WHOLE url and page name to the HREF and got good results.

Worth a shot? Otherwise looks just like our sites are setup - all get deep crawled...

Typester

10:30 pm on Dec 11, 2002 (gmt 0)

Wow, thanks everyone for all the advise! This forum rocks!

ciml, your suggestion (which you stickied to me) revealed all the pages from my site. MacGuru's similar suggestion worked just as well. So, my pages are all there and I can't tell you how happy I am about that. I knew what I wanted to see in the search results, I just didn't know how to find it. I've learned so much today and I appreciate it a lot.

Jady, that also is a good suggestion to add the whole URL. I've read that before and will look into changing that. Hey, it can't hurt, right?

What a bedrock of information this place is, and how willing everyone is to help. Thanks again :)

Typester

Shakil

10:43 pm on Dec 11, 2002 (gmt 0)

typester,

glad your problem is fixed, wish mine were less comlicated.

told you there were gurus lurking around.

anyway there is a magic button round here called the donate button, all weird and wonderful things start happening once you click that (and I am not joking).

Shak