Google never index my site more than one level deep

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google never index my site more than one level deep

Should I rotate deep pages from my home page to get them all indexed?

lgn1

9:11 pm on Apr 4, 2006 (gmt 0)

I currently have a PR5 home page with 36 links to PR3 Tier 1 pages. And Google has been crawling, indexing and updateing these pages since my site went from dynamic to static looking (using URL rewrites) in Mid Jan of this year.

The whole site is 7 years old, but everthing was supplemental other than the home page until Jan., due the complex dynamic URL issues

Google however crawls but refuses to index my 500 or so second tier pages (2 deep), and it has been 10 weeks.

Since Google will quickly (less than a week) pick up changes on my home page, I was thinking of rotating my deep pages (say 10-15) at a time off my home page, rather than waiting for the indexing of the deep pages.

I presume once my internal pages are indexed, they will stay in the index.

Should I just wait and pray for a deep crawl or is it better to do some work on the dock, while waiting for my Google ship to come in?

Does anybody see a down side to this deep page rotation on the home page.

lgn1

3:35 pm on Apr 5, 2006 (gmt 0)

Im just thinking. Now that the new PR update is done. My first tier pages went from PR0 to PR3. Now would this mean that my second tier pages will now be worthy to index, since the tier above now has PR.

Since Google does not index PR0 pages, or take a long time to do so, would this mean that it would take a PR update to give each succesivily deep level of pages indexed.

Reid

5:17 am on Apr 6, 2006 (gmt 0)

My site was deep crawled before it ever got any PR. It may be a code problem. Somehow googlebot can't find the second tier - do you have a sitemap? A good rule of thumb is that no page should be more than 2 clicks away from the homepage. Make the sitemap static old-school - if bot can find the sitemap it will find all the URL's

netchicken1

6:52 am on Apr 6, 2006 (gmt 0)

Create a sitemap for google.

lgn1

12:43 pm on Apr 6, 2006 (gmt 0)

Im using static URL's, and my 2nd Tier pages have been crawled for months, however they just never show up in the index. There is no code problems.

I was thinking of using a sitemap, but I have heard some horror stories about google messing up site maps, and thought that rotating deep pages on the home page may be safer.

trinorthlighting

12:57 pm on Apr 6, 2006 (gmt 0)

If you want googlebot to visit a page, here is a quick trick, go to the google adwords keyword tool. Click on the site related keywords tab, cut and paste the link you want googlebot to index and submit it. After that, check your webstats and you will see that googlebot has visited the page.

vsurlan

1:15 pm on Apr 6, 2006 (gmt 0)

Doesnt adwords tutor say that the bot that visits the page when you submit it for adwords is totally different than the standard page indexing bot?

trinorthlighting

2:24 pm on Apr 6, 2006 (gmt 0)

Yes, but its mozilla bot. I see the same IP bot visiting my site all the time and I do not use adwords. A bot is a bot and if its out there, I am more than sure google uses the bot to the fullest advantages...

lgn1

10:29 pm on Apr 6, 2006 (gmt 0)

The following is a code snippet from my access log (with filenames changed to protect my anominmity).

I use URL rewrites to make my dynamic site look static.

So when google grabs my dynamic URL, its gets a 301 redirect to my static looking URL (using mod rewrite).

Does everything look proper, or does anybody sees a problem with this.

----

66.249.72.129 - - [12/Mar/2006:00:18:16 +0000]
"GET /cgi-bin/script.pl?page=widget.html&cart_id=yyy HTTP/1.1" 301 600 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

66.249.72.129 - - [12/Mar/2006:00:18:17 +0000]
"GET /widget.html HTTP/1.1" 200 15145 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

----

Reid

12:41 am on Apr 8, 2006 (gmt 0)

So your site is not static it's actually dynamic but 'looks static'.

Have you tried a googlebot emulator to see how it handles your site?

I'm not a big fan of google sitemaps yet. My philosopy is with software stay up to date but with the web stay a few years back. Let the rest of the world beta test sitemaps for a while.

One good oldfashioned sitemap with a static link from your homepage and static or 'the real deal' links on the sitemap page will bypass any anomalies that googlebot may be facing.
remember googlebot is not a browser it is just a bot. Have you tried googles suggestion and looked at your site through a Lynx browser? You may be surprised at how slight a code change will be invisible to browsers but will stop a bot in it's tracks.

These pages may have been requested for years but there must be a reason they are not in the index.

One thing I found out - just as an example
<a href="blahblah is quite different than
<a href= "blahblah

Halfdeck

4:47 am on Apr 8, 2006 (gmt 0)

So when google grabs my dynamic URL, its gets a 301 redirect to my static looking URL (using mod rewrite).

This may sound like a dumb question, but are your internal links pointing to dynamic or static URLs?

I personally rewrite static to dynamic without r=301 and make the dynamic url completely inaccessible.

ronburk

5:29 am on Apr 8, 2006 (gmt 0)

Create a sitemap for google.

Or, you could google for this:

matt cutts "classic crawling strategies"

and then Ctrl-F to the "classic crawling strategies" and read his reply to roughly the same question, which might quite discourage you from hoping that a sitemap will do anything to address the problem.

And then was spun a wondrous take about how

A bot is a bot

and therefore, despite the fact that Google assures us that the Mediapartners bot does not share data with Googlebot, just getting a visit from the AdWords bot must get you indexed. This odd idea is probably bolstered by being unable to imagine how simple it is for two different, completely unrelated robots to share the same IP address.

So, I goes to look in me logs, and with only a little poking around I am able to isolate a URL that was visited by the Mediapartners bot about a month ago, but has never been visited by Googlebot. Sure enough, it is not yet indexed. Mediapartners!= Googlebot, just as Google assures us.

The problem of getting crawled is strewn with people's completely unscientific, unproven, and (as it usually turns out) flat-out wrong theories. Eventually, most pages do get crawled. When they do, people are prone to look at any significant event that occurs in close proximity to that event and (wrongly) conclude there is a causal relationship.

Superstition ain't the way.

lgn1

6:45 pm on Apr 8, 2006 (gmt 0)

My URL's are actually static, however I run them thru a script to manage the shopping cart variables and for geographic targeting, etc.

I actually now know what google has been doing.

My .htaccess file has been doing two things:

1) converting static url to the dynamic form for my script

2) converting the old dynamic form to static, for all those thousand of supplemental dynamic urls that has been their for years. (I did this, to catch the users that happen to click on one of these supplementals, rather than give them a 404 error)

Of course I had some conditionals to prevent 1 & 2 from going into the infinite loop.

So for the past two months, google has been trying to perform 2) thus rescanning my supplementals and ignoring them.

In the past few days, I see that my old supplementals are now gone (hurray), and hopefully in the next few weeks will follow the proper links and crawl mysite as static URL's (without making them supplementals).

Reid

1:41 am on Apr 10, 2006 (gmt 0)

"A bot is just a bot" Meaning that browsers can be very forgiving with HTML soup. They may just be thrown into glitch mode or may attempt to interpret erroneus code in one way or another where a bot would just hang.

rkhare

7:19 pm on Apr 10, 2006 (gmt 0)

ouch, just noticed that this is a problem with me too, my .htaccess contains nothing but 301's, need to find out why its happening to me