Forum Moderators: Robert Charlton & goodroi
The whole site is 7 years old, but everthing was supplemental other than the home page until Jan., due the complex dynamic URL issues
Google however crawls but refuses to index my 500 or so second tier pages (2 deep), and it has been 10 weeks.
Since Google will quickly (less than a week) pick up changes on my home page, I was thinking of rotating my deep pages (say 10-15) at a time off my home page, rather than waiting for the indexing of the deep pages.
I presume once my internal pages are indexed, they will stay in the index.
Should I just wait and pray for a deep crawl or is it better to do some work on the dock, while waiting for my Google ship to come in?
Does anybody see a down side to this deep page rotation on the home page.
Since Google does not index PR0 pages, or take a long time to do so, would this mean that it would take a PR update to give each succesivily deep level of pages indexed.
I was thinking of using a sitemap, but I have heard some horror stories about google messing up site maps, and thought that rotating deep pages on the home page may be safer.
I use URL rewrites to make my dynamic site look static.
So when google grabs my dynamic URL, its gets a 301 redirect to my static looking URL (using mod rewrite).
Does everything look proper, or does anybody sees a problem with this.
----
66.249.72.129 - - [12/Mar/2006:00:18:16 +0000]
"GET /cgi-bin/script.pl?page=widget.html&cart_id=yyy HTTP/1.1" 301 600 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
66.249.72.129 - - [12/Mar/2006:00:18:17 +0000]
"GET /widget.html HTTP/1.1" 200 15145 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
----
Have you tried a googlebot emulator to see how it handles your site?
I'm not a big fan of google sitemaps yet. My philosopy is with software stay up to date but with the web stay a few years back. Let the rest of the world beta test sitemaps for a while.
One good oldfashioned sitemap with a static link from your homepage and static or 'the real deal' links on the sitemap page will bypass any anomalies that googlebot may be facing.
remember googlebot is not a browser it is just a bot. Have you tried googles suggestion and looked at your site through a Lynx browser? You may be surprised at how slight a code change will be invisible to browsers but will stop a bot in it's tracks.
These pages may have been requested for years but there must be a reason they are not in the index.
One thing I found out - just as an example
<a href="blahblah is quite different than
<a href= "blahblah
So when google grabs my dynamic URL, its gets a 301 redirect to my static looking URL (using mod rewrite).
This may sound like a dumb question, but are your internal links pointing to dynamic or static URLs?
I personally rewrite static to dynamic without r=301 and make the dynamic url completely inaccessible.
Create a sitemap for google.
Or, you could google for this:
matt cutts "classic crawling strategies"
and then Ctrl-F to the "classic crawling strategies" and read his reply to roughly the same question, which might quite discourage you from hoping that a sitemap will do anything to address the problem.
And then was spun a wondrous take about how
A bot is a bot
and therefore, despite the fact that Google assures us that the Mediapartners bot does not share data with Googlebot, just getting a visit from the AdWords bot must get you indexed. This odd idea is probably bolstered by being unable to imagine how simple it is for two different, completely unrelated robots to share the same IP address.
So, I goes to look in me logs, and with only a little poking around I am able to isolate a URL that was visited by the Mediapartners bot about a month ago, but has never been visited by Googlebot. Sure enough, it is not yet indexed. Mediapartners!= Googlebot, just as Google assures us.
The problem of getting crawled is strewn with people's completely unscientific, unproven, and (as it usually turns out) flat-out wrong theories. Eventually, most pages do get crawled. When they do, people are prone to look at any significant event that occurs in close proximity to that event and (wrongly) conclude there is a causal relationship.
Superstition ain't the way.
I actually now know what google has been doing.
My .htaccess file has been doing two things:
1) converting static url to the dynamic form for my script
2) converting the old dynamic form to static, for all those thousand of supplemental dynamic urls that has been their for years. (I did this, to catch the users that happen to click on one of these supplementals, rather than give them a 404 error)
Of course I had some conditionals to prevent 1 & 2 from going into the infinite loop.
So for the past two months, google has been trying to perform 2) thus rescanning my supplementals and ignoring them.
In the past few days, I see that my old supplementals are now gone (hurray), and hopefully in the next few weeks will follow the proper links and crawl mysite as static URL's (without making them supplementals).