Forum Moderators: open
Each link in the sitemap is of the second type and Google has never spidered any of them. We've put a static page sitemap to some of our most popular pages linked from our homepage - Google found the sitemap but again doesn't follow the links.
On one page, say that static sitemap, change "ID" to something else, like "product" for a few of the links.
Googlebot should be spidering again all out in the next two weeks, so you should be able to identify if it worked fairly quickly.
Try running a few pages as is through the sim-spider here [searchengineworld.com...] just to see if anythng flaky pops up.
I looked at the sim-spider of that page and it doesn't include all those links. (Well, duh, you knew that already, you're not getting spidered!)
I took a peek at the code and noticed that the dynamicly generated sections looked like:
<a href=[red][b]'[/b][/red]http://yadda.com[red][b]'[/b][/red]>Yadda</a>
<a href=[red][b]"[/b][/red]http://yadda.com[red][b]"[/b][/red]>Yadda</a>
I copied the page, replace all'ed the single quotes and ran it through simspider again and it did, then show all your links.
I wouldn't think it should matter, but apparently it does. If it weren't that a great deal of your site is non-indexed because of it, I'd say you found a nice glitch...
P.S. when I've looked at vbulletin sites, their content does not get indexed very often...
We have a static sitemap of our most popular pages that's been in the Google index for a few months too, in an effort to solve this problem. The page.cfm?ID=12345678 pages have again not been spidered, although we've made some specific static copies and linked them from here - the static copies have been spidered. So it definitely seems to be something to do with the second type of dynamic URL.
As explained previously, each page of content has a dynamic URL but the dynamic URL doesn't change for that page, there's no sessionID or anything. So there shouldn't be a problem with the spider getting into infinite loops.
Would the Apache mod_rewrite system help in our case by getting rid of the question marks? How do you go about doing that - I don't know anything about server-side programming but we have technicians who do.
We're a health information database, not a database of products, and we have no banner advertising or popups. We just want people to have access to our content from the search engines because it's been ten years in development and it's a really comprehensive resource. Any further help or suggestions gratefully received...
Have you tried the sim spider? It's awesome and helped me figure out that a page that wasn't being followed didn't have a good status code so I was able to fix it!
Does anyone have experience with using mod_rewrite with this kind of URL?
you should try something like this:
RewriteEngine On
RewriteRule ^/list.cfm/(.*)/(.*)$ http://www.domain.com/list.cfm?ID=$1&start=$2 [P]
RewriteRule ^/page.cfm/(.*)$ http://www.domain.com/page.cfm?ID=$1 [P]
these rules will rewrite http://www.domain.com/list.cfm/u/1 to http://www.domain.com/list.cfm?ID=u&start=1 and http://www.domain.com/page.cfm/-12345 to http://www.domain.com/page.cfm?ID=-123245
Are you getting a 200 success code?