Forum Moderators: open
The front page news in the cached copy at google is usually only 1-2 days old. But I've never been able to figure out why the news archive doesn't get indexed? Maybe someone here could hazard a guess?
The archive links do use a query-string on the url, but it is only a single variable like so:
[domain.com...]
and google has no problem seeing this other url on the site with a query string:
[domain.com...]
So I'm basically stumped. Any ideas?
It also might be something along the lines of too much content (I know it sounds crazy). But, deepbot is the bot that can easily add content to Google's database. Freshbot acknowledges exsistence and in some cases modifies the database but from what I can tell it never really adds directly to the Googlebase.
Considering that deepbot comes around once a month maybe that could easily be your problem.
Idea/Recommendation: Archive by weeks.
I will try moving toward a .php instead of .php3 extension. I also noticed after seeing a comment in a different thread here, that each page in the news archive is not getting it's own title. They all have the same title, and that may be having a negative effect.
It doesn't make sense for us to archive by weeks as suggested. The site is a webcomic, and the newsposts coincide with the comic strip for the given day. Also, the main newspost for a comic is typically 500 words, and the additional posts that squeeze in before the next comic can often add another 1500 words, so I wouldn't want to push a whole week's worth at a user all at once.
We tried URL rewriting for a couple of months about a year ago, but we changed hosts and the new host did not have rewriting configured properly, and we hadn't seen the archives show up yet anyways, so we just took the rewriting out.