Forum Moderators: open
Visited the site many times, but this is my first post. You guys and girls are on my level!
Anyway, my head really really hurts at the moment and I would wade through the site a bit more for my answers but perhaps one of you clearer-headed people could please just answer a few questions to save my brain... If i can get these things cleared up, I'll send a bottle of champagne somewhere!
This is my concern:
I have a non-commercial motorcycle site which has been established for over a decade now (bar a couple of url changes) - You can find it by searching for "Yamaha RD", it's no 1. It has a forum which is imo very popular. When I used a static ".html" based message forum, the site was always deep crawled and was a fountain of info for bikers around the world. It also seemed to refresh on google every few hours. The info was after all about mechanics and such, and was very useful on a daily basis. I've also concentrated on only linking from relevant sites.
However, at one point earlier this year, I installed some new software (phpBB) which uses mysql, and hence topics started to be linked using dynamic pages - "topic.php?ID=3213" etc.
After a couple of months I noticed that my page results on google dropped from around 20,000 to approximately 19. :(
After much faffing around I noticed that it was because google didnt like session ids.. Fair enough. So about 3 or 4 weeks ago, i modified the forum manually (learnt a lot of php too), and allowed googlebot to crawl through without being assigned a session id.
So after waiting patiently till the next dance, I find that only about 400 pages have been picked up.. and to my disappointment, they only seemed to be member profiles or search results. Argh. Anyway I've modified the forum further to not allow those areas to guests.. so that should sort that. I have also put a link from the front page to a sitemap.php file which indexes only the topics.
However, I am quite concerned that only 400 pages have been picked up, and also that they dont seem to come up on google searches anyway.
So my questions:
Is this because im impatient and the dance hasn't finished, or that the site will be crawled more next time?
Is it because I have been downgraded in some way to a non-important and "not worth deep crawling" site?
Is this something to do with the new 'fresh crawl'? (incidentally im not worried about that, as long as the monthly crawl does the job.
Are dynamic forums deemed "less important" than news sites for example?
If I convert my forum to a .html static-page style output, will this be deemed a more important and crawlable site? Or will that even penalise me in some way for trying to hide a forum?
Am I being penalised or likely to be penalised for things like the sitemap for example?
Phew....
I know this post has probably been a drag but I need some help to clear my head... please, googleguy or anyone.. help me with those specifics and ill be a much less stressed man. :))
Thankyou all anyway for all the input.
Take care.
Alek.
[edited by: TeMPeST at 11:11 pm (utc) on Nov. 3, 2002]
Bateman_ap - thanks, removed the link from the post, and yes I think you are right about the loop thing, but I would have hoped my site map would have got round some of that - hence my more specific questions. Is it likely that even file extensions affect PR? eg a .php without parameters is deemed less important than a .html?
Any more definitive answers?
As for why your pages haven't been picked up, Google does a deep crawl once a month, but this doesn't mean it will crawl every page. I don't completely understand what makes Google crawl a website completely, but it can't hurt to have all your pages no deeper than say 2-3 links away from home. I have a directory.shtml static looking page that is dynamically created to generate a link to every other dynamically generated page on the site. It has helped get my dynamic stuff picked up. I hope that helps.
Good luck.
Now if you redirected all the original pages this problem won't be it.
On all of my sites that have sid=somerandomstring
or id=somerandomstring in the url
Google will not traverse these pages. I have had to give google some non sid type pages to work with to get these sites indexed.
This has been the situation with any site I have moved from static to dynamic. I have to use mod rewrite to tell google they have moved to a different page on the site using 301 permanent move. This does not completely resolve the problem first time around. It takes two deep crawls from google before it will follow the moves.
One of the two above situations outside of server down time and server overload is most likely your problem.
Let us know if your logs do show the 404 errors and I can tell you how to set up some type of 301 move for your server.
Oh and by the way all of my Post-Nuke PHP-Nuke websites are deep crawled without any modifications. Not phpBB