Forum Moderators: open

Message Too Old, No Replies

Deep crawl help! Specific site example. Please... my head hurts.

Deep Crawl help

         

TeMPeST

10:50 pm on Nov 3, 2002 (gmt 0)



Hi everyone..

Visited the site many times, but this is my first post. You guys and girls are on my level!

Anyway, my head really really hurts at the moment and I would wade through the site a bit more for my answers but perhaps one of you clearer-headed people could please just answer a few questions to save my brain... If i can get these things cleared up, I'll send a bottle of champagne somewhere!

This is my concern:

I have a non-commercial motorcycle site which has been established for over a decade now (bar a couple of url changes) - You can find it by searching for "Yamaha RD", it's no 1. It has a forum which is imo very popular. When I used a static ".html" based message forum, the site was always deep crawled and was a fountain of info for bikers around the world. It also seemed to refresh on google every few hours. The info was after all about mechanics and such, and was very useful on a daily basis. I've also concentrated on only linking from relevant sites.

However, at one point earlier this year, I installed some new software (phpBB) which uses mysql, and hence topics started to be linked using dynamic pages - "topic.php?ID=3213" etc.

After a couple of months I noticed that my page results on google dropped from around 20,000 to approximately 19. :(

After much faffing around I noticed that it was because google didnt like session ids.. Fair enough. So about 3 or 4 weeks ago, i modified the forum manually (learnt a lot of php too), and allowed googlebot to crawl through without being assigned a session id.

So after waiting patiently till the next dance, I find that only about 400 pages have been picked up.. and to my disappointment, they only seemed to be member profiles or search results. Argh. Anyway I've modified the forum further to not allow those areas to guests.. so that should sort that. I have also put a link from the front page to a sitemap.php file which indexes only the topics.

However, I am quite concerned that only 400 pages have been picked up, and also that they dont seem to come up on google searches anyway.

So my questions:

Is this because im impatient and the dance hasn't finished, or that the site will be crawled more next time?

Is it because I have been downgraded in some way to a non-important and "not worth deep crawling" site?

Is this something to do with the new 'fresh crawl'? (incidentally im not worried about that, as long as the monthly crawl does the job.

Are dynamic forums deemed "less important" than news sites for example?

If I convert my forum to a .html static-page style output, will this be deemed a more important and crawlable site? Or will that even penalise me in some way for trying to hide a forum?

Am I being penalised or likely to be penalised for things like the sitemap for example?

Phew....

I know this post has probably been a drag but I need some help to clear my head... please, googleguy or anyone.. help me with those specifics and ill be a much less stressed man. :))

Thankyou all anyway for all the input.

Take care.
Alek.

[edited by: TeMPeST at 11:11 pm (utc) on Nov. 3, 2002]

Helpmebe1

10:56 pm on Nov 3, 2002 (gmt 0)

10+ Year Member



I wish I could help, using static html I dont know much about your scenario but you are REALLY making me miss my bike :(

bateman_ap

10:58 pm on Nov 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I would put the site in your profile and get rid of it from your message as the board doesn't allow posting of urls for help. In reference to the message I think I am right in saying although google is getting better in following dynamic links it really depends on the pr of the page contining the link and I have a pretty good guess that it will only follow dynamic links one level deep to stop it getting caught in a loop.

TeMPeST

11:15 pm on Nov 3, 2002 (gmt 0)



Helpmebe1 - heh... ye i get told that a lot. Go get another one! :)

Bateman_ap - thanks, removed the link from the post, and yes I think you are right about the loop thing, but I would have hoped my site map would have got round some of that - hence my more specific questions. Is it likely that even file extensions affect PR? eg a .php without parameters is deemed less important than a .html?

Any more definitive answers?

pshempel

11:30 pm on Nov 3, 2002 (gmt 0)

10+ Year Member



Here is a link on phpBB's website related exactly to this topic

www.phpbb.com/phpBB/viewtopic.php?p=252424

[edited by: engine at 1:18 am (utc) on Nov. 4, 2002]
[edit reason] delinked [/edit]

Helpmebe1

11:31 pm on Nov 3, 2002 (gmt 0)

10+ Year Member



Tempest.. I hear you.. had a rice rocket..hoping for a harley next summer.. :)

As far as your question on php and such.. I have read that it dosent matter.. whether it is php htm or html google pays no attention to..

TeMPeST

12:18 am on Nov 4, 2002 (gmt 0)



pshempel, thanks.. but I've been involved in discussion on that thread already, and i need more specific google answers than those provided there unfortunately. it only describes the mod used to 'allow' google to crawl without sessionids. doesnt explain why only 400 pages were crawled tho.

born2drv

12:29 am on Nov 4, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I don't think you will be penalized at all for making a static-outputting website forum. Look at WebmasterWorld :) I'm pretty sure this site is database driven.

As for why your pages haven't been picked up, Google does a deep crawl once a month, but this doesn't mean it will crawl every page. I don't completely understand what makes Google crawl a website completely, but it can't hurt to have all your pages no deeper than say 2-3 links away from home. I have a directory.shtml static looking page that is dynamically created to generate a link to every other dynamically generated page on the site. It has helped get my dynamic stuff picked up. I hope that helps.

Good luck.

Ambiorix

12:32 am on Nov 4, 2002 (gmt 0)

10+ Year Member



One of my sites is a PostNuke CMS package. It is dynamic in PHP with a MySQL database.

It uses the? for parsing parameters...

What my results are that Google doesn't go deep enough to get the info. Those are all articles which are related to categories. Few of them are spidered...

pshempel

1:17 am on Nov 4, 2002 (gmt 0)

10+ Year Member



I have about 40 sites that are created dynamically. In relations to a few things with your new site many problems that occur would be related to the new index that could have been created from the first static site. I do know that google tends to follow your index (top page) of the site from previous month. If you look in your logs you may find many 404 errors related to your old site.
Logs are your best friend and will always tell you more than anyone or thing can. I do know if google gets more than a certian number of 404's it will stop working with the index it has of your site and request a new index and begin again.

Now if you redirected all the original pages this problem won't be it.

On all of my sites that have sid=somerandomstring
or id=somerandomstring in the url
Google will not traverse these pages. I have had to give google some non sid type pages to work with to get these sites indexed.

This has been the situation with any site I have moved from static to dynamic. I have to use mod rewrite to tell google they have moved to a different page on the site using 301 permanent move. This does not completely resolve the problem first time around. It takes two deep crawls from google before it will follow the moves.

One of the two above situations outside of server down time and server overload is most likely your problem.

Let us know if your logs do show the 404 errors and I can tell you how to set up some type of 301 move for your server.

Oh and by the way all of my Post-Nuke PHP-Nuke websites are deep crawled without any modifications. Not phpBB