If you change the urls in any way, each of those is a new page and starts from scratch. If the old urls all are 404 or 410, you will really be in the hole for a period of time. If you 301 redirect old urls to appropriate new urls, that will help a bit -- but just a bit.
Googlebot will definitley crawl your new urls, assuming all is technically solid. Depth of crawl will probably be frustrating for a while.
But getting crawled and getting ranked are not the same thing. In short, I think you have a major bump in the road coming. I would highly suggest letting the old urls resolve as they are and having a link to that cluster of old pages from the new Home Page -- maybe call it "archive". Don't remove the old pages until the new pages are ranking for you.
I would highly suggest letting the old urls resolve as they are and having a link to that cluster of old pages from the new Home Page -- maybe call it "archive". Don't remove the old pages until the new pages are ranking for you.
I had thought of doing that, but what concerns me is the fact that the content on the old pages will also be on the new pages as well - I'm worried that I'll get penalized for duplicate content... does it matter if the dupes are from the same site?
In that case I would suggest you 301 redirect from the old url to the new url. The old url does not go 404 or 410, which is what you don't want.
There wouldn't be any "penalty", that's a commonly used word that isn't really accurate. But Google would make an algorithmic decision as to which version of the dup;icate content to show in the SERP -- and most likely the old url would win and the new one wouldn't get established.
If you have a big old pile of pages, maybe work through a search engine referer report from your server logs and prioritize which urls matter the most for your site.
I did something very similar at the beginning of the year. The old site was an ASP scripted template that delivered static pages. The new site is PHP dynamic pages.
All of the old pages were 301 redirected to the equivalent new page. Changes were kept to the minimum needed to fit the new format. I added new content gradually after the bots had a good chance to spider the new pages.
The old pages continued to show up on Google with their old ranking, with the new pages showing a TBPR of 0. The new pages have recently started to get ranked and the old pages are disappearing from the index. The site still has over 10% of the old pages listed, even though the cache shows the new page. These tend to be the highest ranking pages so I'm not complaining. The new pages mostly have a lower ranking than the old ones, but this is improving.
I believe that redirecting the old pages to the new ones was critical to the success of the move. Everything else is pretty much guesswork. The results have been mostly good, though, and that's what counts.
Please let us know how everything worked for you after a period of time.
Sandboxed no, but as the others said be sure not to lose the URLs that are ranking well and bringing traffic. You can do redirects or even just keep the same URL. For those that aren't, who knows, maybe a schuffle would do some good. Whether a site is html or php or whatever shouldn't make a bit of difference.
Yes the sandbox will kick in within 48h and sandbox your whole site for the next 2 years, believe me.
The fullest answer is that people have differing experiences. How much change and how fast must it be introduced to trigger a re-sandbox? I don't know that anyone has pinned that down precisely. It can happen, but it doesn't always.
I'll be facing same problem soon since I have to move 2000 page website from plain default.asp html page with few include scripts to dynamically driven php site.
I have a script which will catch all the old links a do redirect to new locations but I wonder which root would be best to take with Google, let him slowly discover the change or dump him a brand new sitemap and hope for the best. :-/
Also don't if it is better to include old links in sitemap so they get dumped from index quickly to avoid duplicate content penalty.