Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Best plan to salvage rankings after major site restructuring mess?

         

JS_Harris

7:35 pm on Jun 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've inherited a site from a frustrated cousin, she's about to leave on a tour of duty and gave me carte blanche to "fix any problems" I found, boy does this site have problems right now, any advice welcome.

It's an 18 month old site with just under 100 articles, all of outstanding quality. It is currently receiving ZERO visits from Google because of several major issues as follows.

- 45 day ago she changed her permalinks (wordpress site) from example.com/category/category-name/article-name/postID# to example.com/category-name/article-name-postID#.html

- The 404 page is a blank page that returns as 200

- Google has kept the old URI/Description/Link on about half the articles and either de-listed the others or changed them to a title only, without a desciption or URI, in the serps.

- Recent articles are not being indexed at all, old articles that appear to still be indexed cannot be found even with an exact match search for their titles.

- all links on site use the new link structure leading google to believe the pages are duplicate and thus they are also ignored.

- She added /category/ and to her robots.txt 45 days ago causing Google to begin removing those pages but the new category pages are also still considered duplicate, I assume that's because Google has seen them before via the old uri's.

- She changed some category names and deleted some little used categories.

The site is a mess, my gameplan is...

- create a proper 301 redirect for each article (not easy, category names aren't the same)
- Remove robots.txt entries for now so that Google re-crawls the articles to find the 301 codes.
- fix the broken 404 page so that it returns 404 instead of 200.

My question is: should I do anything else first or any of this in any particular order.. or something else altogether? The de-listing and ranking changes look to be happening right now so time is short and because of the sites current state I don't want to make matters worse.

I'm also wondering if she's done permanent damage to some pages rankings. I'd like to think it can all be sorted out but Google may not attempt to re-crawl pages it has delisted anytime soon and so it will not rank the same page under a new uri... which makes the internal link values on site a real mess too.

Any suggestions?

tedster

8:42 pm on Jun 6, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you fix the errors, I'd say Google will re-index over time. I'd probably start by fixing the robots.txt, then the 404 status - those two should be really quick and could be done essentially at the same time. Then do the 301 work.

JS_Harris

6:42 am on Jun 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I did all of those this morning, fingers crossed. It's all on Google now, I wonder how long it will take to sort out.

The site won't get any new content for at least 6 months now, I'm tempted to write a couple of guest posts a week to help Google learn the new link structure but I'm not sure if that would help or hurt until the mess is sorted out a bit more.

g1smd

9:00 am on Jun 7, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm keeping an eye on a similar mess elsewhere.

Client migrated a bunch of sites over to a new CMS (each use same domain as before) but:
- didn't install the redirect file to redirect all old URLs to new.
- installed robots.txt [Disallow: /] on day of move so that Google would delist all the old URLs. Yeah, really!

The new CMS uses rewrites with 'friendly' URLs, the old had multiple redundant parameters and infinite Duplicate Content issues.

As you can imagine, with the robots.txt [Disallow: /] file in place, Google wasn't indexing the new URLs!

On some sites, the robots.txt file had already been up for more than a week, and on others for just a couple of days (the CMS upgrade had been done in two batches, a week apart).

For the sites that were changed a week or more ago, Google was already turning many of the entries in a site: search to URL-only. None of the new URLs were listed because of the robots.txt [Disallow: /] rule.

Within 48 hours of removing the robots.txt file and adding the redirects, Google is already showing a few URLs from the new site-structure, and continuing to drop the old ones. Traffic *is* down by a fair amount.

For the sites only updated a few days ago, Google is now showing a few URLs from the new sites structure, but has not yet started to drop any of the old, now redirected, URLs. Traffic had not dropped by more than a few % so far.

The old URLs will turn to URL-only entries in a site: search, and/or drop to the bottom of the site: listings, and/or drop out of that list over the next few weeks or more. A few will persist for several months. That is never a problem, nothing to worry about, just as the old URL really does return 301 or 404.

The main measure of success is on getting most of the new URLs indexed before all of the old URLs have dropped out of the index. The real measure of success is in keeping the traffic levels stable, and then seeing a rise due to the many improvements in the new CMS.

JS_Harris

7:46 am on Jun 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thinking that robots.txt should be used on the old copy of pages you are moving is an easy mistake to make isn't it? I mean, you do want the old pages gone. 301 will remove them and index the new, robots.txt will remove them and block the new for a time.

update: 24 hours after making the changes mentioned in the original post Google is indexing the new pages surprisingly quickly. Hopefully it gets them all, time will tell.

[edited by: JS_Harris at 7:47 am (utc) on June 8, 2009]

g1smd

9:07 am on Jun 8, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, you do want the old pages gone, but not before the new ones are listed and ranking: otherwise you have no traffic.

The redirect fixes this.

The new pages get listed quite quickly. The old pages get delisted a bit slower. If anyone clicks a link to an old URL they do not get to see that content, they get redirected to the new content.

In the case mentioned above, the robots.txt file was blocking access to all URLs.