Forum Moderators: Robert Charlton & goodroi
It's an 18 month old site with just under 100 articles, all of outstanding quality. It is currently receiving ZERO visits from Google because of several major issues as follows.
- 45 day ago she changed her permalinks (wordpress site) from example.com/category/category-name/article-name/postID# to example.com/category-name/article-name-postID#.html
- The 404 page is a blank page that returns as 200
- Google has kept the old URI/Description/Link on about half the articles and either de-listed the others or changed them to a title only, without a desciption or URI, in the serps.
- Recent articles are not being indexed at all, old articles that appear to still be indexed cannot be found even with an exact match search for their titles.
- all links on site use the new link structure leading google to believe the pages are duplicate and thus they are also ignored.
- She added /category/ and to her robots.txt 45 days ago causing Google to begin removing those pages but the new category pages are also still considered duplicate, I assume that's because Google has seen them before via the old uri's.
- She changed some category names and deleted some little used categories.
The site is a mess, my gameplan is...
- create a proper 301 redirect for each article (not easy, category names aren't the same)
- Remove robots.txt entries for now so that Google re-crawls the articles to find the 301 codes.
- fix the broken 404 page so that it returns 404 instead of 200.
My question is: should I do anything else first or any of this in any particular order.. or something else altogether? The de-listing and ranking changes look to be happening right now so time is short and because of the sites current state I don't want to make matters worse.
I'm also wondering if she's done permanent damage to some pages rankings. I'd like to think it can all be sorted out but Google may not attempt to re-crawl pages it has delisted anytime soon and so it will not rank the same page under a new uri... which makes the internal link values on site a real mess too.
Any suggestions?
The site won't get any new content for at least 6 months now, I'm tempted to write a couple of guest posts a week to help Google learn the new link structure but I'm not sure if that would help or hurt until the mess is sorted out a bit more.
Client migrated a bunch of sites over to a new CMS (each use same domain as before) but:
- didn't install the redirect file to redirect all old URLs to new.
- installed robots.txt [Disallow: /] on day of move so that Google would delist all the old URLs. Yeah, really!
The new CMS uses rewrites with 'friendly' URLs, the old had multiple redundant parameters and infinite Duplicate Content issues.
As you can imagine, with the robots.txt [Disallow: /] file in place, Google wasn't indexing the new URLs!
On some sites, the robots.txt file had already been up for more than a week, and on others for just a couple of days (the CMS upgrade had been done in two batches, a week apart).
For the sites that were changed a week or more ago, Google was already turning many of the entries in a site: search to URL-only. None of the new URLs were listed because of the robots.txt [Disallow: /] rule.
Within 48 hours of removing the robots.txt file and adding the redirects, Google is already showing a few URLs from the new site-structure, and continuing to drop the old ones. Traffic *is* down by a fair amount.
For the sites only updated a few days ago, Google is now showing a few URLs from the new sites structure, but has not yet started to drop any of the old, now redirected, URLs. Traffic had not dropped by more than a few % so far.
The old URLs will turn to URL-only entries in a site: search, and/or drop to the bottom of the site: listings, and/or drop out of that list over the next few weeks or more. A few will persist for several months. That is never a problem, nothing to worry about, just as the old URL really does return 301 or 404.
The main measure of success is on getting most of the new URLs indexed before all of the old URLs have dropped out of the index. The real measure of success is in keeping the traffic levels stable, and then seeing a rise due to the many improvements in the new CMS.
update: 24 hours after making the changes mentioned in the original post Google is indexing the new pages surprisingly quickly. Hopefully it gets them all, time will tell.
[edited by: JS_Harris at 7:47 am (utc) on June 8, 2009]
The redirect fixes this.
The new pages get listed quite quickly. The old pages get delisted a bit slower. If anyone clicks a link to an old URL they do not get to see that content, they get redirected to the new content.
In the case mentioned above, the robots.txt file was blocking access to all URLs.