| 5:13 pm on Jul 15, 2010 (gmt 0)|
How about checking the 404 error handling?
Similarly check any log-in challenges (401) to make sure the correct status is returned instead of "soft" handling via redirects. Same thing for 403 Access Forbidden.
Also, even though the canonical link "should" handle this problem, I'd still lock down the https: protocol by using a subdomain, such as secure.example.com.
| 6:11 pm on Jul 15, 2010 (gmt 0)|
Thanks! I'll put that on the list. This is the first time I've overseen a relaunch quite this complicated largely by myself.
| 4:17 am on Jul 16, 2010 (gmt 0)|
Using either Yahoo Site Explorer or a commercial backlink database, I'd check out existing external backlinks and try to get as many of them as possible changed after the relaunch.
| 8:25 am on Jul 16, 2010 (gmt 0)|
Also watch your log files carefully for those 404 errors and redirect these ASAP with a 301 redirect. A custom 404 error page will capture those people but it is better to redirect them to their intended landing page.
| 8:51 am on Jul 16, 2010 (gmt 0)|
make sure you have the default directory index document 301 redirected to the trailing slash url.
http://example.com/category/index.html --> http://example.com/category/
check wildcard subdomains and if they resolve either 404 or 301 as appropriate.
if the server is on a dedicated IP make sure requests for the IP redirect to the canonical domain.
if there is a development domain make sure it is behind basic authentication (401).
make sure you don't have domain aliasing canonicalization issues - such as example.net/.org/.info
| 1:49 pm on Jul 16, 2010 (gmt 0)|
Thanks all! Already have the domain aliasing stuff and the trailing slashes taken care of. Ran into a glitch with redirecting to lower case because it breaks the admin and the back end (we didn't write it, so we can't change it) IP is something I hadn't thought of - thanks for that. I'm working on a script that pulls the 404s out of the log and maybe mails them to me. I'll take a look for backlinks; this is a B2B ecommerce site that has some links to its home page, but not very many to internal products pages. I can check though.
This is great, thanks!
| 1:58 pm on Jul 16, 2010 (gmt 0)|
Those sites that I have been mentioning here at WebmasterWorld over the last few weeks, sites that were decimated by MayDay, were ZenCart sites - a close relative to osCommerce.
I believe the core URL structure, the multiple Duplicate Content issues, and several other important factors work against Google properly getting to grips with any ZenCart or osCommerce site.
There are similar issues with vBulletin and phpBB sites. I wonder if many of those also got blown away, just about two months ago?
| 2:37 pm on Jul 16, 2010 (gmt 0)|
Very likely. The only ZenCart sites I have left (meaning that I have clients running them, not me personally) only have a handful of products and pages in small niches, and they've dropped a bit but never had a lot of traffic to begin with. The OSCommerce sites dropped like a ton of bricks. That's why I'm trying to get my ducks in a row BEFORE we launch on the new platform.
| 9:56 am on Jul 17, 2010 (gmt 0)|
|Ran into a glitch with redirecting to lower case because it breaks the admin and the back end |
assuming you are using mod_rewrite, you should typically be able to add a RewriteRule that skips lowercase redirects for your admin urls.
| 2:57 pm on Jul 17, 2010 (gmt 0)|
|so set up a database that looks up the old product id and 301's it to the new product id for the product pages. Seems to work very fast |
That is usually a big task to do, but one that matters a lot. What I usually implement is I'd have a new global include file that records the URL accessed into a DB and then do a XENU run(or 2) on it. After that Export that into an Excel file(or not) and go over it with 3 tooth brushes in each hand, just to make sure that I got all the patterns.
|We now have completely new URLs with actual product names in them |
It helps, Not sure how you set up NEW URLs but there is one thing that I keep seeing over and over where the rewrite pattern implemented on the new site goes like this:
Nothing is wrong with that at all, until Slurp comes in and trying to index:
/products_id/100/ or /products_id/100
and your site spits out content for the /products_id/100/blue-shiny-widget/
So if you haven't implemented that check and have similar new URI structure, it should be on the list as well, I think.
Just my 2 tooth brushes...
| 4:17 pm on Jul 17, 2010 (gmt 0)|
one of the first things I do is run Xenu first in testing environment and then as soon as it goes live just in case, it is amazing how many problems I have found over the years
| 2:44 am on Jul 19, 2010 (gmt 0)|
|We now have completely new URLs with actual product names in them |
Unless you 301 old pages to these new ones they will all be considered new, in fact your entire site will appear new with no rankings whatsoever. It's a tough enough mountain to climb, I'd hate to start over at the bottom netmeg.
| 3:50 pm on Jul 26, 2010 (gmt 0)|
Yes, it's all 301'd, I thought I mentioned that above. We have a database that maps the old product id to the new one, and redirects (via 301) to the correct new page.
| 3:52 pm on Jul 26, 2010 (gmt 0)|
I know this is somewhat off-topic... but some of the best things I have ever done to avoid the BotShock, are:
|What am I forgetting? I know there must be something. |
- Start communicating with your client-base NOW, 30 days in ADVANCE of the change! Explain the coming changes, show screenshots, do ANYTHING you possible can to prepare a full section of the site all about the new build and interface!
- Start preparing a series of press-releases about the new site, "Who-What-When-Where-Why" and on the day of the new launch, start distributing them!
- REALLY prepare the site owners for a +30 day ride! :-)
[edited by: mhansen at 4:05 pm (utc) on Jul 26, 2010]
| 3:58 pm on Jul 26, 2010 (gmt 0)|
One approach I've been very successful with when I use it is only to 301 redirect the important URLs - those with lots of search traffic, or direct landing pages, or strong backlinks. I just let the rest of the URLs go 404 and assume that Google can crawl the the new site and find and rank that content just fine.
Caveat: I know that Matt Cutts has recommended against this approach and Google would prefer to see all the 301s for all migrated content. However, in practice I've found that a website full of 301 redirects seems to take much longer for Google to reprocess and trust-check than a site where just the cream of the crop gets a 301. YMMV - and if you take this approach it does require that you avoid as many technical errors on the site as you can - from day #1.
| 5:51 pm on Jul 26, 2010 (gmt 0)|
You should also do a stress test. Because if all three mayor search engine bots are coming to visit your site and start to reindex it, it might take your server(s) offline. We've had this problem once. A SEO company recommend it to "block" yahoo and msn the first few days and let Google do its work first. Search engines had to reindex thousands of urls at a new domain but with the same url structure.
| 6:58 pm on Jul 26, 2010 (gmt 0)|
|One approach I've been very successful with when I use it is only to 301 redirect the important URLs |
I was thinking of something like that - there are a lot of category pages that never got any traction in the search engines anyway, plus we've reorganized some product lines. So it's probably more work than necessary to 301 them to the proper page. But I just hate to 404 stuff. I really really do. Seems like such a waste.
|You should also do a stress test. |
We're planning to do one for user load on the server anyway; that's an interesting idea on the bots. I will talk it over with my peeps.
| 7:15 pm on Jul 26, 2010 (gmt 0)|
|I manually mapped all the category and static pages, and they will be fed into the .htaccess. Since the old pages were served out of a /catalog directory, I *think* we can put the redirects there instead of in the root directory .htaccess. |
Yes, you can, with adjustments to the rule patterns. However, make sure that no internal rewrites will be applied in the higher-level .htaccess file(s) or in any config files. If an internal rewrite is invoked before an external redirect, you will find that the internally-rewritten filepath will be 'exposed' as a URL to the client by the subsequent external redirect -- and in the case of search engine robot clients, that would be a very bad thing...
The rule is, starting with the server config files and proceeding down through all .htaccess files that will execute for any particular HTTP request, make sure that all redirects execute before any internal rewrites.
Where the RewriteRule patterns matching requested URL-paths are insufficient to guarantee this behaviour, it can be enforced by adding exclusions (negative-match RewriteConds) to those rules, by adding 'skip rules' (as previously mentioned by phranque) ahead of those rules, or by moving all redirects to the highest-level .htaccess or config file where any internal rewrites are invoked (and putting these relocated redirects ahead of the internal rewrite rules in that file).
Use a server headers checker to test URLs which have multiple problems, such as an obsolete /catalog URL with a non-canonical hostname and a couple of casing errors. Make sure that the server responds with a single 301 redirect to the final, correct URL. While it may not be feasible to ensure this for *all* possible old URLs, make sure that it happens for all of the most important ones.
| 7:38 pm on Jul 26, 2010 (gmt 0)|
If you can find the time I'd get the Google Product feed updated pre-launch if it drives any significant amount of traffic. We recently changed our site from a bunch of subdomains back to just a single (www) subdomain and it required updating the feed. We sent the new feed the day of launch and had no drop in traffic. In the past when the feed has been shut off for a time then restarted it seemed to take longer to regain the traffic. Having said that our feed builds dynamically and we didn't need to do much to autogenerate the feed with the new URLs and for us it was worth doing as it drives a fairly significant % of our overall traffic.
| 8:01 pm on Jul 26, 2010 (gmt 0)|
Put a similar checklist together a few years ago..
| 8:46 pm on Jul 26, 2010 (gmt 0)|
What do people think about the OLD sitemap vs the NEW sitemap in tools? Do you just do a 301 and then notify them?
| 12:30 am on Jul 27, 2010 (gmt 0)|
Thanks again, all - this is absolutely great. Plus I'm getting all this great info before we launch, and not rushing to fix stuff afterwards - always a better way to go.
| 1:54 am on Jul 27, 2010 (gmt 0)|
|What do people think about the OLD sitemap vs the NEW sitemap in tools? Do you just do a 301 and then notify them? |
At launch time I would overwrite the old sitemap with the new - but using the same URL. I would point to that URL in robots.txt, and also ping Google directly through Webmaster Tools. But I'd say it's a bad idea to give googlebot an XML Sitemap that includes any URL that 301 redirects. So that's another good relaunch checklist item: make sure the XML Sitemap is being generated correctly.
| 6:35 am on Jul 27, 2010 (gmt 0)|
|I'd say it's a bad idea to give googlebot an xml Sitemap that includes any URL that 301 redirects. |
how about in the case where you want googlebot to discover a 301 for the purpose of removing an incorrectly indexed url such as the case referred to by jdMorgan [webmasterworld.com] in another thread?
| 7:06 am on Jul 27, 2010 (gmt 0)|
In the case of that thread where jdMorgan posted, he was not directly talking about a sitemap - just making sure that googlebot saw one link to the IP address somewhere.
I've read a few SEO bloggers who recommend a "put the 301 URL in the sitemap" approach - but I have also read Google saying that they don't want to see any redirects in the sitemap. I guess you can take your choice. For me, I think a sitemap essentially says "these are my good URLs". On a large dynamic site, that can already be a challenge to generate cleanly.
| 7:37 am on Jul 27, 2010 (gmt 0)|
i wasn't implying that a sitemap was suggested there, but the thought certainly occurred to me.
| 3:08 pm on Jul 27, 2010 (gmt 0)|
No no no, don't plan on having any of the redirecting URLs in the sitemap. I didn't actually research, just kind of intuited that would be wrong.
The new sitemap url is slightly different; I'll see if we can change that. Had planned on adding it to the robots.txt; I generally do that anyway.
| 4:31 pm on Jul 27, 2010 (gmt 0)|
A few days ago I did what tedster suggested - uploaded the sitemap with the new urls for the old domain - in this case, simply the non-www to the www version. I'll let you know.
Tedster - what do you mean by:
"I would point to that URL in robots.txt" - Do you mean on the OLD site point to the NEW url in the robots.txt on the OLD site?
| 4:40 pm on Jul 27, 2010 (gmt 0)|
We're talking about a new version of a site being developed on the same domain - so I'm talking about the Sitemap URL for the domain's robots.txt file.
| This 31 message thread spans 2 pages: 31 (  2 ) > > |