Welcome to WebmasterWorld Guest from

Forum Moderators: phranque

Message Too Old, No Replies

best practices for CDN configuration

avoiding duplicate content or other SEO issues

8:42 am on Oct 6, 2012 (gmt 0)

New User

5+ Year Member

joined:Oct 25, 2006
posts: 37
votes: 0

Most website store static content throughout the docroot, e.g.:

If you want to serve all your static content (css, js, images) from a CDN, the easy solution is to set the CDN's "pull zone" to the root of your own domain, e.g. example.com. This means that your entire site's content is also available at, for example, cdn.example.com, which makes it very easy to rewrite requests for static files so that they get served out of the CDN.

However, this also means that all your non-static content pages are *also* available from the CDN. E.g., your homepage is available at both example.com and cdn.example.com

Maybe your CDN provider lets you set up a custom robots.txt file on the CDN. If so, you can prevent search engines from indexing the CDN's copy of your website. This might prevent disastrous search problems due to duplicate content, but won't help you if you want your images to show up in Google's Image Search.

I've seen a lot of chatter around the web regarding complex mod_rewrite solutions for this problem, but I don't see how that helps. At least for the CDN I'm using (maxcdn), requests from the CDN are not tagged in any way -- there's no way for Apache, mod_rewrite, or PHP to determine that the current request has been proxied through the CDN.

So, I see two solutions:

#1 - create separate pull zones for every static content directory, with different subdomains for each, e.g.:
cdn-images.example.com pulls from example.com/images/
cdn-scripts.example.com pulls from example.com/scripts/

This is tedious to set up, and requires more DNS lookups for every first-time visitor. It solves the problem of preventing content pages from being duplicated in the CDN, but it's ugly.

#2 - set up a second site on your local server to serve as the origin for the CDN pull zone. Call it origin.example.com. For every static content directory in the main docroot (example.com), create a symlink in the origin's docroot. This allows the server at origin.example.com to serve content for those static directories.

Then create a 404 handler for the new 'origin' site that issues a 301 redirect for any other URI. So, requests for files within those symlinked directories would be served directly, but everything else would redirect.

(This is basically a 1-line PHP script:
header("Location: http://example.com{$_SERVER['REQUEST_URI']}", true, 301);

This seems to solve all the identified problems:
- don't allow content pages in the CDN
- do issue 301s for CDN requests for content pages
- don't block spidering of the CDN

Before I go this way I wanted to ask here if I'm overlooking some other problem. I've seen the question asked before, but I've never found a suggestion of what 'best practices' might be.