homepage Welcome to WebmasterWorld Guest from 23.23.12.202
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
Forum Library, Charter, Moderators: phranque

Website Technology Issues Forum

    
best practices for CDN configuration
avoiding duplicate content or other SEO issues
mcglynn

5+ Year Member



 
Msg#: 4504781 posted 8:42 am on Oct 6, 2012 (gmt 0)

Most website store static content throughout the docroot, e.g.:
example.com/images/
example.com/scripts/
example.com/wp-content/

If you want to serve all your static content (css, js, images) from a CDN, the easy solution is to set the CDN's "pull zone" to the root of your own domain, e.g. example.com. This means that your entire site's content is also available at, for example, cdn.example.com, which makes it very easy to rewrite requests for static files so that they get served out of the CDN.

However, this also means that all your non-static content pages are *also* available from the CDN. E.g., your homepage is available at both example.com and cdn.example.com

Maybe your CDN provider lets you set up a custom robots.txt file on the CDN. If so, you can prevent search engines from indexing the CDN's copy of your website. This might prevent disastrous search problems due to duplicate content, but won't help you if you want your images to show up in Google's Image Search.

I've seen a lot of chatter around the web regarding complex mod_rewrite solutions for this problem, but I don't see how that helps. At least for the CDN I'm using (maxcdn), requests from the CDN are not tagged in any way -- there's no way for Apache, mod_rewrite, or PHP to determine that the current request has been proxied through the CDN.

So, I see two solutions:

#1 - create separate pull zones for every static content directory, with different subdomains for each, e.g.:
cdn-images.example.com pulls from example.com/images/
cdn-scripts.example.com pulls from example.com/scripts/

This is tedious to set up, and requires more DNS lookups for every first-time visitor. It solves the problem of preventing content pages from being duplicated in the CDN, but it's ugly.

#2 - set up a second site on your local server to serve as the origin for the CDN pull zone. Call it origin.example.com. For every static content directory in the main docroot (example.com), create a symlink in the origin's docroot. This allows the server at origin.example.com to serve content for those static directories.

Then create a 404 handler for the new 'origin' site that issues a 301 redirect for any other URI. So, requests for files within those symlinked directories would be served directly, but everything else would redirect.

(This is basically a 1-line PHP script:
header("Location: http://example.com{$_SERVER['REQUEST_URI']}", true, 301); )

This seems to solve all the identified problems:
- don't allow content pages in the CDN
- do issue 301s for CDN requests for content pages
- don't block spidering of the CDN

Before I go this way I wanted to ask here if I'm overlooking some other problem. I've seen the question asked before, but I've never found a suggestion of what 'best practices' might be.

 

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Hardware and OS Related Technologies / Website Technology Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved