Server-side "Duplicate Content" Issues

When people think about SEO, they think about content - on the page, across the site, and in backlink URLs on other domains. But there is a bedrock area that accounts for more trouble than most webmasters usually consider. That bedrock is the server technology itself and the platform used to serve the domain. You don't need to be a sysadmin to have the basic knowledge you need to make certain that your site is not built on quicksand. Google works with you and tries to guard against the common problems, but really - this area is your responsibility.

Here are some of the problems that I see routinely, all of them in the general region of duplicate URL troubles. Really, well over half the domains that I look at have some of these issues. We've discussed some of them in various threads, but I thought there would be value to devoting a thread just to this one topic.

If your previously well-ranked site suddenly begins to develop ranking troubles - check here first!

Basic Rule
If two different URLs both return a 200 OK but serve the same document, then you are getting into duplicate content country and that can create major problems. If any two URLs are not an EXACT character match then they are different.

1. Dynamic URLs
What happens if the order of two parameters is reversed? Only one order should result in a 200 status.

2. Rewrite Schemes
Did you take the lazy man's route and key off from a number in the URL - and then just throw a keyword into the filepath so that you have it in the URL? What happens if the number is correct but the keyword is a typo, or even total garbage? With any rewrite scheme, and especially on a site of some complexity, test your set-up with lots of creatively "bad" URLs - really kick that server around.

3. "Custom" 404
If the header for the error page you serve is not 404 (or 410 Gone) then it doesn't matter what the title or body content of that page says. Common problems in this area come from using a 302 redirect when the URL is not found. A 302 redirect on the same domain will usually result in the requested URL indexed, but with the content of your "custom error page." Over time, every bad url ever requested can be indexed as a duplicate urls for that one page. Eventually, your entire domain looks like garbage - and I said eventually. This kind of error can be a timebomb.

4. Double Slashes
Apache has a native configuration that ignores double slashes in the file path and treats them as a single slash. It's best to address this with a rewrite rule.

5. Two Levels of Error Handling
It's common for the server itself to have one native level of error handling, but the platform that serves a url can have a second level. One example would be IIS itself is handling some basic errors, but .NET will handle errors in the query string. I've seen simlar issues with PHP/mySQL and also .jsp/Tomcat websites. I'm pretty sure it can happen on ColdFusion, too. So make sure that BOTH kinds of not-found errors are returning a true 404 http status.

Anyone have more?

Server-side "Duplicate Content" Issues

tedster

tedster

Robert Charlton

nervo

tedster

nervo

tedster

pageoneresults

nervo

Tonearm

jd01

Shurik

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week