homepage Welcome to WebmasterWorld Guest from 54.205.254.108
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Content Management
Forum Library, Charter, Moderators: ergophobe

Content Management Forum

    
URL variations and duplicate content
MustardDan




msg:4445961
 12:42 pm on Apr 26, 2012 (gmt 0)

Hi guys, Ive just been going through Webmaster Tools and found in html suggestions I have many instances of duplicate titles and descriptions etc. A lot of this is coming from 2 versions of the same URL being indexed. The only difference is one has an uppercase letter in the URL wheras the other one has a lower case letter.

For example the extensions maybe /green-widgets.htm and /Green-widgets.htm being flagged as duplicate.

Would this affect in any way the rankings of a particular page and is there any way of de indexing the incorrect version without affecting the correct version?

I would really appreciate any help on this issue as it is worrying me slightly that I am being penalised for duplicate content.

 

ergophobe




msg:4446156
 6:55 pm on Apr 26, 2012 (gmt 0)

Hey Dan,

Windows server?

Can I just turn it around a bit and start by saying that there are penalties and there are penalties and people use the terms loosely.

Strictly speaking, a penalty is when Google, algorithmically or manually, flags your page as in violation of guidelines and puts in a true penalty that pushes your page lower down in the results than it should be. These are things like the "minus 950", "minus 30" and "minus 50" penalties. (see the Penalties topic here: [webmasterworld.com...] ).

I doubt this is happening to you.

But what you may be doing is splitting the authority of those pages. So in other words, as a single page, it might warrant ranking #8 for the main term, but instead you have a page at 50 and a page at 100. It's not a *penalty* per se, but it is bad.

So a few things you can do

- Solve the problem - in other words, stop this from happening. I'm guessing it's a Windows server and some inbound URLs are using mixed case, and because Windows is case insensitive, it's getting carried over. Find those inbound URLs and change them! I can't say I'm familiar with this problem, but it's like any other dupe URL problem (essentially like with and without a trailing slash) and anything you do to solve it will be less effective than stopping it at the source.

- set a "canonical" tag on each page so it tells the SEs what the canonical URL for that page is. This tells the search engine that you've settled on a URL and which one it is, but it doesn't necessarily send the signal that the old URL is bad.

- do a 301 redirect from the mixed case to the all-lower case page. This tells the SE to take the old URL out of the index and just use the canonical one.

Offhand, those are the three things I would focus on.

g1smd




msg:4446244
 9:55 pm on Apr 26, 2012 (gmt 0)

The redirect method is non-obvious. Rewrite (rewrite, not redirect) the upper-case requests to a PHP or other script that works out what the correct URL should be and then issues the correct 301 header.

ergophobe




msg:4446258
 10:06 pm on Apr 26, 2012 (gmt 0)

Because he can't do a case-sensitive match for the offending URL using whatever rewriting tool he has available on his server?

(still don't know what type of server it is).

MustardDan




msg:4446432
 9:20 am on Apr 27, 2012 (gmt 0)

Hi guys and thank you for your input. Yes it is a windows server and I did have the same thought as you about page authority being split instead of it all being pushed to the correct page version. As I speak the onsite link structure is being ammended to make sure it is all lower case versions being displayed. I have a strong feeling that this is where the issue came from and that it isn't from an external source. I will keep you updated on any effects this has.

lucy24




msg:4446618
 5:44 pm on Apr 27, 2012 (gmt 0)

Because he can't do a case-sensitive match for the offending URL using whatever rewriting tool he has available on his server?

If there's only a small number of URLs involved then sure, you can redirect them to the correct form. In Apache it would be done with a RewriteCond checking for at least one upper-case letter in the URL as requested. But Regular Expessions don't have case-changing built in, so you can't say "take any input and change it all to lower case". For that you need some kind of outside script. Unless, ahem, your name is jdMorgan.

ergophobe




msg:4446635
 6:43 pm on Apr 27, 2012 (gmt 0)

Right, I was thinking of a limited number of URLs, not rewriting everything.

In any case, the IIS URL Rewriter module has lowercase conversion built in since about 2009 I think ( [iis.net...] ). I have no clue how to do it, but Russ Lany does ;-)


<rule name="Convert to lower case" stopProcessing="true">
<match url=".*[A-Z].*" ignoreCase="false" />
<action type="Redirect" url="{ToLower:{R:0}}" redirectType="Permanent" />
</rule>

source: [ruslany.net...] (Russ Lany is the program manager for IIS Fast CGI and PHP and was behind the release of the URL Rewriter module for IIS 7, so he probably has about the best answer I'd find.

In an Apache context, of course Jim Morgan knows how!

[webmasterworld.com...]
[webmasterworld.com...]

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Content Management
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved