Forum Moderators: phranque

Message Too Old, No Replies

Duplicate pages several thousand mass rewrite?

duplicate urls ecommerse

         

amavity

7:29 pm on Oct 14, 2010 (gmt 0)

10+ Year Member





Hi peeps, sorry to impose on your forum, i have an issue with duplicate urls on site which was as a result of trying to clean up the urls mass scale.

We basically had urls which contain /? and we decided to get rid of the / in all urls, we tried getting ridf of the question mark but had little to no joy at all.

The urls in question look very much like this.


http://www.example.com/Deus-Ex/?asin=B0PRT8RMB1 duplicate url

http://www.example.com/Deus-Ex?asin=B0PRT8RMB1 correct url.

The problem we face is we have 50,000 pages like this, all of which are keyworded but contain both /? and ?

Would it be possible to mass redirect and how would i go about doing such a task?

Would i have to get the 50,000 matching the correct urls first or could i add some code to simply delete these /? as google find them in webmasters?

Thanks alot, had this issue for a long time and really playing havoc in my webmasters as you can imagine.

Alex

g1smd

7:49 pm on Oct 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Use an end-anchored RewriteRule pattern that checks the incoming request has a trailing slash, and captures all but the trailing slash in a backreference. Use a preceding RewriteCond to look at QUERY_STRING and make sure one is present.

The redirect target should include the protocol and the canonical domain name. Redirect using the captured backreference as the new URL path. Append the original query string to the new path. Use the [R=301,L] flags.

The solution for all of the URLs is two lines of trivial mod_rewrite code. This can redirect all URLs matching the specified pattern.

amavity

7:58 pm on Oct 14, 2010 (gmt 0)

10+ Year Member



Thanks for the useful information, in relation to the trivial mod-rewrite, how would the code be written to match a specific pattern?

Thanks again.

jdMorgan

6:09 pm on Oct 15, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem with getting replies here has to do with two things.

First, please take a look at our Forum Charter -- and note the links at the end, too.

Second, it is not clear which parts of your URLs vary among those 50,000 URLs, and which parts are fixed. It is also not clear what the "character set" used in the varying parts might be. So no-one can make any useful suggestions for a rule or for the regular expression needed for that rule.

For example, does "DeusEx" URL-path change, or just the "asin" number in the appended query string? Is the "asin" number related to the URL-path "name" in some way?

Several good solid examples of maximally-different incorrect URL-paths with query strings might be very helpful at getting a useful reply....

Jim