homepage Welcome to WebmasterWorld Guest from 174.129.103.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
htaccess help
Large amount of 301s
Marketing Guy




msg:4552587
 2:14 pm on Mar 8, 2013 (gmt 0)

Hi all,

Spent the past few hours reading up on old posts, trying to figure out a problem. I've been tasked with sorting out old > new redirects for a large site and dealing with some legacy issues (2 previous URL structures and no real consistent format to the new / 3rd).

The client's old developer stuck 10k or so one to one Redirect Permanent rules into their htaccess file, which with along with other stuff brings it to 1.2mb in size. My knowledge of these things is pretty limited, but I'm guessing that isn't doing them any favours in terms of server performance and could be managed much better. Performance aside, some of the version 1 URLs haven't been redirected so the client is losing out on a lot of old inbound link juice (their new site launch at the end of the year saw their rankings plummet because of this and other issues).

The site is split into 80 or so locations - the plan is to redirect all old pages (version 1 and version 2 URLs) to the hub page of each location. Each location is available in up to 9 languages so redirects are to language appropriate hub pages.

The two old URL formats were:

/en/pub/VERSION1/worldmap/LOCATION1/page.cfm
/en/pub/VERSION2/worldmap/LOCATION1/page.cfm

The new CMS doesn't use .cfm pages, so the idea is to grab any request for .cfm pages and redirect to the appropriate hub page (main landing page for the location), the structure for which is:

/en/continent/country/location2/brand-term/overview/

(this format is correct for all languages except German and French, where "overview" is replaced with "uebersicht" and "apercu" respectively).

The rule I've come up with so far is:

RewriteRule ^/en/pub/[^/]+/worldmap/LOCATION1/[^/]+\.cfm /en/continent/country/location2/brand-term/overview/ [R=301,L]

RewriteRule ^/fr/pub/[^/]+/worldmap/LOCATION1/[^/]+\.cfm /fr/continent/country/location2/brand-term/apercu/ [R=301,L]


However, this would require creating the rule for each location and each language (~800 lines, which is still better than the current 10k+, but far from ideal).

I would like to be able to reduce this to the bare minimum, but my regex knowledge is holding me back. :)

Some questions & points

  • The client decided to change the folder structure a couple of years ago, so that's why there's a VERSION1 and VERSION2 component to the old URLs. [^/]+ should be enough to cover this?
  • The old versions contained locations that are slightly different to the new ones. Imagine if 80 non tech people were allowed to name stuff and you'll be in the right ballpark. So the location can't be captured as a variable for reuse.
  • Could I replace the language folder /en/ with a stored variable /([^/]+)/ and variable reference in target URL /$1/? This would require an OR statement for the overview part of the URL, which would become:


    RewriteRule ^/([^/]+)/pub/[^/]+/worldmap/LOCATION1/[^/]+\.cfm /$1/continent/country/location2/brand-term/overview|apercu|uebersicht/ [R=301,L]



Would this work?

Thanks,
Scott

 

phranque




msg:4552597
 2:21 pm on Mar 8, 2013 (gmt 0)

i would write a script to handle the redirect logic/response and then internally rewrite all requests for .cfm urls to that script.

phranque




msg:4552599
 2:27 pm on Mar 8, 2013 (gmt 0)

i would also advise implementing one-to-one redirects in that script rather than redirecting large numbers of urls to a small number of hubs unless you truly believe that's the best experience for the visitor.

if you're doing it for PR reasons it won't go well.

your other option is provide a 404 or 410 response as appropriate and specify a custom error document that provides sufficient navigation and search for the visitor to find what they were hoping to see when they clicked the link to your old url.

Marketing Guy




msg:4552606
 2:43 pm on Mar 8, 2013 (gmt 0)

The project is a bit of a mess - the client screwed up their redesign project and ended up ditching their developer and now upper management are riding staff members and their new agency pretty hard. We've been working with them on another capacity for a couple of years now and are trying to sort this out for them, but have limited access (took us a month to get a copy of their htaccess file for example).

It's certainly not for PR reasons, the site is pretty well established and they do loads of non SEO related marketing around the world on a regular basis (in fact non-brand search probably only accounts for 1-2% of their revenue).

In terms of legacy rankings, it was really only the location hub pages that ever ranked for anything anyway (it's hotels; the rest of the pages were just info, local fluff, etc).

Realistically given the amount of time we have to assign to the project and given what we know about the client and what they are likely to implement, it was felt a simplified htaccess solution would be best. It's easy for them to digest and the many to one redirects are decent for visitors (same language, same location, same intention behind any searches or click throughs).

We've got a phone call with their new developer next week - I'll chat with them about the script option, but tbh I can't see the client prioritising that at all.

Thanks for the input!
Scott

g1smd




msg:4552615
 3:03 pm on Mar 8, 2013 (gmt 0)

For the very oldest URLs, you're better off doing an internal rewrite such that when .cfm URLs are requested, a brand new PHP file is internally invoked. This logic in this file does all the fancy stuff for constructing the new URL and sending a 301 redirect out. It's important that this PHP script also correctly returns 404 status for non-valid .cfm requests. This reduces the .cfm request handling within htaccess to just a few lines. However you do it, do be aware that the 301 redirect must state both protocol and hostname as well as the new path.

Marketing Guy




msg:4552634
 3:19 pm on Mar 8, 2013 (gmt 0)

Thanks g1smd, I'll discuss the script option with their developer next week.

Am I right in thinking that the existing setup (10k lines / 1.2mb) is probably causing significant server performance issues?

g1smd




msg:4552676
 6:02 pm on Mar 8, 2013 (gmt 0)

Not necessarily; it depends how good the code is.

If the rules are stuffed full of (.*) or (.+) elements at the beginning or in the middle of RegEx patterns, then it will be very inefficient.

lucy24




msg:4552731
 10:41 pm on Mar 8, 2013 (gmt 0)

RewriteRule ^/([^/]+)/pub/[^/]+/worldmap/LOCATION1/[^/]+\.cfm /$1/continent/country/location2/brand-term/overview|apercu|uebersicht/ [R=301,L]

You can't have pipes in the target. You can have them in the pattern; it's one of the most common forms of capturing:

blahblah/(onething|otherthing|thirdthing)/morestuff
>>
newblahblah/$1/morenewstuff

Your first post suggests that almost everything can be done in a couple of rules with appropriate captures. That's assuming for the sake of discussion that you want to do mass redirects. Even then, 80 targets is definitely better than one single target.

Marketing Guy




msg:4553359
 9:55 am on Mar 11, 2013 (gmt 0)

Thanks both!

Given there are 3 target formats per location, that would mean I would have to do 3 rules per location (240 in total) - one each for the overview, apercu and uebersicht targets?

So the language folder ([^/]+) would be replaced with (ar|en|ru|ko|tr|zh-cn), expect for /de/ and /fr/ which would have their own rules. Would I be right in thinking it's more efficient to list the /de/ and /fr/ rules above the variable rule?

Thanks again!

lucy24




msg:4553372
 10:34 am on Mar 11, 2013 (gmt 0)

Yes. You're dealing with one of the basic principles: go from most specific to most general. If you have

rule for abcdefgijkmnopqrstuvwxyz
rule for l
rule for h

list them as

h blahblah
l blahblah
blahblah (won't affect l and h, which have already been dealt with, so you don't need to mention them at all)

g1smd




msg:4553377
 11:03 am on Mar 11, 2013 (gmt 0)

Yes, list the more specific rule first and the most general rules last.

Make sure that every rule has the [L] flag, so that when a particular rule matches, processing stops at that rule.

Marketing Guy




msg:4553381
 11:16 am on Mar 11, 2013 (gmt 0)

Thanks for the help - much appreciated! :)

Scott

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved