Forum Moderators: phranque

Message Too Old, No Replies

htaccess rewite; /oldurl.mma to /oldurl

         

DiscoStu

12:04 am on Feb 10, 2010 (gmt 0)

10+ Year Member



A friend of mine runs an mma blog and he decided to rewrite the URLs to add .mma at the end. Not sure exactly how he did it, but it seems like it was causing indexation problems - hardly any new content was being indexed. Now he has abruptly removed it, but there are som old URLs that were indexed way back with .mma tacked on to them.

So how can he set up so all

www.domain.com/old-urls.mma

301s to

www.domain.com/old-urls

with a .htaccess rewrite?

Thanks!

jdMorgan

1:38 am on Feb 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Please post your best effort at coding this using mod_alias RedirectMatch or mod_rewrite RewriteRule as a basis for discussion.

Thanks,
Jim

DiscoStu

12:20 am on Feb 13, 2010 (gmt 0)

10+ Year Member



Please post your best effort at coding this using mod_alias RedirectMatch or mod_rewrite RewriteRule as a basis for discussion.


First I was disappointed to not get a stragiht answer, but then I realized that this was a good excuse to finally grab the bull by the horns and try to look into how regular expressions work (which I guess was the point) - so thanks for that :)

So here goes my first attempt


RewriteEngine on
RewriteRule ^(.+)\.mma$ $1 [L,R=301]


this would mean that any url ending in .mma would 301 to the url without it so

www.domain.com/any-url.mma

301s to

www.domain.com/any-url (right?)

My only problem I have when testing an equivalent of this on my own site, is that it uses the whole server path (home/domain/public_html/) to the new redirected domain so the end destination becomes:

www.domain.com/home/domain/public_html/any-url

which doesn't work. I'm guessing this is an easy fix?

jdMorgan

1:39 am on Feb 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There's a simple answer, but it raises another question, which potentially makes the whole 'project' not so simple.

The basis of the question is that your "any-url" example cannot exist as a physical file (since it has no file-extension, the server would not know what HTTP Content-Type header to send, so even if it did exist as a file, it would be difficult to serve it successfully via HTTP). So the question is, how does the server know what file to serve if you take the .mma off the URL? How did he serve these files *before* adding .mma to the URL-paths, and how will you serve them now/again once you take the .mma off?

This brings in the complication, in that you probably need two rules: The first to internally rewrite extensionless-URL requests to .mma filepaths (based on some as-of-yet-undefined criteria), and the second to externally redirect only direct client requests for URLs having .mma extensions to URLs having no extension. (Note critical distinctions between rewrites vs. redirects, and filepaths vs. URLs.)

So these questions need to be addressed: How did the site used to work? How do you plan to make it work now -- and most importantly, what criteria can you use to 'decide' to add .mma to an extensionless URL request so that it can be resolved to an existing file?

For the last question, you could add .mma to any URL not ending with a filetype (look for a period in the final path-part) and not ending in a slash. That may be sufficient in this case. Or if all .mma files are in a particular subdirectory or subdirectory-path, then that path can be used as the criterion. Finally --and as a last resort-- you can use "file-exists" checks. You can "rewrite if file exists when .mma is added to URL," or you can test for "file-doesn't exist as requested, so try adding .mma," and other variations. This can be helpful if no other criteria can be identified, but its disadvantage is that each 'file exists' check calls the operating system to go check the disk, and this approach can really bog down a busy site -- plus affect the lifespan of the hard drive on this server.

We do do things differently around here, simply because there are not enough volunteer contributors to "write free code for the world" -- The goal therefore is to help you understand this aspect of server administration, so that even if you don't come back to contribute, then at least the questions you may post in the future will be informed by better understanding, and therefore will be "more interesting" from the standpoint of a "discussion." As outlined in our Charter, our goal is to discuss and to educate, rather than to try to serve as a grossly-understaffed "global coding help desk."

To fix the code you've got so far, you need to add the protocol and domain:

RewriteEngine on
RewriteRule ^(.+)\.mma$ http://www.example.com/$1 [R=301,L]

However, as noted above, this new rule may now interfere with the rewrite (or other mechanism such as content-negotiation) that will have to be used to serve a .mma file when an extensionless URL request is received, because it does not check for the "direct client request" aspect previously mentioned.

If you can "fill in the blanks" indicated by the questions posted above, we can proceed more quickly to finding a rule-system to solve the whole problem.

Jim

DiscoStu

11:13 pm on Feb 15, 2010 (gmt 0)

10+ Year Member



I don't have direct access to the site, and the guy who has it doesn't know too much about any type of coding/seo etc. But basically, he changed it so all non-.mma urls now resolve, and all .mma URLs give a 404 response. Since he has links going to some of the old .mma URLs I figured the easiest way was to redirect all /old-url.mma (that all give 404 messages right now) to /old-url (which give 200 message)

jdMorgan

12:23 am on Feb 16, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yes, that's correct. But above, I'm warning you that doing so can create a loop, because there must be some mechanism in place to rewrite non-mma URL requests to mma files. if you redirect mma to non-mma, and that mechanism rewrites non-mma back to mma, you've got an 'infinite' redirect/rewrite loop, and your site won't work at all with either type of URL.

Therefore, it is critical to identify that mechanism so that a work-around for the loop can (hopefully) be provided.

Jim

DiscoStu

12:41 am on Feb 16, 2010 (gmt 0)

10+ Year Member



OK thanks for the advice, I will try to figure out what the deal is on his end before I suggest him doing any kind of redirect. In the meantime I'm going to go read up a little more on this stuff :P