Forum Moderators: phranque
I recently restructured my site (it was necessary), and am looking for the best solution for redirection. Two programmers advised I use redirect 301 from the old index.html file to the new index.php file. However, of course the site is generating countless of 404 errors because the old filenames were not redirected to reflect new filenames. Those files are now deleted from the server, so when visitors request any of the old filenames, they are currently redirected to a custom 404 page with a meta-refresh to the new index.php page.
My restructure involved creating new folders, and putting new filenames into those folders. Since there is now a new addition of new folders and new filenames, would it even be feasible for me to use htaccess to redirect those non-existant OLD files to the new folders and new filenames?
I have tried using various htacces rewrite options, but none of them will prevent the 404's.
My current htacess has a 301 redirect only for the old index.html to the new index.php:
RedirectMatch permanent ^/index.html$ [mysite.com...]
My problem is that I need to avoid continuous 404's from requests to non-existing files. (the current meta refresh on the 404 page is only a temp solution until I get this figured out)
My biggest problem is that my old files were in excess of around 100+ pages. ALL of them have changed fileNAMES and fileEXTENSIONS (html to php). Since the rewrite options won't cover the old pages, what would my solution be?
I am thinking of resorting to redirecting every (or most of, I can tolerate a few 404's) old html to the new filename.php in htaccess, but would 100+ pages crash my server or cause a considerable load time? If I did this, would the following code be correct?
redirect 301 /index.html [mysite.com...]
redirect 301 /oldfilename1.html [mysite.com...]
redirect 301 /oldfilename2.html [mysite.com...]
etc etc
Lastly, what is the difference between:
RedirectMatch permanent ^/index.html$ [mysite.com...]
and
redirect 301 /index.html [mysite.com...]
If I had to do it this way, I would prefer the redirects to be consistent. All old to new should reflect a permanent redirect if possible.
Thanks for taking the time to read this....this forum may be my last hope :)
Smile, everything you asked is possible.
There are a few things to consider, and questions I would ask before constructing (or being able to construct) a .htaccess file for you, so they may be good to ask yourself in the process of deciding on the best answer.
1. Is there any type of consistancy between the old .html file names and the new .php file names?
2. Are you using any query strings (stuff after the?) with your new .php files?
3. If you are, have you considered rewriting old .html to new .html, and then serving the .php to the new .html
(Complicated question and process, but may help with Search indexing if you are using query strings)
To answer your question about server load/load times, the number of pages you are asking about should not cause any huge issue. Using proper placement of rules can also help the speed issue.
(Apache moves on a soon a rule fails, so you can actually optimize your file to perform better in most cases.)
Further, the answer depends on your answer to the structure of your old pages, compared to the new... If they are consistant throughout you may only need 1 or 2 rules to account for all possible combinations and could use a very short .htaccess file.
As far as not being able to prevent the 404, there are a bunch of variables that go into making rules work.
The correct rewrite for this:
redirect 301 /oldfilename1.html [mysite.com...]
could/would be:
RewriteRule ^oldfilename1\.html$ /newdifferentfilename.php [R=301,L]
(If you are not changing domain names, while the full path to the new file is necessary, the full canonical http:// version of the URL is not. The prepending / on the left side of your rule should not be included, but it should be included on right if you are not using a full http string.)
Some other things to consider: (I am assuming you have adequate permissions to rewrite, and have the necessary general information in your .htaccess, since one of your rules is working.)
1. Where is your .htaccess file in relation to the pages you are trying to rewrite? (Before or on the same level as the files... If it is on the same level, make sure your left side of the equasion starts from where you are now.
EG if you were rewriting this URL yoursite.com/stuff/page.html from /stuff/ your left side rule would look like:
RewriteRule ^stuff\.html$
2. Are you using regular expressions or specific file names?
3. If you are using regular expressions are you passing your variables correctly? EG () creates a variable, so is every variable being created? Are they referenced in the correct order? EG (variable-1)/no-var/(variable-2)\.html /$1/$2.html would rewrite to http//www.yoursite/variable-1/variable-2.html
4. Are you remembering to \ (back slash) special characters EG .(dot)
5. Regardless of where your .htaccess file is, are you remembering to use the full path to the new file on the right side of the equasion?
I probably missed a few things, but hopefully this will help you some, or at least point you in a direction.
Justin
I'll try to answer your q's to the best of my ability:
>1. Is there any type of consistancy between the old .html file names and the new .php file names?
Only the content is the same (slightly modified to include updated info and of course, a new template and headers), but filenames are completely different. Keywords and descriptions are generally the same and are in each header in the new php pages.
>.2. Are you using any query strings (stuff after the?) with your new .php files?
I'm sorry, not sure what you mean.
>3. If you are, have you considered rewriting old .html to new .html, and then serving the .php to the new .html
Well, I'm assuming would be redundant, as I no longer have the old html files (they were deleted).
>To answer your question about server load/load times, the number of pages you are asking about should not cause any huge issue.
I am breathing a sigh of relief, thank you :)
>Further, the answer depends on your answer to the structure of your old pages, compared to the new...
They are not consistent. The old structure was for the most part, mainly in the root (main directory), with only a few folders for images, graphics. The new structure is now better organized (still in root directory) with new folders and subsequent new php pages inside the new folders. I have 4 folders now: main (index.php), category1/, category2/, category3/ and category4/. I have an index.php in each of those folders, along with their individual php content pages. Each folder contains it's own images.
>The correct rewrite for this:
redirect 301 /oldfilename1.html [mysite.com...]
>could/would be:
RewriteRule ^oldfilename1\.html$ /newdifferentfilename.php [R=301,L]
I would prefer not to use the rewrite rule if possible.
>1. Where is your .htaccess file in relation to the pages you are trying to rewrite? (Before or on the same level as the files... If it is on the same level, make sure your left side of the equasion starts from where you are now.
The htaccess is in the root directory.
>2. Are you using regular expressions or specific file names?
I am not familiar with regular expressions and would prefer not to use them. At present, I only have the old index.html redirected to the new index.php as noted above, using the RedDirect Match line in htaccess. If I redirect my old pages to the new pages, I would be using specific file names.
>3.
Does not apply.
>4. Are you remembering to \ (back slash) special characters EG .(dot)
I would only need to backslash if using regular expressions, correct? If I use redirect 301, the forward splash would appy?
>5. Regardless of where your .htaccess file is, are you remembering to use the full path to the new file on the right side of the equasion?
I haven't started redirecting yet, but yes, I definately would include full path, since many of them will now include a folder structure.
>I probably missed a few things, but hopefully this will help you some, or at least point you in a direction.
You were extremely thorough and helpful!
Although I still have a couple q's. Can I still redirect using the example below, would the code be correct, and can I use the 'redirect' in lowercase?
Example:
redirect 301 /index.html [mysite.com...]
redirect 301 /oldfilename1.html [mysite.com...]
redirect 301 /oldfilename2.html [mysite.com...]
etc
etc
And my last q, can I change the RedirectMatch 301 permanent ... line I have currently, to reflect the above? I was under the impression it wasn't neccessary to use the RedirectMatch when the above example would accomplish the same thing. Also, I'm not using regular expressions. I have it there now only because someone told me to do it that way.
I am not familiar at all with redirect 301, so after consulting the Apache documentation, it appears your syntax is correct, with the exception that, eventhough you are not directly using regular expression, the .(dot) is still a reserved character and therefore should be preceded with a \ to keep from matching 'any character, except for a line break.'
Added: It appears after looking further, that using redirect in either the root or on a per-directory basis, requires the full file path on both sides of the equasion. Using the example above of redirecting http//www.yoursite.html/stuff/file.html from the directory /stuff/ the correct syntax would be:
Redirect permanent /stuff/file\.html /new-stuff/file.html
Added: You may also substitute 301 or any numerical code (between 300 and 400) in the place of the word permenent.
If all of your file names and paths are exactly the same length, this will not cause an issue, because the .(dot) in the file path you are redirecting is 'any character, except for a line break', and therefore is matched by .(dot) when used with or without a preceding \, but I believe it is always good practice.
As far as what I asked about the query string...
That is when a url looks like this:
http//www.yoursite.com/file.php?variable=stuff
As shown in the example above, the 'query string' is often used to pass variables from one php file to another, or to pass a different set of variables to a single php (or other dynamic) file.
Unfortunately most search engines cannot read (or choose not to) after the?, so eventhough the contents of the page change based on the information after the? the robot only sees a single page with everchanging content, and so the indexing of that page is very difficult, if at all.
The question I asked about the double rewrite has everything to do with the preceding paragraphs, and though it appears redundant, if you need to remove information from after the?, it may be essential.
The way it would work is:
A request for the original (old) .html page is made.
Apache grabs that request and redirects it to a new .html page.
Apache then grabs the request for the new .html page and rather than redirecting, or making any visible changes, serves the information from the .php file via the new .html URL that was requested. (This type of redirect is often refered to as a silent redirect, because only you and your server know the information is actually coming from a .php page, not the .html page that is in the browser address bar.)
This type of double rewrite/redirect is only useful if you need to remove a 'query string' from a url, while keeping backlinks intact.
Sorry I cannot be of more help with the redirect portion, it is not anything I have ever had a need to use.
(Actually I would like to know if there is a time when it allows greater flexibility or other advantages over a full RewriteRule... maybe you can help me out with this one?)
Justin
[edited by: jd01 at 6:00 am (utc) on April 22, 2005]
No, I'm not using variables with the php pages. So, regular expressions is probably not useful for my needs (well, at least until I test my redirects!)
I think I will try the simpler redirect code first and test a few individuals first, will see how it goes...If I'm successful, I will post an update :)
Ughhh.
>Added: It appears after looking further, that using redirect in either the root or on a per-directory basis, requires the full file path on both sides of the equasion. Using the example above of redirecting http//www.yoursite.html/stuff/file.html from the directory /stuff/ the correct syntax would be:
Redirect permanent /stuff/file\.html /new-stuff/file.html
Still not sure I understand, it's somewhat confusing to me. My old files never were in seperate folders, they were all in the main directory. Are you saying I can use the following?:
redirect 301 /index\.html [mysite.com...]
redirect 301 /oldfilename\.html [mysite...]
redirect 301 /oldfilename2\.html [mysite.com...]
>If all of your file names and paths are exactly the same length, this will not cause an issue, because the .(dot) in the file path you are redirecting is 'any character, except for a line break', and therefore is matched by .(dot) when used with or without a preceding \, but I believe it is always good practice.
Line break, as in carriage return? I do have some old filenames that are longer than allows on one line. Otherwise, I am not sure what you mean.
Renamed the htaccess, reverted it back to how it was and renamed it again. Apparently, my syntax is totally off. I don't know if it's because of carriage returns for the longer files, or if there is a case-sensitive issue (only with the word "redirect") or what.
Justin, I would be grateful if you could write me a line or two example for my htaccess, (hopefully using my preference for redirect 301), using full pathfiles from my examples above. Do you think the long pathfiles were causing my 500 errors?
(Thank you)
Like I said before, this is not something I use, but I will make an attempt...
I would try each of these examples separately to try to get the syntax correct, and not take your site down in the process (hopefully):
Redirect permanent ^oldfilename\.html$ [mysite...]
Redirect permanent ^/oldfilename\.html$ [mysite...]
RedirectMatch permanent ^oldfilename\.html$ [mysite...]
RedirectMatch permanent ^/oldfilename\.html$ [mysite...]
Other than this I don't know where to go, except to make sure the directive mod_alias is loaded on your server, you are running a version of Apache that supports status arguments (EG permanent, 301, etc.), any 'initiation' that must be included in your .htaccess file is present.
Sorry, but this is not a module I know or have any experience with.
While reading I did find that mod_rewrite is a "more powerful and *flexible* version" of what you are attempting to use.
If you decide to switch to a rewrite, I would be happy to help you.
Justin
The Redirect directive does not support regular expressions or regular-expressions notation.
This code:
Redirect 301 /index.html http://www.example.com/index.php
Jim
Your latest responses prompted me to review my server's information, and lo and behold, cpanel has an option for "Manage Redirects". Why on earth I never thought to look there is beyond me. Why on earth my server's support team never directed me there is also beyond me.
Thank you both for taking the time to help and for the valuable advise, this forum is awesome. If nothing else, maybe this thread can be helpful for others who are finding themselves in similar situations.
Again, thanks!
I've been rehashing this with my server support for well over two days now. Apparently, everyone has a different opinion, and most of the tech support people don't even have experience with this issue. I don't know who is right.
Is it true that once you delete old files (i.e, an html file), if they don't exist, you cannot redirect those filenames? If that were the case, then why was I able to redirect my index page? The orginal html file for my index page was deleted as well, just replaced with php.
I attempted to use cpanel's redirect options, but it threw 500 errors again, and moments later, a 403 Forbidden where I couldn't access the site, forums, or my cpanel. Apparently, support fixed the problem, but they didn't tell me how.
I have removed the redirects I tried to use in cpanel. One of the "higher-ups" at support told me there is nothing I can do because my old html files were deleted. He told me the *only* solution was to do as I am doing now, a meta-refresh in my 404 page and put up with constant 404's.
If I understand correctly, I will probably be dropped from google and other search engines. I reset my 404 meta-refresh to 0 to immediately take visitors to my index page, but this is only a bandaid.
Was the support person right? Am I doomed?
As a matter of fact, the '.html page' you are reading right now does not exist. Requests for this page are redirected to a script which builds the page from a database.
It sounds like your cpanel and server configuration are conflicting with what you want to do. You might want to shop around for a more usable host with more-experienced support (we don't do hosting reviews here, though).
Look for a hosting service that offers you a unique IP address, and *no control panel*, unless you add one to re-sell your space later. [opinion]Control panels are used to make the administration of simple sites easy for non-technical people. They are also intended to keep the hosting company 'safe' from having to spend a lot of time on security and support issues. They are not good if you need a lot of conrol and flexibility in administrating your site.[/opinion]
The next time you decide to change your site's technology, just tell the server to parse html for php using AddHandler [httpd.apache.org], name any new php pages as either ".php" or ".html", and the server will handle them just fine.
One of the things that people do all too often is to change URLs. There are very few valid reasons to do it, and switching to php is not one of them. Only after going through the grief you're experiencing -- and often losing their search engine ranking, do people see this idea as anything other than naive "Web purism." but there are very good practical reasons that "Cool URLs never change [webmasterworld.com]". (Thanks to pageoneresults for the link)"
If you are on a server setup where your URLs start with example.com/~username, you may have to specify that /~username part in the redirect code:
Redirect 301 /~username/index.html http://www.example.com/index.php
If it's any consolation, I find these 'confict with host configuration' problems frustrating, too. Sorry for the rant above.
Jim
I find it *very* interesting that when I tried using the cpanel's url redirect, once I removed the ones I tested, they *are* still redirecting properly. Don't know if this is an update problem on the server or what, but why would it continue to redirect properly if it was removed from cpanel's url redirect?
I'm beginning to think it's a server problem, not anything I've done incorrectly. Unfortunately, tech support says otherwise. It doesn't help when one tech says do this, and another says no, you can't do that. They also have not answered me regarding my inquiry if the server is mod-alias enabled.
In answer to your inquiry why I don't get a new server: This one was chosen on the basis of what I needed, it's a basic package, but does provide everything I need. It was chosen also based on good reviews, and after searching for months for a good server, this one was chosen, actually, by a programmer friend who I considered experienced.
I can't afford to be looking for another server, nor worry about the hassle of another move.
My restructure was done because I felt there was a true need for it. The files were a mess, and someone more experienced than me had done some additional things that eventually, were making matters worse. I am still in the process of trying to get rid of the latter. So, while I agree that restructures should be well thought out before hand, and preparation, expectations considered, I also think it is necessary in some cases.
UGLY is all I can say. Obviously your 'tech' is wrong, because as my 'sticky' outlined, Apache gladly serves millions, if not billions of pages that do not exist every day. This is the only way database driven sites like this one are possible.
There are many reasons for serving 'non-existing' pages. A couple of them are:
1. Search engines generally ignore any information after the? in a URL, meaning if this forum did not appear to be .html it would not be indexed, because the reality is it is probably less than 10 real .php files.
2. Sites with thousands (50,000 and up) would require so much hard disk space and time to create it would not be possible. Simple math for the creation of a 50,000 page '.html' site. Pretending you could build a page, completely, including title, tags, description, content, etc. in 2 hours. In a normal work day of 8 hours you could build 4 pages. 50000/4 = 12500 days or 34.24 YEARS to create the site.
(I have a 700+ page site indexed, it took me 3 months to fully complete 80 pages of html. The 6 php pages that appear to be html took less than a week, and account for the other 620+ indexed pages on my site.)
I am not sure 'tech' is appropriate for the person who told you otherwise.
* If any of your rewrite/redirect(s) work, as it appears in your previous posts. It stands to reason there are only two options:
The first is your server setting only allows redirects/rewrites from the httpd.conf file, which is usually where 'control panel' redirects are placed. (jdMorgan or someone more familiar with server settings can let you know of this possibility better than I can)
The second is there is something wrong with the code you are using, which could, among other things, be that mod_alias, which you are attempting to use is not loaded in your server configuration, and the redirects that are happening via the 'control panel' settings are actually mod_rewrites.
(The second option actually makes sense to me, because they appear to be *basically*, a redundant package, so why load them both? The Apache documentation states clearly:
"A more powerful and flexible set of directives for manipulating URLs is contained in the mod_rewrite module.")
In short logic says if *any* redirects are happening... There is the ability to redirect.
Obviously, as I did early this morning (about 4), I may have misread the manual or situation again. So, I will sign off of this thread. Should the situation change and you need help with something I am more familiar with I will be glad to give you a hand.
Hope you get things going.
I know this is not a solution, but a logical assessment says, the solution you are looking for is possible.
Keep looking and I am sure you will find it.
Justin