homepage Welcome to WebmasterWorld Guest from 54.225.57.156
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
remove the folders name from start of the file name
This is deceptively simple, i thought it would be a piece of cake
nigelt74




msg:4365697
 3:56 am on Sep 22, 2011 (gmt 0)

Ok this has me stumped, i haven't had to use redirects for a while, and when i have i have they have been fairly simple.

the existing URLs are in this form

the filename always starts with the foldername.

/a/b/c/d/e/f/cooked/cooked-haddock.html
/a/b/c/d/e/f/cooked/cooked-Hedgehog-in-sauce.html
/a/b/c/d/e/f/deep_fried/deep_fried-badger.html
/a/b/c/d/e/f/1-2_3grt/1-2_3grt-jhdfjkfhsk-sffksh.html

and they need to be rewritten to

/j/k/cooked/haddock.html
/j/k/cooked/Hedgehog-in-sauce.html
/j/k/deep_fried/badger.html
/j/k/1-2_3grt/jhdfjkfhsk-sffksh.html

Basically i can't work out how to Match and remove the foldername from the filename,
the filename always starts with the foldername.

edit should add the htaccess file is at the same level as the directories to be redirected
so

/a/b/c/d/e/f/.htaccess

 

lucy24




msg:4365732
 6:17 am on Sep 22, 2011 (gmt 0)

Is the nesting always the same or did you just make it like that for your examples? If it's really that simple, all you need is a redirect from

/a/b/c/d/e/f/([^/]+)/[^-]+-([^.]+\.html)

to

http://www.example.com/j/k/$1/$2

And then come back and explain whether you meant Redirect or Rewrite, since you used both terms. Probably Redirect. But you will want to use mod_rewrite unless there are special circumstances.

And leave those hedgehogs alone. They're endangered.

Edit after closer inspection:
Oh, ###. Do you really have names that contain - (hyphen) within the part that's duplicated? ###. That makes it much more complicated. In non-htaccess circumstances you'd be looking at a ({blahblah})/\1 pattern, but this gets us into RegEx dialectal variation.

Someone will investigate.

g1smd




msg:4365736
 6:31 am on Sep 22, 2011 (gmt 0)

The question is unclear.

If you really meant "rewrite" then one of those is a URL and the other is an internal folder path.

If you meant "redirect" both of those are URLs.

Please rephrase the question using example.com for the URLs.

"When user requests URL example.com/x I want to silently internally rewrite the request to fetch content from the internal server path at /y".

"When user requests URL example.com/y I want to externally redirect request to a different URL, example.com/x".

I suspect you have misunderstood how a rewrite works. Most people have it exactly backwards. A rewrite does not make new URLs. A rewrite accepts a URL request after a link is clicked and fetches the content from a place inside the server that is different to that suggested by the path part of the requested URL. URLs and files are not at all the same thing. They are related merely by the actions of a server. URLs are used "out there" on the web. Files are used "here" inside the server. A rewrite alters the default relationship of which file is served when a URL is requested.

[edited by: g1smd at 7:03 am (utc) on Sep 22, 2011]

nigelt74




msg:4365743
 7:03 am on Sep 22, 2011 (gmt 0)

@Lucy - yep thats my problem in a nutshell, i thought it would be dead simple, but some of the actual folder names contain hyphens, i can make specific rules for them as they only make up 10% of cases, but i thought i could get away with an all in one solution.

@g1smd
Ok, um
I am fairly sure it is a redirect i want.

these files no longer exist
/deep_fried/deep_fried-badger.html
/1-2_3grt/1-2_3grt-jhdfjkfhsk-sffksh.html

They have been replaced with

/deep_fried/badger.html
/1-2_3grt/jhdfjkfhsk-sffksh.html

The nesting i had previously is unimportant, it was mainly there to show that they had (in addition to being renamed) been moved to a different section of the same website

so yes
"When user requests URL example.com/y I want to externally redirect request to a different URL, example.com/x".

g1smd




msg:4365749
 7:21 am on Sep 22, 2011 (gmt 0)

A simple RewriteRule with domain name included in the rule target and the [R=301,L] flags is what you require.

You need to make a list of all of the old URL formats:
example.com/products/<letters>/<letters or numbers>/<letters>-<5 digit number>/
and the corresponding new URL format for each.

Only once you have that detailed list can you begin to think about coding. URLs with hyphens are not a problem. You can use the hyphen as a delimiter in your pattern matching or just include it in the allowed characters and copy it through into the new URL.

YOu'll likely end up with several rules, one for each format. Each rule will in itself be responsible for redirecting dozens or hundreds of URLs. The rules will need to be sorted from "most specific" to "most general".

This question, or one very like it, is asked 1 to 3 times per day in this forum, so there's more than 5000 prior examples of the correct format for RewriteRule configured to prodiuce a redirect.
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]

nigelt74




msg:4365758
 8:07 am on Sep 22, 2011 (gmt 0)

Ok, i get where you are coming from, But the issue is not hyphens or patterns (as there aren't any specific patterns), redirecting the folder is dead easy, But from what i am reading i am going to have to create a rule for every single file.

These are the old file and folders that need to be changed, you cannot make a rule to detect a pattern in the filename, without knowing what the foldername is, because the foldername is the pattern.

I know I am explaining this really badly, and i know your sitting there interpreting it very differently to the way i am meaning it.

example number 1

/a-b/a-b-toasters.html
/a-b/a-b-bread-maker.html
bob_jones/bob_jones-hats.html

would be changed to

/a-b/toasters.html
/a-b/bread-maker.html
bob_jones/hats.html


as you can see there is no way to create a detailed list that will work without taking into account the foldername, as the filename can be anything and have any number of hyphens in it (most have 1), the only real pattern is they all filenames start with the name of the folder that they are within.
eg in the folder bob all filenames begin with bob-

Does that make it clearer, or am i missing something big

g1smd




msg:4365993
 7:00 pm on Sep 22, 2011 (gmt 0)

You might be able to have one rule per folder, if for /a-b/ it is a-b- that is stripped from the next part.

It will become more clear when you make a list of all the mapppings.

The alternative is to rewrite (that's rewrite not redirect) all of those requests to a special PHP script that uses the old URL request to look up the new URL and then send the 301 header and the Location header back to the browser. You'll likely store the old and new URL pairs in a array.

lucy24




msg:4366025
 8:23 pm on Sep 22, 2011 (gmt 0)

How many different folders are you stripping? That is, the ones you listed originally as "cooked", "deep-fried" etc. Can you get your Conditions down to this pattern? (This is assuming you're not allowed to match within the same Cond, which would make it much easier.)

RewriteCond %{REQUEST_URI} ^(?:(?:[^/.]+/){n})([^/.]+)/[^/.]+\.html
RewriteRule %1/%1(-[^/.]+\.html) http://www.example.com/j/k/%1$1 [R=301,L]

The part I showed as {n} has to be replaced with some specific number determined by the directory you're redirecting. If it is not safe to use ?: "no-capture" in .htaccess, then replace %1 with %3.

This translates as:
Condition: The requested address is a bunch of folders you don't care about, followed by one folder you do care about, winding up with a page that you also care about but you won't capture it here.
Rule: If you have two consecutive occurrences of the same thing (to be established in the Condition), first as a directory name and then as the beginning of a filename, then rewrite, throwing away the directory part. If you are allowed to put %1 inside the parentheses, it can become part of the $1.

Beware of \ for / typos in the above. I keep finding one more.


Dammit, Forums, when I said "Disable smileys" I meant keep them disabled!

nigelt74




msg:4366035
 8:45 pm on Sep 22, 2011 (gmt 0)

Ok i'm back

there are only around 20-30 folders in this section, so yes i can create a rule for each folder, and i will probably do that, while i look at the other options, you both mentioned.

The main reason i was asking is that there are 4 or 5 more sections that will need the same treatment coming up and these new section are 2-300 folders each, so calling a php script might be a very, very good option,

Ok calling the php script thats the rewritemap function isn't it, I'll have to have a good read on it.

Thankyou

I

g1smd




msg:4366061
 9:37 pm on Sep 22, 2011 (gmt 0)

calling the php script thats the rewritemap function isn't it

No.

RewriteRule ^some-pattern /special-script.php?var=some-params [L]
where pattern matches specific URL requests, and the PHP script looks up the new URL for this request, and the PHP script sends the 301 and location HTTP headers.

nigelt74




msg:4366104
 2:01 am on Sep 23, 2011 (gmt 0)

Ok here goes, it seems to work but i have 2 questions

my .htaccess

RewriteEngine On
RewriteRule ^([^/.]+/)(.*) fixmyurl.php?args=$1&bob=$2 [R=301,L,QSA]


my fixmyurl.php

<?php
$folder_name = str_replace('/','',$_REQUEST['args']);
$new_file_name = str_replace($folder_name.'-','',$_REQUEST['bob']);
$new_url = 'http://example.com/b/'.$folder_name.'/'.$new_file_name;

header("Location: $new_url");
?>


Question 1)
is the [R=301,L,QSA] correct, from everything I have seen i believe it is, BUT, i have a nagging feeling somethings hinky, I just want to be sure that it is telling everyone that the old url has been replaced by the new url and NOT the fixmyurl.php file. make sense?

Question 2)
This one is embarrasing, but I can't seem to get the foldername without the / after it, Its not a massive issue as i can use a str_replace in php to clean it up, but i know i should be able to do it in the .htaccess

nigelt74




msg:4366124
 3:16 am on Sep 23, 2011 (gmt 0)

Ok now thats odd, just came back after being out for an hour and the rule isn't working.

apparently it can't find the fixmyurl.php file
------------------------------------
and now its working again

[edited by: nigelt74 at 3:22 am (utc) on Sep 23, 2011]

lucy24




msg:4366129
 3:21 am on Sep 23, 2011 (gmt 0)

apparently it can't find the fixmyurl.php file

If you left off the leading slash in the redirect, it doesn't know where to look. The leading slash means it's at the top level of your domain. If it's somewhere else, then of course you need to spell it out.

Speaking of which: Yes, you leave off the [R=301]. You're not redirecting, you're just taking a furtive detour to a php script. This script in turn will do the redirecting; there is no reason for the user to know you've been there.

QSA is correct if there is other stuff in the query that you need to keep. But if there is no pre-existing query, then you don't need it. It won't do any harm in this situation, but it isn't necessary.

nigelt74




msg:4366136
 3:44 am on Sep 23, 2011 (gmt 0)

Ok i think i have it now.

Does this look right, it certainly works.

.htaccess


RewriteEngine On
RewriteRule ^([^/.]+/)(.*) fixmyurl.php?args=$1&bob=$2


fixmyurl.php


<?php
$folder_name = str_replace('/','',$_REQUEST['args']);
$new_file_name = str_replace($folder_name.'-','',$_REQUEST['bob']);
$new_url = 'http://example.com/b/'.$folder_name.'/'.$new_file_name;

header("Location: $new_url",true,301);
?>


edited--------------------
Somehow i appended .php on the end of the new url

[edited by: nigelt74 at 4:08 am (utc) on Sep 23, 2011]

lucy24




msg:4366140
 3:59 am on Sep 23, 2011 (gmt 0)

Always put an [L] at the end of each Rule, unless you have a clear and specific reason not to do so. (Or if you've got [F], which renders all other flags superfluous.)

And, er, I was talking through my hat when I said all that about the leading slash. Your location has already been determined by the ^ in the Rule, so the Rewrite will carry on from the same location. Oops.

I don't speak php. But unlike htaccess, php errors don't usually result in your whole site crashing to a halt, so it is safe to experiment.

g1smd




msg:4366194
 6:42 am on Sep 23, 2011 (gmt 0)

Compare your code with the instrctions in post #4365993...

rewrite (that's rewrite not redirect) all of those requests to a special PHP script

and the code in post #4366061...

RewriteRule ^some-pattern /special-script.php?var=some-params [L]

Note the instruction to use a rewrite, the leading slash for the target and the [L] flag.

The R flag is NOT correct as it changes the ruleset to be a redirect.

nigelt74




msg:4367148
 11:09 pm on Sep 25, 2011 (gmt 0)

Thanks Guys, i do have the [L] flag in there now,
as regards to using the [R=301] I get it now, it just seemed wrong when i was first doing it, but the more i read the more it makes sense.

g1smd




msg:4367162
 12:10 am on Sep 26, 2011 (gmt 0)

Each RewriteRule can be configured either as a redirect or as a rewrite. That confuses a lot of people.

lucy24




msg:4367185
 1:00 am on Sep 26, 2011 (gmt 0)

...especially when the Rewrite contains a Redirect within itself. You're not going off to the php file and staying there, it's simply a detour on the way to your real destination-- which will be determined by that same php file.

"Can I borrow your car to go to the liquor store?"
###, no.
"Can I borrow your car to go to the bank?"
Sure, here are the keys.

If you choose to stop by the liquor store on the way to the bank-- so you'll know just how much replenishing your wallet needs-- that is nobody's business but your own.

This may not be the single best analogy I have ever come up with.

nigelt74




msg:4367190
 1:25 am on Sep 26, 2011 (gmt 0)

Ok I am back and stumped.
Found one minor issue, the rule works fine however i can't work out how to redirect the html files that are in the same directory as my .htaccess

Everything i try either doesn't work, or breaks the existing rule.

this is the contents of the relevant folder

.htaccess
/index.html - doesn't redirect
/ruhg.html - doesn't redirect
/noodle/index.html does redirect
/noodle/noodle-7.html does redirect
/muddle/index.html does redirect
/muddle/muddle-7.html does redirect

Please just tell me if my logic here is correct
I think I need a different rule for those files that are at the same level as the .htaccess as they are redirected to a higher level than ones in the subfolders, I also believe that, i need to create a rewritecond that excludes the subfolders so the two rules work in harmony.
Is that correct?

lucy24




msg:4367203
 1:43 am on Sep 26, 2011 (gmt 0)

RewriteRule ^([^/.]+/)(.*) fixmyurl.php?args=$1&bob=$2

There's your culprit. Your RewriteRule mandates exactly one directory before the filename: the [^/.]+/ part. It's not caused by being in the same directory as the htaccess, except in the coincidental sense that they are all top-level files.

Can you put a ? at the end of the first grouping, and tweak the php file to deal with a possible null value for "args"? Or did I miss the reason why you needed a directory name in the first place?

nigelt74




msg:4367238
 5:12 am on Sep 26, 2011 (gmt 0)

Believe me i have tried putting a ? in the rule like below, (tweaking a php file is simple) which i thought was correct, but that just broke the redirect and somehow sent the fixmyurl page as one of its own parameters, but the foldername is important for those files within a subfolder.

RewriteRule ^([^/.]+/)?(.*) fixmyurl.php?args=$1&bob=$2 [L]

I have also tried placing it in every other conceivable position, but couldn't get it to work, that when I decided to use a rewrite condition.

Unfortunately the rewrite condition didn't work any better as it repeatedly broke the redirect, But after 4 or 5 hours of staring at it, i think i'm ready to chuck it in till tomorrow

lucy24




msg:4367245
 6:33 am on Sep 26, 2011 (gmt 0)

Oh, wait. You need to say something like

RewriteCond %{REQUEST_URI} !fixmyurl\.php

so it doesn't go around in circles.

g1smd




msg:4367256
 8:03 am on Sep 26, 2011 (gmt 0)

Someone else asked a similar sort of question around the same time you did, so the stuff in [webmasterworld.com...] might be useful, especially post #4367158 showing various patterns.

nigelt74




msg:4367565
 1:33 am on Sep 27, 2011 (gmt 0)

Cheers for that, i had been watching that thread.

I finally have it working, found out a major part of my problem was browser cacheing, as my code started working after i had cleared my cache manually.

So here is my corrected code

.htaccess


RewriteEngine On
RewriteCond %{REQUEST_URI} !fixit\.php
RewriteRule ^([^/.]+/)?(.*) fixit.php?args=$1&bob=$2 [L]


and here is my fixit.php

<?php
if (($_REQUEST['args'])=='') {
$new_url = 'http://example.com/new/';
} else {
$folder_name = str_replace('/','',$_REQUEST['args']);
$new_file_name = str_replace($folder_name.'-','',$_REQUEST['bob']);
$new_url = 'http://example.com/new/'.$folder_name.'/'.$new_file_name;
}
header("Location: $new_url",true,301);
?>


Just want to say Thank you very much guys, it has been really helpful, and i am hoping some of this sinks in.

g1smd




msg:4367649
 6:48 am on Sep 27, 2011 (gmt 0)

Your rule may or may not capture the first folder level from the requested URL into $1. I'd expect the ruleset to randomly fail for some valid requests.

Additionally, requests for robots.txt are rewritten to the script. I assume that's not correct.

The fixit script looks way too simple compared to the the URL examples in the original post.

nigelt74




msg:4368510
 8:27 pm on Sep 28, 2011 (gmt 0)

It seems to catch everything i have thrown at it, and i have tried dozens of different urls and filenames, trying to trip it up.
As to robots.txt, this whole section is within a subfolder, so there are no worries with that, infact there should be only .html files in these folders any other files in there are either out of date or invalid.

So cheers, I will definitely remember robots.txt for future as it hadn't even crossed my mind, and its something i will need to keep an eye on for any rewrites in the root.

g1smd




msg:4368574
 12:08 am on Sep 29, 2011 (gmt 0)

What happens if I request whatever.png in that folder? or an html page that doesn't exist?

Does the PHP script return "404 Not Found" or does it return a blank page/empty file but served as "200 OK".

The "200 OK" status will kill your site indexing.

lucy24




msg:4368626
 4:39 am on Sep 29, 2011 (gmt 0)

i have tried dozens of different urls and filenames, trying to trip it up

But you want to trip it up. It should react differently to a spurious ("foobar.html") or malformed ("wid*get..pztf") name than to a real one. At a minimum, constrain the search to

\.html$

nigelt74




msg:4368635
 5:11 am on Sep 29, 2011 (gmt 0)

I see what you mean, however its not an issue as such on this site because of the way the site is created

The old section (like the new section) is automatically created by a php script, and it only creates html files, there is nothing else there whatsoever.

What is happening with the current redirect is
Basically any incorrect/spurious filenames(old or deleted products) in the subfolders of the old section will be redirected to the new subfolders and will get a 404 there.
Anything in the root folder of the old section will be redirected to the root index file of the new section, as the functionality of all valid files at that level have been rolled into the index file.

besides when i tried limiting it html files only, it broke on the root of the old section.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved