homepage Welcome to WebmasterWorld Guest from 54.211.7.174
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Apache string replace www. with www/
JAB Creations




msg:4564793
 10:37 pm on Apr 14, 2013 (gmt 0)

I have the following in my .local.htaccess...

RewriteCond %{REQUEST_URI} !\.(gif|jpg|js|mp3|png)$
RewriteRule ^[^/]*/(images|search|scripts|sounds)(.+) $1$2 [QSA]


This allows me to keep resource files for multiple domains in the following manner...

http:// localhost/version3/www.example1.com/

http:// localhost/version3/
www.example2.com/

http:// localhost/version3/
www.example3.com/


However as I add more domains it's starting to flood the folder so I'm trying to figure out how to adjust the rule so I can replace "
www." with "www/" so my directory structure would instead reflect the following...

http:// localhost/version3/www/example1.com/

http:// localhost/version3/
www/example2.com/

http:// localhost/version3/
www/example3.com/


How can I match
www. and replace it with www/ in my existing rewrite?

- John

 

lucy24




msg:4564830
 1:07 am on Apr 15, 2013 (gmt 0)

Replacing www. with www/ is trivial; mod_rewrite unlike some mods doesn't give any special treatment to the / character. But I don't understand how it fits in with the quoted rule. I don't see any "www" in there, and no clue where it might occur.

Matter of fact, you won't need a "www" at all. You can call the subdirectory anything you like. It can be /www/ but certainly doesn't have to be.

Now that we've established that I'm not going to answer your question ;) --at least not right now-- let me toss out some other stuff.

#1 Wouldn't it be more efficient if the body of the rule specified the extensions the rule does want, instead of going to a condition to exclude the ones it doesn't want? The combination of /images and /sounds in the pattern, with !\.(gif|jpg|js|mp3|png)$ in the rule, definitely looks funny. Just what have you got in those directories, if not sounds and images?

#2 Why does the pattern start with ^[^/]*/ ? Are you getting a lot of malformed internal requests that have an extra / slash at the beginning? In htaccess, a normal request will start with a non-slash character, representing the first letter after "example.com/" (or some deeper directory if the htaccess file isn't in the root). As written, the rule will find either requests for a subdirectory-- throwing away the outermost directory, whatever it is-- or malformed requests for a top-level directory with doubled leading slash.

#3 A literal period in mid-URL isn't absolutely forbidden-- apache dot org themselves have buckets of them-- but it's really better if you can get rid of them. Then the only occurrence of a dot . will be right before the extension. This is often useful when you're constructing patterns.

#4
... (images|search|scripts|sounds)(.+) $1$2 [QSA]

Can I assume the first group is always followed by a directory slash and then more stuff? If so it could also be expressed as

((?:images|search|scripts|sounds)/.+)

resulting in a single $1 capture.

#5 The [QSA] isn't needed, because that's the default behavior unless you've explicitly added a new query. It also, of course, isn't needed if there was no query in the first place. Your rule doesn't give any hint about what the file extension is, only what it isn't, so I can't guess. What the rule does need-- and I'm surprised there haven't been unintended consequences-- is an [L] flag.


Oh and Psst! Although it has to be "example" in posts, it doesn't have to be dot com. You can say .org, .net, .uk, .xyz as much as you like if you need to distinguish among different domain names.

JAB Creations




msg:4564858
 4:59 am on Apr 15, 2013 (gmt 0)

Actually my goal was to keep the post extremely simple, now I have to explain the existing genius behind the setup in order for you to realize that there is in fact a "www." in there. ;)

My software is shared by multiple parked domains (multiple domains pointed to the same directory). My software simply chooses the database based on the domain name so the content is in the database, the structure in the PHP. One update to the structure and all sites using that software are automatically updated so I don't have to write the same stuff twice. Unlike many sites including though not limited to Yahoo mail *sigh* I test locally first. In order to make changes quickly local the structure is exactly the same with some minor differences. Obviously I can't use $_SERVER['HTTP_HOST'] for localhost/192.168.*.* so I make sure there is a www. in the URL...

if ($_SERVER['HTTP_HOST']!='localhost' && substr($_SERVER['HTTP_HOST'],0,7)!='192.168')
{//live
$p0 = explode('www.',$_SERVER['HTTP_HOST'],2);
}
else
{//local
$p0 = explode('www.',$_SERVER['REQUEST_URI'],2);
$p1 = explode('/',$p0[1],2);
}


So a live domain path would look like...

http:// www.example1.com/


...and a local domain path would look like...

http://localhost/Version 3/www.example1.com/


Assets

Obviously if there are multiple domains using the same software asset files from different domains would conflict. In example if two domains have the same file named crazy-penguins-on-my-front-lawn.png the second domain's webmaster who uploads them would traditionally overwrite the first webmaster's copy, obviously undesirable as I don't want any sites overwriting anyone else's content. So asset files are stored accordingly...

To Apache the full live domain path would look like...

http:// www.example1.com/www.example1.com/images/


...but because of the live rewrite from Apache the image still appears to load (in your browser) from...

http:// www.example1.com/images/


...while the local domain path would look like the following to Apache...

http://localhost/Version 3/www.example1.com/www.example1.com/images/


...when loaded (in your browser) the image would appear to load from the path...

http://localhost/Version 3/www.example1.com/images/


This way images appear to be loading from the same images path be it local or live.

However as the number of domains utilizing my software increases it will start to bulk the public_html/ directory on my live server and the same for my local directory for local testing.

So highlighted in blue below is where on localhost the domain is emulated...

RewriteCond %{REQUEST_URI} !\.(gif|jpg|js|mp3|png)$
RewriteRule ^[^/]*/(images|search|scripts|sounds)(.+)
$1$2 [QSA]


...when I access a domain the following way...

http:// localhost/Version 3/www.example.com/


...the file crazy-penguins-on-my-front-lawn.png is located (to Apache) at...

http:// localhost/Version 3/www.example.com/www.example.com/crazy-penguins-on-my-front-lawn.png


...and appears to load in the browser from...

http:// localhost/Version 3/www.example.com/crazy-penguins-on-my-front-lawn.png


I hope this makes sense? Yes, for synchronized local/live testing it does have to be this way though it works beautifully and this complexity makes coding simpler, hooray for irony!

#1 Wouldn't it be more efficient if the body of the rule specified the extensions the rule does want, instead of going to a condition to exclude the ones it doesn't want?


I think this post should clarify things to that question now.

#2 Why does the pattern start with ^[^/]*/ ?


If I recall correctly this means match the second directory depth. Again, I think this should make more sense now that I've clarified exactly what the Apache rewrite is doing.

#3 A literal period in mid-URL isn't absolutely forbidden-- apache dot org themselves have buckets of them-- but it's really better if you can get rid of them.


The efficiency is the ability to update multiple domains with a single update so this setup is absolutely necessary. If you really want to suggest an improvement over the one I'm attempting you're welcomed to it though I'd prefer to get the primary goal of this thread working first and then work on optimizations if possible please. :)

#4
... (images|search|scripts|sounds)(.+) $1$2 [QSA]

Can I assume the first group is always followed by a directory slash and then more stuff? If so it could also be expressed as ...


Correct, we could do that but it's midnight and I'll play around with that modification tomorrow or whenever we get the primary goal working first. I learn best from when things work, once I've had the ah-ha! moment I tend to pick up on the rest. Apache rewrites have been one of the absolute hardest things I've ever learned programming wise so I can't brain that much right now.

#5 The [QSA] isn't needed, because that's the default behavior unless you've explicitly added a new query. It also, of course, isn't needed if there was no query in the first place. Your rule doesn't give any hint about what the file extension is, only what it isn't, so I can't guess. What the rule does need-- and I'm surprised there haven't been unintended consequences-- is an [L] flag.


There is an [L] flag in some other semi-related though not directly related rewrites for making the blog|forum|themes folders be shared by rewrites though I'm trying to stay as on-topic and direct as possible right now without branching this out in to something horribly complicated.

Oh and Psst! Although it has to be "example" in posts, it doesn't have to be dot com. You can say .org, .net, .uk, .xyz as much as you like if you need to distinguish among different domain names.


The primary reason I did this was to emphasize multiple domain names a bit more clearly whereas someone might miss the different domain suffixes. Time to go back to bed, I'll check back in about 14-16 hours or so if I'm free tomorrow afternoon. Getting this working would help me a lot in the long term and also allow me to finish writing some PHP code so I don't have to rewrite it down the road. Thanks for your reply!

- John

g1smd




msg:4564877
 6:56 am on Apr 15, 2013 (gmt 0)

#2 Why does the pattern start with ^[^/]*/ here?
If I recall correctly this means match the second directory depth.

^[^/]*/bar will match requests for example.com/foo/bar and example.com//bar
Change to
^[^/]+/bar to no longer match the latter.
lucy24




msg:4565091
 1:33 am on Apr 16, 2013 (gmt 0)

What he said ;)

Isn't there a fancier version of MAMP/WAMP that lets you run multiple pseudo-domains at once?

:: detour to check ::

Yup, thought so. It's $58 but that's not bad if this is your day job.

Obviously I can't use $_SERVER['HTTP_HOST'] for localhost/192.168.*.* so I make sure there is a www.

Someone hereabouts brilliantly suggested looking for a : in the HTTP_HOST. Won't work if your "live" site uses some fancy https setup where there's always a port number, but I did swipe the idea for one file that goes
if ($_COOKIE["silence"] || $newval != "index.html" || strpos($_SERVER['HTTP_HOST'],":"))
It means that if I'm viewing locally, I get the repeat-visitor version of the page instead of the first-time version. (Repeaters get more links.)

#1 Wouldn't it be more efficient if the body of the rule specified the extensions the rule does want, instead of going to a condition to exclude the ones it doesn't want?

I think this post should clarify things to that question now.

Your post clarifies a lot, but I'm still not getting this one. Just how many different extensions have you got? For comparison purposes, most of my RewriteRules end in
(^|/|\.html)$
Some rules may throw in a .pdf or .php (generally with [NS] flag). But other extensions like .css or .jpg simply aren't mentioned in the rule. If you need to capture, you throw in an intervening
(([^/]+/)*[^/.]+)

...when I access a domain the following way...
http:// localhost/Version 3/www.example.com/

...the file crazy-penguins-on-my-front-lawn.png is located (to Apache) at...
http:// localhost/Version 3/www.example.com/www.example.com/crazy-penguins-on-my-front-lawn.png

...and appears to load in the browser from...
http:// localhost/Version 3/www.example.com/crazy-penguins-on-my-front-lawn.png


My head hurts :( Where does this rewrite take place? Obviously not in the rule we started out with, since that one specifically excludes image files.

So highlighted in blue below is where on localhost the domain is emulated...
RewriteCond %{REQUEST_URI} !\.(gif|jpg|js|mp3|png)$
RewriteRule ^[^/]*/(images|search|scripts|sounds)(.+)
$1$2 [QSA]

I don't get the connection between the two blue parts. $1 is any one member of the set (images|search|scripts|sounds). If the domain name is buried anywhere in the request, it has to be either in the first [^/]+/ directory-- the one you've thrown away-- or after the four-way option group.

...and appears to load in the browser from...

Don't get this either. Non-page files don't "appear to load" from anywhere in particular; as far as the user is concerned they just show up. A non-page file's URL is only visible if you're tracking every step in Live Headers. Or, of course, if you ask to view the image in isolation. As long as the final destination is in the appropriate format, it doesn't matter if it's a rewrite, multi-chain redirect, proxy or worse.

:: down to business ::

http:// localhost/Version 3/www.example.com/www.example.com/crazy-penguins-on-my-front-lawn.png

OK, so which of the two "www." elements is supposed to go into a fresh directory? And is that a simple split from a group of directories named
/www.example.com/
/www.example.org/
/www.example.net/
to a first-level directory
/www/
containing within itself all the previous directories, now renamed
/example.com/
/example.org/
/example.net/
? Or do you want to keep the element "www." so it's now
/www/www.example.com/www.example.com/
or possibly
www.example.com/www/www.example.com
?

I have to assume there's something more complicated than capturing
www\.example\.com
as
www\.([a-z-]+\.com)
or possibly
(www\.[a-z-]+\.com)
and rewriting to something involving
www/$1
or similar

And, now, wait a minute. Where'd /images/ /sounds/ etc. go? I thought they were the whole point of the rewrite.

:: looking grumpily around for someone who speaks Apache and php probably wouldn't hurt either ::

JAB Creations




msg:4565123
 4:09 am on Apr 16, 2013 (gmt 0)

So let me try explaining it this way...

The first part only matches extensions that are supported. If it's not in the list then the file type isn't supported and webmasters can't upload them. Again, this is a shortened list because it's merely an example.

RewriteCond %{REQUEST_URI} !\.(gif|jpg|js|mp3|png)$


Here is the second rule...
RewriteRule ^[^/]*/(images|search|scripts|sounds)(.+) $1$2 [QSA]


----

The first part of the second rule...

^[^/]*


...selects the SECOND real directory on the server. That means SKIP the first subdirectory (or domain) and match the second string.

So it skips the bold black and instead chooses the blue part of this path on the server (this part does not show up in the browser)...

http:// localhost/Version 3/www.example.com/www.example.com/


This part is a bit more obvious, the server will match the following folders...

(images|search|scripts|sounds)


----

So what does the actual folder structure look like before Apache does anything?

shared public root
Version 3/

specific domain, emulates live though access via localhost
Version 3/www.example.com/

assets for this domain stored in this folder
Version 3/www.example1.com/www.example1.com/

image assets
Version 3/www.example1.com/www.example1.com/images/

script assets
Version 3/www.exampl1e.com/www.example1.com/scripts/

assets for this domain stored in this folder
Version 3/www.example2.com/www.example2.com/

image assets
Version 3/www.example2.com/www.example2.com/images/

script assets
Version 3/www.example2.com/www.example2.com/scripts/


----

I don't get the connection between the two blue parts.


Yeah, I goofed that up. The RewriteCond says the rewrite rule applies to only files with these extensions. The rewrite rule itself says this only applies to directories that are two deep thus making them appear to be one deep.

----

I think I may either have to update the rewrite to replace the "www." with "www/" or add another RewriteCond.

----

I should leave it at that as I'm dead tired. I'll be able to better reply in detail tomorrow. Thanks for trying to help me out with this!

- John

lucy24




msg:4565131
 4:50 am on Apr 16, 2013 (gmt 0)

The RewriteCond says the rewrite rule applies to only files with these extensions.

Uhm... From where I'm sitting, it says it applies only to files without these extensions.

!\.(gif|jpg|js|mp3|png)$


?

JAB Creations




msg:4565275
 4:03 pm on Apr 16, 2013 (gmt 0)

ALRIGHT! I'm rested, have time to think and I can explain the two rules much better now. Please disregard my previous posts as this one should make a lot more sense...

This line...

RewriteCond %{REQUEST_URI} !\.(gif|jpg|js|mp3|png)$


...matches
non-shared files that need to be in their own domain-specific sub directories.

This line (
now better revised!)...

RewriteRule ^[^/]*/(admin|blog|contact|forums)(.+) $1$2 [QSA]


...is necessary for
shared files. Multiple domains may have active blogs/contact [forms]/forums, the structure is shared but their content is not (each site has a dedicated copy of a database).

So in essence these rules split the shared and non-shared content based on the context of the request.

So the blogs at these URLs...

http:// www.example1.com/blog/
http:// www.example2.com/blog/
http:// www.example3.com/blog/



...are all (for localhost) located at...

http:// localhost/version3/.local.htaccess
http:// localhost/version3/blog/


...and all on the live host located at...

/public_html/.htaccess
/public_html/blog/


----

If you request
crazy-penguins-on-my-front-lawn.png the browser url is rewritten by Apache to...

http:// localhost/version3/www.example2.com/images/crazy-penguins-on-my-front-lawn.png


...yet the server location is...

http:// localhost/version3/www.example2.com/www.example2.com/images/crazy-penguins-on-my-front-lawn.png


..and my goal is to have the file on the server stored at the following location instead...


http:// localhost/version3/www.example2.com/www/example2.com/images/crazy-penguins-on-my-front-lawn.png


----

Does this make more sense? Getting the Apache rules to work required a lot of fumbling and testing, and so unfortunately explaining it as well. Eventually I was able to get them to work as desired and that is the critical part.

- John

lucy24




msg:4565364
 7:09 pm on Apr 16, 2013 (gmt 0)

OK. I've just deleted a massive post containing the latest version of

I don't get it :: whine ::

as the shoe finally drops and I understand that there are two layers of rewriting involved.

FIRST you've got your users at www.example.com being secretly rewritten to receive content that lives at
long-boring-path-name/www.example.com/rest-of-requested-path

AND THEN you've got your overloaded server with a top-level directory currently containing
www.example.com
www.example2.com
...
www.example97524.com

and you want to shovel all those directories into a single /www/ directory which can then quietly bloat up with
example.com
example2.com
...
example97524.com
and so on until the cows come home, without making it impossible for you to find stuff in the root.

Have I NOW got it right?

The form "browser URL" is confusing because supporting files don't have browser URLs-- that is, the user doesn't normally see where they're coming from. (Obvious example: Unless you look at the code, you have no idea why a page consisting entirely of hotlinks takes soooo long to load up.) But the browser knows where it's requesting them from, so by keeping it as a rewrite instead of a redirect, you save the browser from having to make a fresh request every single time.

Right so far?

So the first piece of the request-- the piece that contains the directory name-- is thrown away in the case of shared files. But how did it get into the request in the first place? If it didn't come from htaccess, it must have come from php. And why is php adding something that isn't going to be used?

See, I've got this lurking suspicion that this isn't an apache issue at all. It may be better solved at the php level. Unless you're particularly concerned about human users snooping into the html and finding the "real" paths laid bare before their eyes.

It keeps coming back to: Obviously the solution doesn't simply involve
www\.([a-z-]+\.com)
>>
www/$1

because you would have worked that out for yourself without all those sleepless nights ;)

JAB Creations




msg:4565369
 7:27 pm on Apr 16, 2013 (gmt 0)

and so on until the cows come home, without making it impossible for you to find stuff in the root.

Have I NOW got it right?


YES! :)

The form "browser URL" is confusing because supporting files don't have browser URLs-- that is, the user doesn't normally see where they're coming from.


In Firefox right-click on an image and view it's image properties to see it's URL, that is what I'm talking about.

so by keeping it as a rewrite instead of a redirect, you save the browser from having to make a fresh request every single time.


A redirect would be messy and ultimately just not work in any reasonable way plus it would likely be horribly static requiring me to manually add image paths, completely unreliable at any reasonable volume of sites. By having this code dynamic it works all the time automatically.

So the first piece of the request-- the piece that contains the directory name-- is thrown away in the case of shared files. But how did it get into the request in the first place? If it didn't come from htaccess, it must have come from php. And why is php adding something that isn't going to be used?


The first piece makes exceptions for non-shared files (e.g. images). The second part is below where Apache hands things off to a rewrite.php file.

See, I've got this lurking suspicion that this isn't an apache issue at all. It may be better solved at the php level.


No, it's totally a PHP issue. This wouldn't be possible or it would be exceptionally difficult to pull off with PHP.

Unless you're particularly concerned about human users snooping into the html and finding the "real" paths laid bare before their eyes.


If you attempt to visit a double-domain path it's simply marked as a 404 unless I explicitly create a page at that address with my CMS so there is no need to hide anything.

This is where Apache hands things off to PHP for shared resources...

RewriteRule ^(scripts\/|themes\/) - [L]
RewriteCond %{REQUEST_URI} !.*/(admin|blog|contact|forums)
RewriteRule !\.(css|js|xml|zip)$ rewrite.php


because you would have worked that out for yourself without all those sleepless nights


I'll play around with that and see what if anything I can come up with. Yeah, this is a giant mind-warp but once you figure it out it's frigin gold.

- John

lucy24




msg:4565385
 8:25 pm on Apr 16, 2013 (gmt 0)

RewriteRule ^(scripts\/|themes\/) - [L]

Or rather:
RewriteRule ^(scripts|themes)/ - [L]
Directory slashes never need to be escaped in mod_rewrite. And no point in repeating an element that's used in all options.

RewriteCond %{REQUEST_URI} !.*/(admin|blog|contact|forums)

The .* element isn't needed unless you are capturing the part in parentheses-- which obviously you're not, since it's a negative. Doubly superfluous when there's no opening anchor.

RewriteCond %{REQUEST_URI} !.*/(admin|blog|contact|forums)
RewriteRule !\.(css|js|xml|zip)$ rewrite.php

:: setting aside the duplicate ! negatives, because they make my head hurt ::

Are you working with sites that already exist, so all the files have got established URLs? If not, it seems as if you could save yourself a lot of bother by sorting things at the directory level. Any given directory name can contain either shared material or non-shared material, but never both.

JAB Creations




msg:4565393
 8:57 pm on Apr 16, 2013 (gmt 0)

because you would have worked that out for yourself without all those sleepless nights


Actually I was doing this to get two domains to share this setup so I wasn't thinking about the root public folder filling up at that point so yes, I do still need help with this as I'm far from mastering Apache rewrites.

Are you working with sites that already exist, so all the files have got established URLs?


I have several at the moment that are live.

If not, it seems as if you could save yourself a lot of bother by sorting things at the directory level.


...I need help with changing www. to www/. Everything else is established and working beautifully so why would I want to make more work for myself when this is already saving me a lot of hassle?

- John

lucy24




msg:4565400
 9:16 pm on Apr 16, 2013 (gmt 0)

why would I want to make more work for myself

The idea here was to make less work for yourself as well as the server. (Yes, both! This is not always the case.) Right now the rules have to test for two different things: location (name of directory) and filetype (extension). If you were setting up new sites, matching up filetypes and directories would save a lot of trouble, because each piece by itself would give your rule the information it needs. /images/ == \.(gif|png|jpg)$ and so on down the list.

If the sites already exist then obviously there's nothing more to do. Unless you've got two domains today but are looking forward to a happy future where you've got a thousand ;) Then it might still be worth the short-term bother of changing things.

And if you make the best possible rules at the beginning, you don't have to go back and fine-tune them later. So it saves aggravation in the long run.

JAB Creations




msg:4565403
 9:23 pm on Apr 16, 2013 (gmt 0)

Is this your way of saying you're not sure how to use Apache to do the www. to www/ rewrite or that you prefer to use PHP for this? What exactly is the benefit you are proposing? Besides saving time of course, and yes if there were to be changes better now than later but I'm only a short way away from having everything working exactly as desired.

- John

lucy24




msg:4565439
 12:26 am on Apr 17, 2013 (gmt 0)

The business about extensions and directories is tangential to the www.-to-www/ rewrite. So if it truly isn't practical, we'll just set it aside.

The rewrite is straightforward once you know what form the URL has at the time it reaches your htaccess. For example if your incoming request is for
http://www.example.com/www.example.com/morestuff

and you wanted it to be instead
http://www.example.com/www/example.com/morestuff

the rule would be (leaving out the bits about extensions and directories which you've already got in place)
RewriteRule www\.([\w-]+\.com)/(more-stuff-here) /www/$1/$2 [L]

Here I changed my earlier "a-z" to \w because numerals and lowlines can also occur in domain names, and both count as \w. You can keep it at "a-z" if your domain names are constrained to alphabetics.

BUT there's one very important thing to understand. The browser makes its requests based on where it "thinks" it is, not on where it "really" is. If
www.example37.net

is getting silently rewrittenn to
long-complicated-path/www/example37.net/more-stuff-here

then rewrites involving images or css have to be written as if the starting point was
www.example37.net/images/picname.jpg
NOT
long-complicated-path/www/example37.net/images/picname.jpg

In other words: that first rewrite, from domain name to physical directory, has to happen every time, not only on page requests. That's why it's important to know what form the URL has when it first lands in htaccess.

Make sure all RewriteRules include an opening anchor. Two reasons: to make the rule run faster-- if it doesn't match right away, mod_rewrite can stop looking-- and to make sure you don't get into an infinite loop with more and more stuff being added to the front of the path. Sometimes this has to be done with another condition looking at %{THE_REQUEST}, but here you should be able to keep it all in the body of the rule.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved