homepage Welcome to WebmasterWorld Guest from 54.226.213.228
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

This 47 message thread spans 2 pages: < < 47 ( 1 [2]     
non-www to www WITH subdomains
need help with correct configuration
mihomes




msg:4588081
 7:41 am on Jun 27, 2013 (gmt 0)

I am using the following for non-www to www on my sites :


RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]


I want to add a subdomain such as blog.example.com. I have two issues to figure out :

1 - Exclude the subdomain from the above rewrite as it currently rewrites blog.example.com to www.example.com/blog/

2 - The actual subdomain location is /blog/, but that should never be accessible to people or engines. So, /blog/* should always be blog.example.com/*

I'm guessing this is a pretty common thing, but I have not used subdomains in ages. Any recommendations would be great... I would like to keep the original rewrite intact since it is the most complete non-www to www I was able to come across to date.

thanks

 

mihomes




msg:4592898
 12:18 am on Jul 14, 2013 (gmt 0)

Lucy,

By shorter I meant that literally - as in 'example.com/test/index.htm to example.com/test/'. Yes, the parameters are untouched so they will stay. This is the whole htaccess as I am testing on a domain I do not use. When I do use this I will be replacing the expires rules with my own of course.

phranque,

You were correct. I did not update my comments properly to reflect subdomain file location in the sub_ds location. The second rule was also written incorrectly as you pointed out and has been corrected. It was meant to be the same result as the first just a different incoming format.

Yes, index options are set on the server for pretty much all index types. The default apache setup as far as I know as I have never touched them.

The index redirect you posted works perfectly fine with one minor change :

# Externally redirect index.(php|html?) in any location, preserving parameters, to location root
RewriteCond %{THE_REQUEST} !^[A-Z]{3,9}\ /phpsite/
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]+/)*)index\.(php|html?)
RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]

I removed the '/' at the end of the second condition. With it there was an endless loop happening as before. Typo? There would never be a trailing slash after the file called.

Everything appears to be working as intended unless there is an argument otherwise. I haven't used subdomains in ages so I appreciate the assistance with this.

Of course I am open ears as far as recommendations and best practices with this. I think I covered all the bases I wanted to with this.

[edited by: phranque at 12:57 am (utc) on Jul 14, 2013]
[edit reason] unlinked url [/edit]

phranque




msg:4592911
 12:57 am on Jul 14, 2013 (gmt 0)

that trailing backslash was supposed to be an escape of the trailing blank, which would essentially end-anchor the requested url path in THE_REQUEST.
i forgot it wouldn't show up well in the post.
it would exclude redirecting requests for "bad" urls such as:
GET /index.html-is-a-great-filename

lucy24




msg:4592957
 6:11 am on Jul 14, 2013 (gmt 0)

that trailing backslash was supposed to be an escape of the trailing blank, which would essentially end-anchor the requested url path in THE_REQUEST.

Does this work for you?! I tried it once and the server crashed. OK, I exaggerate. All requests got a 500-class error.

\b ought to work, though I'm told some servers don't like it. Similarly the [NS] flag should work to exclude mod_dir activity.

A request can always be anchored with trailing "\ HTTP". That's safe.

phranque




msg:4592990
 8:18 am on Jul 14, 2013 (gmt 0)

the [NS] flag should work to exclude mod_dir activity

i was aware of a flag for that but forgot to look for it when i responded.

A request can always be anchored with trailing "\ HTTP". That's safe.

yes - that's what i should have suggested.

mihomes




msg:4592998
 8:40 am on Jul 14, 2013 (gmt 0)

Just saw this and it makes more sense, BUT, with \ HTTP if you have any parameters the url just stays the same.

This, however, does work :

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]*/)*)index\.(php|html?)(\?[^\ ]*)\ HTTP

Speaking of... is it good practice to end all requests this way?

phranque




msg:4592999
 8:47 am on Jul 14, 2013 (gmt 0)

is it good practice to end all requests this way?

did you mean "end all patterns for THE_REQUEST"?
i use this to avoid the ambiguity since you will never see an unencoded blank within the url in THE_REQUEST.

JD_Toims




msg:4594745
 2:09 am on Jul 20, 2013 (gmt 0)

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]*/)*)index\.(php|html?)(\?[^\ ]*)\ HTTP

I think I'd write it as "is not a space, followed by a space, followed by HTTP" and not worry about the literal ? or () since it's not being back-referenced and in your example a literal ? is required for there to be a match:

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]*/)*)index\.(php|html?)[^\ ]*\ HTTP

mihomes




msg:4594760
 6:41 am on Jul 20, 2013 (gmt 0)

I actually ended up with this... I guess I should have updated the thread :

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]*/)*)index\.(php|html?)(\?[^\ ]*)?\ HTTP

This makes the parameters part optional, but still prevents weird file endings like .html54678 from redirecting and throws the 404. Your example will redirect these strange file endings.

JD_Toims




msg:4594764
 7:03 am on Jul 20, 2013 (gmt 0)

Hmmm, I know it will, but even if the "strange" file endings are redirected they should still generate a 404 if the file does not exist and your server is configured properly, plus, then you keep the subdomain totally separate from the main domain... If you don't redirect "odd file endings" from the main domain to the subdomain that should have been requested, then that's where the 404 occurs and when I run a subdomain/main domain (I have a few), I personally want the sub totally separate from the main domain for any request, whether the file is present on the server or not.

* The easiest thing I've found to do to keep things separate without a redirect is point the subdomain to a directory on the same level as what is generally the public_html directory. So, rather than allowing cPanel (guessing that's what you're using) to set the "host directory" of the subdomain within the public_html directory, I create a directory for the subdomain on the same level as public_html, then add the subdomain and set that as the "host directory". It makes a bunch of headaches go away when you do it that way, in my experience, especially when rewriting and redirecting.

mihomes




msg:4594782
 8:24 am on Jul 20, 2013 (gmt 0)

This particular rule was for everything, subs included, and just removes index files to root folder to prevent dupes.

The server is setup right... it has nothing to do with whether the file exists or not. Because your condition was met and allowed the strange file endings it redirects to the folder as it should. The one I posted on the other hand prevents those weird file endings.

/test/index.html56456 meets your condition and goes to /test/ where there is a valid index.php|html? so it loads with no error.

The other one prevents that.

lucy24




msg:4594788
 8:47 am on Jul 20, 2013 (gmt 0)

even if the "strange" file endings are redirected they should still generate a 404 if the file does not exist and your server is configured properly

Sure. But whenever possible, you want to avoid redirecting a request that's going to end up as something other than 200. That's why, to take the most obvious example, any RewriteRule ending in [F] goes before the ones leading to a redirect.

Sometimes you have to take the chance: for example you're not going to waste time and clog up the server by attaching -f and -d conditions to every single rule. But if you can filter something out simply by constraining the form of the request, go for it.

Then again, if you are getting hit with malformed requests for ".html56456", you may want to deal with them at some earlier stage. It falls under the heading of "problems you don't have to think about unless you have to think about them". Another one is // doubled slashes. Or garbage added after .html. Or bad queries. Or outside links that have got something in the wrong case. Or... et cetera.

JD_Toims




msg:4594824
 1:56 pm on Jul 20, 2013 (gmt 0)

/test/index.html56456 meets your condition and goes to /test/ where there is a valid index.php|html? so it loads with no error.

That's what I was meaning about there being something not correct with the server (or in thinking about it more, the blog software). It shouldn't 404 in one case and not another... The redirect to the correct subdomain should not have any bearing on a 404 being served.

* My guess is it's an issue with the blog software that should, imo, be fixed by forcing it to 404 rather than generating a page for those type of requests, because if it 200s those, then who knows what other currently unseen "ugly dupe" or "soft 404" issues will popup somewhere down the road.

you want to avoid redirecting a request that's going to end up as something other than 200

As I said, I prefer to keep 404s that should be on subdomains on the subdomain, not the main domain, so I guess we'll have to agree to disagree about whether I want to avoid redirecting anything that doesn't end in a 200. You may not want to redirect those, but personally, I do. (It actually happens all the time when sites are moved and everything is redirected.)

We know today Google at least says they don't "ding" a site for 404s, but what about other search engines? What about tomorrow? What about next year? What about the custom error page(s) for the specific subdomain so a real visitor can find what they were looking for? What about making it so if there's an error a visitor can just delete everything except the sub.domain.tld and be at the home page of the sub they should be on rather than the main domain?

I can think of a number of reasons I want to redirect them, even if others can't.

mihomes




msg:4594878
 7:38 pm on Jul 20, 2013 (gmt 0)

What are you even talking about? This has nothing to do with subdomains or blog software of any kind.

The rule you wrote earlier was wrong... well, not necessarily wrong per se, but you left out a scenario where odd filenames could be typed.

I explained it above - you are welcome to try it out and see. You don't need to have a subdomain and you don't need any blog software either lol. Just a folder with an index.htm, .html, or .php inside it and your rewrite rule.

JD_Toims




msg:4594879
 7:42 pm on Jul 20, 2013 (gmt 0)

This has nothing to do with subdomains or blog software of any kind.

I guess I got really confused then after reading this.

2 - The actual subdomain location is /blog/, but that should never be accessible to people or engines. So, /blog/* should always be blog.example.com/*

Sorry I thought the preceding meant this had something to do with a subdomain and likely blogging software.

# Externally redirect client requests for www.<subdomain>.example.com/<URLpath> to <subdomain>.example.com/<URLpath>
RewriteCond %{HTTP_HOST} ^www\.(.+)$
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteRule (.*) http://%1%{REQUEST_URI} [R=301,L]


to

# Externally redirect client requests for www.<subdomain>.example.com/<URLpath> to <subdomain>.example.com/<URLpath>
RewriteCond %{HTTP_HOST} ^www\.(.+)\.example\.com$
RewriteRule ^subs/([^/]+)/(.*)$ http://$1.example.com/$2 [R=301,L]

I also seem to have been confused by this part due to it seeming to reference subdomains in some way.

* I do have to admit I did just read the OP and then skimmed and skipped to this page to see if the issue had been solved, and when I got here all I saw was a condition that would not work without a literal ? present, so I thought I'd help and post a solution for the condition.

###

My condition only redirects if you use the .* or some other non-explicit match in the rule as it appears you have:

RewriteRule .* http://%{HTTP_HOST}/%1 [R=301,L]

RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /(([^/]*/)*)index\.(php|html?)[^\ ]*\ HTTP
RewriteRule index\.(php|html?)$ http://%{HTTP_HOST}/%1 [R=301,L]

Feel free to test it and let me know if it's not working correctly.

You actually get into a Ton of extra recursion and storage by using a "catch all" for the rule, because the first two patterns of the condition will be met for every single request for every single location on your whole site and every single subdomain. Without an explicit match in the rule the condition will not "break the matching" until index is not present on the end of any request made for any page, so an explicit rule match is much better / more efficient.

And if you are saying mine is "broken" because a malformed request is redirected to / I have to wonder why it would bother you to land a visitor on the most likely location they were looking for rather than serving a 404?

If I typed (or clicked) a link to /index.htmll or index.php5 I'd be very happy I was redirected to the correct location rather than getting a 404 error, so why would that be a "wrong" redirect?

mihomes




msg:4594907
 9:03 pm on Jul 20, 2013 (gmt 0)

I see - yeah, that rule was just something I was throwing in extra - it really had nothing to do with the subdomain setup or original reason for the post. It just removes the index file from the url and preserves the parameters if any.

The reason I would not want a broken url going to the correct location is this could be used in the wrong way against you. Say someone posts a ton of malformed index links to your site. This is one reason I am extra careful with these rules anymore. It isn't enough to use the correct urls in your site. Say you don't have a redirect in place for :80 or :443 then someone can use that against you.

JD_Toims




msg:4594913
 9:36 pm on Jul 20, 2013 (gmt 0)

it really had nothing to do with the subdomain setup or original reason for the post. It just removes the index file from the url and preserves the parameters if any.

Gotcha! I was thinking when you said mine returned a 200 you were meaning it was at an equivalent malformed location on the subdomain or something like that for some reason initially, then I realized you just meant the /index.htmlAnything was stripped also.

Say someone posts a ton of malformed index links to your site.

I can understand that.

Personally, I redirect anything containing index to the correct location for visitors, because typos happen and if I get a ton of links like you're talking about if it turns into an issue I can either disavow or take the redirect down.

At the same time, I do definitely understand being "protective" and "careful" these days, so I think the answer for "to redirect or to not redirect" is based on "comfort level" and that's cool. Some may be comfortable "taking the chance" and others may not. We're in "each to their own" territory on this one I think.

JD_Toims




msg:4594915
 9:52 pm on Jul 20, 2013 (gmt 0)

if I get a ton of links like you're talking about if it turns into an issue I can either disavow or take the redirect down.

Of course, much more fun with a bunch of those links would be to anchor the current redirect, then write another one for malformed requests and 301 em right back to one of the linking sites ;)

This 47 message thread spans 2 pages: < < 47 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved