homepage Welcome to WebmasterWorld Guest from 54.227.12.219
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
.htaccess help, please
After spending most of a day trying, I am at a dead end and need help.
Komodo_Tale




msg:4343619
 4:40 pm on Jul 25, 2011 (gmt 0)
After working on this most of Sunday I am at a dead end and would appreciate help. Obviously I am not a coder. Thank you in advance.

Here is the logic I want to create:

If the URL is not exactly http://domain.net/robots.txt.
And {
If the URL is exactly http://domain.net or exactly http://domain.net/ go to http://domain.net.
ELSE
If the URL begins with http://domain.net/http://bbb, where bbb can be anything, go to http://domain.net/t/?url=bbb.
ELSE
If the URL begins with http://domain.net/yyy, where yyy can be anything, go to http://domain.net/r/?url=yyy.
}

==============

Current .htaccess
------------------------
Options +FollowSymLinks
RewriteEngine On
RewriteBase /

RewriteCond %{http_host} ^www\.domain\.net [NC]
RewriteRule ^(.*)$ http://domain.net/$1 [R=301,NC]

RewriteCond %{REQUEST_URI} !^/t/
RewriteCond %{REQUEST_URI} !^/r/
RewriteCond %{REQUEST_URI} ^/http:/
RewriteRule ^http:/(.*)$ http://domain.net/t/?url=$1 [R=301,L]

#=========================================================

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteCond %{REQUEST_URI} !^/t/
RewriteCond %{REQUEST_URI} !^/r/
RewriteCond %{REQUEST_URI} !^/http:/
RewriteRule ^(.*)$ http://domain.net/r/?url=$1 [R=301,L]

 

lucy24




msg:4343747
 9:48 pm on Jul 25, 2011 (gmt 0)

Oy.

What happens with your current htaccess? (Sure, someone could sit down and work it out from studying the lines of code, but it's a lot faster-- and, ahem, less annoying-- if you just say so.)

Here is the logic I want to create:

Welcome to htaccess. It does not work like any other programming language in the world. It is theoretically possible to combine [AND] (default) and [OR] in a single set of Rewrite Conditions, but I do not advise it unless you are rock-certain that you know what you are doing. In other words, don't try it ;)

If the URL is not exactly http://domain.net/robots.txt.
And

Let's start by leaving off the periods, because they have meaning in RegEx-speak, and htaccess is 90% RegEx.


If the URL is not exactly http://domain.net/robots.txt
AND
If the URL
( is exactly

You can stop here, because if the url is exactly /robots.txt (can we please assume that people who ask for www.domain.net/robots.txt are also allowed in, since that particular rewrite hasn't happened yet?) then by definition it is not exactly anything else. So you only need to exclude it from those conditions that would potentially allow it.

If the URL
( is exactly http://domain.net
or
is exactly http://domain.net/ )
go to http://domain.net

Um, you want to delete the trailing slash that your server has just gone to the trouble of adding?
ELSE
If the URL
( begins with http://domain.net/http://bbb )
/* where bbb can be anything */

Say wha? Is that really the form of your urls? How on earth did they get to be that way? Are you mopping up someone else's mistakes?
go to http://domain.net/t/?url=bbb
ELSE
If the URL
( begins with http://domain.net/yyy )
/* where yyy can be anything */

meaning, anything other than http: ?
go to http://domain.net/r/?url=yyy
...
RewriteBase /

Unless you have an extremely wonky server, that's the default RewriteBase and you don't need to specify it.

RewriteCond %{http_host} ^www\.domain\.net [NC]
RewriteRule ^(.*)$ http://domain.net/$1 [R=301,NC]

If they ask for www.domain.net, send them to domain.net. OK. But if you're on shared hosting, check the fine print. You may be able to do this kind of thing by clicking a button, and then you don't have to mess with it in htaccess. One less thing to keep track of.

RewriteCond %{REQUEST_URI} !^/t/
RewriteCond %{REQUEST_URI} !^/r/
RewriteCond %{REQUEST_URI} ^/http:/
RewriteRule ^http:/(.*)$ http://domain.net/t/?url=$1 [R=301,L]

If they don't ask for the /t/ directory, or the /r/ directory, but they do ask for something starting over again in /http:/ then send them to the /t/ directory, replacing the former query string-- if any-- with "url=/" (the second slash in http:// was included in your capture) and contains the whole remainder of the request.

If they are asking for /http:/ then they are by definition not asking for /t/ or /r/ so you don't need to say so. But I kinda suspect that mod_rewrite will go bonkers if you try to feed it http:/ anywhere other than the target string or certain versions of {THE_REQUEST}.

:: looking vaguely around for g1 ::

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteCond %{REQUEST_URI} !^/robots\.txt$
RewriteCond %{REQUEST_URI} !^/t/
RewriteCond %{REQUEST_URI} !^/r/
RewriteCond %{REQUEST_URI} !^/http:/
RewriteRule ^(.*)$ http://domain.net/r/?url=$1 [R=301,L]

If they ask for something that isn't an existing file, AND isn't an existing directory, AND isn't robots.txt (which presumably does exist, so you don't need to name it again), and isn't any of the same three directories listed above, then redirect them all over again to directory /r/, replacing the former query string-- if any-- with "url=everything they asked for".

Well. That was fun.

Actually I think your htaccess is pretty close, it's just got some lines that don't need to be there, and it isn't in the optimal order.

g1smd




msg:4343762
 11:00 pm on Jul 25, 2011 (gmt 0)

Please don't use URLs with http:// in the path. Colon is NOT a valid character to use in the path part of the URL. Slashes MUST imply a hierarchical structure.

Page 15 of [w3.org...] mentions "
reserved = | ; | / | # | ? | : | space "

See also [w3.org...]

URLs with multiple http:// constructions, can also trigger warnings from Internet Security programs.


In the original question, the words " go to " are used, but the meaning is ambiguous. Do you want the browser to be redirected to the paramter-based URL, or do you want an internal rewrite to serve the content.

Be sure you understand the differences between redirects and rewrites. Both use RewriteRule syntax but have totally different end functionality.

lucy24




msg:4343787
 1:18 am on Jul 26, 2011 (gmt 0)

" reserved = | ; | / | # | ? | : | space "

That reminds me! (OT, but it is bound to come up again.) If you are, let's say, cleaning up someone else's mess and therefore have inherited urls and/or queries with spaces, what do you do with them inside [abc] brackets? Escape as usual, leave them as-is, or hide behind a locution like \s? (Yes, lots of things can be \s -- but they're even worse than spaces in an url!)

URLs with multiple http:// constructions, can also trigger warnings from Internet Security programs.

Whew. I was pretty sure "mod_rewrite will go bonkers" was not quite the right technical term.

In the original question all the rules wind up with R=301, and this does seem to be the OP's intent.

:: wandering off to study w3 docs ::

Komodo_Tale




msg:4344272
 10:48 pm on Jul 26, 2011 (gmt 0)

Let's just say I want no 302 redirects in the chain and to keep it all SEO friendly.

As for the http:// in the URL, I know that is not kosher, but it is necessary. It's a method for input, the same as bit.ly uses. [bit.ly...] gives you a short URL for http://example.com.

I'm looking at some examples from open source URL shorteners since that is close to what I am doing. Right now everything seems to work 90% of the time but I want to compact my code and make it fullproof.

g1smd




msg:4344291
 11:54 pm on Jul 26, 2011 (gmt 0)

"Foolproof" involves not ignoring the HTTP specifications.

They set out what you can and cannot do. Period.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved