Forum Moderators: phranque

Message Too Old, No Replies

Redirect virtual URL's to published files

I need some information on url rewrite

         

mplacona

3:20 pm on May 20, 2009 (gmt 0)

10+ Year Member



Hi, I'm new in this forum and was looking for some information on url rewrite.

Basically I have my website using SEO URL's this way:

http://www.example.com/blog/archives.cfm/category/adobe
Which in reality is: http://www.example.com/blog/archives.cfm?category=adobe

What I would like to do is:

When a user hits this URL, he'd be shown a published flat HTML file I've already generated.

I've already accomplished it on the blog folder, by doing:

RewriteRule ^blog/$ /blog/published/index.html [L]

By logic for my other file I tried:

RewriteRule ^adobe$ /blog/published/adobe.html [L]

But that doesn't seem to work. Can somebody help me with this rewrite?

Thanks in advance,

Marcos Placona

[edited by: jdMorgan at 4:15 pm (utc) on May 20, 2009]
[edit reason] example.com [/edit]

jdMorgan

4:14 pm on May 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



^adobe$ matches only the exact URL-path "/adobe". It will not match /adobepage.html or /adobe/page, or anything else, just "/adobe".

Similarly, your first rule matches only exactly "/blog/" and nothing else.

Take a look at the regular-expressions tutorial cited in our Apache Forum Charter [webmasterworld.com] and specifically look into the concept of "anchoring" for more information.

As for your larger problem, try searching WebmasterWorld for "rewrite URL to script rewriterule" and similar keyphrases.

There are also several threads in our Apache Forum Library which may be quite helpful. Among them, the thread titled Changing Dynamic URLs to Static URLs [webmasterworld.com] may be most useful to you.

Jim

mplacona

4:28 pm on May 20, 2009 (gmt 0)

10+ Year Member



Hi, I've already read the mentioned resources, and what I'm after is really the exact match, as I have other pages underneath "blog" for example, and wouldn't want to redirect then anywhere else.

In "^adobe$ matches only the exact URL-path "/adobe" case, it's totally fine, but it doesn't seem to work. /adobe is only part of the url, and as you can see, the file itself is called archives.cfm, but only when it comes followed by the string "/category/adobe" I want to redirect to "/blog/published/adobe.html"

It all seems right in my optinion, but when i hit this file, nothing happens, it it still doesn't load the flat html file.

Thanks again

jdMorgan

5:29 pm on May 20, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> In "^adobe$ matches only the exact URL-path "/adobe" case, it's totally fine, but it doesn't seem to work. /adobe is only part of the url...

This is the logical contradiction that is apparently preventing your rule from working.

The pattern must match the URL-path part of the requested URL in order for the rule to be invoked. If "'/adobe' is only part of the URL," then your pattern won't match; It will match if and only if the requested URL is example.com/adobe, no more, no less.

Based on what you posted above, I'd say you need to add "blog/archive\.cfm/" ahead of "adobe" in your new rule's pattern, and put this new rule ahead of the generic /blog rewrite, so that the specific /abode rule is invoked first.

Again, I suggest a careful review of regex "anchoring."

Jim

[edited by: jdMorgan at 5:31 pm (utc) on May 20, 2009]

mplacona

11:39 am on May 21, 2009 (gmt 0)

10+ Year Member



Thank you so much Jim, I've finally got it to work doing:

RewriteRule ^blog/archives\.cfm/category/([a-zA-Z0-9-]+) /blog/published/category/$1.html [R,L]
.

It then lead me to two new questions:

It really redirects to the html page now. What I wanted to accomplish was "read" from the HTML page, but still remain at the same URL. I tried to mix and match different combinations such as [PT,L], [NC,L], but the only one that seemed to load my published page was [R,L].

The other problem I have, is that I have a script that does http requests in a dayly basis to my pages, in order to grab all the HTML generated, and save on my published folder. It's all fine, but if I have this rewrite in position, when my http request tried to "hit" http://www.example.com/blog/archives.cfm/category/adobe, it will automatically be redirected to the already published page, as it'll fall into the rewrite.

I know some about RewriteCond, and I was wondering is there's any way I could check the request, to know if it's coming from my own server, or if it's coming from anywhere else.

mplacona

1:12 pm on May 21, 2009 (gmt 0)

10+ Year Member



I'll just partially answer my own question.
For the second problem, I simply did:

RewriteCond %{REMOTE_ADDR} !^my_server's_ip$

before my RewriteRule, and it did the trick. now I'm only stuck on the first problem, which is:

Why is it doing the redirect to the html page. I just wanted to display the html page, but still stay at the same URL.

Cheers

g1smd

1:21 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



By including [R] you are forcing a 302 redirect.

Omitting R gives a rewrite.

[Additional Note: If you include a domain name in the target, you will get a 302 redirect.]

[edited by: g1smd at 1:25 pm (utc) on May 21, 2009]

jdMorgan

1:25 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's doing an external (client) 302 redirect because you explicitly told it to do so...

What you apparently want is an internal rewrite, and the syntax for that is different:


RewriteRule ^blog/archives\.cfm/category/([0-9A-Za-z\-]+)$ /blog/published/category/$1.html [L]

Note that the [R]edirect flag has been removed, and the hyphen in the grouped alternate pattern has been escaped.

Jim

mplacona

1:37 pm on May 21, 2009 (gmt 0)

10+ Year Member



Hi JD and g1smd,

I did try without the R, but it simply doesn't do anything. i went further and tried to find similar problems, and found this blog post, which talks about something similar I think:

[petefreitag.com...]

I tried using PT, PT,L, L alone, but none of them work, as they keep loading the dynamic page instead of the flat one.

Am I missing anything here? Is there a workaround for this problem?

Thanks

jdMorgan

2:32 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What exact URL-path did you request to test the new code?

Jim

mplacona

2:36 pm on May 21, 2009 (gmt 0)

10+ Year Member



The original URL is:

http://www.example.co.uk/blog/archives.cfm/category/adobe

The rule I tried is:
RewriteRule ^blog/archives\.cfm/category/([a-zA-Z0-9\-]+)$ /blog/published/category/$1.html [L]

And by hiting the original URL, it should open the contents from:
http://www.example.co.uk/blog/published/category/adobe.html

But on the original URL.

Thanks

[edited by: jdMorgan at 4:35 pm (utc) on May 21, 2009]
[edit reason] example.co.uk [/edit]

g1smd

3:59 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes it should.

What does Live HTTP Headers for Firefox have to say for this request?

Temporarily add the [R] flag back in just to ensure that the rule is generating the right filepath (you will see the URL change -- look at it and make sure the path part is the right one), and then take it back out again.

mplacona

4:24 pm on May 21, 2009 (gmt 0)

10+ Year Member



It does a request like this:

#request# GET http://www.example.co.uk/blog/archives.cfm/category/adobe
GET /blog/archives.cfm/category/adobe

If I put R back, it will do something like:

#request# GET http://www.example.co.uk/blog/archives.cfm/category/adobe
GET /blog/archives.cfm/category/adobe
#request# GET http://www.example.co.uk/blog/published/category/adobe.html

But then it's doing two requests, which is not performant, and goes to a url I didn't wanna show.

But that means it's going to the right path, but not doing the internal rewrite

Thanks in advance

[edited by: jdMorgan at 4:36 pm (utc) on May 21, 2009]
[edit reason] example.co.uk [/edit]

g1smd

4:30 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That shows that the rewrite is working, but why does it not pull content?

Is there really a physical file loaded at: /blog/published/category/adobe.html or is that feeding a rewrite itself?

mplacona

4:32 pm on May 21, 2009 (gmt 0)

10+ Year Member



No, this file really exists.

<snip>

It's my dynamic category generated as a flat html file

[edited by: jdMorgan at 4:37 pm (utc) on May 21, 2009]
[edit reason] No URLs, please. See Terms of Service. [/edit]

jdMorgan

4:44 pm on May 21, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is odd. If the rule pattern matches for a redirect, then it should also match for a rewrite. So I suspect something else is going on behind the scenes here. Here are some steps to take:

If you're on Apache 2, disable AcceptPathInfo (See Apache core directives documentation).

Disable content negotiation/MultiViews if you've got them enabled and don't need them (use "Options -MultiViews").

Put the rule back in "internal rewrite" form, try the /blog/archives.cfm/category/adobe URL again, and check your error log for any warnings.

Review all of your .htaccess files, and make sure that there is/are no other rule(s) which interfere with or countermand this rewrite.

Jim

mplacona

9:30 am on May 22, 2009 (gmt 0)

10+ Year Member



Hi, thanks for the answers. I tried everything, from disabling multiviews, disabling AcceptPathInfo, tried to mix and match and have one enabled and one disabled. tried to disable all of my rewrites, except for this one, but nothing seems to do the job.

Is there anything else I should try?

Cheers

jdMorgan

1:54 pm on May 22, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Leave MultiViews and AcceptPathInfo disabled. Put the rule back into "internal rewrite mode." Then request your test URL and take a hard look at your server access log and your server error log files. Any errors reported? How does the byte count look compared to normal? What is the server response code when using Live HTTP Headers to watch the client-server transactions? -- 200-OK, other?

You say that your request "does not pull content." What happens if you change the rewrite to point to a simple "hello world" HTML file --say, in your Web root directory-- a file that has no dependencies on anything else (no scripts, no includes, no images, no css, no JS), just a static file that you have placed there...

I guess my summary of the above would be, "Stop changing the code and the server config, and instead test the set-up that you have very thoroughly, using all of the tools available."

Jim