Forum Moderators: phranque

Message Too Old, No Replies

P3P 404: Apache RedirectMatch for '3E' anywhere in URL?

         

JAB Creations

3:03 am on Jan 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I am not sure how to do a wild card match for the string '3E'. The specific problem is that the P3P validator is requesting...
http://www.example.com/%3E

...which is creating 404 error codes.

Due to prior interactions, poor documentation online, and inability for others to post something for me to *replicate* (as learning is the detection of patterns) I'm instead asking how to do a wild card match for the string '3E' in the URL?

- John

jdMorgan

3:35 am on Jan 17, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Assuming you want to rewrite requests from the P3P validator only, and to the same URL-path, but dropping the "%3E" from the end, then something like this:

RewriteCond %{HTTP_USER_AGENT} ^P3P\ Client$
RewriteCond %{THE_REQUEST} ^/([^\%]*)\%3E[^?]*\? [NC]
RewriteRule .* /%1 [L]

If the %3E is not always at the end, and is followed by more URL-path info, then a more-complicated pattern will be required.
Here, we're using THE_REQUEST to avoid the un-encoding that takes place for REQUEST_URI and the URL-path 'seen' by RewriteRule.

However, it might be possible to catch this with a simpler rule:


RewriteCond %{HTTP_USER_AGENT} ^P3P\ Client$
RewriteRule ^([^>]*)> /$1 [L]

In either case, we do not look for "%3E" in query strings; Only a %3E in the URL-path is considered to be a match requiring a rewrite.

Jim

JAB Creations

6:49 pm on Jan 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks for posting though nothing happens live with either script running Apache 1.3 though in a sub-folder? It also doesn't work *local* on Apache 2 when using either script at the root?

- John

* Edited - live to local, running Apache 2 locally.

[edited by: JAB_Creations at 7:28 pm (utc) on Jan. 18, 2008]

gergoe

7:45 pm on Jan 18, 2008 (gmt 0)

10+ Year Member



How does it come that this character happens in the url at all? Might sound nonsense, but why not eliminate the cause, if that's possible? For example a badly written parser could read http://example.com/> from the <a href=http://example.com/> link (note the missing quotes), so maybe just one line needs to be changed somewhere?

JAB Creations

8:02 pm on Jan 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The specific problem is that the P3P validator is requesting...
- John

- John

jdMorgan

8:17 pm on Jan 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, I forgot to include the HTTP method in the pattern for THE_REQUEST:

RewriteCond %{HTTP_USER_AGENT} ^P3P\ Client$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^\%]*)\%3E[^?]*\? [NC]
RewriteRule .* /%1 [L]

Jim

gergoe

8:39 pm on Jan 18, 2008 (gmt 0)

10+ Year Member



Sure, I'm not yet blind, just wanted to point out that validators/programs are not used to place &gt; signs at the end of the url's.

JAB Creations

9:57 pm on Jan 18, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks Jim though I tried messing around with it without any different results this time testing everything I could on the live server (1.3x).

Gergoe, the validator is spawning this problem and I'm not exactly sure what the symbol is (such as %2F is a space when spaces can't be parsed?)

- John

jdMorgan

12:22 am on Jan 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Two things:

Make sure that the client user-agent is exactly "P3P Client" (case-sensitive).
If the client is capable of caching, disable caching if possible.

Jim

jdMorgan

1:04 am on Jan 19, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



And another bug: Missing second "?" in pattern due to forum auto-editing. The first is a literal, the second a regex token:

RewriteCond %{HTTP_USER_AGENT} ^P3P\ Client$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^\%]*)\%3E[^?]*\?? [NC]
RewriteRule .* /%1 [L]

Ugly problem, ugly code. ;)

Jim

[edited by: jdMorgan at 1:05 am (utc) on Jan. 19, 2008]

JAB Creations

4:50 am on Jan 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wow yes this just isn't pretty! So I've tested this local and live again. Live is running Apache 1.3.39 and I made a slight mistake by overlooking the difference in the useragent when I was messing with the regular expressions. Any way what is funny is that I get an error 500 though only when I'm using the exact useragent and request that exact URL (at the public root of course). I'm accustomed to having to immediately delete the .htaccess file from messing up the entire domain with error 500s for everyone.

Which brings me to a good question, is there a way to have Apache reveal more detailed information when I encounter an error 500? Thanks for your help!

- John

jdMorgan

1:42 pm on Jan 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Look in your server error log -- It often tells you exactly what is wrong.

Jim

Receptional Andy

2:04 pm on Jan 21, 2008 (gmt 0)



I'm not exactly sure what the symbol is

It's the encoded version of the greater than symbol (>) as George implied, which does make it seem like a bad link somewhere, maybe in the inline policy reference?

<meta http-equiv="P3P" content='policyref="http://www.example.com/w3c/p3p.xml"'>

[edited by: Receptional_Andy at 2:06 pm (utc) on Jan. 21, 2008]

JAB Creations

5:19 pm on Jan 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here is my P3P header...
header('P3P: policyref="http://www.example.com/w3c/p3p.xml"');

Locally I get the following message (running Apache 2.2.3)...

.htaccess: Invalid command 'RewriteEngine', perhaps misspelled or defined by a module not included in the server configuration

Then I removed this line...

RewriteEngine On

and then received this message...

Invalid command 'RewriteCond', perhaps misspelled or defined by a module not included in the server configuration

I've been waiting about twenty minutes for the live server to update the error message log.

- John

jdMorgan

5:42 pm on Jan 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



... looks like you do not have mod_rewrite installed on your local machine, as stated in the first error message.

Jim

gergoe

5:44 pm on Jan 21, 2008 (gmt 0)

10+ Year Member



Make sure that the
LoadModule modules/mod_rewrite.so
is not commented out in your main server config (conf/httpd.conf).

JAB Creations

5:57 pm on Jan 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks, I stopped Apache, uncommented mod_rewrite in the httpd.conf file, and restarted Apache. I then switched the useragent using Chris Pederick's user agent switcher extension in Firefox and requested http://localhost/%3E. The first time I requested it I was redirected to http://localhost/ and after directly requesting http://localhost/%3E again I kept hitting 403 errors. Either one is fine though what is the explanation for this? I presume this is related to caching?

The logs for my current live server update in the afternoon. There is no error.log file itself, just a list in the control panel which I presume will be updated when the server recompiles the monthly log.

- John

jdMorgan

6:12 pm on Jan 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, this gets even uglier. There may be an internal rewrite loop going on. And if so, it's not easy to fix:

RewriteCond %{ENV:rwDone} !^True$ [NC]
RewriteCond %{HTTP_USER_AGENT} ^P3P\ Client$
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^\%]*)\%3E[^?]*\? [NC]
RewriteRule . /%1 [E=rwDone:True,L]

We have to create and use an internal server variable to prevent this rule from recursing, since we can't use a RewriteRule pattern to do it because RewriteRule cannot 'see' the %3E or ">" (based on the fact that you selected the more complex rule from my first post).

Jim

JAB Creations

7:11 pm on Jan 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I just so happened to test the first script, test the second, and then tried tweaking the first again I think. It doesn't matter how we get this to work, just so long as the error code isn't a 404 (301 or 403 is fine by me). I would hope the P3P validator authors (one of whom I could not reach and the other I have not yet received a reply from) are interested in any HTTP codes other then 200 that their validator might be encountering.

The second script now works live on Apache 1.3.39. I'm not sure what I did differently however? What is the rewrite rule doing in English? With my very rough understanding of operators my best guess is that ^> is that since %3E is a > character this is cropping off anything after this character and sending the URL before the greater than symbol?

What would be keeping Apache 2 from executing the simpler script? I use a slightly modified version of XAMPP locally.

- John

jdMorgan

9:09 pm on Jan 21, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> this is cropping off anything after this character and sending the URL before the greater than symbol?

Yes. But it's not "sending" anything. It is simply changing the URL-path that the server will next convert to a filepath -- by lopping off the ">" on the end.

If this were a search engine, I'd recommend using an external redirect. But if the the validator is producing and requesting bad URLs, there's no guarantee that it would follow the redirect correctly either -- and I assume that your goal is to simply get the validation completed.

Jim