Forum Moderators: open

Message Too Old, No Replies

URL encoding and referrers

         

bitstearm

9:56 am on Feb 1, 2003 (gmt 0)

10+ Year Member



I have noticed that various XHTML strict validators don't like url encoding like: [somesite.cgi?data1=value&data2=value...]

I'm asked to replace & with & but this causes some problems on the server side where & isn't always interpreted as &. Has anyone else seen this? Any ideas what to do?

Also, I have noticed that URL encoding seems to break referrer on certain browsers such as Safari beta and also Mozilla 0.7 on KDE RedHat linux. Could this be a function of bugs in these beta versions or what?

Thanks

andreasfriedrich

10:16 am on Feb 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Writing & as & in XML/SGML documents is necessary since & has special meaning. It marks the beginning of a entity reference. If you want to use it literally you need to use its entity reference which happens to be &.

This is the same as escaping quotes within a quote delimited string in programming languages. When you want to have a string like

'Aaron's party'
you need to escape the single quote in the middle. Otherwise it will be interpreted to mark the end of the string. The
party'
will then most likely cause a syntax error.

Useragent have been more forgiving. They did not see this as a syntactical error when the use of the ampersand looked like a literal use. Programming languages could do the same. They might look further down the line for quotes. When they find another one they might automatically quote the middle one. However, such forgiving parsing rules make for a lot of ambiguities that are better avoided.

When you write a link like

http://www.ac.com/index.html?name=aaron&age=15
in an XML/SGML document the user agent will see that entity reference and know that you want it to render and interpret it as a literal ampersand. When the client requests that URI it will use
http://www.ac.com/index.html?name=aaron&age=15
. So all your server will ever see is the literal ampersand. Everything else would be a different URI which you might handle whatever way you want.

Andreas

bitstearm

10:26 am on Feb 1, 2003 (gmt 0)

10+ Year Member



Andreas,

Thanks for your explanation. It makes sense to me. Alright...better go and update my url encoding scripts right now :)

andreasfriedrich

10:30 am on Feb 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You´re welcome bitstearm.

BTW is that a mistyped bitstream? ;)

Andreas

bitstearm

10:32 am on Feb 1, 2003 (gmt 0)

10+ Year Member



Yes it is...was too eager to sign up for this forum ;)

andreasfriedrich

10:35 am on Feb 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, you might want change it while you are still young in WebmasterWorld terms. :)

bitstream

11:02 am on Feb 1, 2003 (gmt 0)

10+ Year Member



Done. Thanks for the reminder :D

BjarneDM

2:43 pm on Feb 1, 2003 (gmt 0)

10+ Year Member



I too have run into this problem with & required to be encoded as & :(

My solution was to use ';' instead and changing the php configuration to reflect this with : arg_separator.input=";" in /usr/lib/php.ini .

Works perfectly without a hitch :)

andreasfriedrich

2:46 pm on Feb 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BTW that´s the recommended W3C solution.