Forum Moderators: phranque

Message Too Old, No Replies

Double encoding - what is it?

         

biggles

11:05 am on Feb 17, 2003 (gmt 0)

10+ Year Member



A clients' site developed in Vignette is not being re-indexed by PositionPro for either FAST or Inktomi listings. Apart from the homepage none of the URLs have been re-indexed since 7 Jan, despite the to be a 48 hour refresh.

PositionPro tell me the reason is that the URL's are "double encoded" and causing an issue with Inktomi and their system matching URL's. Aparently recoding the URL's properly should have the refresh dates generate for the URL's.

Would someone be good enough to explain to in plain English what double encoding is please (I'm a marketer, not a techie) - I've not been able to find much that I can understand and have a meeting with the client shortly.

Current URL formats are as follows: [xyz.com...]

My understanding is that the double encoding issue is the fact that URLs contain the string "%25" (why, I have no idea). I have experimented and found that if I remove either the "25" or the "%25" and enter these modified URLs in a browser the pages still display.

i.e. Both these URLs work for the same page...http://www.xyz.com/0,1234,SectionID%3D1234%26ContentID%3D1234,00.html
and [xyz.com...]

Any advice on what to do appreciated.

Thanks in advance.

Dreamquick

11:38 am on Feb 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Basically what double encoding/decoding means is that two stages of encoding have been applied to a string. Encoding in URLs refers to the use of % hex codes in place of characters which could cause some browsers to mis-interpret the link.

Your problem is that once something decodes the URL the first time it finds that there are *still* encoded variables inside it. Might be an issue or it might just be that someone upgraded their software and the upgrade flags this as an issue.

ie if we decode the following ("%25" = "%");

http://www.xyz.com/0,1234,SectionID%253D1234%2526ContentID%253D1234,00.html

we get this, which is still using encoding
http://www.xyz.com/0,1234,SectionID%3D1234%26ContentID%3D1234,00.html

finally if we decode that we get this ("%3D" = "=", "%26" = "&");
http://www.xyz.com/0,1234,SectionID=1234&ContentID=1234,00.html

As to why this happened, or if this was a bug or feature - I have no idea. The URL you posted looks to me to have the same data included in it several times over, so presumably this redundancy is what keeps it working despite your changes to the URL structure.

- Tony

andreasfriedrich

1:13 pm on Feb 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Encoding is not so much about writing certain characters which could cause some browsers to mis-interpret the link as hex codes but about using hex code for characters that are simply not allowed (not within the unreserved class) in an URI or that have special meaning (reserved characters) but one wants to use them in their literal sense.

URIs may contain three groups of characters: reserved ¦ unreserved ¦ escaped. Where these groups are defined as follows.

reserved := ";" ¦ "/" ¦ "?" ¦ ":" ¦ "@" ¦ "&" ¦ "=" ¦ "+" ¦ "$" ¦ ","

unreserverd := alphanum ¦ "-" ¦ "_" ¦ "." ¦ "!" ¦ "~" ¦ "*" ¦ "'" ¦ "(" ¦ ")"

escaped := "%" hex hex

The ampersand "&" is quite often used to delimit parameter=value pairs when passing data in an URIs query string. Now imagine you want to pass along the string "Aaron&Nick" as the value of the "who" parameter:


test.php?test=Aaron&Nick

would be interpreted as "test=Aaron" and "Nick" where the latter was a parameter name without a value. To get the literal meaning of we need to escape all reserved characters.


test.php?test=Aaron%26Nick

is interpreted as the parameter "test" with the value "Aaron&Nick".

If one wants a reserved character to retain its literal value even after unescaping you just escape it again (double encoding). As you have seen from your own experience escaping an URI more than once may cause problems since every escaping may change an URI´s semantics (As demonstrated by the "Aaron&Nick" example above).

See [faqs.org...] for details.

HTH Andreas

biggles

9:48 pm on Feb 17, 2003 (gmt 0)

10+ Year Member



Dreamquick & Andreas

Thanks very much for your helpful replies.

Cheers