Forum Moderators: phranque

Message Too Old, No Replies

Loop from url with an encoded unicode char

         

marciano

12:30 am on Aug 31, 2011 (gmt 0)

10+ Year Member



Hello,

I found that
www.mydomain.com/smth%F1odd
or
www.mydomain.com/smth%F3
get the user into a loop (several redirects)

www.mydomain.com/smth%20odd works as expected
"smth odd" not found

I have tested from
subdomain.mydomain.com/smth%F1odd and it dislpays
"smthñodd" not found.

I don't realize what is causing that loop in www.mydomain.com
I've removed .htaccess for just a couple of seconds to check something there is the problem source.
Same behavior.
I also have compared httpd.conf content from both www and subdomain virtual servers and nothing seem to be the problem.

Do you have an idea what to look into?
Thank you

marciano

12:39 am on Aug 31, 2011 (gmt 0)

10+ Year Member



Sorry, just a minute after I posted this thread I found the answer from Google Webmaster tools "Crawl Errors" dates. I looked about changes since the first error date.
There's something wrong in my 404.php file. Not an apache problem. Thanks

lucy24

12:45 am on Aug 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You should still deal with the underlying problem, which is neither apache nor php but file naming. Unless your entire site is in a non-Roman script, it's simply asking for trouble if file or directory names include anything other than alphanumerics, lowlines or hyphens.

marciano

1:03 am on Aug 31, 2011 (gmt 0)

10+ Year Member



Hi Lucy.
File names are all utf8.
I found requests containing chars other than a-zA-z0-9
%F1 %F3, etc.

g1smd

1:15 am on Aug 31, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You are not free to put any character you like in a URL. Only those listed in the HTTP specifications are allowed "as is".

Characters not that in the OK list must use the encoded form.

As for diagnosis, use the Live HTTP Headers extension for Firefox to see what is going on.

marciano

2:16 am on Aug 31, 2011 (gmt 0)

10+ Year Member



Hi!
From Google tools I found some http requests containing non-standard chars in file names causing redirect loops.
I fixed my 404.php and now those requests get a 404 response as it is supposed to be.
These are isolated cases.
There are lots of query strings containing encoded accented vowels, for example, but this has never been a problem.
Thanks for your suggestion about handling those chars in a URL from HTTP headers