homepage Welcome to WebmasterWorld Guest from 54.167.173.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Need help replacing '%23' in incoming links
JWJonline




msg:1515139
 5:55 pm on Dec 26, 2005 (gmt 0)

I am getting a number of 404's caused by the '#' in some of my links being interpretted a '%23'. For example, my link http://www.example.com/poems.shtml#Leisure is being read as http://www.example.com/poems.shtml%23Leisure. As best as I can determine this is because the links have been miscoded in some old cached pages somewhere. I have done all I can to find the source of the problem but can not.

In the end I decided to simply use a redirect in htaccess to solve the problem but that doesn't seem to be possible. At least, I've not figured out how. If a redirect isn't possible then perhaps the characters '%23' could be detected and replaced with a '#'. Can anyone suggest a suitable solution.
Thanks.

[edited by: jdMorgan at 6:26 pm (utc) on Dec. 26, 2005]
[edit reason] Example.com [/edit]

 

jdMorgan




msg:1515140
 6:29 pm on Dec 26, 2005 (gmt 0)

JWJonline,

Welcome to WebmasterWorld!

I'm not sure what problem you're having a problem with mod_rewrite, since there's no code or log file entries to discuss. So, I'm guessing that the "#" has actually been double-escaped to %2523, since you say you're having trouble rewriting it. If that's the case, then RewiteRule will see "%23" in the requested URI instead of "#"

I'd suggest:

RewriteRule ^([^%]*)\%23(.*)$ http://www.example.com/$1#$2 [R=301,L]

as a start.

Jim

JWJonline




msg:1515141
 6:37 pm on Dec 26, 2005 (gmt 0)

Thanks for the swift reply Jim. (and sorry about the links). I'm not sure that I explained my problem too well but, in a nut shell, I want to redirect an incoming http://www.example.com/poems.shtml%23Leisure as http://www.example.com/poems.shtml#Leisure. I'll give your rewrite a try and see if that works. Thanks.

Later:
Nope. That still gives me a 404. Looking at the code am I right to believe it is placing everything before the %23 in $1 and everything after it in $2 and then writing them with a '#' between?

jdMorgan




msg:1515142
 6:59 pm on Dec 26, 2005 (gmt 0)

Yes, I'm assuming that the request is arriving at your server as "GET /<something>%2523<something> HTTP/1.1". RewriteRule in .htaccess would then see this as a request for "<something>%23<something>", and we rewrite that to "<something>#<something>".

It would be most helpful if you would post the relevant info from your server access log and server error log, and specify whether you're trying to put this code in .htaccess or httpd.conf, and whether you already have any currently-working rewrites in that file. At this point, we (here) don't know if you've even got mod_rewrite enabled.

This should be a trivial problem to fix, but there are many details that must be taken care of.

Jim

JWJonline




msg:1515143
 7:16 pm on Dec 26, 2005 (gmt 0)

The request is arriving at my server as "GET /<something>%23<something> HTTP/1.1". (Not %2523 as in your example). I do have mod_rewrite enabled. A typical log entry would be "81.174.131.187 - - [30/Nov/2005:10:20:21 +0000] "GET /poems.php%23Leisure HTTP/1.0" 404 3150 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Maxthon)"

Forgive my inexperience but if this is not the best way of posting my error log please give ne some guidance.

I would be happy to post my htaccess file but again would need guidance unless a simple 'past' would do it.

killroy




msg:1515144
 7:26 pm on Dec 26, 2005 (gmt 0)

I doubt u can redirect to a url includign a #, since as far as I know # get filtered out at the browser end and never get send to the server. teh best u can hope for is to strip it in mod_rewrite.

JWJonline




msg:1515145
 7:40 pm on Dec 26, 2005 (gmt 0)

I'd be happy to do that ... strip it out. I just want to 'catch' the link and do something useful with it rather than leave it to go to a 404. Rewriting poems.shtml%23Leisure as poems.shtml would be great.

jdMorgan




msg:1515146
 10:13 pm on Dec 26, 2005 (gmt 0)

Well, in that case, I'd suggest:

RewriteRule ^([^#]*)# http://www.example.com/$1 [R=301,L]

and if that doesn't work, then

RewriteCond %{REQUEST_URI} ^/([^%]*)\%23
RewriteRule .+ http://www.example.com/%1 [R=301,L]

Jim

JWJonline




msg:1515147
 11:20 am on Dec 27, 2005 (gmt 0)

Thanks for those 2 suggestions Jim but sadly neither of them solve the problem. I am placing them at the end of my htaccess ... is the order of things important?

jdMorgan




msg:1515148
 11:35 pm on Dec 27, 2005 (gmt 0)

With mod_rewrite, *everything* is important; One character can change the behaviour completely -- or take down your server.

I can't think of anything to recommend, other than posting *a lot* more details about what happens when you test this, what's in your access log, what's in your error log, etc. I'm afraid that "It doesn't work" is not any more useful in a server debug context than it would be if you were asking for remote help to repair a car or something... ;)

Jim

JWJonline




msg:1515149
 12:06 am on Dec 28, 2005 (gmt 0)

OK ... sorry about that. You've given me no information on how to post my log nor how to show you my htaccess file. There is a saying about the blind leading the blind. I wish I knew how to help you help me but I don't. Forget it .... and thanks for trying.

JWJonline




msg:1515150
 11:03 am on Dec 28, 2005 (gmt 0)

I don't wish to be put off using a great resource just because I don't understand enough to know what information is needed with regards to my problem, so I will start over in the hope that I explain myself in a better way this time around.

On my web site I use a number of Named Anchors, typically something like www.example.com/poems.shtml#Leisure. This work very well most of the time but I am frequently finding entries such as this in my access logs ... 81.79.143.133 - - [27/Dec/2005:16:38:37 +0000] "GET /poems.shtml%23Leisure HTTP/1.1" 404 3024 "-" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322)"

In my efforts to maintain my site in good working order at all times I examine the cause of all 404's and try to resolve whatever is responsible. Working with this problem over many weeks I have satisfied myself that my pages are coded correctly, that browser visitors are not getting 404's and that this problem seems to be caused by some bots possibly using some old cached pages that have poorly encoded the '#' to '%23'. This means I cannot correct the problem at source and can only therefore detect it at destination and redirect it.

I would be very grateful for any suggestions from anyone as to things I might try to redirect these bad url's to a meaningful page either by redirecting, by replacing the '%23' with '#' or by stripping away the bad anchor completely. I have established that conventional redirects in htaccess don't work (although I can't explain why) and the further suggestions in this thread don't have any effect either, and again, I'm afraid I can't explain why. The above posted log entry was the result of putting in place Jim's last suggestion and then typing www.example.com/poems.shtml%23Leisure in my browsers address box. The result was that my custom 404 page was displayed once again just as if the htaccess file wasn't there.

To avoid any issues brought about by the code being in the wrong place in the file relative to other things that are already there I have tried a new htaccess file that only contains the following :-

RewriteEngine On

RewriteCond %{REQUEST_URI} ^/([^%]*)\%23
RewriteRule .+ http://www.example.com/%1 [R=301,L]

(and yes, I did use my proper domain and not 'example' ;) )

If there is further information that I need to post in order to explain the problem better then please let me know and I will go get it.

Thanks.

jdMorgan




msg:1515151
 3:56 pm on Dec 28, 2005 (gmt 0)

In the previous post, you had qualified your requests for 'posting instruction' with statements like 'If this isn't the right way', etc. So, no-one has replied to that aspect simply because it's OK to post your relevant log entries and code, as long as identifiers for any personally-owned sites (yours or others') are obscured. We typically use 'example.com' and replace the second octet of IP address with '***', and short, relevant snippets are preferred over 'code dumps'.

I would suggest that you create a test page with two 'bad links' on it, one containing the '#' character and the other with the %23-encoded version. Then fluch your browser cache and test the code (explicitly flushing the browser cache is required after *any* change to your server-side code in order to get valid results) by clicking on the links on your test page.

The problem with typing '%23' into your browser address bar is that your browser will encode the % to %25, sending %2523 to your server, and you're right back where you started again, because the current code is only looking for %23. If you just type '#' then the browser will encode that to %23.

There is a possibility that this is the cause; Maybe your testing method isn't accurately reproducing the same conditions as an actual malformed request. Another possibility is that the '#' in the code needs to be escaped with '\' or something. Other than that, there's nothing magical or mysterious about replacing or removing the '#' character -- as long as that is in fact the character that appears in the request at the point where a given mod_rewrite directive is processing it.

Basically, these character-encoding problems are very often confusing and difficult, and this is just another example of that. So don't give up.

Jim

JWJonline




msg:1515152
 6:39 pm on Dec 28, 2005 (gmt 0)

Thanks Jim. I *think* I have understood what you've advised and I can see that flawed testing may not be helping. This is what I've done.

I created a testpage called httest.shtml. I created 3 links. A good link using '#' to a valid named anchor, a bad link using '#' but to a non-existant anchor, and a bad link using '%23' but to a valid anchor. My expectation would be that the first link would work fine, the second would simply resolve to the top of the current page (without errors) and that the third would return back a 404. That is precisely what DID happen.

I then changed my htaccess file to use one containing the last of your suggested fixes and had the same result. It generated the following log entry when I tried the bad link. 84.64.79.*** --[28/Dec/2005:17:51:14 +0000] GET /%23badanchor HTTP/1.1 404 3024 http://www.example.com/httest.shtmlMozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.0.3705; .NET CLR 1.1.4322).

Then I replaced your code with your first suggestion. I got the same result.

Next I used your second suggestion (RewriteRule ^([^#]*)# http://www.example.com/$1 [R=301,L]). To my amazement instead of a 404 I resolved to my home page. That isn't a 'perfect' solution but it's a darn good one that I can live with.

Finally, I added that 'working' fix to my original htaccess file and it doesn't work any more. Thinking that my original file needed RewriteEngine on I added that but then I received a server side message 403 'Forbidden'. Something enabled my Hotlink Protection and locked me out.

So, it would now seem that something in my htaccess file is nullifying the effect of your code. I paste my file here (with my domain replaced with example.com) in the hope you can spot what it is.

# To allow video

AddType audio/mpeg .mp3
AddType application/octet-stream .asz
AddType video/x-ms-asf asf asx
AddType audio/x-ms-wma wma
AddType audio/x-ms-wax wax
AddType video/x-ms-wmv wmv
AddType video/x-ms-wvx wvx
AddType video/x-ms-wm wm
AddType video/x-ms-wmx wmx
AddType audio/x-pn-realaudio ram rm
AddType audio/x-pn-realaudio-plugin rpm
AddType audio/x-realaudio ra
AddType application/smil smi smil
AddType audio/x-ms-wax wax

# Redirects from old sites to new and
# Redirects from html to php

Redirect permanent /AOL.htm http://www.example.com/projectremoved.shtml
Redirect permanent /artintro.htm http://www.example.com/artintro.php
Redirect permanent /Comparisons.htm http://www.example.com/poems.htm#Comparisons0
Redirect permanent /Contact.htm http://www.example.com/contact.php
Redirect permanent /contact.htm http://www.example.com/contact.php
Redirect permanent /Design.htm http://www.example.com/design.php
Redirect permanent /design.htm http://www.example.com/design.php
Redirect permanent /disasters.htm http://www.example.com/disasters.php
Redirect permanent /draughts.htm http://www.example.com/draughts.php
Redirect permanent /england.htm http://www.example.com/england.php
Redirect permanent /England.htm http://www.example.com/england.php
Redirect permanent /Explanation.htm http://www.example.com/projectremoved.shtml
Redirect permanent /fairies.htm http://www.example.com/fairies.php
Redirect permanent /Flagpin.htm http://www.example.com/flagpin.shtml
Redirect permanent /flagpin.htm http://www.example.com/flagpin.shtml
Redirect permanent /fuchsia_cuttings.htm http://www.example.com/fuchsia_cuttings.php
Redirect permanent /fuchsia_cuttings.shtml http://www.example.com/fuchsia_cuttings.php
Redirect permanent /gallery01.htm http://www.example.com/gall01.php
Redirect permanent /Gallery01.htm http://www.example.com/gall01.php
Redirect permanent /gallery02.htm http://www.example.com/gall02.php
Redirect permanent /Gallery02.htm http://www.example.com/gall02.php
Redirect permanent /gall03.htm http://www.example.com/recent-watercolours.php
Redirect permanent /gall04.htm http://www.example.com/gall04.php
Redirect permanent /gall05.htm http://www.example.com/gall05.php
Redirect permanent /garden.htm http://www.example.com/garden.php
Redirect permanent /humour.htm http://www.example.com/humour.php
Redirect permanent /Humour.htm http://www.example.com/humour.php
Redirect permanent /info.htm http://www.example.com/info.php
Redirect permanent /Info.htm http://www.example.com/info.php
Redirect permanent /inspiration.htm http://www.example.com/inspiration.php
Redirect permanent /Inspiration.htm http://www.example.com/inspiration.php
Redirect permanent /links.htm http://www.example.com/links.php
Redirect permanent /Links.htm http://www.example.com/links.php
Redirect permanent /mary.htm http://www.example.com/flagpin.htm
Redirect permanent /mask.htm http://www.example.com/mask.php
Redirect permanent /Mask.htm http://www.example.com/mask.htm
Redirect permanent /miscellany.htm http://www.example.com/miscellany.php
Redirect permanent /Musings.htm http://www.example.com/index.htm
Redirect permanent /musings.htm http://www.example.com/index.htm
Redirect permanent /MyGarden.htm http://www.example.com/mygarden.php
Redirect permanent /MyGarden.htm http://www.example.com/mygarden.php
Redirect permanent /Mygarden.php http://www.example.com/mygarden.php
Redirect permanent /nature.htm http://www.example.com/nature.php
Redirect permanent /Northampton.htm http://www.example.com/northampton.php
Redirect permanent /northampton.htm http://www.example.com/northampton.php
Redirect permanent /pencil.htm http://www.example.com/pencil.php
Redirect permanent /Pencil.htm http://www.example.com/pencil.php
Redirect permanent /photogallery.htm http://www.example.com/photogallery.php
Redirect permanent /Photogallery.htm http://www.example.com/photogallery.php
Redirect permanent /poems.php http://www.example.com/poems.shtml
Redirect permanent /poems2.php http://www.example.com/poems2.shtml
Redirect permanent /poems.htm http://www.example.com/poems.shtml
Redirect permanent /poems2.htm http://www.example.com/poems2.shtml
Redirect permanent /ponderables.htm http://www.example.com/ponderables.php
Redirect permanent /Ponderables.htm http://www.example.com/ponderables.php
Redirect permanent /recent-watercolours.shtml http://www.example.com//recent-watercolours.php
Redirect permanent /Results.htm http://www.example.com/projectremoved.shtml
Redirect permanent /scottish.htm http://www.example.com/scottish.php
Redirect permanent /Scottish.htm http://www.example.com/scottish.php
Redirect permanent /silverstone.htm http://www.example.com/silverstone.php
Redirect permanent /SiteMap.htm http://www.example.com/sitemap.php
Redirect permanent /sitemap.htm http://www.example.com/sitemap.php
Redirect permanent /smile.htm http://www.example.com/smile.php
Redirect permanent /story.htm http://www.example.com/story.php
Redirect permanent /Tired.htm http://www.example.com/humour.htm
Redirect permanent /tired.htm http://www.example.com/humour.htm
Redirect permanent /watercolours.htm http://www.example.com/watercolours.php
Redirect permanent /Watercolours.htm http://www.example.com/watercolours.php
Redirect permanent /wimp.htm http://www.example.com/wimp.php

# as a result of 404's

Redirect permanent /statmap.htm [statmap.example.com...]
Redirect permanent /StatMap.htm [statmap.example.com...]
Redirect permanent /newstatmap.htm [statmap.example.com...]
Redirect permanent /autoshrink_test.htm http://www.example.com/projectremoved.shtml
Redirect permanent /movieerror.jpg http://www.example.com/projectremoved.shtml
Redirect permanent /feedback.php http://www.example.com/index.php

# strip out %23 invalid anchors.

RewriteRule ^([^#]*)# http://www.example.com/$1 [R=301,L]

I've waffled on a lot in order to give you the fullest explanation I can, and for that I apologise, but to summarise, your line above DOES work in a file that contains ....

RewriteEngine On
RewriteRule ^([^#]*)# http://www.example.com/$1 [R=301,L]

but it DOES NOT work when I add it to my existing file above. If I add 'RewriteEngine On' to my file I get 403 Forbidden.

jdMorgan




msg:1515153
 7:14 pm on Dec 28, 2005 (gmt 0)

Let's take a step back then, and test only this 403-Forbidden problem, since that is an utterly unexpected and 'impossible' result in the context of only the .htaccess file itself.

Try this rule in your 'real' .htaccess file:

RewriteEngine on
RewriteRule ^foobar\.html$ http://www.example.com/ [R=301,L]

With this rule in place, request "foobar.html" from your server, and you should be externally redirected to the home page.

If you still get a 403, that tells us that something else in your config is wrong or interfering with mod_rewrite code in your .htaccess.

I should note that a 500-Server Error may result if you are missing some of the 'setup' lines for mod_rewrite. The whole thing together would be:

Options +FollowSymLinks
RewriteEngine on
RewriteRule /foobar\.html$ http://www.example.com/ [R=301,L]

However, sometimes it's the case that the server already defaults to the correct Options setting and won't let you change it. In that case, the presence of the Options directive can cause a 500-Server Error. Fun, huh?

Also, where in your directory structure did you do the initial test, and where is the original .htaccess? I assume it's in your document root (home page directory), but let's be specific.

Another issue that can come up is if your host has 'aliased' your files into a directory different from the document_root that it defines. If this happens, you'll see a suspicious 'extra' filepath-part in your server error log entries when this code fails. For example, you expect to see '404 - example.com/file.html#anchor does not exist' but what's really there is '404 - example.com/username/file.hrml#anchor does not exist' or something similar -- there's an unexpected 'extra' piece of filepath info in there. This indicates an 'aliased' directory, and will require you to use mod_rewrite's RewriteBase directive to fix the problem.

Jim

JWJonline




msg:1515154
 8:12 pm on Dec 28, 2005 (gmt 0)

I think I've got to the bottom of the problem, although I don't know why it's doing what it's doing.

When I pasted my htaccess here I accidently missed off the last few lines. It's those lines that are causing the problems. Without them in my file I do not get 403 forbidden and your code works fine at redirecting the %23's. The lines I failed to paste are :-

# Block unwanted crawler/spammer
RewriteCond %{HTTP_REFERER} (example\.net) [NC,OR]
RewriteRule .* - [F]

Testing this further it would seem that the last line is what is screwing things up. Can you tell me what it's supposed to be doing? Is it necessary?

By the way, my htaccess is in my root. I believe my host set up an alias to show public_html as www to make things easier for me. I'm guessing this is no longer of interest in view of the discovery of the wild goose chase I'd sent you on. Sorry.

[edited by: jdMorgan at 8:33 pm (utc) on Dec. 28, 2005]
[edit reason] Examplified. [/edit]

jdMorgan




msg:1515155
 8:32 pm on Dec 28, 2005 (gmt 0)

That code is malformed, as the *last* RewriteCond in a list of one or more RewriteConds *must not* contain an [OR] flag.

If you still want to block examplenet.net, then change it to:

# Block unwanted crawler/spammer
RewriteCond %{HTTP_REFERER} (examplenet\.net) [b][NC][/b]
RewriteRule .* - [F]

This syntax error would have caused all requests to the server to be forbidden.

Jim

JWJonline




msg:1515156
 8:44 pm on Dec 28, 2005 (gmt 0)

Wow. Then I guess that when I've been logging into my server and clearing the hotlink protection to remove the 'forbidden' condition, I have been 'unsetting' those lines in some way. When I've re-ftp'd my htaccess next time I've made a change I've been setting them again. I've been tripping myself up for quite a while .... and probably NOT blocking examplenet at all.

I'm assuming from your explanation that if I had a list of domains to block then the [OR] would apply to all of them except the last one.

# Block unwanted crawler/spammer
RewriteCond %{HTTP_REFERER} (example1\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (example2\.net) [NC,OR]
RewriteCond %{HTTP_REFERER} (example3\.net) [NC]
RewriteRule .* - [F]

I'm feeling pretty stupid at the moment but I'm very grateful for the time you've spent unravelling my problem. Thank you.

Pfui




msg:1515157
 9:39 pm on Dec 28, 2005 (gmt 0)

[Lurker Bit: Not to fret about feeling stupid, JWJonline. Or at least know that when it comes to messing with mod_rewrite, we all do, all too often.

I even have my rewrites in sections (robots, user agents, IPs, etc.) because I tend to forget to cut that last [OR] when merrily copy-pasting a new list entry. With sections, one errant [OR] only breaks one part of my rewrites rather than all of them. And sooner or later I spot my error(s) -- and feel slap-head stupid:)

So anyway, hang in there. Thankfully, Jim knows this stuff to a fare thee well, and his frustration tolerance level approaches super-human. I only wish I understood his comments and code the first 1,2,3,4-plus times I read them!]

jdMorgan




msg:1515158
 10:01 pm on Dec 28, 2005 (gmt 0)

> ...the [OR] would apply to all of them except the last one.

Yes, exactly. You can [OR] RewriteConds with each other, but you can't [OR] a RewriteCond with a RewriteRule, that would make no logical sense, since a RewriteCond or group of RewriteConds is implicitly [AND]ed with the matching function of RewriteRule.

.hthacker tip: In addition to using Pfui's 'partitioning' technique, I often go one further and --as the very last RewriteCond in each group-- add a line with a unique referrer value, such as 'jdM_test_1' and 'jdM_test_23'. In this way, I can test each section on a live server by spoofing my own referrer to one of those test values. This is especially helpful when try to find a *missing* [OR] or ")" or "¦" in a huge file.

You could do that based on referrer, user-agent, or even query string, depending on what seems most convenient for you. This is really only useful when you've got several hundreds of lines of code, though.

Jim

JWJonline




msg:1515159
 10:12 pm on Dec 28, 2005 (gmt 0)

LOL ... you've lost me, but don't bother trying to explain it my brains fried already.

Jim, one last question. Can the line of code that redirects the %23 be altered to resolve to another page rather than my index page ... poems.shtml for example or whatever page happenned to be in the url just ahead of the anchor?

jdMorgan




msg:1515160
 10:15 pm on Dec 28, 2005 (gmt 0)

Yes, back to message #8.

Jim

JWJonline




msg:1515161
 10:31 pm on Dec 28, 2005 (gmt 0)

That's what I'm using.

RewriteRule ^([^#]*)# http://www.example.com/$1 [R=301,L]

It redirects to my home page only.

JWJonline




msg:1515162
 11:05 pm on Dec 28, 2005 (gmt 0)

Nope ... me being stupid again ... it works fine.
Thanks for sorting me out Jim.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved