Forum Moderators: phranque
http://www.example.com/test/post-1.html#IDComment12345
to
http://www.example.com/test/post-1.html
I just want to strip out #IDComment and the random numbers after it completely. Is this considered a query string? Since it has no = after #IDComment I'm a bit confused on how to write the rule.
I currently have this but it's clearly not working.
RewriteCond %{QUERY_STRING} ^IDComment RewriteRule (.*) /$1? [R=301,L]
Any help is greatly appreciated.
[edited by: jdMorgan at 1:13 pm (utc) on Oct. 27, 2009]
[edit reason] example.com [/edit]
As such, it is neither a query string nor part of the URL-path, so you'll have to take special measures to detect it in mod_rewrite. This can be done by examining the raw client request, exactly as sent by the client (e.g. browser or search robot), and exactly as it appears in your raw server access log.
Looking at your code, you should be aware that a RewriteCond establishes a condition under which a subsequent RewriteRule may be invoked; A RewriteCond cannot 'take action' all by itself, as you are attempting to use it to do; In a scripting language such a PERL or PHP, we might say that RewriteCond is an "If" clause, while RewriteRule is a "Then" directive.
I haven't tested this case myself, but I suspect that something like this might work for the specific case that you show:
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /post-1\.html#IDComment[0-9]+(\?[^\ ]*)?\ HTTP/
RewriteRule ^post-1\.html$ http://www.example.com/post-1.html [R=301,L]
Jim
I tried that and got a 404 error. I have an existing mod_rewrite set up and it's working fine. I also have an existing rule that makes the domain work with or without www.
For any given post, only [mywebsite.com...] is static. post-1.html#IDComment12345 changes depending on the post and the comment ID. Basically I want to strip all #IDComment12345 that intensedebate adds because it's causing 404 error on all links with that at the end of the URL.
Example 1:
[mywebsite.com...]
should be
[mywebsite.com...]
Example 2:
[mywebsite.com...]
should be
[mywebsite.com...]
Example 3:
[mywebsite.com...]
should be
[mywebsite.com...]
The rule I posted has no provision whatsoever for 'categoryN' in the URL, which is probably why it won't work when tested with your specific URL. So, you should try this, although again it's likely more literal and specific than what we'll end up with:
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /category1/post-1\.html#IDComment[0-9]+(\?[^\ ]*)?\ HTTP/
RewriteRule ^category1/post-1\.html$ http://www.example.com/category1/post-1.html [R=301,L]
Try to fully-describe the permissible characters and number of characters or digits as shown. This results in the most efficient and unambiguous patterns, and prevents unexpected operation and/or unintended restriction of your future freedom to define new URL-paths.
In some cases, the character-sets and numbers of characters in each variable field cannot be precisely defined. In those cases, it may become necessary to use exclusion rather than inclusion to prevent the rule from affecting URLs other than those desired. As posted, the rule is very specific as to one page in one category, but with any "#IDComment" number. But it can be loosened up -- possibly to the point where *any* .html URL with an #IDComment in *any* subdirectory will be rewritten. But without a full working definition of what is and is not to be rewritten, it's not yet possible without risking unintended consequences.
Jim
I also tried to do this:
Redirect 301 /category/post.html#IDComment12345 [mywebsite.com...]
That also produced 404 error but if I do this(replacing # with - just to test out the redirect engine):
Redirect 301 /category/post.html-IDComment12345 [mywebsite.com...]
and type in [mywebsite.com...] it will redirect me correctly to [mywebsite.com...]
Does it have something to do with # sign since hyphen works.
Well, that would imply that mod_rewrite is treating the "#" as part of the URL-path then
What's even more interesting to me is somehow # is being sent to the server by browsers other than Safari (and MSNBot)?
Is this new? I can't get FireFox to do it, and I only tried for a minute but couldn't redirect it when using Safari...
Really, how are you getting the # symbol and information following sent to the server in the first place? It might help understand how to redirect it. Is it being encoded somehow EG %23? I know sometimes spaces are a bit of a hassle to redirect, could this be the same type situation, because I really can't understand how a major browser is sending it to the server.
I've heard Safari does, and Chrome might, but no other major browsers I've read about send the # symbol to the server... The only other thing I've read on the subject said MSNBot was this summer, but I don't know if that's still the case or not.
ADDED: I can't even get the # symbol out of THE_REQUEST (or identify it) using Safari. Are you sure this isn't an AJAX Page State issue and why can't you just remove it from the source code of the page(s)?
My only questions are whether the "#" is treated as part of the URL-path or not (I'm sure it's not considered a query string, though) and whether, if treated as part of the URL-path, it is encoded or not.
If it's encoded, we will have to go back to the %{THE_REQUEST} method, and test for "\%(25)*23" to catch it.
Other than these two issues, it's not a particularly difficult problem -- I just don't have the time to experiment with it right now, so we're working through it here.
Jim