Forum Moderators: phranque

Message Too Old, No Replies

redirect # sign and numbers

         

jackdakota

4:40 am on Oct 27, 2009 (gmt 0)

10+ Year Member



I'm trying to redirect

http://www.example.com/test/post-1.html#IDComment12345
to
http://www.example.com/test/post-1.html

I just want to strip out #IDComment and the random numbers after it completely. Is this considered a query string? Since it has no = after #IDComment I'm a bit confused on how to write the rule.

I currently have this but it's clearly not working.

RewriteCond %{QUERY_STRING} ^IDComment RewriteRule (.*) /$1? [R=301,L]

Any help is greatly appreciated.

[edited by: jdMorgan at 1:13 pm (utc) on Oct. 27, 2009]
[edit reason] example.com [/edit]

jdMorgan

1:11 pm on Oct 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is not a query string. It is a "URL fragment" when sent via HTTP, or a "named anchor" when used on an HTML page. It's also currently used as a state identifier in AJAX, although Google has proposed changing AJAX state identifiers to "#!" to make them distinct from fragments/named-anchors.

As such, it is neither a query string nor part of the URL-path, so you'll have to take special measures to detect it in mod_rewrite. This can be done by examining the raw client request, exactly as sent by the client (e.g. browser or search robot), and exactly as it appears in your raw server access log.

Looking at your code, you should be aware that a RewriteCond establishes a condition under which a subsequent RewriteRule may be invoked; A RewriteCond cannot 'take action' all by itself, as you are attempting to use it to do; In a scripting language such a PERL or PHP, we might say that RewriteCond is an "If" clause, while RewriteRule is a "Then" directive.

I haven't tested this case myself, but I suspect that something like this might work for the specific case that you show:


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /post-1\.html#IDComment[0-9]+(\?[^\ ]*)?\ HTTP/
RewriteRule ^post-1\.html$ http://www.example.com/post-1.html [R=301,L]

However, I'm not sure which other parts of your requested URLs may be variable and which are fixed, so again, this addresses only the specific case you showed. You also didn't state whether you have any other working rules, so we can't be sure whether you've already got mod_rewrite set up and working. So, I'd suggest that you test this rule as-is with that specific URL, and then proceed to modify it only if that initial test is successful.

Jim

jackdakota

7:04 pm on Oct 27, 2009 (gmt 0)

10+ Year Member



Thank you Jim for that explanation.

I tried that and got a 404 error. I have an existing mod_rewrite set up and it's working fine. I also have an existing rule that makes the domain work with or without www.

For any given post, only [mywebsite.com...] is static. post-1.html#IDComment12345 changes depending on the post and the comment ID. Basically I want to strip all #IDComment12345 that intensedebate adds because it's causing 404 error on all links with that at the end of the URL.

Example 1:
[mywebsite.com...]
should be
[mywebsite.com...]

Example 2:
[mywebsite.com...]
should be
[mywebsite.com...]

Example 3:
[mywebsite.com...]
should be
[mywebsite.com...]

jdMorgan

8:19 pm on Oct 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One problem we have here is trying to generalize the URLs and to show which parts can change and which don't, what's a literal string, and what isn't...

The rule I posted has no provision whatsoever for 'categoryN' in the URL, which is probably why it won't work when tested with your specific URL. So, you should try this, although again it's likely more literal and specific than what we'll end up with:


RewriteCond %{THE_REQUEST} ^[A-Z]+\ /category1/post-1\.html#IDComment[0-9]+(\?[^\ ]*)?\ HTTP/
RewriteRule ^category1/post-1\.html$ http://www.example.com/category1/post-1.html [R=301,L]

If that works for the post-1.html page in category one, then we can proceed to the next step. For that, please thoroughly-define the variable and non-variable parts of the URL-path, using a notation for requested URLs such as
"category<numbers 0 through 99>/<one or more lowercase letters or numbers>.html#IDComment<numbers 0 through 999999>"
where un-bracketed fields are literals and bracketed fields are variables as described.

Try to fully-describe the permissible characters and number of characters or digits as shown. This results in the most efficient and unambiguous patterns, and prevents unexpected operation and/or unintended restriction of your future freedom to define new URL-paths.

In some cases, the character-sets and numbers of characters in each variable field cannot be precisely defined. In those cases, it may become necessary to use exclusion rather than inclusion to prevent the rule from affecting URLs other than those desired. As posted, the rule is very specific as to one page in one category, but with any "#IDComment" number. But it can be loosened up -- possibly to the point where *any* .html URL with an #IDComment in *any* subdirectory will be rewritten. But without a full working definition of what is and is not to be rewritten, it's not yet possible without risking unintended consequences.

Jim

jackdakota

10:10 pm on Oct 27, 2009 (gmt 0)

10+ Year Member



Thank you, Jim but it doesn't work. Still getting 404 error. The category do not contain numbers just letters. Like you said it would be simpler to just remove #IDComment from any .html URL in any subdirectory. That's exactly what I want.

I also tried to do this:
Redirect 301 /category/post.html#IDComment12345 [mywebsite.com...]

That also produced 404 error but if I do this(replacing # with - just to test out the redirect engine):
Redirect 301 /category/post.html-IDComment12345 [mywebsite.com...]

and type in [mywebsite.com...] it will redirect me correctly to [mywebsite.com...]

Does it have something to do with # sign since hyphen works.

jdMorgan

11:34 pm on Oct 27, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, that would imply that mod_rewrite is treating the "#" as part of the URL-path then, so you may be able to get away with a one-liner:

RewriteRule ^([a-z]+/[a-z0-9]+\.html)#IDComment[0-9]+$ http://www.example.com/$1 [R=301,L]

Jim

TheMadScientist

8:23 am on Oct 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, that would imply that mod_rewrite is treating the "#" as part of the URL-path then

What's even more interesting to me is somehow # is being sent to the server by browsers other than Safari (and MSNBot)?

Is this new? I can't get FireFox to do it, and I only tried for a minute but couldn't redirect it when using Safari...

Really, how are you getting the # symbol and information following sent to the server in the first place? It might help understand how to redirect it. Is it being encoded somehow EG %23? I know sometimes spaces are a bit of a hassle to redirect, could this be the same type situation, because I really can't understand how a major browser is sending it to the server.

I've heard Safari does, and Chrome might, but no other major browsers I've read about send the # symbol to the server... The only other thing I've read on the subject said MSNBot was this summer, but I don't know if that's still the case or not.

ADDED: I can't even get the # symbol out of THE_REQUEST (or identify it) using Safari. Are you sure this isn't an AJAX Page State issue and why can't you just remove it from the source code of the page(s)?

jdMorgan

1:25 pm on Oct 28, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The other possibility (other than Safari & Chrome) is that these links are being followed by robots due to finding links in that form.

My only questions are whether the "#" is treated as part of the URL-path or not (I'm sure it's not considered a query string, though) and whether, if treated as part of the URL-path, it is encoded or not.

If it's encoded, we will have to go back to the %{THE_REQUEST} method, and test for "\%(25)*23" to catch it.

Other than these two issues, it's not a particularly difficult problem -- I just don't have the time to experiment with it right now, so we're working through it here.

Jim