homepage Welcome to WebmasterWorld Guest from 174.129.103.100
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL & phranque

Apache Web Server Forum

    
Remove query string from url with page anchor
twitterfeed&utm_medium=twitter removal from liks
tweetfeeder




msg:4203835
 1:02 pm on Sep 19, 2010 (gmt 0)

If the ?twitterfeed&utm_medium=twitter is appended to your url then somebody has used twitterfeed to post your RSS feed to their twitter page.

If it's your own twitter page, then you may want to do what I'm currently trying to do. Remove that query string in .htaccess

This works for me...

# Enable mod_rewrite
Options +FollowSymLinks
#
# Turn on the rewriting engine
RewriteEngine on
#
# If query string is non-blank
RewriteCond %{QUERY_STRING} .
# redirect to remove query string
RewriteRule (.*) [example.com...] [R=301,L]

This works fine for

http://www.example.com/My_Blog/mypage.html?utm_source=twitterfeed&utm_medium=twitter

The only problem I've encountered is that if the link is to an anchor on a page

http://www.example.com/My_Blog/mypage.html#c123456?utm_source=twitterfeed&utm_medium=twitter

the redirect ceases to work.

Anyone know how to change my code so that

http://www.example.com/My_Blog/mypage.html#c123456?utm_source=twitterfeed&utm_medium=twitter

redirects to

http://www.example.com/My_Blog/mypage.html#c123456

Thanks

 

phranque




msg:4203844
 1:31 pm on Sep 19, 2010 (gmt 0)

the link to the fragment identifier should be:
http://www.example.com/My_Blog/mypage.html?utm_source=twitterfeed&utm_medium=twitter#c123456

the request passed to your server for this link would then be:
http://www.example.com/My_Blog/mypage.html?utm_source=twitterfeed&utm_medium=twitter

the user agent would "keep" the fragment and use it appropriately for the document returned in the response.

the likely reason that your redirect doesn't work is that nothing after the hash mark is being sent with the request.
i.e. your request in this case would be:
http://www.example.com/My_Blog/mypage.html

Spiekerooger




msg:4203857
 2:03 pm on Sep 19, 2010 (gmt 0)

Hi,

this is a tough one.

Two problems: mod_rewrite doesn't even get the part after the #anchor, therefor query-string is empty. It's also not covered by THE_REQUEST - Apache simply doesn't care for anchors, it's only important for the html file accessed.

The url is acutally malformed, it should be ?foo=bar#anchor :/

But there's something positive as well: search engines do not care for anchors either, so you won't get a dc issue here. The query string is just seen as part of the anchor, not the url.

Edit: my testing and typing was too slow. ;) phranque already answered the technical part.

jdMorgan




msg:4203873
 3:26 pm on Sep 19, 2010 (gmt 0)

Actually, the problem is that most browsers do not send the URL-fragment identifier to the server, because named anchors are only processed "inside" a page -- They are handled entirely within the browser and there is therefore no need to send them to the server.

However, if you still want to include them in the redirect path for those few browsers that do send them (some Apple Webkit implementations), you can use a RewriteCond examining %{THE_REQUEST}, and parse them out using something like:

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /My_Blog/mypage\.html(\?[^#\ ]*)?(#[^\ ]*)?\ HTTP/
RewriteRule ^My_Blog/mypage\.html$ http://www.example.com/My_Blog/mypage.html%2? [R=301,L]

But again, this only works if the browser actually sends the URL-fragment identifier, and most don't.

Note that the fragment follows the query string as phranque pointed out, and that the "?" in the RewriteRule substitution string serves as an operator, not a literal. This explains the discrepancy in its position with respect to the re-appended URL-fragment in the substitution.

Jim

tweetfeeder




msg:4203880
 4:39 pm on Sep 19, 2010 (gmt 0)

Hello,

Thanks to all who responded so promptly to my newbie question. I'm now a webmasterworld convert...

phranque wrote:

>the link to the fragment identifier should be

OK- I'll accept that the url that folk click on at Twitter has been malformed. I'll also accept that either Twitterfeed or Bit.ly has somehow created this malformed url

Nevertheless, IE8 Firefox and Opera all show these malformed URLs in the address bar.

http://www.example.com/My_Blog/post_title.html#c123456?utm_source=twitterfeed&utm_medium=twitter

But the person clicking through expecting to the read comment identified by #c123456 lands at the top of the post_title.html page and has to wade through all the other comments first. So I'd like to pass a correctly formed URL.

>the likely reason that your redirect doesn't work is that
>nothing after the hash mark is being sent with the request.

If that were the case would IE8 Firefox and Opera all show these malformed URLs in the address bar?

http://www.example.com/My_Blog/next_post_title.html#c123457?utm_source=twitterfeed&utm_medium=twitter

Apparently yes... and Spiekerooger confirmed:

>The url is actually malformed, it should be ?foo=bar#anchor :/

Does that mean that the only fixes are:-

a. use something other than twitterfeed

b. get twitterfeed to fix the problem (could take a while)

And that there is nothing at the server I can do to reconstruct the correct URL?

>search engines do not care for anchors either, so you won't
>get a dc issue here

That's good to know. Thanks.

>the problem is that most browsers do not send the URL-fragment
>identifier to the server, because named anchors are only
>processed "inside" a page -- They are handled entirely within
>the browser and there is therefore no need to send them to the
>server.

So, if I understand you both and later also jdMorgan correctly IE8 Firefox and Opera all show these malformed URLs in the address bar

http://www.example.com/My_Blog/more_posts_.html#c123458?utm_source=twitterfeed&utm_medium=twitter

but my server is not actually seeing anything more than:-

http://www.example.com/My_Blog/more_posts_.html

I wonder if that means that each new comment automatically fed to Twitter for...

http://www.example.com/My_Blog/more_posts_.html

will be seen as spam by Twitter or Google et al? That would be a pity as I thought feeding comments would liven up and make my Twitter page more relevant.

jdMorgan also wrote:

>if you still want to include them in the redirect path for
>those few browsers that do send them

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /My_Blog/mypage\.html(\?[^#\ ]*)?(#[^\ ]*)?\ HTTP/
RewriteRule ^My_Blog/mypage\.html$ http://www.example.com/My_Blog/mypage.html%2? [R=301,L]

I guess there is little point pursuing this, unless there is any chance that a variation of your code would actually fix the problem for the majority.

I tested that. I just hard coded that for a specific blog post page on a different blog.

It did not work in removing the query string in my first test.

It also failed on my second test to redirect to another page which I inserted at test.com to see if anything was happening at all.

So there was no use me hoping you might be wrong, and that this might be a magic fix.

Thanks again, maybe I'll see if Feedburner can pass a correctly formed URL, and then I'll presumably need different code again?

Spiekerooger




msg:4203974
 9:25 pm on Sep 19, 2010 (gmt 0)

For usability reasons I would think about using a javascript function to send users to the right anchor at your page.

Another solution would be to feed no urls w/ anchors to twitterfeed, but sent them a url like http://www.example.com/My_Blog/more_posts/c123458 and to sent this first to your already working query string killer and then redirect this to http://www.example.com/My_Blog/more_posts_html#c123458, e.g. by using this ruleset (the last line doing the work about sending the users to the right anchor):


# Enable mod_rewrite
Options +FollowSymLinks
#
# Turn on the rewriting engine
RewriteEngine on
#
# If query string is non-blank
RewriteCond %{QUERY_STRING} .
# redirect to remove query string
RewriteRule (.*) [example.com...] [R=301,L]
RewriteRule ^My_Blog/more_posts/([a-z0-9]+)$ /My_Blog/more_posts_.html#$1 [NE,L,R=301]


Not that elegant as a solution, but this would work with every browser...

g1smd




msg:4203978
 10:04 pm on Sep 19, 2010 (gmt 0)

I always understood that the # in these types of URL is picked up by a browser-side Javascript routine, as some sort of AJAX functionality - but I could be wrong.

jdMorgan




msg:4203982
 10:32 pm on Sep 19, 2010 (gmt 0)

Not necessarily. AJAX (mistakenly) adopted/absconded with the URL-fragment identifier originally specified for HTML <a name="#name"> tags for its own use in call-backs, and this has been causing lots of compatibility problems. Google floated a proposal to use something like "!#" instead (that may not be right, I not sure I remember this character-sequence accurately), to differentiate the two uses, but so far has had little success (as far as I can see).

Jim

tweetfeeder




msg:4204000
 11:54 pm on Sep 19, 2010 (gmt 0)


Hello again,

You guys and gals are just great.

Spiekerooger said:

>For usability reasons I would think about using a javascript
>function to send users to the right anchor at your page.

If I were capable of it, I'm sure that would be much more elegant and satisfactory than the shameful (but working) kludge I've just implemented.

>Another solution would be to feed no urls w/ anchors to
>twitterfeed, but sent them a url like
>http://www.example.com/My_Blog/more_posts/c123458 and to sent
>this first to your already working query string killer

Again, more elegant, and you are right, I could easily create a new RSS feed which is purpose built for twitterfeed that did not have the hash "#" anchor.

>and then redirect this to
>http://www.example.com/My_Blog/more_posts_html#c123458, e.g.
>by using this ruleset (the last line doing the work about
>sending the users to the right anchor):

# Enable mod_rewrite
Options +FollowSymLinks
#
# Turn on the rewriting engine
RewriteEngine on
#
# If query string is non-blank
RewriteCond %{QUERY_STRING} .
# redirect to remove query string
RewriteRule (.*) [example.com...] [R=301,L]
RewriteRule ^My_Blog/more_posts/([a-z0-9]+)$ /My_Blog/more_posts_.html#$1 [NE,L,R=301]

The problem is that I'd also have to write the variables into that example above which would convert "more_posts_.html" to "all_variations_.html" and that also converts /My_Blog/ to all the date based substructures that will occur in the future.

/My_Blog/2010/09/ et cetera.

>Not that elegant as a solution, but this would work with every
>browser...

Much more elegant than my kludge which was inspired by what you folk wrote earlier. When you told me the browser was seeing everything after the "#" as a "name" tag then the solution seemed obvious. Create a "name" tag which had everything after the "#" using the the blog site building template.

The end result after rebuilding the site is that there are now two "names" for every comment, in the format

<a id="c123456">&nbsp;</a>

and

<a id="c123456?utm_source=twitterfeed&utm_medium=twitter">&nbsp;</a>

The first goes into the RSS feed, and is fed to twitter, and the first and second occur in the html page. So either of the two sources of visitors will have functional links.

Tested in IE8, Firefox and Opera, and when clicking on the links in Twitter - I land on the right comment.

The foreseeable problem is that a person coming from Twitter and then ReTweeting will create an even longer string which will not link to the "#" anchor any more. But direct Tweets don't add the string, only RSS feeds passed through twitterfeed (and others?).

Even if Twitter does add something later, hopefully the extra string will be ignored. What's important to me at the moment is that my own Twitter page will not have non-functioning links right now.

jdMorgan (Jim) wrote:

>originally specified for HTML <a name="#name">

My memory on this is that around browser versions 7 all the major browsers started recognising Div ids, and various other stylesheet classes as valid "#" anchor tags. (which I've found useful but which is likely to be non standards compliant behaviour)

I think the standards now specify "a id=" and I believe we got some pages to validate with the W3C validator using that. Although I have to shamefully admit to only checking the validator when things are not working consistently in my 3 test browsers.

Thanks again.

jdMorgan




msg:4204343
 5:56 pm on Sep 20, 2010 (gmt 0)

I mentioned <a name="xyz"> in the context of the original use for the "#" character.

Using the modern <a id="xyz"> instead of the deprecated <a name="xyz"> doesn't change that assertion.

Jim

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / Apache Web Server
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved