Forum Moderators: phranque

Message Too Old, No Replies

htaccess question mark removal & Google analytics?

how to remove question marks and keep the gclid?

         

wzshop

8:25 am on Jul 19, 2011 (gmt 0)

10+ Year Member



Hi all,
For SEO I use the following code in my htaccess, in order to remove all question marks and the parameters behind it.

#Redirect to remove query string from any directory-paths
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteRule ^(.*)$ http://www.#*$!.com/$1? [R=301,L]


This makes all the question marks and its parameters to disappear. But this also removes the '?gclid' thing from the url (the extension that Adwords applies) to disappear. Therefore my paid clicks do not appear in Analytics. So is it possible to exclude this from removal?

Thanks a lot for yr help!

lucy24

6:54 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes, if you simply say
?gclid
it will replace the existing query string with this part only. This is the default behavior with question marks; you don't need to add any commands.

Also replace ^(.*)$ with unanchored (.+). Since you are not matching specific text in a specific location-- and since there will always be text-- all you need to say is "grab the whole thing and recycle it".

There is no need to swear at your own domain ;) A simple example.com will do.

wzshop

7:32 pm on Jul 19, 2011 (gmt 0)

10+ Year Member



Hi lucy24,
Thanks for your reply!
Problem is, i have no idea how htaccess works. Found this code somewhere on this forum, but i do not really know what it tells;)

So if i understand you right the code should look something more like this:

#Redirect to remove query string from any directory-paths
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteRule (.+) http://www.example.com/$1? [R=301,L]


Right? Now could you please explain to me how to exclude the 'gclid' part from being rewrited?

Thanks a lot in advance!
Kind regards, Robbert

lucy24

8:28 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Instead of ending your target with a bare question mark, end it with ?gclid

RewriteRule (.+) http://www.example.com/$1?gclid [R=301,L]

Without knowing the exact situation, I can't say if your RewriteCond lines are correct or even necessary, but someone else will come along and explain that part. I just do Regular Expressions ;)

g1smd

11:05 pm on Jul 19, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Add another
RewriteCond %{QUERY_STRING} (^|&)gclid=([^&]+)(&|$)
and then append
?gclid=%2
to the end of the target URL.

lucy24

12:07 am on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



D'oh! That is, only use glcid if it was there in the first place. And its actual form is glcid={something} ?

Does Apache recognize \b ? If you're allowed to say \bgclid you don't have to worry about the parenthesized bit. That's assuming there's no possible query string containing _gclid or \dgclid.

Does the gclid=([^&]+) part mean that the gclid could be absolutely anything, so you can't express it positively as [{reasonable amount of stuff here}] which would run a little faster?

g1smd

12:12 am on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



([^&]+)
is read until the next "&" and capture all before that.

It will hit an &, or the end of the query string itself, and stop.

If you're sure that it will be only digits or only letters, you can use [0-9]+ or [A-Z]+ or similar.

I would not generally use \b and \d in patterns. That's PCRE syntax. For widest compatibility use POSIX.

wzshop

9:07 am on Jul 20, 2011 (gmt 0)

10+ Year Member



Thanks again for all your replies. I tried some of your suggestions. This rule
RewriteRule (.+) http://www.example.com/$1?gclid [R=301,L] 
does not work. This redirects all the www.example.com?page= urls to www.example.com?gclid etc. I do not want to redirect these urls, it is just that when an url contains '?gclid=' it does not get rewrited so that Google analytics can assess this as an Adwords url.

I also tried the
RewriteRule (.+) http://www.example.com/$1?gclid [R=301,L] 

This conflicts with i think this code
RewriteRule ^computer-repair/(.+)\.html$ /location.php?regio=$1
since this rule does not seems to work anymore... :s

Anyone can help me out?
Thanks in advance.
Robbert

lucy24

6:32 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



! You mean you don't want to do anything at all to requests whose query strings contain gclid? Fer hevvins sake, why didn't you say so? That makes it enormously easier. All you have to do is add

%{QUERY_STRING} !gclid

to each of your rewrite conditions. It means: skip this one when rewriting.

But what happens to requests that would otherwise be rewritten? Don't you risk having identical requests being sent to different places depending purely on whether they contain the gclid element? Or is this something that could never occur anyway?

The rules g1 gave don't have to be made into redirects. If you leave off the leading http://www.example.com and the [R=301], they revert to being simple rewrites.

([^&]+) is read until the next "&" and capture all before that.

:: lightbulb :: I always thought of it conceptually as "capture everything that isn't an ampersand, and stop cold when you do hit one". Technically it may be the same thing, but it feels different.

g1smd

7:33 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Stop when you hit ampersand OR run out of stuff to process.

lucy24

8:13 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Stop when you hit ampersand OR run out of stuff to process.

:: further lightbulb ::
This doesn't apply to Apache, but in my text editor, the [^foo] form adds \n line breaks to the search, even if you're otherwise in single-line mode. This can be almost as disastrous as making a mistake in .htaccess :o

Sorry, Robbert, slight digression there but it's all in a good cause. Now, about those rewrites...

wzshop

8:47 pm on Jul 20, 2011 (gmt 0)

10+ Year Member



Thanks again;)

sorry i am a noob at this htaccess stuff.. Now, i should add the %{QUERY_STRING} !gclid to my existing rewrite url right?

#Redirect to remove query string from any directory-paths
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L] %{QUERY_STRING} !gclid


something like this? cause it does not seems to work..
And lucy, you said

But what happens to requests that would otherwise be rewritten? Don't you risk having identical requests being sent to different places depending purely on whether they contain the gclid element? Or is this something that could never occur anyway?


gclid does only appear to an url if it is coming from adwords.. my site does not use 'gclid' at any place in any other url.. Does that settle your question?

Ok, is there any chance that someone could post the complete working code for this noob here? I know it sounds lazy, but i really tried to understand.. It would really help me out..

Thanks for yr time.
Robbert

lucy24

9:36 pm on Jul 20, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



#Redirect to remove query string from any directory-paths
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L] %{QUERY_STRING} !gclid

something like this? cause it does not seems to work.

I should hope not :)
I meant that you have to add it to your Conditions as
RewriteCond %{QUERY_STRING} !gclid


And lucy, you said
But what happens to requests that would otherwise be rewritten? Don't you risk having identical requests being sent to different places depending purely on whether they contain the gclid element? Or is this something that could never occur anyway?

gclid does only appear to an url if it is coming from adwords.. my site does not use 'gclid' at any place in any other url.. Does that settle your question?

No, I meant it the other way around. You're obviously rewriting something or you wouldn't need this code at all. But anything containing the ?gclid query is exempt from all rewriting. It often helps to backtrack and say in English what you're trying to do. That is, what's being rewritten and why?

wzshop

8:20 am on Jul 21, 2011 (gmt 0)

10+ Year Member



He Lucy,
Thanks again for your great support. I do understand what youre saying now. I do use another rewrite rule, and that one is indeed now blocking the !gclid condition

My htaccess, now looks like this:

#Redirect to remove query string from any directory-paths
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]
RewriteCond %{QUERY_STRING} !gclid

RewriteRule ^computer-repair/(.+)\.html$ /region.php?region=$1


So what happens now is that all the www.example.com/?gclid urls get redirected to www.example.com/region.php. Hope this gives you enough information to help me further?

Thank you so much.
Kind regards,
Robbert

lucy24

7:29 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm going to try to translate the existing htaccess into English, even though I'm pretty sure it isn't written optimally.

RewriteCond %{THE_REQUEST} ^[A-Z]+\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]


Computer arrives at this Rule, asks itself "is the input in the form '.*' meaning 'something or even nothing at all'?" which of course it always is. Then and only then does it backtrack to see whether the Conditions apply: did the request come from outside and does it include a query (the first part only works for "GET something containing a question mark, optionally preceded or followed by other stuff and winding up with 'HTTP'") and "does the requested file not begin with "stats/" (that is, it's not in the "stats" directory)? You don't need a back-reference here; %{REQUEST_URI} will do the same job.

If the conditions match, then it rewrites the request to keep it exactly the same except for chopping off the query, and then turn it into a Redirect.

Then the computer meets [L], meaning "stop here and go back to the very beginning of the htaccess file". Now there is no longer a query, so the conditions for rule #1 no longer apply, so it carries on to the second Rule:

RewriteCond %{QUERY_STRING} !gclid
RewriteRule ^computer-repair/(.+)\.html$ /region.php?region=$1


Here's where I got lost. Which requests would rule #1 not apply to? because those are the only ones that would still have a query string to evaluate.

g1smd

7:47 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Then the computer meets [L], meaning "stop here and go back to the very beginning of the htaccess file".

Not quite. The redirect causes the "301 redirect" HTTP header to be sent back to the browser and processing halts for this request.

If the browser/requester makes a new request (it might not do so) as a result of the redirect that was sent to it, then processing starts again from the top of the .htaccess file using the new request as input.

Nothing is remembered from the previous request.

wzshop

8:28 pm on Jul 21, 2011 (gmt 0)

10+ Year Member



pfiew. Thank again for all the replies but it is very hard for me to understand. All the codes, except the
RewriteRule ^computer-repair/(.+)\.html$ /region.php?region=$1
i got from this website. Just by copy pasting it. The code above has been written by an scripter, who i can not contact.

What the htaccess is supposed to do is the following:
-Remove question mark at the end. or actually the dynamic generated urls like
PHPsession
and urls like:
www.example.com/computer-repair/region1-region2-region3.html?page=computer-repair&region=region1-region2-reion3
since this is messing up my SEO (indexing in the search engines).

-Then i had to make an exeption for the /stats/ folder (my website stats indeed) since it was for some reason blocked by the provided htaccess code to count visits. So i asked here, and someone gave me the provided solution, which worked.

-Now in order to track the visits - generated by Adwords - in Google Analytics, urls like:
www.example.com?gclid=CMiA4avSzZkCFYZM5Qodaw3Muw
also need to be excluded from the rewrite conditions/rules.

The htaccess code i use now, might not really be suitable for the above. I made it myself with some copy/pasting... Now i am hoping that someone can help me out with the above, since this really is out of my league.

Thanks again for all your commitment.
Robbert

lucy24

10:58 pm on Jul 21, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



OK, here's the part we need to pin down. Is it possible for an url (other than your stats, which are private) to contain both the gclid stuff-- which you need to keep-- and the other stuff-- which you don't want to keep? I don't mean "is it possible in theory", I mean does it ever occur in your own urls?

wzshop

8:57 am on Jul 25, 2011 (gmt 0)

10+ Year Member



He Lucy,
No, the gclid part only occurs when visits come from adwords.

Thanks, Robbert

lucy24

4:18 pm on Jul 25, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Whew. Then all you need is the added

RewriteCond %{QUERY_STRING} !^gclid


in the list of Conditions for your main rewriting or redirecting rule. "If the query is ?gclid=something, don't do anything." I think that's been posted three times in this thread already, we were just running around in circles trying to figure out what you wanted to exclude and when.

The ^ anchor saves a few nanoseconds of computing time because the computer only has to read the first five characters of the query. It also covers the remote possibility that your ordinary query strings might some day happen to contain the random sequence of letters "gclid".

wzshop

8:22 am on Jul 26, 2011 (gmt 0)

10+ Year Member



He lucy,
Im very sory, now i understand what happens. What i would like it to do is when it containts the "gclid" thing, i would like it to get rewrited like any other urls which is:
RewriteRule ^computer-repair/(.+)\.html$ /region.php?region=$1


that rewrite code makes the url SEO friendly.. Any chance to implement that?

thanks again!

g1smd

6:46 pm on Jul 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



That rewrite code makes the URL SEO friendly.

The code not "make" anything at all.

URLs are defined in links. It is links to SEF URLs that "make" SEF URLs.

Mod_rewrite can only act after the link is clicked.

lucy24

9:00 pm on Jul 26, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: beating head against wall ::

What i would like it to do is when it containts the "gclid" thing, i would like it to get rewrited like any other urls

Are you now saying that you want to

#1 change the present static url into a dynamic url, putting part of the old static url into the query string of the new format while throwing away the rest of the query,

and

#2 do EXACTLY THE SAME THING if there is a gclid=blahblah query, only in this case the gclid part has to be added to the newly created query string

and

#3 bypass those urls that have already had #1 done to them (either by rewrite of outdated urls or because they were correctly formatted in the first place)

?

:: further beating of head ::

wzshop

9:26 am on Jul 27, 2011 (gmt 0)

10+ Year Member



Hi,
Im sorry if i am being unclear about this. It is just that this is new to me, so im not really sure what to mention and what not.

#1 All my urls get rewrited to SEO friendly urls:
RewriteRule ^computer-repair/(.+)\.html$ /region.php?region=$1
This will make the ursl like:
example.com/computer-repair/region1-region2-region3.html


#2 In order to track down visitors from Adwords, Google ads the gclid=blabla thing to the above rewrited urls. That is something i would like to keep, so i can seperate these visitors in my Google Analytics account. These urls look like:
example.com/computer-repair/region1-region2-region3.html?gclid=blabla


#3 Now sometimes that rewrite does not really seems to work and gets indexed the wrong way in google like:
www.example.com/computer-repair/region1-region2-region3.html?page=computer-repair&region=region1-region2-region3
. Something goes wrong there. So i would like to exclude this
?page=computer-repair&region=region1-region2-region3
thing from the urls so they get indexed the right way in google:
example.com/computer-repair/region1-region2-region3.html


Hope that is more clear. In order to accomplish the above i (guess) i added a rewrite condition to exclude all the question marks behind the urls. As mentioned, this works fine but that way the ?gclid thing gets excluded as well, which disabled Google Analytics to track down visitors from Google Adwords.

I hope you can help me out, and im sorry for your head.
Kind regards, Robbert

g1smd

10:27 am on Jul 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



All my urls get rewrited to SEO friendly urls
This will make the URLs like:

Again, mod_rewrite does not "make" anything. Links make URLs. A URL exists as soon as it appears in a link. Mod_rewrite deals with requests only after the link is clicked.

What you seem to have here is a redirect which exposes the server internal path, when what you needed was an internal rewrite.

This is a complex issue to fix, and the ordering of the rules will be crucial.

Your redirect that removes parameters should be altered to not remove the glcid parameter, and in fact preserve it while removing all others. In this case you'll need several redirects, each dealing with a specific case.

Your internal rewrite should work if there is a glcid parameter present or not. The failure you indicate may be caused by the rules being in the wrong order.

You can't (easily) do the first two rules as one, without creating an infinite redirect loop.

These rules go first:

# Capture glcid parameter and remove all preceding (and following)
# parameters when glcid is present in middle or at end of parameters.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteCond %{QUERY_STRING} &glcid=([^&]+)
RewriteRule (.*) http://www.example.com/$1?glcid=%1 [R=301,L]


# Capture glcid parameter and remove all following parameters
# when glcid is present as first parameter of multiple parameters.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteCond %{QUERY_STRING} ^glcid=([^&]+)&
RewriteRule (.*) http://www.example.com/$1?glcid=%1 [R=301,L]


# Remove all parameters except when glcid is present.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteCond %{QUERY_STRING} !glcid=
RewriteRule (.*) http://www.example.com/$1? [R=301,L]


# Canonical redirect.
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1? [R=301,L]


The above redirects must be listed BEFORE your internal rewrites.

[edited by: engine at 1:59 pm (utc) on Aug 2, 2011]
[edit reason] fixed typo [/edit]

g1smd

11:02 am on Jul 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Possible replacement rule for the first two rules.

# Capture glcid parameter and remove all preceding and following
# parameters when glcid is present in query string with multiple parameters.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteCond %{QUERY_STRING} !^glcid=[^&]+$
RewriteCond %{QUERY_STRING} glcid=([^&]+)
RewriteRule (.*) http://www.example.com/$1?glcid=%1 [R=301,L]


What the RewriteConds above do, is as follows:
If THE_REQUEST contains a question mark (i.e. query string is present),
AND the request is NOT for the /stats/ folder,
AND the query string does NOT have just ONE parameter, the glcid parameter (i.e. DOES have additional OR other parameters),
AND the query string DOES contain the glcid parameter (and maybe also contains additional parameters)
CAPTURE the glcid parameter value and redirect to URL with appended glcid parameter.

You'll still need rules 3 and 4 from the post above too.

# Remove all parameters except when glcid is present.
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /([^?#\ ]*)\?[^\ ]*\ HTTP/
RewriteCond $1 !^stats/
RewriteCond %{QUERY_STRING} !glcid=
RewriteRule (.*) http://www.example.com/$1? [R=301,L]


# Canonical redirect.
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1? [R=301,L]


The above redirects must be listed BEFORE your internal rewrites.

[edited by: engine at 2:00 pm (utc) on Aug 2, 2011]
[edit reason] fixed typo [/edit]

g1smd

12:01 pm on Jul 27, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The two patterns
!^glcid=[^&]+$
(meaning "not exactly") and
glcid=([^&]+)
(meaning "contains") are crucial to correct operation of the first ruleset, as is
!glcid=
(meaning "does not contain") in the following ruleset.

Clear your browser cache before testing and ensure the redirects are listed BEFORE your internal rewrites.

g1smd

6:24 pm on Jul 29, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ack. Looks like I wasted several hours here.

wzshop

12:42 pm on Aug 1, 2011 (gmt 0)

10+ Year Member



guys, thanks a lot for all your replies. When i use the code of g1smd, none of my urls work anymore..

Guess i just have to ask a professional in this, and pay for it. Nonetheless thanks a lot for all your time!

g1smd

7:15 pm on Aug 1, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You'll need to explain how the rules don't work, as barring a stupid typo somewhere the code should do exactly what you require.

It is likely there's an unwanted interaction with the rest of the code running your site, especially if rules are in the wrong order.
This 44 message thread spans 2 pages: 44