Forum Moderators: DixonJones

Message Too Old, No Replies

GA and RegEx

Separating Dr from Andrew etc.

         

chewy

5:20 pm on Mar 10, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi,

So I get it basically about regular expressions but I don't quite know how to apply them to Google Analytics other than just a cook-book application - say as applied on the Lunametrics site or in Brian Clifton's book about same.

Let's say I want to find the occurrence of the term Dr. (with or without the period) in either search history or content viewing using the "Find" box at the bottom of a GA page.

I type in "DR" and I get all instances of DR, such as in Andrew. How do I filter out other instances of DR but only when it is used to mean as a title, say a medical doctor?

Of course DR. also doesn't work, nor does DR with a space.

Thanks,

-C

ergophobe

10:02 pm on Mar 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You want to search for complete words, which means using the zero width word separator character (don't know if GA supports it, but it's a standard regex)

\bdr\.?\b

That will find only dr and dr. In PCRE flavor of regex, you can also set case insensitivity with i flag. If not available, you can make character classes, in which case

\b[dD][rR]\.?\b

That find all case combos (though dR is unlikely!).

Final alternative
\b(Dr¦DR¦dr)\.?\b

Which finds only the three options listed

chewy

10:21 pm on Mar 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



works fabulously !

I remember that WebmasterWorld somehow converts the pipe "¦" to some other character, so type in the pipe key (us keyboards this is the key with the backslash "\" shifted)

Amazing!

ergophobe

10:27 pm on Mar 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>>WebmasterWorld somehow converts the pipe

Yes, thanks for remembering that!

>>Amazing!

I'm glad you share my sense of wonder at the power of regex. I tried to get a seminar of historians to understand why they absolutely had to learn them in order to find textual variants in old documents. They were blown away by some sample searches on a large corpus of texts, but I could see in their eyes that 30 minutes later not a one of them would know what a regex was.

chewy

10:45 pm on Mar 18, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So, ah, is there a cribsheet on how to use this, say and combine one search with another, negatives, etc?

I know some of you guys "think" this way but for the life of me, this is very hard to figure out, remember etc.

g1smd

1:30 am on Apr 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Is "Dr." or "Dr" always placed at the beginning of the string? If it is, you can also start-anchor the pattern.

ergophobe

4:36 pm on Apr 3, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Generally the word boundary search (\b) should catch every case the start anchor (^) would.

They're both zero-width operators, but one is more specific than the other, so there's really no need unless you would want Dr.? stripped at the beginning of a line only, but not in the main text.

chewy

8:55 pm on Apr 24, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



still hoping for a cribsheet on how to do simple booleans or exclusions using regex with GA.

how do I see doctors widgets and doctors'widgets separate from doctors?

-and I know some brains totally get this stuff - I just didn't happen to win that gene in the genetic lottery or something!