jdMorgan

msg:3403625 | 10:32 pm on Jul 24, 2007 (gmt 0) |
The patterns used in the PERL code are nothing more than extended regular expressions -- similar to those used in mod_rewrite, PHP, and many other scripting and "utility" languages. Try a search for PERL regular expressions, and you should find plenty of useful info. One thing to beware of is that the latest Netscape 9 browser (released as a beta) has been reduced to little more than a 'skin' and a few extensions on top of Mozilla Firefox, and now carries a Firefox User-agent string with "Netscape Navigator" tacked onto the end. Example from a WinXP user in the US: "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.5pre) Gecko/20070712 Firefox/2.0.0.4 Navigator/9.0b2" Yet another anomaly to deal with... :) Jim
|
JAB Creations

msg:3403637 | 10:59 pm on Jul 24, 2007 (gmt 0) |
Now you've mentioned regular expressions...so here is my take on what I've read thus far... According to Wiki... | ^ Matches the beginning of a line or string. $ Matches the end of a line or string. |
| So I attempted this... | my $regverexampleMozilla5=^Mozilla/5.0$; |
| It's obviously not valid if the script breaks. Is there simply no direct method of detecting an exact string in Perl? - John
|
jdMorgan

msg:3403656 | 11:24 pm on Jul 24, 2007 (gmt 0) |
Well, one problem is that you'll rarely see exactly "Mozilla/5.0" because the user-agent string has all that other stuff bolted on it, as shown in my Netscape UA example above. Of course PERL has an exact match, but you've got to satisfy both the required regular expressions syntax, and PERL's own syntax, which is why I suggested a search for PERL regular expressions. The second result on Google, for example, neatly answers your question about "/i" on the very first page... Jim
|
JAB Creations

msg:3403701 | 12:22 am on Jul 25, 2007 (gmt 0) |
| to satisfy both the required regular expressions syntax, and PERL's own syntax, |
| I can understand this to roughly a third of what it implies. I can understand basic singular characters and what they may imply. I have roughly a third of the required understanding however my brain doesn't pick up patterns like a developer. In regards to programming my brain works best on replication, without a clear cut example to replicate I am only able to accidentally find my answer. However in regards to design I don't need to rely on math, just visuals and thus I'm able to construct what I need from scratch much easier. So I understand the basic implications of expressions and the basic implications of operators. I am clueless how we are mixing them as I only roughly understand what defines each group from the other and in my head it is again a visual understanding in place of the logic that a developer works with. My guesses include this... | ^ Match the beginning of the line |
| | $ Match the end of the line |
| So I'd adapt the string from... to... I assume we must escape slashes (I understand that this is a filtering array of some sort as I can create filters for a string but that is only my best guess as the much more general situation)...so I would adapt it as so.... I understand in PHP that a . connects two things...working from Awstats's (key part here) already working example of this... | my $regfavico=qr/\/favicon\.ico$/i; |
| ...it is my understanding that I must escape the dot as an operator. My adaptation mutates to this... Still unless the regular expression is doing an exact match with ^ and $ it is completely unclear if I'm executing an exact match. Is this how to exact an exact match using regular expressions (minus the fact that we're mixing operators)? Perl's page may describe to you about "/i" but it does not to me. "i" is case insensitive...so I don't need this if I'm doing exact matching (to be exact about exact, it automatically implies case sensitivity automatically as that of course is part of what exact implies). I still also do not understand what "qr" is. "/" is used to escape...so what is the point of "/i"...escaping case sensitivity? So by my designer's visual logic this is currently my best guess... | my $regverMozilla5=^Mozilla\/5\.0$ |
| ...it breaks the script though. So hopefully you'll be able to explain to me what I'm missing, where my visual logic is failing at literal logic, and my understanding will align that way, I hope... Thanks for your help! - John
|
phranque

msg:3403714 | 1:03 am on Jul 25, 2007 (gmt 0) |
a backslash rarely hurts but the first one isn't necessary. you can escape any character with a preceding backslash, but it is only required to escape the following special characters: you might want to put your regular expression string in quotes: | my $regverMozilla5 = '^Mozilla/5\.0$'; |
| now it is an exact match regular expression for:
|
phranque

msg:3403720 | 1:15 am on Jul 25, 2007 (gmt 0) |
| I don't know what qr/ and /i do exactly |
| qr is a perl operator for quoting regular expressions. the /i is an option to ignore case sensitivity in the regexp alpha characters.
|
JAB Creations

msg:3403741 | 2:28 am on Jul 25, 2007 (gmt 0) |
Thanks phranque, this works perfectly! | my $regverMozilla5='^Mozilla/5\.0$'; |
| However it wasn't without a custom string attached of mine! It's obviously a spoof and spoofing is against my site's TOS. That translates in to no access log lines with that useragent AND a normal code (200). So it took me a moment of playing around with a temporary access log as Awstats does not display non-normal codes for things like browser hits. I changed the response codes around for a specific string (301s). Apache redirects (changed a txt file to php to enforce my TOS on my Adblock filter subscription) before PHP gets a chance to execute (not hard to figure out) so I just changed the 301 redirects to 200s for the sake of testing and it works fine. I have exactly 682 instances in my test case, and it detected exactly 682 instances. Anyway thanks for all the help to both of you. I have a better understanding of regular expressions and I'm a major step closer to exceptionally accurate browser statistics. :) - John
|
|