Forum Moderators: DixonJones

Message Too Old, No Replies

Google toolbar log entries causing keyword analyis issues

Google toolbar log file entry keyword analysis problem.

         

Tonerman

4:02 am on Mar 26, 2005 (gmt 0)

10+ Year Member



I posted the following on a searchenginewatch forum and I am posting it here because I need info on this problem. Maybe it isn't a problem and I just don't know what I am doing!

I noticed that my Netracker Professional 7.5 log analyzer was not picking up all the Google keywords on initial referrals. I checked and my free version of Webfunnel that I use for quick traffic analysis during the day wasn't picking up all the Google keywords either.

In addition to the above log analyzers I also use some custom macros with Ultraedit, a powerful text editor, for special tasks. I looked into the log files and found incoming google traffic with lines like this:

proxya.scott.af.mil - - [25/Mar/2005:11:27:29 -0500] "GET /online-store/scstore/p-02100.html HTTP/1.1" 200 12728 "http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLD,GGLD:2004-19,GGLD:en&q=HP+2200+toner+" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322)"

I've bolded the area of the log file that appears to be screwing up my log analyzers. They were not picking up "HP+2200+toner" as a keyword. Running around on the web trying to learn more it appears that these log entries originate from people doing searches with the Google toolbar. There are two formats of this type. The one like the above log file entry and this one:

dsl-KK-static-231.202.95.61.touchtelindia.net - - [25/Mar/2005:08:02:02 -0500] "GET /online-store/scstore/c-Omnifax.html HTTP/1.0" 200 11654 "http://www.google.com/search?hl=en&lr=&rls=GGLD%2CGGLD%3A2004-35%2CGGLD%3Aen&q=XEROX+OMNIFAX" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)"

I've bolded the key area of this log file entry also. In reading posts from other people looking at these log entries the best guess was that the letters "rls" stood for "release" and the following "2004-35" was the version and date of the specific Google toolbar being used. Sounds logical to me.

In any event, neither Netracker or Webfunnel could pick up the keywords in these log entries. Losing all the keywords from Google toolbar users coming into my site was screwing up my Adwords ROI analysis and messing with tracking my organic SERP SEO efforts also.

I studied the structure of the entries and wrote a macro using Ultraedit that stripped out the toolbar part of the entry and rewrote it to look like a strandard Google logfile log entry. Because there are two versions of these log entries I had to rewrite separate macros for each format.

Running the macros against three weeks worth of log data was incredibly slow. About 6 hours to run it twice on about 275,000 log entries. On the otherhand it was the only available tool to do this with - search and replace is useless with so many different Google toolbar release dates, etc.

I tested it today with both Netracker and Webfunnel on my traffic today and the difference in key word analysis was incredible. Before cleaning up the log entries they both picked up about 85 keywords. After cleaning up the log file with the macros I wrote to rewrite the Google toolbar log file entries into traditional Google format they picked up 124 keywords!

I am writing this post to alert anyone trying to do keyword analysis on their PPC ads, or anyone doing SEO, about this issue. If someone knows how to solve this problem when using log analzers like Netracker please let me. If anyone has any other insight on this please let me know also.

plumsauce

7:35 pm on Mar 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have been working on this exact thing for the last two days. So far the software is churning through this stuff at the rate of 500 entries/sec looking at all search engines, not just google. the performance is linear, so 275k lines should have taken 550 seconds, or about 10 minutes.

Tonerman

8:59 pm on Mar 26, 2005 (gmt 0)

10+ Year Member



Not sure I undeerstand. Do you mean you are working on the google toolbar user data issue in the log files, or analyzing log data, or what? The macro's I wrote go into the log files and reformat all the lines in the log file from a log entry like this:

p86-135.acedsl.com - - [26/Mar/2005:10:43:59 -0500] "GET /online-store/scstore/p-NEC870MR.html HTTP/1.1" 200 11860 "http://www.google.com/search?hl=en&rls=GGLD,GGLD:2004-07,GGLD:en&q=nec+superscript+870+toner&spell=1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

into a log entry that looks like this:

p86-135.acedsl.com - - [26/Mar/2005:10:43:59 -0500] "GET /online-store/scstore/p-NEC870MR.html HTTP/1.1" 200 11860 "http://www.google.com/search?q=nec+superscript+870+toner&spell=1" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1)"

Please note how it has rewritten the log entry to look similar to the usual non-google toolbar log entry. This is a difficult find and replace issue since the area of the log file being rewritten varies text-wise widely.

Tonerman

9:05 pm on Mar 26, 2005 (gmt 0)

10+ Year Member



By the way - it only alters the specific google toolbar entries. It is running on the entire log file across all engines and ignores lines that do not need changing. I've speeded this process up to about 1/2 hour on 250K lines, with some pauses for me to load or save 65K chunks of data, reload a specific macro, and related human help for the macros.

plumsauce

7:21 pm on Mar 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member




The process I am following is different than yours, it is directly processing the query/phrases/keywords into database tables. So I am not rewriting log files, instead I am writing new entries into three database tables.

Tonerman

12:17 am on Mar 29, 2005 (gmt 0)

10+ Year Member



Maybe it isn't a problem and I just don't know what I am doing! Yes - when all else fails reboot, follow directions, and get your head out of your google. After dumping all the data, reimporting it and checking for keywords all of them displayed fine. This thread, along with the poster, should be deleted. Tonerman