Forum Moderators: open

Message Too Old, No Replies

Extracting Japanese Search Terms from Logs

I can't seem to do it...

         

Josefu

3:00 pm on Sep 27, 2003 (gmt 0)

10+ Year Member



Dear all and everyone : )

My site is in Japanese and English, but it is hosted by an American company. I cannot seem to extract any of the search terms used to find my site - it always turns up as gobbldygook in the logs (no matter which app I use) - it may be because the query is always "search=%U2%...." - it may be the "=" sign that is screwing up the encoding. Has anyone found any solution for this, or perhaps found an application which can bypass this problem? Thanks in advance for any and all help.

Take care,

Josefu.

takagi

3:31 pm on Sep 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What if you just copy the referer in the address bar of your browser (Internet Explorer, Netscape Navigator etc) and press the enter key?

Josefu

4:04 pm on Sep 27, 2003 (gmt 0)

10+ Year Member



D-oh! Sorry for not getting back to you right away but I was busy tapping my head against the wall...

LOL, a bit long but works like a charm. Thanks! : )

takagi

4:12 pm on Sep 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Your welcome.

I'm sure there are better ways to handle this, if you have to do it quite often. But for now, at least you can see what is in the logs.

Josefu

4:26 pm on Sep 27, 2003 (gmt 0)

10+ Year Member



Yes, like knowing what odd things people are putting into their computer to find my site... "Frenchpub" - LOL.

Thanks again : )

David_M

2:13 am on Sep 29, 2003 (gmt 0)

10+ Year Member



You may need to write your own script to process the log file.
I wrote a cgi script that uses the following:
$search =~ s/\\x([0-9a-f][0-9a-f])/\%$1/gi;
$search =~ s/%([0-9a-f][0-9a-f])/pack("C",hex($1))/egi;
&jcode'convert(*search, 'sjis');

Josefu

11:13 am on Sep 29, 2003 (gmt 0)

10+ Year Member



Whoo - thanks a lot David, but that is Waaaay over my head. Could you please explain a bit? I do understand the last line, but I've never done this sort of thing before.

Thanks : )

bill

12:22 am on Sep 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



A Japanese version of Analog on a Japanese OS will do what you're looking for quite nicely.

whats up skip

6:03 am on Sep 30, 2003 (gmt 0)

10+ Year Member



Analogue on an English OS will give you most of it.

Josefu

9:53 am on Sep 30, 2003 (gmt 0)

10+ Year Member



Analogue on an English OS will give you most of it.

Yes, it is exactly that the problem, most of it. I need those search terms and Analog won't extract them. Even when I change the the settings file to shift-JIS, EUC, or any other form of Japanese encoding : P

Thanks all the same : )

I WILL look for the 'Japanese version' - but I'm on Mac OS X so perhaps there isn't one...

bill

2:01 am on Oct 1, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The Japan Analog User Group [jp.analog.cx] might be able to help. They list Windows, Linux and Solaris binaries. It looks like a Mac binary is still in the works. Maybe you could emulate a Windows machine in a virtual PC to run something like this.

Josefu

7:13 am on Oct 1, 2003 (gmt 0)

10+ Year Member



Yes, I can! (I already use Virtual PC to look at pagerank : )

Thanks a lot for the address and the tips! I'll let you know how it goes - perhaps this could be useful to others in my predicament.

Take care, Greetings,

Josefu.

(added) Ping!

That's just it, incompatabilities with an Apache (linux) server and Japanese encoding. Never knows how to translate. Thanks a million for that address! David_M: your character replacement cgi script seems to be right on the ball, but I'd have no idea where to stick it. Looks like ah'v got sum lernin' to do...

asinah

12:06 pm on Oct 17, 2003 (gmt 0)

10+ Year Member



We have about 2000-3000 searches and I am using analog and the reffering log comes out nice with analog.

I zip the log file and use it on Japanese Windows. (Any 4bit should do that includes GB and Big5, Korean and even Arabic & Thai)

My site is hosted in Canada and it is RH Linux 9

I even open the file with wordpad and since I rotate logs every three days the files are not that large and 2-3 mb can be sometimes loaded in Explorer.