Forum Moderators: coopster

Message Too Old, No Replies

Webpage Tracking mysql table growing to fast!

         

jezzer300

1:54 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



Hi,

It's great having an mysql table tracking all the users progress, but the table is just going to get out of hand in no time.

With 100 visitors per day I estimate an 85mb table, currently it's only 14 per days. (my hosts limit is 100mb)

What's the normal practice for web tracking here?

1. Don't use mysql, use a text log file and analyse that.
2. Archive older results, but I want to go back several months?
3. Archive on to a local server without MB limitations and analyse there.
4. don't want to pay for a bigger DB!
?

Regards,

Jez.

bcolflesh

1:56 pm on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Are you trying to track information that is not found in the normal log created by the server?

jezzer300

2:49 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



Yes, I'm tracking the session id so I can determine how many users viewed each product / page (instead of no. pages which could refer to the same user).

My table:
date_time datetime
page varchar(100)
referer varchar(200)
agent varchar(125)
ip varchar(15)
session_type char(1)
session_id varchar(50)

Keyname Type Cardinality Action Field
PRIMARY PRIMARY 18357 date_time

Space usage : Type Usage
Data 2,718 KB
Index 227,328 Bytes
Total 2,940 KB

Row Statistic : Statements Value
Format dynamic
Rows 18,357
Row length ø 151
Row size ø 164 Bytes

bcolflesh

3:06 pm on Feb 15, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In that case, maybe your should write a script to write/append the info to a text file from the MySQL db and run it via cron - then delete all the db entries so the thing doesn't grow out of hand.

jezzer300

4:01 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



I don't have access to cron.

Are you suggesting to replace the mysql database with a text file? I have been thinking about it, but I assume access to stats would be slow.

How do all these other website statistic tools manage to produce stats so quickly, I assume from a text file?

Nutter

5:18 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



This is just off the top of my head, although I will be doing something similar in the next few weeks for a site I'm working on.

What about a text log file for each day that you import into a database on your desktop computer. So, Tuesday morning you come in and download the stats from Monday and import, then you can delete the Monday file. The run stats on your local database.

I'm planning on doing something like this on my site. My thinking is that dumping the information to a text file is going to be faster than logging to SQL. And, I really am not too concerned with how fast I can access the stats since it'll just be me looking. I'd rather have the main site faster.

- Ryan

Nutter

5:41 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



Sitting here I thought of a couple other things :-)

I don't think a text file will save you any space. Someone will probably come and correct me, but I'm thinking that a database may actually be smaller than a text file.

What is the session_type char(1) field?

You might consider changing the ip address field to a char(). I'm just thinking that it may shave off a little space (admittedly, a very little bit of space). And, they're all about the same length. You won't have one shorter than 7 and none longer than 15.

- Ryan

jezzer300

5:43 pm on Feb 15, 2005 (gmt 0)

10+ Year Member



it's an idea.

I'm "trying" to keep everything online so I or another user can few the stats without downloading etc, it wont all fit in an sql table with one row per page accessed.

Do to this I believe I have the following options:
1. process a text file, with one file per day so I can use ranges. But this could be slow if I want something like "the most popular page" this year.

2. only keep 3 months (for example) of session history, after which I could achieve into daily totals for each required stat, ie, page access or referer (the stats I want) which could be built each week. I would loose the ip, session id, agent etc, but I think I need to re-think what totals I actually require. The trouble with this is that I wouldn't be able to later ask for a new type of stat like you can with the raw data.

BTW - I don't mind about the size of the text file, my hosting company gives loads of space, it's the db that's restricted.

The session type is either, bot, user or X - ignore.
If it's me testing it's an x type.
A recongised bot is B, so I can view bot stats seperately.

noonseclipse

3:28 pm on Feb 16, 2005 (gmt 0)

10+ Year Member



You are duplicating a huge amount of information. Keep the sessionID etc in one table, and then have a second table with the unique record ID and only the page name visited. You should have a unique recordID that does an autoincrement to link the two tables. This will save you about 450 bytes per page.

jezzer300

3:55 pm on Feb 16, 2005 (gmt 0)

10+ Year Member



You're right, doh!

Thank you for your input.

I will now create 4 tables:

1. session - 1 record per session/cookie
session_index int(11) auto_increment
session_id varchar(50)
ip varchar(15)
agent varchar(125)
session_type char(1)
date_time datetime

2. page_visit - 1 record per page
date_time datetime
session_index int(11) auto_increment
page_index int(11)

3. page - 1 record per web page
page_index int(11) auto_increment
page varchar(150) - full web address
page_short (30) - short hand page reference for display

4. referer
session_index int(11) auto_increment
date_time datetime
referer varchar(200)

noonseclipse

4:18 pm on Feb 16, 2005 (gmt 0)

10+ Year Member



Hi,

I agree with table 1 for session. I would put the referrer in that table also - the first time that would be the page they were visiting prior to coming to your site, which would be interesting to track.

In the second table I would but the page they are viewing on your site with the id link to table 1. If you want to see how long they pend on a page, then put the time and date in (a timestamp would be perfect).

As they navigate through your site, say index.php, page1.php, page2.php the referrer would be external page, index.php, page1.php etc. So you would be recording that information twice. And I don't think you need the whole url. Just the script would do. (I think $_ENV['SCRIPT_NAME'] will do it.)

Unless of course you are passing parameters. In which case you could have a separate column for query string ($ENV['QUERY_STRING'].)

Cheers

hakre

4:20 pm on Feb 16, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



ups. these 4 tables wouldn't be my choice, but anyway with 100mb db doing a mysql logging is quite useless i guess. i would suggest using mod_log_sql for apache.

[edited by: coopster at 6:20 pm (utc) on Feb. 16, 2005]
[edit reason] removed url per TOS [webmasterworld.com] [/edit]