Forum Moderators: coopster

Message Too Old, No Replies

More Efficient MySQL Data Retrieval

         

NickCoons

5:21 am on Dec 21, 2003 (gmt 0)

10+ Year Member



To store all of my traffic information, my site keeps two sets of files. One called the "counter" that simply records the IP address of the client from the current page view in a file that is named with the current date, and the other called the "tracker" which records information about every single page view sent by the client (including IP address, page accessed, referer, etc).

So far, this has worked well. To find the number of page impressions, I have the program count the number of IP addresses in the counter file. To find the number of visitors, I go by unique IP addresses.

I've decided that I would like more functionality, like being able to pull up certain dates, display graphics showing traffic trends, etc.. so I've converted all of my flatfiles to MySQL. The new database currently has over 100,000 records, and will grow by one record for each page view of the site. With this method, I can now create about any script that gives me the information that I want.

Unfortunately, a simply lookup seems to take longer than I'd want it to. For instance, if I want to know the number of impressions for today, I would use this query:

SELECT * FROM Table WHERE Date = '$Date';

This takes about one second, which I can live with (can't use COUNT(), you'll see why in a minute, but performance-wise it doesn't seem to matter anyway). But if I want to know the number of impressions from $StartDate through $EndDate broken out by day, I have to make multiple queries, one for each day. So if I wanted to find the amount of traffic from day 1 through day 30 of a given month, it would take about 30 seconds to retrieve this data. It would also take about that amount of time to retrieve this data to display a graph showing traffic trends throughout the month.. and it would take a considerably longer period of time to generate a graph that would show traffic over a year or more. Then it gets more interesting when you find out that I have to compare each record's user agent to make sure that it's a real visitor (I don't count bots in my traffic).

Is this normal, or is there some magic query that I'm not thinking of? It doesn't seem that the number of records returned has anything to do with it, because I used Date and Time to narrow my search down to a single record, and the return time 1.10 seconds.

NickCoons

6:43 am on Dec 21, 2003 (gmt 0)

10+ Year Member



Perhaps a "nevermind" is in order. Creating an index for the Date field seemed to massively increase performance.

This being the first index that I've created (other than a primary key), what disadvantages does having an index cause? If I wanted to search by other fields as well (i.e. user agent), would it be a good idea to create indexes using all of those fields? Even though this would increase my search performance, will it cause other performance hits elsewhere, or take considerably more space?

figment88

7:00 am on Dec 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



an index will increase the amount of diskspace required but more importantly on the negative side it increases the overhead on inserts and updates.

While each site visitor will only have a tiny degradation of performance in return for your great increase in efficiency, IMHO it does not seem like a good tradeoff to decrease the performance of site visitors to increase the performance of admin tools.

NickCoons

7:25 am on Dec 21, 2003 (gmt 0)

10+ Year Member



<but more importantly on the negative side it increases the overhead on inserts and updates.>

After a quick test, it appears that a particular insert operation was timed by MySQL as taking 0.00 seconds, so it doesn't seem to increase overhead that much. An update test took the same amount of time.

<While each site visitor will only have a tiny degradation of performance in return for your great increase in efficiency, IMHO it does not seem like a good tradeoff to decrease the performance of site visitors to increase the performance of admin tools.>

I agree.. but it doesn't look like there will be any noticable degradation for inserts or updates, so I think I'll be okay.

jamesa

12:51 pm on Dec 21, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> If I wanted to search by other fields as well (i.e. user agent), would it be a good idea to create indexes using all of those fields?

Yes. You can have any number of indexes and it's best to have indexes for the types of queries you commonly perform. Check out my post in this thread [webmasterworld.com] for more on that.

>> on the negative side it increases the overhead on inserts and updates

That's true. So what would be the best way to build an app like this without losing performance on either end? According to the mysql manual LOAD DATA INFILE is the fastest way to insert, so I would guess that logging to a flat file and periodically importing it to the database would be the best bet. Any other ideas?

daisho

2:50 pm on Dec 22, 2003 (gmt 0)

10+ Year Member



Why not look into a commerciall logger? Apache logs IP address and Page into it's log already. Pick up webalizer or there are a ton of other ones and they will process the log and give you lots of information.

daisho.