Forum Moderators: coopster
SELECT data.* FROM data INNER JOIN (SELECT MAX(id) AS id FROM data GROUP BY url) ids ON data.id = ids.id
I've been using this technique with success, but now I do feel the need to use max(timestamp) instead, since I'm adding entries with older dates. Using max(id) would make the older entries jump in front of the new ones since the old ones' ids would be higher. Shouldn't the updated code be like the one below?
SELECT data.* FROM data INNER JOIN (SELECT MAX(timestamp) AS id FROM data GROUP BY url) ids ON data.timestamp = ids.id
I understand now why this isn't working. max(id) was matching highest ids grouped by "urls" (from the linked example) with ids, but if I look for max(timestamp) it can match more than one row when inner joining timestamps, right?
Having timestamp as a key doesn't sound like a good idea. Any thoughts?
If you have build it like the above method but you have include also hours, minutes and seconds, then no... it will be hard to get more than a couple of records (at best, providing two records were inserted on the same second)
Of course you can GROUP BY a "name" column, to find 2 or more identical ones and using timestamp, to decide on what is older/newest
id¦ store--- ¦price ¦ timestamp
1 ¦ store A ¦ 1.23 ¦ 2005-08-04 13:03:12
2 ¦ store B ¦ 1.48 ¦ 2005-08-04 13:33:12
3 ¦ store C ¦ 1.20 ¦ 2005-08-04 15:03:12
4 ¦ store C ¦ 1.23 ¦ 2005-08-06 10:00:02
5 ¦ store A ¦ 1.50 ¦ 2005-08-10 13:03:12
6 ¦ store A ¦ 1.48 ¦ 2005-08-11 13:03:12
7 ¦ store A ¦ 1.05 ¦ 2005-08-12 13:03:12
8 ¦ store D ¦ 1.98 ¦ 2005-08-20 13:13:12
9 ¦ store A ¦ 1.05 ¦ 2005-08-31 13:03:12
ids are auto-incrementing, and my results should be the following:
id¦ store--- ¦price ¦ timestamp
2 ¦ store B ¦ 1.48 ¦ 2005-08-04 13:33:12
4 ¦ store C ¦ 1.23 ¦ 2005-08-06 10:00:02
8 ¦ store D ¦ 1.98 ¦ 2005-08-20 13:13:12
9 ¦ store A ¦ 1.05 ¦ 2005-08-31 13:03:12
I will manually add historic price data, so this same query would end up returning probably this (check out the year in the timestamp):
id ¦ store--- ¦price ¦ timestamp
10 ¦ store B ¦ 0.33 ¦ 1912-08-04 13:33:12
32 ¦ store D ¦ 0.75 ¦ 1983-08-06 10:00:02
45 ¦ store A ¦ 1.10 ¦ 2002-08-20 13:13:12
28 ¦ store C ¦ 0.75 ¦ 1990-08-31 13:03:12
this is why max(id) won't work for me, I need it to keep giving me the most recent data, like the second example. could you give me an example of the query you mean with your suggestions?
First I converted the timestamp to unix time. Then I ran the following query:
SELECT p1.store, p1.price, p1.timestamp FROM data as p1 LEFT JOIN data AS p2 ON p1.store = p2.store AND p1.timestamp < p2.timestamp WHERE p2.store IS NULL
it worked, but the problem is that it's just toooooooo slow. taking more than a minute to run on 20k records. any tips on indexing?
If or example you keep track of year-month-day, then there is a point into indexing the timestmap colum. Now that you track hours, minutes and seconds there is no point at all.
Idea: since you tried to convert your timestamp colum to unixtimestamp, try to do it again but this time, use only year, month, day, something like mktime(0, 0, 0, $month, $day, $year);
This will provide enough similar records for you to index.
Note: if you know that day is useless (if for example you get 2 orders per month,keeping track of days for indexing purposes is useless), try to negate day also, so as to generate a wider range timestamp for you to index.