Can anyone give me a good explination/idea of where to use indexing for maximum benefit. I've been looking into it a little and turned up not a lot so far.
My database is similar to a forum with tables; members, member posts, stats and suchlike. I've created all my tables with an auto incriment ID field at the start which I've been setting as the primary index purely because I was told you should always have an index somewhere and this is usually the best place.
I'm still not sure exactly of the nature of indexes and the different types and I read somewhere they can slow down INSERT statements if not used wisely.
I'm currently working on a table to log member posts, where they posted and when so that all their posts in different locations can be referenced easily. I seemed to make more sense than querying all the different tables for user id 'x' that may or may not contain posts from them. My plan is to log each post on this table with the member id that posted it, and the table and id of the corresponding post. Is that a good idea? :)
And of course, how best to index it?
Any thoughts more than welcome
1) By primary index, do you mean primary key?
2) Indices always slow down inserts. Its a matter of balancing faster reads against slower writes. Which does that table get more of?
3) Indexes should usually be created for columns that frequently appear in the WHERE clause of queries on large tables...
4).. except where it does not reduce the number of rows a query has to scan: e.g. its rarely optimal to index a boolean.
5) Indexes on small tables may not speed queries up at all, and could even slow them down. If you have ten rows in a tables (e.g. a list of forum sections) do not add indexes....
6) ...if you have hundreds of thousands of rows in a table, you almost certainly need indices on it.
Lots, lots, lots more, but those are important things that come to mind.
Having posts scattered across multiple tables wounds like bad design (not normalised) anyway. Have a posts table, with foreign keys on the member table, and where the post should appear (a sub-forums table, perhaps?).
I think you should post more details, of what you are trying to do.
Hi there, thanks for the info, that's great! I'd love to post more but I'm trying to keep my project to myself just now which makes it a bit tricky! :)
I do mean 'primary key' yes.
Being a user of smf I've taken a look at their table structure and of course you are indeed right in that it seems they have one table that contains all the posts on the whole forum. What's bugging me is that I still don't quite understand why it's better to have all my posts for seperate sections of the site in one big table instead of several more relevant ones.
If on a forum you have 20,000 users all posting and reading, and just one table that needs to be referenced and updated many many times in an instant, how is that beter than having 5 tables effectively 'spreading the load'?
Being totally new to databases and sql, and with but a vague smattering of php I'm using this project to help me learn more about them all. I've managed to pretty much tame the beast of querying, but it now turns out there's a lot more to good structuring than one might think.
Anyway, thanks again for your help, you've pointed me in the direction of taking a better look at what other forums do, and restructuring my db in a new fashion and writing some code for it. The one truely ultimately frustrating thing tho is that I'm not quite sure why I'm doing it that way :(
If you have a table for each section, each time you want to add a section you have to create a new table.
If you have one table that contains the name and ID of all the sections in your forum and another that contains all the posts in the forum (and which section they are in), all you need to do (if you've coded for it already) is add another entry in your sections table to create a new section.
Also by having things in separate tables you aren't really spreading the load as you still will be doing about the same number of read and writes.
If have separate tables and you then also want to add search functionality across the sections, you would have to search through each table, rather than just one.
Be a bit careful of the example of others. I have no experiences of forums, but there is bad database designs in some reasonably popular CMSs.
I suggest you read up on data base normalisation. Also foreign keys and transactions.
Databases keep your data safe, and structure it in a logical and maintainable way. You sometimes need to compromise that for performance, but most websites do not need to, even if they need to they should try caching first (and you can cache the results of queries, as well as web pages).
Slightly OT: if you are writing a forum from scratch, are you using something like plain PHP? Web frameworks are a lot more productive.
I am writing from scratch yes, using plain php. Not sure what you mean by web frameworks, but I'll look into it. Along with 'normalisation' and the rest. A friend keeps insisting I check out drupal (spelling?) but I've resisted because I really want to teach myself some php and I really want to know every inch of my code for maximum tweakability and something truely individual. I'm sure there may be some argumernts that I'm shooting myself in the foot somehow wih that attitude but it is a lot of fun chucking yourself in at the deep end and having to think much harder than usual sometimes. I know my project is a fairly tall order but I think it's doable :D ...sooner or later.
Back on topic I've also noticed that the smf forum 'post table' stores the poster ID, also their name and their email, which are also obviously on the 'members' table. I was led to believe that it was 'reasonably bad database practice' to be storing the same info twice, but maybe I've taken that a bit too literally? In some cases it would be a lot handier, and save the need to join tables and cross referencing when you just want a member name associated with a post. I've been trying to avoid at all costs storing anything twice, and stashing purely the member ID and then using that to reference the members table to get their name. Hmmm. I guess it's more a case of weighing up the need to store certain things a few times in different places for ease of retrievingthe info you want with minimal querying and table joining.
Thanks again for the pointers!
Glad to help. I am sure you are having a a lot this way. I once write my own (very simple, but reasonably fast and flexible) web server in TCL in somewhat a similar spirit.
The thing about not using a framework is that in order to keep things clean you are likely to end up writing one yourself. I use Django (Python not PHP, but Python is a *much* nicer language), the most popular is probably Ruby on Rails (written in Ruby). There are a number of PHP ones: Symfony looks most attractive to me. All of the above make working with databases easier as well.
|I was led to believe that it was 'reasonably bad database practice' to be storing the same info twice |
Depends what they do. I haven't checked the code but if you wanted to search posts along with the poster name and/or the poster has closed his account, you would still want to find the relevant posts. So the post table would keep information about the poster at the time of his post. It's history recording and not always the same as what's in the current members table.
Blimey! That's so obvious I never thought of it :D
Of course it makes total sense. Having just a member ID means if that record is deleted in the members table is scuppers everything that references their name.
Thanks for that! Definately worthy of note.
That's the trouble with being new to something and taking one person on their word about something so general.
Cheers again, I need to sort that right now. Funny how the simple things sometimes escape you. It'll simplify a lot of my queries too! :)
You've positively made my day, apart from the obvious rewrites I now have in hand.