Forum Moderators: open
I'm designing a search engine for a web app (in php/mysql), and I'm trying to figure out how to design the database to efficiently handle all of the INSERT statements.
Users will run the script to extract keywords from their documents. It extracts anywhere from 100-10,000 keywords at a time and inserts them into the database. There will be a function to extract related searches (kinda like a Technorati cloud).
My current plan is to structure the table like this:
id(int;primary;auto-increment), keyword (varchar 255), magic_number(int)
Each keyword extracted from the document will get it's own row.
The cloud() function will then do a SELECT %keyword% from the table to gather all of the rows with that keyword in it.
I was thinking that it would be faster to just lump all of the keywords into a single "text" field (one row per document), but I don't think I will be able to implement the cloud() that way.
So my question is... Will inserting 10,000 rows at a time bring my database server to its knees? Is there a better design I should consider for this sort of thing?
Thanks for your help!
Will inserting 10,000 rows at a time bring my database server to its knees?
10k is not that many and I haven't experienced any problems myself. But, it does depend on your setup such as hardware, indexes, memory, table type etc.
Is there a better design I should consider for this sort of thing?
I've had tables with billions of records in MySQL and life has been pretty good so I'd say it'll work for you just fine.
If you want to put the words into a text field then you can get decent performance IF you can cache the results of a fulltext query all the time. Otherwise queries start to get very slow rather quickly.
JAG
Since posting, I've been thinking about caching all of the data in flat files first then doing a LOAD DATA INFILE batch a couple of times a day.
Also - I should probably put an index on the keyword field since their will be a lot of wildcard searches on it right (%keyword%)? Will that slow down the inserts significantly?
The server is a P4 3.0GhZ with 1GB of RAM.
Thanks!
My usage estimates starting out are about 2million inserts per month.
Absolutely no worries then.
Since posting, I've been thinking about caching all of the data in flat files first then doing a LOAD DATA INFILE batch a couple of times a day.
You could.
Also - I should probably put an index on the keyword field since their will be a lot of wildcard searches on it right (%keyword%)? Will that slow down the inserts significantly?
As a general rule any inserts into a field that has an index will be slower than one without. I typically try to stay away from doing any wildcard queries when possible and in this case you may want to consider doing that as well. Unless of course you need to. Nonetheless you would still want an index on that field no matter what.
JAG
Insert (select 'MyKeyword' sequnce_nbr
from new_sequence_table)
into insterted_table
where new_sequence_table.sequence_nbr between 1 and 750
Not sure I have the syntax right, but the idea is to only send one SQL statement to insert 750 rows. This is a simplification of my problem, but hopefully the concept will help someone. It improved my perfomance by 100 fold.