Forum Moderators: coopster

Message Too Old, No Replies

How Do I Cull a Database .

.. to keep only those rows I need

         

old_expat

11:04 am on Sep 2, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have a spreadsheet that I downloaded. I can load it onto my mysql desktop. It has 65,000 rows (8+ mb). I only want probably 5% of them, based on certain keywords in a 'name' or 'description'.

But I want to delete any row that does not have those keywords .. or save any row, that does have those keywords, in a new table .. which will become the new database.

I'm really a newbie at this so please be gentle.

chrisjoha

11:22 am on Sep 2, 2005 (gmt 0)

10+ Year Member



I guess you could just load it all, then do

DELETE FROM tablename WHERE name not like '%criteria%' and description not like '%criteria%';

old_expat

4:30 am on Sep 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hello chrisjoha,

Thanks for the reply.

I guess I could load the original spreadsheet then do the query you suggested, save the results, reload the database and do a second query (different fields), save those results, then add both sets to a new database.

So I am sure to have some duplications.

Is there a query that looks for duplicate records in a database?

I tried to find such an animal once, and looked around quite a lot and couldn't find one.

grandpa

4:38 am on Sep 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would tend to want to save the spreadsheet into a delimited text file, and then build a script to read that file and populate the database. At the same time you can prevent duplicate entries.

If you want to load it all into a database and then delete the unwanted rows, you can define your keyword field as UNIQUE in the table. That should help prevent a lot of duplicate records. But for all the work that would be involved in removing unwanted rows, I'd really consider the first option. Do you really want to manually delete about 60000+ rows?

old_expat

10:22 am on Sep 3, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"I would tend to want to save the spreadsheet into a delimited text file..."

That's what I had planned on doing.

".. and then build a script to read that file and populate the database."

I think I know how to do that .. with phpmyadmin?

"At the same time you can prevent duplicate entries."

Now THAT .. I don't have a clue how to do.

coopster

8:07 pm on Sep 6, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Have you considered using a temporary table for the mid-way processing?
  1. Create a temporary table
  2. Import ALL your spreadsheet data into the table
  3. Create the permanent table (probably will have the same exact columns as the permanent table)
  4. Use a query statement to select only those rows that you want from the temporary table and insert them into the permanent table.
INSERT INTO perm_table 
SELECT * FROM temptable
WHERE mycolumn LIKE '%myvalue%'
;

[dev.mysql.com...]