homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / Databases
Forum Library, Charter, Moderator: open

Databases Forum

Conditional mysql insert question - how to prevent duplicates

 9:58 pm on Aug 3, 2011 (gmt 0)

Scenario: I am tracking/saving search referrer data including the search $terms and their resulting $url from my site, each in their own column.

I DO want to add an entry even if the $terms OR the $url are already in the database but NOT if an entry has both already. If the referrer data $terms and $url match a previous entry I'd like to either not add a new entry OR increase the count on the previous entry, whichever is easier.

For the sake of simple how would you not insert a new entry if $terms AND $url are in a previous entry?

Current syntax
mysql_query("INSERT IGNORE INTO tablename (term, url)VALUES ('$terms', '$url');")


brotherhood of LAN

 10:09 pm on Aug 3, 2011 (gmt 0)

Is there a PRIMARY or UNIQUE key on the table already?

If not,

ALTER IGNORE TABLE `tablename` ADD PRIMARY KEY (`term`,`url` ) ;

This will add the key and also remove any duplicates.

INSERT IGNORE tablename (term,url) VALUES ('$term','$url') ON DUPLICATE KEY UPDATE countercolumn = countercolumn + 1

This will increment counter column by one if the term/url combo already exists. It relies on the primary/unique index and reverts to updating when the value already exists.


In general, you should try to avoid using an ON DUPLICATE KEY clause on tables with multiple unique indexes.


 10:14 pm on Aug 3, 2011 (gmt 0)

That looks like it should do the trick but there is a current primary key, it's an auto-increment ID field. Should I remove it? If the count increases on individual entries when a duplicate attempt is made then I don't need the auto-increment ID field anymore. Or do I if it's best not to use ON DUPLICATE.. ?

Thanks for the response.

brotherhood of LAN

 10:20 pm on Aug 3, 2011 (gmt 0)

Hard to say. Removing it puts you at risk of having duplicate values for the current PK as auto_increments have to be an index of some sort (in this case, a standard INDEX that accepts duplicates).

If you don't foresee that as a problem and you don't regularly query the current PK row then feel free to give it a try. If you don't reference the current PK field in INSERT's to the table then the duplicate problem should never arise.

If the table's primary role is to store unique $term/$url combinations and the current PK is merely a more compact reference to it, I'd say change to the new PK.


 10:31 pm on Aug 3, 2011 (gmt 0)

I changed to the new PK you suggested and it works, I didn't reference the ID anyway. One thing of note is that there is a 1000 byte limit on PKs so the combination of keywords and url's can't exceed 1000 characters. So - with varchar 255 on keyword and varchar 744 on url I shouldn't run into any problem, I don't think. I don't need long keyword strings and want to leave room for long urls in case of parameters etc...

Thanks your your help!

brotherhood of LAN

 10:39 pm on Aug 3, 2011 (gmt 0)

Indeed, I've never considered 1000 byte keys in practice, even if it's multi-byte text encoding...

You could probably get away with a 16 byte PRIMARY key by adding a BINARY(16) column for MD5 values.

INSERT IGNORE tablename (hash,term,url) VALUES (UNHEX(MD5('$term$url')),'$term','$url') ON DUPLICATE KEY UPDATE countercolumn = countercolumn + 1

// added

If you were to use MD5, you'd probably want to turn the $term URL to lowercase if case is unimportant for that field, as it would generate separate MD5's for lower and upper case.

[edited by: brotherhood_of_LAN at 10:43 pm (utc) on Aug 3, 2011]


 10:43 pm on Aug 3, 2011 (gmt 0)

I don't think I'll need the hash for this particular use but thats a great idea.

One last question re: timestamp

I'd like to add a timestamp column that uses ON UPDATE CURRENT_TIMESTAMP so that the data can be sorted by most recent, do you envision any conflict with the above? I don't need the date each search was performed and it would be complicated to store anyway with the above.

[edited by: Sgt_Kickaxe at 10:46 pm (utc) on Aug 3, 2011]

brotherhood of LAN

 10:45 pm on Aug 3, 2011 (gmt 0)

That would be fine. ON DUPLICATE KEY essentially is an INSERT or UPDATE in the one query. It's just like a regular insert/update.


 1:49 pm on Aug 4, 2011 (gmt 0)

It's working almost flawlessly.

The apparent remaining bug is that the url table is ignoring not null on urls occasionally. It seems to be getting ignored 1 time in 25 from Google only. I do have some php that first tries to extract the google web url variable (url) and if that fails it tries the image url variable (imgrefurl) but I don't see any spacing issues. Something has to be in there or the php would not open the database connection, a string perhaps?

Any ideas what might cause the empty value in a not null field?

brotherhood of LAN

 3:51 pm on Aug 4, 2011 (gmt 0)

If you think it's the value pairs that are going into the DB, try logging it in a tab delim list and have mysql_affected_rows() alongside the insert data

With ON DUPLICATE KEY UPDATE, the affected-rows value per row is 1 if the row is inserted as a new row and 2 if an existing row is updated.

So you'll see how it's behaving from the start.

Though I'm not quite sure by what you mean by
is ignoring not null on urls occasionally

If URL/term are a primary key, all unique combinations are inserted and the increment column is updated otherwise. If data is not being inserted/updated at all it's happening before this query or the query is returning an error (have you escaped the data?)


 9:19 am on Aug 5, 2011 (gmt 0)

I tracked the problem down to my handling of images in search that appear as thumbnails in regular web results. The database is fine.

Thanks again.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Code, Content, and Presentation / Databases
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved