|Php + Sql for a "price-comparison" website|
| 5:49 pm on Feb 2, 2009 (gmt 0)|
I'm currently trying to make a small price comparison website. To show all the prices, companies etc correctly with all the products I first created a table called "product". The table product contains all information regarding a specific product (showing stuff from the product tale works fine). I now also have a lot of XML files from retailers containing all their product information: price, name, productid, producturl, etc etc.
What I need to do is incorporate these XML files in my database in a correct and efficient way. For this I could really use some help :).
What I thought up was to import all the XML data in a "partner" table. Here I create an extra kolom called "partner.myID" where I insert the ID number found in the "product" table that matches the specific product (using a join). Then when I show a product page I create a join where it uses product.id to find the matching partner.id and then collects the information neccessarry from that "partner"-row.
However, I do not know if this is efficient and if you guyz might know a better way. For instance, I have no idea if things will start to jam up when I keep adding products in the partner table from 10+ shops. Also, isnt there a more efficient way then my join to do this?
Tips or experiences are greatly appreciated.
- Linux server
- Mysql database
- Dreamweaver to do my coding
- Navicat to make tables and insert XML feeds etc.
I'm a newbee when it comes to SQL databasing. So, even tips on what kind of variables I should use and how to structure everythg are welcome...
| 10:04 pm on Feb 2, 2009 (gmt 0)|
You certainly have more products that partners, so I'd suggested adding a partners_id field into product table, not vice versa:
create table products (
products_id int(11) not null auto_increment,
create table partners (
partners_id int(11) not null auto_increment,
To read all products by a certain partner:
select * from partners prt left join products prod using(partners_id) where prt.partners_id = 19;
To insert data, do not use joins. Just read partner table one-by one, process corresponding XML file, insert data from that file into products table (setting current partners_id).
When looking for a certain EAN:
select * from partners prt left join products prod using(partners_id) where prod.partners_ean = '0011232451234';
I suppose that config should go perfectly with thousands of partners and millions of titles (secondary indexes will make sure your DB won't slow down over time).
Hope that helps!
| 10:06 am on Feb 3, 2009 (gmt 0)|
Thanks. Few things still remain, this industry doesnt seem to use ean codes (why I really really dont understand :) ).
I'm going to explain how I think you ment to do this, because some stuff doesnt ring a bell to me (i'm realy not used to make complex database stuff).
1. I make two tables, one "product" and one "partners".
2. In both I index the "id" fields for easy and quick lookup.
3. I implement all the partners in the "partners" table
4. using a left join I mtch the product.id to all the matching partners.id to get the correct information.
I will manually have to insert the partner.id because of lack of ean codes and beacuse product names tend to differ slightly.
- What do you mean by "prt"
If I didnt get something please feel free to correct me.
Thanks for the help!
| 10:25 am on Feb 3, 2009 (gmt 0)|
Yep, that seems to be correct.
select ... partners >>prt<< ... >>prt<<.partners_id = 19;
So `prtī is table alias (can be whatever you want) -- you should use these when you have same name columns in different tables, to be able to differentiate partners_id from partners table and partners_id from products table.
| 10:52 am on Feb 3, 2009 (gmt 0)|
Oke thanks, Ill get on it then! :)
About PRT, wouldnt it be easier to call the partners.id partners.productid or something so I dont have to use an alias?
Edit, another small question: because of SEO (search engine optimization) reasons I do not ink directly to the partner Using the url. Instead, I first link to my link page which uses a GET to get the appropriate URL.
For instance: example.com/linkpage.php?id=XX is used where the link page uses this XX to get the URL and redirect.
What I would like to know is if it is possible to use the "row number" instead of the ID beacause it is theoretically possible that different partners use the same id. Or, should I make a new colum where I auto increment a unique number and use that? Just wondering what would be most efficient to do..
[edited by: Joppiesaus at 10:59 am (utc) on Feb. 3, 2009]
| 10:57 am on Feb 3, 2009 (gmt 0)|
Hmmm... I never tried this (it's just a matter of personal preference), but I suppose that should work as well.
So you certainly can try that! :-)
| 1:51 pm on Feb 3, 2009 (gmt 0)|
One final question,
I thought up the following because this would be the easiest thing for me to do: create 3 tables.
1. product (product.productid, rest of the product info)
2. link (link.id, link.productid, link.partnerid, link.row)
3. partner (partner.partnerid, partner.row, rest of the partner info)
This way I can more easilly manage the links by just keeping the link table updated. However, I need to make a left join on three tables. Is this:
2. efficient enough to work with large amounts of data?
[edited by: Joppiesaus at 1:52 pm (utc) on Feb. 3, 2009]
| 2:07 pm on Feb 3, 2009 (gmt 0)|
I'd personally inserted link either into partners table (if there is a link structure that can be used for all partner's products), or directly into products table.
However, keeping links in a separate table should not do any harm.
I personally used to have left joins on up to 5 tables with millons of products without performance degradation.
However, you have to be sure your tables contain either primary or secondary indexes for all fields used in 'on' statement (left join <table> on(some_id).
| 2:41 pm on Feb 3, 2009 (gmt 0)|
thanks for your help. Because of practical everyday handling it is really easier for me to manage three tables, so I got that working:
FROM (product LEFT JOIN link ON product.id = link.linkid) LEFT JOIN partner ON link.linkproductid = partner.productid
WHERE product.id = colname AND link.linkid IS NOT NULL
ORDER BY partner.productprijs ASC
What would finally really help me is a way to automate partner.shopname and partner.row.
Let me explain; first of all the information provided from partners does not contain one unique field. Mostly it does (partner.productid), but it also happends that different colors get the same partner.productid. So I thought fixing this with a kolom called row. Because this is such a pain to do I thought that a combination of my product.id and their partner.productid would be extremely effective. Is it possible to automate a kolom in the link and partner table that combines these two variables?
For instance product.id = 1000, partner.productid=12356, then the link.linkproductid and partner.uniqueid become 100012356
Secondly partners have no field in their XML sheets that contain their shopnames on each row. What would be the easiest way to incorporate this? Just copy it on every row in the respective partner table?
| 10:15 pm on Feb 3, 2009 (gmt 0)|
Hi, sorry for the last post, now that I think of it I do not need that. What I finally do ned is a way to automatically update the database using different xml feeds to which I have the URL information.
This is something i hdnt thought trough. Updating manually isnt going to work on the long run i guess. I have about 10 URL's to different XML feeds which I (think) i want to use in a php script to update the partner table. However, I really do not know where to start with this. The table looks like this:
-- Table "partner" DDL
CREATE TABLE `partner` (
`shopnaam` varchar(25) collate latin1_general_ci NOT NULL,
`id` int(15) default '0',
`productid` int(15) NOT NULL,
`naam` varchar(255) collate latin1_general_ci default NULL,
`url` varchar(1000) collate latin1_general_ci default NULL,
`productprijs` double(255,0) default NULL,
`productdescription` varchar(1000) collate latin1_general_ci default NULL,
`categorie` varchar(255) collate latin1_general_ci default NULL,
`subcategorie` varchar(255) collate latin1_general_ci default NULL,
`productmerk` varchar(255) collate latin1_general_ci default NULL,
KEY `id` (`id`,`shopnaam`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
The xml feed looks like this:
Any help in the right direction is greatly appreciated.
1. the partner.id field is not matched with the xml feed, here I need to manually fill in my own corresponding product.productid to make a match.
2. "shopnaam" has to be entered manually as well with the corresponding shopname because I see no way how else to retrieve the right shopname while linking to the page. The only way for me to know is to look inside the URL, but I dont think I can extract the name from there...
3. I dont use the image URL because I use my own picture database with my product table.
xml --- partner
<productURL> --> partner.url
<productID> --> partner.productid
<productNAME> --> partner.naam
<description> --> partner.productdescription
<productPRICE> --> partner.productprijs
<imageURL> --> none
<categorie> --> partner.categorie
<categorieSUB> --> partner.subcategorie
<productMERK> --> partner.productmerk
[edited by: Joppiesaus at 10:24 pm (utc) on Feb. 3, 2009]