Forum Moderators: open

Message Too Old, No Replies

Subquery/JOIN nightmare

Need to get through two levels of tables

         

groovenectar

3:29 pm on Jun 29, 2006 (gmt 0)

10+ Year Member



The point of this query is to display the full title of an academic program, which includes Level, Degree(s), College, Department, Major and Concentration. An example would be: Undergraduate, BSET, College of Engineering and Technology, Department of Engineering Technology, General Engineering Technology major, Electromechanical Systems concentration.

If it were as simple and consistent as it should be, then of course it would be no problem. Things can never be consistent though, right? I've got a table for each of the above-mentioned elements (besides concentrations which are always different), and they all come together in a lookup table called "programs":

id(int) ¦ program_levels_id(int) ¦ degrees_id(int) ¦ colleges_id(int) ¦ departments_id(int) ¦ majors_id(int) ¦ concentration(varchar)

The major headache is that for display purposes, some programs need to be combined by degree because the coursework is so similar. As an example, the graduate ME in Engineering Management is similar enough to the graduate MS in Engineering Management for them to list it as a MS/ME in Engineering Management. There are quite a few cases where this happens.

Each program is technically different, so that's one of a few reasons that they each need their own row in the programs table. I have decided to create another table that will serve to combine degrees if necessary, and store the data about each program (or combination of programs) such as a description. Here is what the program_data table looks like right now:

id(int) ¦ programs_id(int) ¦ programs_id2(int) ¦ description, etc.

So here's the (rather messy) plan for generating a list of programs:

First it will have to tap into the program_data table. It reads the programs_id and can then use that to extract the level, degree (we'll come back to that), college, department, major and concentration for the particular program.

Now, if programs_id2 is set, that indicates that this row needs to to tie in a second program and get the degree. E.g. - When it comes across the programs_id for MS in Engineering Management it is going find that programs_id2 is set. It looks up the value in programs_id2 in the programs table and finds the ME in Engineering Management. All it really needs to know is the "ME" part, and the final result should return as "MS/ME in Engineering Management).

Somehow I need to tap into the degree abbreviation from the degrees table *through* the programs table, *through* the program_data table.

Here is a query that ignores the programs_id2 column, so it returns all of the data except that:

SELECT program_data.id,
program_levels.name AS program_level,
degrees.abbr AS degree,
colleges.name AS college,
departments.name AS department,
majors.name AS major,
programs.concentration
FROM programs LEFT OUTER JOIN departments ON programs.departments_id = departments.id
LEFT OUTER JOIN degrees ON programs.degrees_id = degrees.id,
program_data,
program_levels,
colleges,
majors
WHERE programs.program_levels_id = program_levels.id AND
programs.colleges_id = colleges.id AND
programs.majors_id = majors.id AND
program_data.programs_id = programs.id

It seems to me that the only way to do this would be to use a subquery where degree.abbr is right now. I tried that, but could not get the subquery down to the necessary single row!

aspdaddy

6:18 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not sure I understand the problem fully, but I know enough about the system anyway.

1. Create a view called vwQualTitles thats joins all the tables to contain all the possible fields needed. Use inner joins unless the fields allow null, then use outer right joins with the main table on the left.

2.Make one composite field called "Title" thats suits the 80% rule

2. Include any fields in the view that will help with the decision making about exact title.

3. Use code or presentation logic (not SQL) to solve the non-database problem of displaying the actual title.

If this doesnt fix it, the database probably needs redesigning, or do your subquery thing.

HTH

john_k

6:34 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I guess I would add a ProgramGroup table that would contain the description. Then make each program row subordinate to a row in ProgramGroup. Most ProgramGroup rows would only have one program linked to them. Then join through the ProgramGroup table to get your program label of "MS/ME", etc.

You could initially populate the ProgramGroup description column by running a query to pull it from the program description column.

This approach has another benefit in that it can combine more than two programs without further modifications.

groovenectar

6:56 pm on Jun 29, 2006 (gmt 0)

10+ Year Member



Thanks for your help! This looks interesting, and I am now doing some research about some of these concepts you have mentioned. VIEW looks interesting; I'd never worked with that, and I'm trying to find what the 80% rule is. I'll be sure to post again if this does the trick!

I've actually re-written the question and stripped out all of the unnecessary stuff, so I might as well post it. As far as I can tell, this forum doesn't have a way to preserve spacing, so I used periods. Hopefully it will make more sense:

Let's say you have three tables - colors, items, and item_display:

colors:


+----+---------+
¦ id ¦ name . .¦
+----+---------+
¦ 1 .¦ red . . ¦
¦ 2 .¦ green . ¦
¦ 3 .¦ blue . .¦
+----+---------+

items:


+----+-----------+-------+
¦ id ¦ colors_id ¦ name .¦
+----+-----------+-------+
¦ 1 .¦ 2 . . . . ¦ grass ¦
¦ 2 .¦ 1 . . . . ¦ apple ¦
¦ 3 .¦ 2 . . . . ¦ apple ¦
¦ 4 .¦ 3 . . . . ¦ sky . ¦
¦ 5 .¦ . . . . . ¦ glass ¦
+----+-----------+-------+

item_display:


+----+---------+----------+-----------------------------------+
¦ id ¦ item_id ¦ item_id2 ¦ description . . . . . . . . . . . ¦
+----+---------+----------+-----------------------------------+
¦ 1 .¦ 1 . . . ¦ . . . . .¦ The grass is green . . . . . . . .¦
¦ 2 .¦ 2 . . . ¦ 3 . . . .¦ Both apples, but different colors ¦
¦ 3 .¦ 4 . . . ¦ . . . . .¦ Sky is blue during the day . . . .¦
¦ 4 .¦ 5 . . . ¦ . . . . .¦ We'll say the glass is colorless! ¦
+----+---------+----------+-----------------------------------+

.SQL file here: <url removed>

Here is the desired output from the final query:


+----+--------+--------+-------+-----------------------------------+
¦ id ¦ color1 ¦ color2 ¦ name .¦ description . . . . . . . . . . . ¦
+----+--------+--------+-------+-----------------------------------+
¦ 1 .¦ green .¦ . . . .¦ grass ¦ The grass is green . . . . . . . .¦
¦ 2 .¦ red . .¦ green .¦ apple ¦ Both apples, but different colors ¦
¦ 3 .¦ blue . ¦ . . . .¦ sky . ¦ Sky is blue during the day . . . .¦
¦ 4 .¦ . . . .¦ . . . .¦ glass ¦ We'll say the glass is colorless! ¦
+----+--------+--------+-------+-----------------------------------+

The fun stuff:

Since some items (e.g. - glass) will not have any color, there will need to be an outer join between colors and items. The following query takes care of that, but does not extract the second color for records that have a second color (e.g. - apples):


SELECT item_display.id,
. .. . colors.name AS color1,
. .. . items.name,
. .. . item_display.description
FROM item_display,
. .. items LEFT OUTER JOIN colors ON items.colors_id = colors.id
WHERE item_display.item_id = items.id;
+----+--------+-------+-----------------------------------+
¦ id ¦ color1 ¦ name .¦ description . . . . . . . . . . . ¦
+----+--------+-------+-----------------------------------+
¦ 1 .¦ green .¦ grass ¦ The grass is green . . . . . . . .¦
¦ 2 .¦ red . .¦ apple ¦ Both apples, but different colors ¦
¦ 3 .¦ blue . ¦ sky . ¦ Sky is blue during the day . . . .¦
¦ 4 .¦ . . . .¦ glass ¦ We'll say the glass is colorless! ¦
+----+--------+-------+-----------------------------------+

I think that extracting the second color will require a subquery. Here is an example of what *looks* right to me, but does not work:


SELECT item_display.id,
. .. . colors.name AS color1,
. . . (SELECT colors.name
. .. . FROM colors,
. . . . . . items,
. . . . . . item_display
. .. . WHERE items.colors_id = colors.id AND
. .. . . . . item_display.item_id = items.id) AS color2,
. .. . items.name,
. .. . item_display.description
FROM item_display,
. .. items LEFT OUTER JOIN colors ON items.colors_id = colors.id
WHERE item_display.item_id = items.id;

The reason it doesn't work is because the "Subquery returns more than 1 row" (to quote the error). Sooo, I think the only way to get it to work would be to somehow "tell" the subquery what the current item.id is for the parent query.

Hopefully this whole thing makes sense, but more examples and information can be provided. I think the example tables are set up the way the need to be, but I am certainly open to structural suggestions. The item_display table is purposefully linking items together rather than directly linking colors. For the sake of this example, a "red apple" is totally different from a "green apple," but they are similar enough to warrant a link when displaying all of the items along with descriptions.

[edited by: encyclo at 1:11 am (utc) on Oct. 14, 2007]

groovenectar

7:06 pm on Jun 29, 2006 (gmt 0)

10+ Year Member



Thanks john_k... I saw your reply right after I submitted my last one..

I wondering if that approach is similar to using the program_data table? As far as having columns to store program ids, and a column for the description? In the stripped-down example, would it be similar to the item_display table?

--
Also, as a side note to the stripped-down example.. "colors" represent degrees, "items" represents programs, and "item_diplay" represents program_data. It's too bad about the spacing issues!

syber

7:09 pm on Jun 29, 2006 (gmt 0)

10+ Year Member



You need to use MAX in the subquery to limit it to one row.

You also should probably add to the WHERE clause of the subquery a statement that prevents the same color from being shown twice (color1 & color2).

SELECT item_display.id,
. .. . colors.name AS color1,
. . . (SELECT MAX(colors.name)
. .. . FROM colors,
. . . . . . items,
. . . . . . item_display
. .. . WHERE items.colors_id = colors.id AND
. .. . . . . item_display.item_id = items.id) AS color2,
. .. . items.name,
. .. . item_display.description
FROM item_display,
. .. items LEFT OUTER JOIN colors ON items.colors_id = colors.id
WHERE item_display.item_id = items.id;

groovenectar

7:27 pm on Jun 29, 2006 (gmt 0)

10+ Year Member



syber... I'd say that's a step in the right direction, since it does limit the subquery to returning one row.. Buuut, the result of the query would be:

+----+--------+--------+-------+-----------------------------------+
¦ id ¦ color1 ¦ color2 ¦ name .¦ description . . . . . . . . . . . ¦
+----+--------+--------+-------+-----------------------------------+
¦ 1 .¦ green .¦ red . .¦ grass ¦ The grass is green . . . . . . . .¦
¦ 2 .¦ red . .¦ red . .¦ apple ¦ Both apples, but different colors ¦
¦ 3 .¦ blue . ¦ red . .¦ sky . ¦ Sky is blue during the day . . . .¦
¦ 4 .¦ . . . .¦ red . .¦ glass ¦ We'll say the glass is colorless! ¦
+----+--------+--------+-------+-----------------------------------+

So I think that it would always display the same color (red for MAX, and blue for MIN)..

Here is what the desired resultset would look like:


+----+--------+--------+-------+-----------------------------------+
¦ id ¦ color1 ¦ color2 ¦ name .¦ description . . . . . . . . . . . ¦
+----+--------+--------+-------+-----------------------------------+
¦ 1 .¦ green .¦ . . . .¦ grass ¦ The grass is green . . . . . . . .¦
¦ 2 .¦ red . .¦ green .¦ apple ¦ Both apples, but different colors ¦
¦ 3 .¦ blue . ¦ . . . .¦ sky . ¦ Sky is blue during the day . . . .¦
¦ 4 .¦ . . . .¦ . . . .¦ glass ¦ We'll say the glass is colorless! ¦
+----+--------+--------+-------+-----------------------------------+

It's kinda like a brain teaser! I know it was fun for me until I got to my wit's end :-]

john_k

8:04 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks john_k... I saw your reply right after I submitted my last one..
I wondering if that approach is similar to using the program_data table? As far as having columns to store program ids, and a column for the description? In the stripped-down example, would it be similar to the item_display table?

The tables may look similar, but definitely not the same. The program_data table pre-supposes that you will only ever have two programs to manage together. If you ever encounter a situation requiring three programs, then you will need to make changes to your table (add another id column) and also to your query that pulls the data. If there are any other queries running against the table, they'll need to get updated also.

With the ProgramGroup idea, there is no cap on the number that can be grouped together.

The program_data table is denormalizing your data and you will find that some queries will become difficult because the program id values are in the same row. For instance, if you want to query the program_data table to see how often a specific program_id is used, you will need to do a count on each of the two foreign key columns and then add them. That is a trivial example, but if you ever have to build in logic based on whether or not a program_id is in use in that table, it will become an extra headache and probably degrade the query performance.

The grouping concept is a standard way to handle parent-child type relationships. There are a lot of examples on the internet.

syber

9:11 pm on Jun 29, 2006 (gmt 0)

10+ Year Member



You also should probably add to the WHERE clause of the subquery a statement that prevents the same color from being shown twice (color1 & color2).

SELECT item_display.id,
. .. . colors.name AS color1,
. . . (SELECT MAX(c2.name)
. .. . FROM colors c2,
. . . . . . items,
. . . . . . item_display
. .. . WHERE items.colors_id = c2.id AND
. .. . . . . item_display.item_id = items.id AND
. . . . . . .c2.name <> colors.name) AS color2,
. .. . items.name,
. .. . item_display.description
FROM item_display,
. .. items LEFT OUTER JOIN colors ON items.colors_id = colors.id
WHERE item_display.item_id = items.id;

aspdaddy

9:29 pm on Jun 29, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



and I'm trying to find what the 80% rule is
Sorry! All I meant was that usually with stuff like this, programmers introduce unnecessary compleixty to cope with < 20% of the data. 80% usually works with simple queries, but programmers have this arrogance you know of the perfect solution :) Its not a very productive attitude in real life.

I'd just build it to work for the majority of qual titles and then handle the few minority with workarounds in the application code, its just efficient system development.

Good luck anyway!

Added.I just re-read and seems that actually you may have a field missing in the model: "MS/ME in Engineering Management." is a another field in the table called "Generic Title". Just select distinct by that field and the problem is solved.

groovenectar

3:50 pm on Jun 30, 2006 (gmt 0)

10+ Year Member



Thanks everyone for your help!

Using all these ideas.. Here are some solutions that seem to work.

For the colors one :


select i_d.id
, c1.name as color1
, c2.name as color2
, i1.name
, i_d.description
from item_display as i_d
left outer
join items as i1
on i1.id = i_d.item_id
left outer
join colors as c1
on c1.id = i1.colors_id
left outer
join items as i2
on i2.id = i_d.item_id2
left outer
join colors as c2
on c2.id = i2.colors_id

For the academics one:


SELECT pd.id,
pl.name AS program_level,
CONCAT_WS('/', d.abbr, d2.abbr) AS degree,
c.name AS college,
dept.name AS department,
m.name AS major,
p.concentration
FROM program_data as pd
LEFT OUTER JOIN programs AS p ON p.id = pd.programs_id AND pd.type = 3
LEFT OUTER JOIN degrees AS d ON d.id = p.degrees_id
LEFT OUTER JOIN programs AS p2 ON p2.id = pd.programs_id2
LEFT OUTER JOIN degrees AS d2 ON d2.id = p2.degrees_id
LEFT OUTER JOIN departments dept ON p.departments_id = dept.id
INNER JOIN program_levels AS pl ON p.program_levels_id = pl.id
INNER JOIN colleges AS c ON p.colleges_id = c.id
INNER JOIN majors AS m ON p.majors_id = m.id

I think this may actually be a pretty neat reference thread...

[edited by: encyclo at 1:10 am (utc) on Oct. 14, 2007]

groovenectar

3:52 pm on Jun 30, 2006 (gmt 0)

10+ Year Member



Also, the 80% rule is really interesting... I'm totally guilty of that, including with the issue in this thread ;-]