Forum Moderators: coopster
Can anyone help me with a little pseudo-code to recurse through a multi-dimentional array that looks something like this:
array(0 => array('id' => 1, parent => '0', title => 'Green'), 1 => array('id' => 2, 'parent' => 1, 'title' => 'Eggs'), 3 => array('id' => 3, 'parent' => 1, title => 'And'), array('id' => 4, 'parent' => '3', 'title' => 'Ham'))
I just need help with the psedo code, I'm finding it hard to grasp the concept here ane would really appreciate a helping hand to get me going ;)
I'd envisioned (for testing) a list like:
<ul>
<li>Green</li>
<ul>
<li>Eggs</li>
<li>And
<ul>
<li>Ham</li>
</ul>
</li>
</ul>
</ul>
With me?
Many thanks if anyone can see what I mean and help out! ;)
Nick
However, while the general concept of the sitepoint article is good (better than what I had in mind), IMNSHO the suggested implementation is seriously flawed! You have to rewrite the database everytime you add a new record (or with clever algorithms the best you can do is rewrite on average half of the database). If the database is small enough for that not to matter, then it is small enough to read the whole thing in, and do 2 passes (1st pass parsing it into a tree structure) instead of simulating a tree structure in the database. If the database is large, then instead of the left and right concept, I'd suggest storing a 'hierarchy_number' instead of 'left' and 'right'. Something like 1, 2, 3 for top levels, 1.1, 1.2, ..., 1.732, etc for children of 1, 1.2.1, ... 1.2.nnn for children of 1.2, etc. Then, to find all the descendents of 3, you do a select where hierarchy_number is between 3.0 and 4 (except it is a string comparison, not numeric)
Shawn
<Added> Oops, too late again. Beaten to the post.</added>
The code for inserting new elements looks something like this..
BEGIN
DECLARE right_most_sibling INTEGER;
SET right_most_sibling
= (SELECT rgt
FROM Personnel
WHERE emp = :your_boss);
UPDATE Personnel
SET lft = CASE WHEN lft > right_most_sibling
THEN lft + 2
ELSE lft END,
rgt = CASE WHEN rgt >= right_most_sibling
THEN rgt + 2
ELSE rgt END
WHERE rgt >= right_most_sibling;
INSERT INTO Personnel (emp, lft, rgt)
VALUES ('New Guy', right_most_sibling, (right_most_sibling + 1))
END;
Obviously in MySQL you might have to do some of this work in PHP, but its do-able and the performance benefits are wel worth it.
I guess you could be less precise about the numbers and use more of a binary tree approach (where not every number has a corresponding node). That would add some 'slack' that could allow you to add and remove elements without updating the entire tree.
However, it means the tree would not be perfectly balanced so would give sub-optimal performance, and it would require re-hashing from time to time.
<edit>tidied fomatting</edit>
>>>"...add some 'slack' that could allow you to add and remove elements ...However, it means the tree would not be perfectly balanced so would give sub-optimal performance, and it would require re-hashing from time to time..."
You'd have to periodically do a maintenance run to add back any slack taken up, but that wouldn't affect whether the tree was balanced or not. If you are using a tree to speed up bin sorts of ordered data then you can re-hash to balance the tree periodically. However, if you are using the tree to represent real hierarchical relationships, then the tree will be balanced if your hierarchy is balanced, and not balanced if the hierarchy is not balanced, regardless of what you use to represent 'left' and 'right'.
Anyway, while I may not like the implementation, I do like the concept, and when I need it one day I'll use it, albeit with a slight variation in implementation, so thanks!
Shawn
Well, one call to $this->list_categories() will return the array in my first post. And $this->get_children() will do exactly the same for the kids of any given category.
array(0 => array('id' => 1, parent => '0', title => 'Green'), 1 => array('id' => 2, 'parent' => 1, 'title' => 'Eggs'), 3 => array('id' => 3, 'parent' => 1, title => 'And'), array('id' => 4, 'parent' => '3', 'title' => 'Ham'))
id ¦ parent ¦ title
1 0 Green
2 1 Eggs
3 1 And
4 3 Ham
I'd like to use those prebuilt DB calls in the function in msg26.
>in your foreach.
Yep, I'm not completely stupid ;) - However, I've tried that and it just 'aint workin' If you're sure about that I'll get on with some more testing on it this morning and try to work out why it's misbehaving but perhaps my snippets above will help clarify what I'm trying to work with? ;)
Cheers!
Nick
Hmm.. I can't think of any way to improve that in terms of implementation. Whats your gripe with it Shawn? What would you like to change?
Yeah, there is definitely a hit when inserting thats not there with the adjacent rows model. But this is a write-occassionally read-often situation so it seems perfect to me.
I might do a bit of testing to see what kind of stats I can get for it. Nick: give me some figures, how deep is your tree and how many branches per sub-tree?
"...However, I've tried that and it just 'aint workin' ..."
Starting from what you had in msg#26, this is what I think should work:
function recursor($array)
{
global $f;
if(is_array($array))
{
print "<ol>"
foreach($array as $key => $val)
{
print("<li>".$val['title']."</li>\n");
if ($kids=$f->get_children($val['id']))
{
recursor($kids);
}
}
print "</ol>"
}
}
(I'd suggest you call the function something other than recursor... One day you may have a program that has more than one recursive function.)
Note, I am assuming that in addition to $this->list_categories() and $this->get_children(), you also have some function to give you an array of all the top-level ancestors. Even if there is only one, such as in your sample data, it must be an array. Also I assume that $this->get_children() returns only the direct children, not all descendents.
I must say, while this will work, I am a bit uncomfortable. You are using the recursive function to traverse the list. That traverses it once, but has the overhead of recursion. Now when your recursie function is traversing the list, it calls get_children() for each element which itself traverses the list. So you traverse the list n+1 times; hence you look at a record in your list (n-squared + n) times (Your algorithm is Order(n-squared) ). You could change it to order(n) (specifically 2 x n) by traversing it once to create a tree or sorted list, then traversing the tree or sorted list.
"...Hmm.. I can't think of any way to improve that in terms of implementation. Whats your gripe with it Shawn? What would you like to change? ..."
We're talking about the pre-ordered tree implementation, right?
I'd suggest storing a 'hierarchy_number' instead of 'left' and 'right'. Something like 1, 2, 3 for top levels, 1.1, 1.2, ..., 1.732, etc for children of 1, 1.2.1, ... 1.2.nnn for children of 1.2, etc. Then, to find all the descendents of 3, you do a select where hierarchy_number is between 3.0 and 4 (except it is a string comparison, not numeric). To find all the ancesters is not as easy, but that is not as big a deal: Say you are looking for the ancestors of 1.7.3.4, just search for hierarchy_number being one of '1', '1.7', '1.7.3', and '1.7.3.4'.
Shawn
Just briefly 'cause I gotta go for a while:
Options are:
I'd suggest number 2 for the implementing the permissions hierrarchy in your other monster post, and number 3 for this case.
Shawn
I would just check that any 3rd party class you use is a balanced tree by looking for a rotate function when an insert is done. Another good feature is a parent pointer, which speeds thing up when you need to traverse upwards.
If the tree is balanced, the performance is known to be log n.
I'd agree if the tree is being used within a search algorithm. The look-up time for an arbitrary element is log n. However, in this application, you need to traverse the whole tree, starting from the top. There is "zero" look-up time (you aren't searching for a child, just looking at each child in turn). You need to look at each element, so traversing the tree is order n.
Note: make sure that in the routine to populate the tree, you keep an index of the nodes you populate. If you have to search for an element every time you do an insert you'll turn it into order(n log(n)) instead of order(n).
>> If the tree is balanced
In this application, whether the tree is balanced or not depends on whether the hierarchy being represented is balanced or not.
...or have I missed your point?
Shawn
Thats is a good point, and not a reason why trees are generally used, in my experience.
If what nick is trying to achieve is to display the hierachy from top to bottom, not insert, delete or find specific nodes (which can be done very easily with SQL anyway) then maybe a tree is not the way to go?
Trees are used to do exactly this. Classic textbook 'compiler construction' problem. [google.com...]
But hey, it just might be that is the way I am thinking about it. There are other solutions. There are two perfectly good ones posted a few posts above. One is order(n-squared) instead of order(n), and the other requires the database to be pre-ordered and is a pain for insertion, but either is probably adequate.
Not kidding when I say 'classic text-book problem'. See:
[citeseer.nj.nec.com...]
[citeseer.nj.nec.com...]
[google.com...]
Shawn
Added: Just realised I could have used examples closer to home for us at WebmasterWorld. Have a look at how the internals of the perl HTML:: libraries are implemented (e.g. [search.cpan.org...] ). Browsers parse html files and turn them into tree structures represented by the DOM
Most of what is needed will be done in SQL, you wont be inserting,deleting or searching the tree. If its just to display the hierachy, you have all the overhead of building a tree, for just one query?
I think you can display the hierachy from a simple array, with a recursive function getPath and if you keep track of the depth, you can do the indenting or whatever to show the hierachy visually.
I dont know the PHP, but something like this in C, I think:) (untested!) should do it:
class cat
{
public:
string name;
int id;
int parent_id;
};int depth=0;
int temp=0;
void showPath ( int id, vector<cat> data)
{
depth++;
for (vector<cat>::iterator i=data.begin(); i!= data.end(); ++i)
{
if (i->parent_id == id )
{
cout << i->name << "(" << depth << ")\n";
temp = i->id;
showPath (temp, data);
}
}
depth--;
}
void display(int start, vector<cat> & data)
{
for (vector<cat>::iterator i=data.begin(); i!= data.end(); ++i)
{
if ( i->parent_id == start)
{
cout << i->name << "(" << depth << ")\n";
showPath( data[start].id, data );
}
}
}
Added: I just dont see the connection with those examples and this problem, they seem to me, to use trees for search performance and dynamic size.
Maybe you are right... Might be overkill. Depends on how comfortable you are with one way of doing it vs another.
I think the problems are pretty similar. In the compiler case you are sequentially traversing a flat file (source code) to produce a symbol table and code which understands the hierarchy (variable scope, etc) of the code. In the DB case you are sequentially traversing a list of records to produce a table representing the hierarchy of the DB.
The other two methods to do it shown in previous posts are:
>>>"...just don't see the connection with those examples and this problem, they seem to me, to use trees for search performance and dynamic size..."
Exactly. Trying to get a program of order (n squared) down to Order(n). In a complex programming language, the best that may be possible is O(n log n) instead of O(n), but in this case I think O(n) is possible. If someone isn't comfortable with traversing trees, or they don't have a tree class handy and have to build it from scratch, then it might be overkill. Heck, Nick has probably already built this thing ;)
Great discussion, great talent involved in the discussion. We've had some superior quality threads the last few days ;)
Thanks everyone, but don't let my departure stop you ;-)
Nick
Looking at it like that, it is very similar. But for performance, a compiler handles a lot of (millions?) elements & speed is the bottom line.
Complexity is a good measure of performance for that, but I wonder whats the practical trade-off between best case, say log n, and worst case, say expotential n , algorithms, with nicks data?
I dont know the answer, but possibly its not as much of an issue as we think.
>how comfortable you are with one way of doing it vs another
Personally, I will go with the simplist solution :) , experience tells me that one day I'm gonna have to come back to it and change something, or extend it and the simpler the code the better. How many specs never change :)
If the development time is a big issue, then even if its not an optimal solution, processors are cheaper than programming.
Obviously, if I was writing a compiler, I would have different objectives :)
btw - I think you mistunderstood me above, what I was saying is that if I were looking at obtaining a 3rd party tree, I would expect log n, and how I would check for it (rotate).
>a tree is maybe not the way to go
Meaning your array solutions solve it okay, without needing to traverse it and create a tree, just to bring it down to log n.