Forum Moderators: coopster & phranque

Message Too Old, No Replies

MySQL JOIN statement.

help needed, please

         

ikbenhet1

10:50 am on Jun 16, 2003 (gmt 0)

10+ Year Member



How can i JOIN the fields title and content from table crawl_database to the results of the query below, but only where linkout = crawl_database.url?
Thanks.

I have already tried to modify existing examples, but i'm getting error messages in phpmyadmin when i modify them and query them.

$sql = 'SELECT COUNT( * ) AS repetitions, `linkout` ';
$sql .= 'FROM `link_database` ';
$sql .= 'WHERE `anchor` ';
$sql .= 'LIKE "%'.trim($search);
$sql .= '%" GROUP BY linkout ';
$sql .= 'ORDER BY repetitions DESC LIMIT '.$start.' , '.$end;
$x = mysql_query( $sql );

grahamstewart

11:33 am on Jun 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I think you want something like this...

$sql = 'SELECT COUNT( * ) AS repetitions, `linkout`, title, content ';
$sql .= 'FROM `link_database`,crawl_database ';
$sql .= 'WHERE `anchor` ';
$sql .= 'LIKE "%'.trim($search).'%" ';
$sql .= 'AND linkout = url ';
$sql .= 'GROUP BY linkout ';
$sql .= 'ORDER BY repetitions DESC LIMIT '.$start.' , '.$end;

ikbenhet1

1:42 pm on Jun 16, 2003 (gmt 0)

10+ Year Member




I've tried the exact code in phpmyadmin.
It replied: Column: 'url' in where clause is ambiguous

Then i changed url into "http://www.example.com"
I get a "Connection lost error" few .
A few times i got a 404 page.

Any ideas?

I could also achief this by querying to mysql for each url to get the description and title of it, but that would be 10 queries extra, very inefficient or not?

BCMG_Scott

1:53 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



The reason you got the ambiguous error is because the field url exists in both tables.

Try this:

$sql = "SELECT COUNT(link.*) as repetitions, link.linkout, link.title, link.content ";
$sql .= "FROM link_database link ,crawl_database crawl ";
$sql .= "WHERE link.anchor LIKE '%".trim($search)."%' ";
$sql .= "AND link.linkout = crawl.url ";
$sql .= "GROUP BY link.linkout ";
$sql .= "ORDER BY link.repetitions DESC LIMIT ".$start." , ".$end";

Scott Geiger

ikbenhet1

2:41 pm on Jun 16, 2003 (gmt 0)

10+ Year Member




No luck. I get an error message.

You have an error in your SQL syntax near '* ) AS repetitions, link.linkout, link.title, link.content
FROM link_database ' at line 1

jatar_k

3:24 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



why the count? Why not select and then use mysql_num_rows to count rows?

ikbenhet1

4:18 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



Because there are duplicates values in link_out. Those are counted, all duplicates are removed (made to 1 record) and sorted desceningly by most number of duplicates up.

So basicly my simple ranking process is now finished.

All i need to do now is to join 'title' & 'content' from crawl_database into the result to i can echo then on the screen.

ikbenhet1

5:20 pm on Jun 16, 2003 (gmt 0)

10+ Year Member




Let me ask another question, if this is a query that can't be 'solved' easy.

Is is ok to call mysql_query() 10 times or 15 times per page?
Or would that slow down the page too much?

jatar_k

5:29 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



10 or 15 times

that would probably depend on how much data is being returned and how many people are hitting the page.

You could run a few tests and see.

ikbenhet1

6:34 pm on Jun 16, 2003 (gmt 0)

10+ Year Member




I've tested a few searches. It takes 12 second with me as only user before any results are returned. This is too slow.
The sql query in my first post works really fast.
that query returns.

repetitions ¦ linkout ¦
---------------------------------
¦ 57 ¦ www.example.com ¦
¦ 47 ¦ www.example.com/1 ¦

It only needs an extra join from the other table to get the title en content through the url.

The other table:
TABLE crawl_database

¦URL ¦TITLE¦ CONTENT¦ DESCRIPTION ¦ etc
--------------------------------------------------
¦www.example.com/ ¦ x ¦ x ¦ x ¦ x
¦www.example.com/1 ¦ x ¦ x ¦ x ¦ x

So the output would be:

repetitions ¦ linkout ¦ TITLE ¦ CONTENT ¦
--------------------------------------¦---------¦
57 ¦ www.example.com ¦ x ¦ x ¦

It's so much harder then i thought, i assumed 1 join could do the job.

jatar_k

6:53 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



have you tried just removing link. from the count and leaving it as * in the query BCMG_Scott posted?

ikbenhet1

7:06 pm on Jun 16, 2003 (gmt 0)

10+ Year Member




yes i did, i got this error message:

Unknown column 'link.title' in 'field list'

jatar_k

7:18 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



$sql = "SELECT COUNT(*) as repetitions, link.linkout, crawl.title, crawl.content ";
$sql .= "FROM link_database link ,crawl_database crawl ";
$sql .= "WHERE link.anchor LIKE '%".trim($search)."%' ";
$sql .= "AND link.linkout = crawl.url ";
$sql .= "GROUP BY link.linkout ";
$sql .= "ORDER BY link.repetitions DESC LIMIT ".$start." , ".$end";

that work? I am getting a little confused, hehe.

ikbenhet1

7:27 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



Yes. Mysql replies: unknown column 'link.repetitions'.
So i changed that to repetitions, and submitted the query, which is accepted. But now the computer has taken a lunch break, after i hit go it's just sitting there dooing nothing it seems. (i even used limit 0, 1)

jatar_k

7:31 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



sorry, i missed that reference at the end, beware cut and paste. ;)

looks like it doesnt really work then, the logic is broken somewhere.

jatar_k

7:42 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



what about a join

$sql = "SELECT COUNT( * ) AS repetitions, link.linkout, crawl.title, crawl.content";
$sql .= "FROM link_database link left join crawl_database crawl on link.linkout = crawl.url ";
$sql .= "WHERE link.anchor ";
$sql .= "LIKE '%".trim($search);
$sql .= "%' GROUP BY link.linkout ";
$sql .= "ORDER BY repetitions DESC LIMIT ".$start." , ".$end;

I'm so confused I may have done that wrong. :)

ikbenhet1

8:35 pm on Jun 16, 2003 (gmt 0)

10+ Year Member



I was taking a fresh nose. I tried the join code, it get's accepted by phpmyadmin, but this time i got a error saying too many connections.
Like it is trying yo say, your queries from today are not processed yet, you still have never ending mysql queries running.

I'm gonna leave it for today, i don't see how i can finish this by tonight, i'm confused. Tomorrow another day, another try.
Thanks for the input.

grahamstewart

11:31 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hang on - we almost had it with the one before the join. It was just complaining because there isn't a column called link.repetitions - its just an alias for count(*)

So try this..


$sql = "SELECT COUNT(*) as repetitions, link.linkout, crawl.title, crawl.content ";
$sql .= "FROM link_database link ,crawl_database crawl ";
$sql .= "WHERE link.anchor LIKE '%".trim($search)."%' ";
$sql .= "AND link.linkout = crawl.url ";
$sql .= "GROUP BY link.linkout ";
$sql .= "ORDER BY repetitions DESC LIMIT ".$start." , ".$end";

Sorry, I shouldn't have let you dangling but had to sleep :)

jatar_k

11:38 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



So i changed that to repetitions, and submitted the query, which is accepted. But now the computer has taken a lunch break, after i hit go it's just sitting there dooing nothing it seems. (i even used limit 0, 1)

I believe that is what ikbenhet1 tried that hung.

grahamstewart

11:55 pm on Jun 16, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oops sorry - missed that bit.

Hmmm... that statement looks fine to me - not sure why its hanging.
Does it work without the count(*) bit in it?

Its worth pointing out that its never going to be a very fast search, because its using a double wildcard text search (e.g. %search%) and its using text strings as keys for the join.

ikbenhet1

12:16 am on Jun 17, 2003 (gmt 0)

10+ Year Member



Do you think it would choke in a 100 mb database?

Should i empty my database and then run the query to see if it does or does not work?

<add> i don't think it should choke, because i tested it when it was calling 10x mysql_query to retrieve title and description and it was finished within 12 seconds.

ikbenhet1

9:23 am on Jun 17, 2003 (gmt 0)

10+ Year Member




Would it be easier if we used 2 select statements instead of 1 . We could use the mysql query in post 1 as the first query. and then next query should be like:

SELECT 'title' , 'description' from crawl_database where url= array("url1","url2")

So instead of 10 query calls per page, i would have 2 queries to call, which would be ok.

ps. My Mysql version is : 3.23.56 Can that be the problem?

BCMG_Scott

2:08 pm on Jun 17, 2003 (gmt 0)

10+ Year Member



Ok I did some playing. I created two tables as follows:

CREATE TABLE linkout ( url varchar(50) );
CREATE TABLE crawl (url varchar(50), title varchar(50), content text);

I then inserted as follows:

INSERT INTO crawl VALUES ('www.one.com','ONE','blah');
INSERT INTO crawl VALUES ('www.two.com','TWO','blah blah');
INSERT INTO crawl VALUES ('www.three.com','THREE','blah blah blah');

INSERT INTO linkout VALUES ('www.one.com');
INSERT INTO linkout VALUES ('www.one.com');
INSERT INTO linkout VALUES ('www.one.com');
INSERT INTO linkout VALUES ('www.two.com');
INSERT INTO linkout VALUES ('www.three.com');
INSERT INTO linkout VALUES ('www.one.com');
INSERT INTO linkout VALUES ('www.three.com');
INSERT INTO linkout VALUES ('www.two.com');
INSERT INTO linkout VALUES ('www.one.com');
INSERT INTO linkout VALUES ('www.two.com');

Ok, now I did the following (results follow):

select count(*) as rep, b.url, b.title, b.content
from linkout a, crawl b
where b.url = a.url
group by b.url;

+-----+---------------+-------+----------------+
¦ rep ¦ url ¦ title ¦ content ¦
+-----+---------------+-------+----------------+
¦ 5 ¦ www.one.com ¦ ONE ¦ blah ¦
¦ 2 ¦ www.three.com ¦ THREE ¦ blah blah blah ¦
¦ 3 ¦ www.two.com ¦ TWO ¦ blah blah ¦
+-----+---------------+-------+----------------+
3 rows in set (0.00 sec)

Does this get us closer to what you are looking for? Are the tables as I have constructed them similar to yours?

Scott

ikbenhet1

5:08 pm on Jun 17, 2003 (gmt 0)

10+ Year Member



i'm amazed. It works with you. which version of mysql do you use?

I discovered the "process list" in phpmyadmin, and it had 5 select queries running all the time and a few more in the row waiting for idle time i think. I killed the selects, and the list was empty after that.

I renamed my 2 table without underscore. That didn't help either.
add> i just tried your query with no other processes, and the process seems to just "hang" in the operations screen.

BCMG_Scott

6:49 pm on Jun 17, 2003 (gmt 0)

10+ Year Member



If you have processes hanging then you might need to stop and start mysql. It could be that something has happened to the main processes and it is spawning off child processess that are already in a "bad" state. I would start with that.

Next, sometimes phpmyadmin can be funky - what I did was on the command line mysql tool. Try it there to rule out phpmyadmin weirdness. BTW, what version of pma are you running?

As for version, I am running 4.0.12, however, I have done similar queries on 3.23.56 (pretty sure I have anyway).

Scott

ikbenhet1

7:46 pm on Jun 17, 2003 (gmt 0)

10+ Year Member



I have done all queries in phpmyadmin so far.
i think i found out the problem.

phpmyadmin is showing this on the bottom:

Error
The additional Features for working with linked Tables have been deactivated. To find out why click here.
(when i click this comes up:)
PMA Database ... not OK[ Documentation ]
General relation features Disabled
So i've read documentation, it sais to set cfg['Servers'][$i]['pmadb'] string to the name of the database.

So i did that, but now it returns this:

$cfg['Servers'][$i]['pmadb'] ... OK
$cfg['Servers'][$i]['relation'] ... not OK [ Documentation ]
General relation features: Disabled
$cfg['Servers'][$i]['table_info'] ... not OK [ Documentation ]
Display Features: Disabled
$cfg['Servers'][$i]['table_coords'] ... not OK [ Documentation ]
$cfg['Servers'][$i]['pdf_pages'] ... not OK [ Documentation ]
Creation of PDFs: Disabled
$cfg['Servers'][$i]['column_info'] ... not OK [ Documentation ]
Displaying Column Comments: Disabled
Bookmarked SQL-query: Disabled
MIME ... not OK [ Documentation ]
$cfg['Servers'][$i]['history'] ... not OK [ Documentation ]
SQL-history: Disabled

So what is trying to say?

I think your queries must be ok if they work on your side.
So my phpmyadmin is not properly working?

I'm gooing to try to run it directly from a .php file, to see if that does work.

<added> I've tried it, and it does work! Thank you very much!

BCMG_Scott

3:11 pm on Jun 18, 2003 (gmt 0)

10+ Year Member



Hmmm - well I am glad it worked, but I am not sure why those errors in pma would cause it not to work. I tend to think that they are unrelated. Those errors are "optional" components of pma and should not have an impact on general SQL queries. For that very reason I tend to favor using the command line. pma is nice for somethings, but it can get in the way.

The docs should also give you the create table syntax to create those tables for the pma optional components. You'll not only need to have the database, but you will also need to create the required tables. I have created and re-created pma_table_coords and I still get that not Ok error - haven't dug into it to figure out why (not a priority).

Scott