Forum Moderators: coopster

Message Too Old, No Replies

Extracting value in between specific tag

Extracting value in between tags of a string

         

kenchix1

10:03 am on Mar 4, 2009 (gmt 0)

10+ Year Member



I have a MySQL database where people submit their messages. Part of those messages are images which are enclosed in html tags. What I would to do is to extract ONLY the value of those that are inside the < img > tags.

What I did was I exploded the whole string into an array and loop through it and check if the chunk starts with "<img>" and ends with "</img>". But this process is so slow specially if a user put a lot of remarks on their post.

Is their a better and faster way how to do it ?

Thanks.

kenchix1

10:11 am on Mar 4, 2009 (gmt 0)

10+ Year Member



Edit:

it's not <img> but rather [img].

coopster

7:03 pm on Mar 4, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Are you asking how to locate rows with [img] codes in the column value stored? Or how to extract after selecting and reading through the returned column value?

kenchix1

2:04 am on Mar 5, 2009 (gmt 0)

10+ Year Member



I am asking how to extract all values inside [img] tags after reading through the returned column value.

such as if I have 20 rows and there are 50 [img] tags in it, then I should get all the 50 values inside those [img] tags

Thanks. :)

coopster

1:14 pm on Mar 5, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



OK. So you are retrieving rows from the table no matter if they have the [img] pattern in them or not. Then you loop through the result set and analyze a certain column's value for the pattern.

It sounds as if you are appending each column to a string value first, then analyzing the string. Perhaps it would be faster to analyze each column as you process it within the loop rather than append it to one very large string. Pseudocode ...

$imgs = array(); // initialize 
$pattern = <your pattern here, with subpattern capturing parenthesis>;
while (looping through sql results) {
if preg_match_all($pattern, $row['columnWithPossibleImgs'], $matches) {
foreach ($matches[1] as $m) { // $matches[1] represents subpattern captured
$imgs[] = $m;
}
}
}

kenchix1

5:58 am on Mar 7, 2009 (gmt 0)

10+ Year Member



I think that Pseudo would really help a lot, but I didn't use the preg_match_all because I don't know how to create a pattern for "[img]" and "[/img]" that's why I end up using substr - the reason why I exploded the string into an array. I tried so maaany times creating pattern through samples at php site but still can't get it to work. I just loop at the array and look for "[img]" and "[img]".

The preg_match_all pattern will be a great help.

Thanks for the help.

coopster

7:43 pm on Mar 7, 2009 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



$pattern = "/\[img]([^\[]+)\[\/img]/";

kenchix1

2:20 am on Mar 11, 2009 (gmt 0)

10+ Year Member



Thank you ! :) I'll modify the code and remove the exploding of string portion and looping then change it with preg_match with your pattern.

Many thanks again sir.

kenchix1

3:58 pm on Mar 12, 2009 (gmt 0)

10+ Year Member



<?php
include "config.php";

$conn = mysql_connect("$hostname", "$dbUser","$dbUserPw") or die(mysql_error());
mysql_select_db("$dbName", $conn) or die(mysql_error());

$qry="SELECT body FROM smf_messages WHERE id_board=1100";
$ubound=0;

$result=mysql_query($qry,$conn);
echo mysql_num_rows($result);
$imgs = array(); // initialize
$pattern = "/\[img]([^\[]+)\[\/img]/";
while (list($body)=mysql_fetch_row($result)) {
if (preg_match_all($pattern, $body, $matches)) {
foreach ($matches[1] as $m) { // $matches[1] represents subpattern captured
$imgs[$ubound] = $m;
$ubound++;
}
}
}

while ($cnt!=$ubound)
{
$mtmp2=createThumb($imgs[$cnt]) // This line creates a local thumbnail and save the filename in the database
$cnt++;
}
mysql_close($conn);

?>

I have to run the script regularly using cron to scan the board #1100 (SMF) then save the thumbnails and the location of the images in the database. My problem with this code is I don't know if the image URL is still alive and the resources that it will eat on the server once the number of images posted get large.

My purpose in writing this code is to scan board #1100 where users post their images. I inserted a code in the boardindex that display a few thumbnails that links on its original location everytime the page loads .

Any suggestions to improve it ?

Thanks again.

BradleyT

7:01 pm on Mar 13, 2009 (gmt 0)

10+ Year Member



You're pretty much writing a webbot or spider.

The book Webbots, Spiders, and Screen Scrapers will tell you pretty much exactly what you need to do. It has a specific spider that is used to harvest images too. I haven't used that specific one but I've created a few others from the book and they're pretty great.

kenchix1

3:16 pm on Mar 15, 2009 (gmt 0)

10+ Year Member



Thanks for the info ! :)