Detecting a site's home page and grabbing links

Forum Moderators: coopster

Message Too Old, No Replies

Detecting a site's home page and grabbing links

flashkid

1:00 pm on Oct 26, 2003 (gmt 0)

I'm making a php script that grabs the links out of a given site example:
"http://www.microsoft.com"
I want to be able to :
1-detect the site's homepage (even it was insisde another directory )
2-make a regular expresion math for the
<a ...some attributes here... href=....>LINK</a>
my regex is :
preg_match_all("!(<a *href=([^>]*)>)([^<]*)(<\/a>)!si",$string,$matches)
but this doesn't grab a link like:
<a class="mnpGlobalToolbarLink" dir="LTR" href="http://support.microsoft.com" target="_parent" guid="m1b3560fa9d8189ce386c2508f94c616f">Support</a>
Thanx

[edited by: jatar_k at 5:32 pm (utc) on Oct. 27, 2003]
[edit reason] turned off smilies [/edit]

jatar_k

5:33 pm on Oct 27, 2003 (gmt 0)

Welcome to WebmasterWorld flashkid,

since I am no regex expert maybe one of the folk will see this shameless mod *bump*. I turned off the smiles in your post so the pattern could be read properly.

vincevincevince

5:48 pm on Oct 27, 2003 (gmt 0)


preg_match("\/</s*a[^/>]*href/s*=([^/s]*)\i",$url,$matches);
echo $matches[1];

(untested)