$line =~ m/http\:\/\/([^:\/]+.*\")/i;
$url = $1;
$wholeLink = "http\:\/\/" . "$url\n";
Trouble is, I'm still not very good with regex :) so I only half understand what I'm doing here...the link that gets extracted from the target file contains the trailing quote from the html, and I want to only include up to but not that trailing quote.
Any tips or pointers? (must be something sillly I'm missing, I know it!)
try something like this:
m{href="(.*?)"}ig
I use "href' instead of "http" to find my links so I don't miss the relative ones. the "g" after the reges is in case ther's more than one per line. The "?" in the regex makes it non-greedy.
Hope it helps.