Forum Moderators: coopster
If you're curl-ing links from "some location", say, directory crons:
/var/www/virtuals/example.com/httpdocs/crons
The "base href" you are looking for, I am guessing, is actually httpdocs, which is also the domain root. index.html in httpdocs is your main page.
Your curl returns links:
<a href="some-file.html">Some file</a>
If you want those at the domain toot, just use a PHP string function or preg_replace to add a leading slash to it:
<a href="/some-file.html">Some file</a>
Hope that works, crystal ball is a little cloudy today . . .
THIS IS THE PROBLEM
4. properly structuring the links provided
some links are example:
/
here.html
/here.html
?gp=1
I'm trying to structure each link to turn it into a lead to make the spider automated. If I were to just take the current links, it would result into errors for the curl script.
I have this so far....if anyone can add on to how to properly structure href links, I would greatly appreciate it.
I've even thought of only displaying the links then utilizing javascript to get them all. They would be properly formatted, but I really want to try to keep it server side so I can make it a cron job.
SNIPPET OF CURRENT LINK FORMATTING:
/*FORMAT URL LINKS*/ function newleads($url,$focusurl){
/*get host name from resulting URL*/
/*base non secure*/ if (substr($url, 0, 7) == "http://"){return $url;}
/*base secure*/ elseif (substr($url, 0, 8) == "https://"){return $url;}
/*base unknown*/ elseif (substr($url, 0, 7) != "http://" && substr($url, 0, 8) != "https://" && !preg_match("/#/i", $url) && !preg_match("/mailto:/i", $url) && !preg_match("/javascript:/i", $url)){
/*1st letter or number */ if (preg_match("/[A-Za-z0-9\'\"]/", substr($url, 0, 1))){
/*last not / */ if (substr($focusurl, -1, 1) == "/"){return $focusurl.$url;}else{return $focusurl.'/'.$url;} }
/*1st / */ elseif (substr($url, 0, 1) == "/"){preg_match('@^(?:http://)?([^/]+)@i', $focusurl, $matches);return $matches[0].$url;}
}
}
when calling the above function url is the link obtained from dom access and focus url is the current page accessed by curl