Forum Moderators: coopster

Message Too Old, No Replies

Links Scan script

linkschecker script in PHP

         

anshul

8:14 am on Oct 26, 2005 (gmt 0)

10+ Year Member



Hi all,

I need create a PHP script for my company which generates all information about links in the Web site. It should show missing links, redirects, external links etc.

I try use Snoppy(SourgeForge), first to get all links in an array; then connect via fgets() to get the headers.

This process is absolutely non-sense: 1) It's extremely slow, 2) It causes 500 error for most Web sites.

What can I do? There are already such online (CGI) scripts there, do required task in few seconds, but not free.

Can PHP (and WebMasterWorld community) help me do, what I wanted to?

soflution

8:15 pm on Oct 26, 2005 (gmt 0)

10+ Year Member



First thing is to run it from your company web-server - that way access times are going to be negligable.

You might be able to just wget -r the whole site, and then grep -P for a link pattern. Read them into an array, then foreach decide if it is an internal link or external link.

Internal links can be checked with file_exists() external links can be done by requesting HEAD on the URL. The biggest hangup is probably going to be failures with DNS before you get the HEAD result. Set a timeout which reflects the beefyness of your company server, probably 1 or 2 seconds should be fine.

anshul

6:30 am on Oct 27, 2005 (gmt 0)

10+ Year Member



I couldn't comprehend much, what you're saying:
What it mean: wget -r the whole site?

Do you mean get dump all html by file_get_contents() and search for href!

Not using sockets?

This tool, we want in our company Web site online for people; our site is hosted remotely.

Can you kindly show me some code/examples?

[edited by: coopster at 3:58 pm (utc) on Oct. 27, 2005]
[edit reason] removed url [/edit]

vincevincevince

11:43 am on Oct 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



wget -r probably refers to a recursive wget (web-get) - and you can run it using exec() or ` `. It would download the whole website, following all links, and then allow you to search through the whole thing looking for those links.

By the way, wget also has a spider mode which just checks if a link exists...

anshul

10:47 am on Oct 31, 2005 (gmt 0)

10+ Year Member



I'm not sure of your saying; I tried:
$data = exec('wget -r http://www.example.com/');
print_r($data);

anshul

10:58 am on Nov 5, 2005 (gmt 0)

10+ Year Member



What about cURL?
How much it can help, in this regard?
(I recently installed it on my local machine).

There is insufficient documentaton about it on PHP Web site.