Forum Moderators: coopster
I need create a PHP script for my company which generates all information about links in the Web site. It should show missing links, redirects, external links etc.
I try use Snoppy(SourgeForge), first to get all links in an array; then connect via fgets() to get the headers.
This process is absolutely non-sense: 1) It's extremely slow, 2) It causes 500 error for most Web sites.
What can I do? There are already such online (CGI) scripts there, do required task in few seconds, but not free.
Can PHP (and WebMasterWorld community) help me do, what I wanted to?
You might be able to just wget -r the whole site, and then grep -P for a link pattern. Read them into an array, then foreach decide if it is an internal link or external link.
Internal links can be checked with file_exists() external links can be done by requesting HEAD on the URL. The biggest hangup is probably going to be failures with DNS before you get the HEAD result. Set a timeout which reflects the beefyness of your company server, probably 1 or 2 seconds should be fine.
Do you mean get dump all html by file_get_contents() and search for href!
Not using sockets?
This tool, we want in our company Web site online for people; our site is hosted remotely.
Can you kindly show me some code/examples?
[edited by: coopster at 3:58 pm (utc) on Oct. 27, 2005]
[edit reason] removed url [/edit]
By the way, wget also has a spider mode which just checks if a link exists...