I am trying to set up a simple webcrawler that goes out and crawls specific urls. Could someone point me in the right direction, I have never configured or built a crawler before. I could use a nudge in the right direction. I found some stuff on WGET but I am not sure if that is the right approach. Any suggestions would be helpful in my quest for information on creating or configuring one. Thanks!
There are two indespensible books by O'Reilly... LWP & Perl, and Spidering hacks.
The first talks about building spiders and parsing pages. The second really gets into the specifics of getting information out of pages.
Others have mentioned using PHP and LWP. I really recommend WWW::Mechanize. It's based on LWP, so you can always revert back to it, but it adds a bunch of things to easily follow links and submit forms.