Forum Moderators: coopster

Message Too Old, No Replies

bulk fopen();

         

gilmour

4:54 am on May 13, 2003 (gmt 0)

10+ Year Member



My new pet project is a site directory and am working on some dead link tools to automate the upkeep.

I made a script that iterates through the URLs and checks them with fopen();, flags and de-activates them accordingly. Currently there are 2K links, I would not expect to grow beyond 10K.

fopen(); seems to hang (even after breaking my while query into smaller batches). From what I understand fopen(); has the propensity to have trouble with paths that end as a directory without an explicit file name.

fopensock(); seems to be more robust, however I don't really needs all of it's features (detecting 302's etc.)

Any advice would be appreciated.

lorax

4:04 pm on May 13, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm actually working out this very issue. You might want to read through [webmasterworld.com...] as there is a nice solution provided there using the cURL library.

BCMG_Scott

9:21 pm on May 13, 2003 (gmt 0)

10+ Year Member



Another option you can use the CPAN module WWW::Robot and write your own spider/lint/robot. I did this to help with indexing/spidering my sites (for a search engine) and to do some link checking. I backended it to a MySQL database so that I can do nice SQL queries against the data. I have around 1100 pages indexed and about 80,000 links.

Scott Geiger

gilmour

5:10 am on May 14, 2003 (gmt 0)

10+ Year Member



Thanks lorax,...I was hoping to avoid using the cURL library, it seemed a little bit like sandblasting a soup cracker, but it appears to be the best tool for the job.

The O'Reilly PHP Cookbook seems to favor it over fopen(), fsockopen() and PEAR's HTTP_Request class,...they probably know what their talking about.

lorax

2:19 pm on May 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> sandblasting a soup cracker

LOL. If you consider the library as a whole - yeah. But we're talking about a limited selection of functions that should provide a quick and relatively painless solution. I suppose you could also write something in C++ or JAVA and execute it server side if you had the wherewithall to develop and implement it. But why would you when you have a ready made set of functions available? ;)