Forum Moderators: coopster

Message Too Old, No Replies

Check if remote files exist.

Efficiently.

         

gosman

3:31 pm on Nov 13, 2008 (gmt 0)

10+ Year Member



I have a database that contains over 60,000 image url's. Some of the images url's are no longer valid and I want to clean up the database. I need to do this monthly.

I have created a script to parse the database and then use fopen($url) to check if each of the images exists. This method is taking a long time to process, approx 3 hours for the entire database. Can anyone advise if there is a more efficient way of doing this?

Thank you in advance.

jatar_k

3:50 pm on Nov 13, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



3 hrs per month isn't so bad

you could split it up

another idea might be the more often an image has been checked, the less likely it is that it has been changed or moved

this might allow you to select the urls that have never been checked and work up to the most checked.

just an idea, not sure it would help as I don't know how often images become invalid

andrewsmd

5:15 pm on Nov 13, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I agree with jatar_k 3 hours is not bad. You could limit your query to a set number and then set them to a date last verified and then select only that have not been checked in x amount of time. Here is the pseudo of what I am saying I'm assuming you have 60,000 files 60,000 / 30 = 2000
select imageUrl from imageTable where dateLastChecked < currentDate - 30 days limit 2000
run a check on what that returns and just run it daily. It would take a while but eventually everything would be getting checked once a month. As far as working with the dates I would use a unix time stamp because dealing with seconds would be the easiest way to do it.

gosman

5:34 pm on Nov 13, 2008 (gmt 0)

10+ Year Member



Hi Guys.

Thanks for the replies. I've manged to get it down to about 90 mins using curl instead fopen.

jatar_k

5:44 pm on Nov 13, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



nice

I am a little surprised it made such a difference