Msg#: 4425029 posted 12:35 pm on Mar 5, 2012 (gmt 0)
I have a folder (images) in my website which contains, in a rather unstructured way, all the images used in the website. With the time this folder grew in size and many files contained in it are not used anymore. Now it's time to make some file cleaning and I need to choose the best strategy to remove all the unused files and preserve the used ones.
The best idea that came to my mind is writing a software that crawls all the website following all the links and writes in a file all the request that are made to the folder images. And then delete all the files that are not in that list.
Before starting reinventing the wheel I would like to know if this idea makes sense and if there are already libraries that perform part of this task. Any suggestion is welcome!
Msg#: 4425029 posted 2:32 pm on Mar 5, 2012 (gmt 0)
Thanks for the interesting reply! This sounds like a really cool software. I have a question though, in case you are an expert user: I just read the software's specs and I wonder if it is able to see that an image is required using the background-image(url:myUrl) css command. Doesn't looks like it goes checking the GET requests of files.
In case it does would you be so kind to give me an advice about how to configure it for my purpose? Besides, my site runs locally only for now. So I guess no FTP is needed.
Msg#: 4425029 posted 2:56 pm on Mar 5, 2012 (gmt 0)
After Xenu scans the website via HTTP (the site therefore needs to be running on a HTTP server such as Apache) it then asks for the FTP credentials so it can look in all the folders to find any files that were not accessed during the HTTP scan - those are the unused files.
I have no idea if Xenu looks for files mentioned in style sheets. I have never considered that possibility. I would hope that it does. It is quite easy to test whether it does or not.
Msg#: 4425029 posted 7:51 am on Mar 6, 2012 (gmt 0)
Hmm... Seems it won't work for JS and CSS:
"Please be careful with removing files when listed in an orphan report. Especially navbar mouseover images will be seen as “orphans”, because Xenu cannot find links to it. " (from [integralworld.net...]
Xenu is cool but doesn't seem to be the perfect solution in this case. I would need something that goes checking the GET requests done to server. In this way all CSS, JS and AJAX request would be parsed.