Welcome to WebmasterWorld Guest from

Forum Moderators: Webwork & skibum

Message Too Old, No Replies

How I Maintain My Directory's URL / Listings Validity

How do you maintain your links and avoid link rot?

1:58 pm on May 16, 2009 (gmt 0)

Junior Member

10+ Year Member

joined:Dec 16, 2007
votes: 0

This builds on another recent thread and I thought I would show how I maintain a directory of 14,000 links in a specific vertical market.

1. All of our links (along with company name, address and phone) are in a Filemaker database. If you do not have a database of the names you could use Xenu to generate the list of valid names or otherwise extract them from your website.

2. At the beginning of each month I run Xenu against a local copy of our website. This will pick up non-responding urls, and urls with 403, 404, 301 and 302 code results. It does not identify html refreshes nor websites that have expired and become a MFA or other junk advertising site.

3. At the middle of the month to find the problem sites, I use a perl script I wrote that runs on our unix server using:

use LWP;
use LWP::UserAgent;
use HTML::HeadParser;

The script returns 2 files which I open in a simple database. The first file contains every url with the html refresh if it exists, where I look for interdomain transfers done by web designers not knowing how to do a proper 301 server redirect when doing domain transfers. The second file contains the page title and url. Pulling this into the simple database, I scan the list looking for duplicate titles, obvious problems – such as expired websites and other typical problem indicators. I tag the record for every blank title, index titles, etc.

This script misses the url changes done by very dumb web designers using a javascript redirect to a new domain.

4. I export this tagged list of questionable urls to a file. I take a snapshot of the urls using Snapshotter. My thanks to a previous thread where the idea of viewing snap shots of websites to check validity was mentioned. This posting is my payback to that contributor.

I used Tweek UI from Microsoft to change the default size of the thumbnails viewed in the directory to the maximum 256 pixels wide. Using my 1650 pixel wide monitor I can see 5 in each row and 3 high.

This is large enough to identify problem pages. I had saved the snapshot as a 600 pixel wide snapshot although I seldom look at them and delete them once the run is completed.

I scan the thumbnail images of the websites and look for sites that have become MFA (made for adsense) or search engine feeds as well as other problems.

In this month’s pass I identified about 50 problem sites using Xenu (out of 14,000), and another 10 using the script above. This saves a manual look at the sites. My father-in-law had been looking at the 1000 or so questionable sites every quarter and it would take him about 8 hours to do. I did this in about 1 hour of my time.

12:53 pm on May 19, 2009 (gmt 0)

Moderator This Forum

WebmasterWorld Administrator webwork is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 2, 2003
votes: 101

Helping others by sharing effective technique for performing important tasks, like the task of keeping directory links current, is always a class act.

Kudos ColinG! ;)


Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members