homepage Welcome to WebmasterWorld Guest from 54.205.99.71
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Visit PubCon.com
Home / Forums Index / Search Engines / Directories
Forum Library, Charter, Moderators: Webwork & skibum

Directories Forum

    
How I Maintain My Directory's URL / Listings Validity
How do you maintain your links and avoid link rot?
ColinG




msg:3914953
 1:58 pm on May 16, 2009 (gmt 0)

This builds on another recent thread and I thought I would show how I maintain a directory of 14,000 links in a specific vertical market.

1. All of our links (along with company name, address and phone) are in a Filemaker database. If you do not have a database of the names you could use Xenu to generate the list of valid names or otherwise extract them from your website.

2. At the beginning of each month I run Xenu against a local copy of our website. This will pick up non-responding urls, and urls with 403, 404, 301 and 302 code results. It does not identify html refreshes nor websites that have expired and become a MFA or other junk advertising site.

3. At the middle of the month to find the problem sites, I use a perl script I wrote that runs on our unix server using:

use LWP;
use LWP::UserAgent;
use HTML::HeadParser;

The script returns 2 files which I open in a simple database. The first file contains every url with the html refresh if it exists, where I look for interdomain transfers done by web designers not knowing how to do a proper 301 server redirect when doing domain transfers. The second file contains the page title and url. Pulling this into the simple database, I scan the list looking for duplicate titles, obvious problems – such as expired websites and other typical problem indicators. I tag the record for every blank title, index titles, etc.

This script misses the url changes done by very dumb web designers using a javascript redirect to a new domain.

4. I export this tagged list of questionable urls to a file. I take a snapshot of the urls using Snapshotter. My thanks to a previous thread where the idea of viewing snap shots of websites to check validity was mentioned. This posting is my payback to that contributor.

I used Tweek UI from Microsoft to change the default size of the thumbnails viewed in the directory to the maximum 256 pixels wide. Using my 1650 pixel wide monitor I can see 5 in each row and 3 high.

This is large enough to identify problem pages. I had saved the snapshot as a 600 pixel wide snapshot although I seldom look at them and delete them once the run is completed.

I scan the thumbnail images of the websites and look for sites that have become MFA (made for adsense) or search engine feeds as well as other problems.

In this month’s pass I identified about 50 problem sites using Xenu (out of 14,000), and another 10 using the script above. This saves a manual look at the sites. My father-in-law had been looking at the 1000 or so questionable sites every quarter and it would take him about 8 hours to do. I did this in about 1 hour of my time.

 

Webwork




msg:3916413
 12:53 pm on May 19, 2009 (gmt 0)

Helping others by sharing effective technique for performing important tasks, like the task of keeping directory links current, is always a class act.

Kudos ColinG! ;)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Directories
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved