Forum Moderators: martinibuster

Message Too Old, No Replies

Hyperlink Maintenance

         

student25

3:03 pm on Jul 12, 2003 (gmt 0)

10+ Year Member



Hi!
Im a student who is currently writing a thesis on the management on hyperlinks on web sites. Ive decided to study the managerial aspect on how to best conduct hyperlink maintenance, but I have found the literature to be very limitid. Technical solutions seems to be the main aspect, but Im sure that the managerial issues (such as routines, structured tasks and guidelines on how to avoid broken links and how to best perform updating of content on a website) are of great importance for companies in order to be able to deliver quality web information for customers.

I was wondering if anyone in this discussion forum might have ideas or tips on how I can get more information on this topic or if any webmasters will share their procedures and routines on how they perform hyperlink maintenance.

For example:

Does your company have a guideline/framework for how to perform hyperlink maintenance?

Did your company develop a guideline?

What is included in this guideline?
- routines
- procedures
- how often should hyperlinks be checked
- are the tasks structured

I will be most grateful for all help.

Regards, student25

claus

4:00 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi student25, welcome to WebmasterWorld :)

Could you sticky me with a little more info on the thesis, i might like to cooperate, but i don't have the time right now :)

One small point, though. There's a vast difference between directory-type sites, other sites with a link directory, sites that "have some links", and sites that ie use links as part of other types of content (newspapers linking out and such)

/claus

<added>one more important point: do you mean inbound links (to the site), outbound links (to other sites), or both? it's important to be clear on this. I understand it as outbound only, correct me if i'm wrong</added>

[edited by: claus at 4:09 pm (utc) on July 12, 2003]

Hawkgirl

4:01 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



One day when clicking around my site doing some copy proofreading and editing, I clicked through a link and noticed that one of our main partners' sites was down.

They hadn't bothered to call me - in fact, they didn't even know they were down!

So we installed a link-checker program pronto - and check all of our links automagically, once an hour.

The guideline was informal, but dictatorial. "Nothing should ever be down. Check early and check often."

claus

4:11 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



hawkgirl
>> once an hour

- really? recently banned one that checked every 10 minutes. Once a day ought to do it i should think ;)

/claus

Hawkgirl

4:14 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We've lost enough sales by having a partner site be down all day that we want to know by the hour.

:)

Plus it's really fun to call someone else and say, "Gee, did you guys know your site is down?"

fathom

7:11 pm on Jul 12, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does your company have a guideline/framework for how to perform hyperlink maintenance?

Yes - as a consultant for many website owners I use site management software (Maxamine).

Did your company develop a guideline?

I would think most don't - if you manage your own site the only real problem is external links when other sites revise their site moving or removing dated pages.

What is included in this guideline?
- routines

The software is automated that crawls each web site, formats reports and emails the owner problem links or potential upcoming problem (if a re-direct was added)

- procedures

depress start :)

- how often should hyperlinks be checked

Weekily for myself - usually Sundays so that the client receives broken URL for fixing on Monday morning.

- are the tasks structured

N/A

- version control

N/A

- documentation routines

The email sent gives site pages linked, error found, reported URL, and potential reasons.

- absolute or relative linking

hmmm... for design purposes all internal links are relative (for me anyway) with the exception of the mainpage link where one per page is absolute.

- amount of effort paid to maintenance of links

If regularly done a few minutes - but less maintenance mean more work at once in a well.

Also, what are your views on broken links and how important do you think it is to pay attention to?

Extremely important! No one likes getting a 404 page they didn't ask for.

claus

3:09 pm on Jul 14, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hawkgirl:
>> we want to know by the hour

- did you get a tool that only sends out a "HEAD" request, or "PING" then, or do you use a tool that (GET) requests the whole page? 24 times a day is still 720 times each month, multiplied by the Kb of the page in non-user bandwith, and i bet you're not the only one checking links ;)


student25:
Thanks for info, this one i can just as well post, there's no need for secrecy. Plus, it might just kick off a good discussion, one never knows. I'll stick to general concerns, though, as there are almost as many ways of doing things as there are webmasters.

First, there's three (seven) different types of links to manage, each being there for separate reasons, having separate possible problems, and requiring different actions.

The seven types of links, according to Claus Schmidt (*)

  1. On-site links: Links on the site, pointing to other parts of the site.
  2. Outbound links: Links on the site, pointing to other sites than the one the link is on.
  3. Inbound links: Links on other sites, pointing to the site in question or to separate pages within the site.
  4. Inbound bookmarks: Links stored on a PC, pointing to the site in question or to separate pages within the site.
  5. Site-specific software-embedded inbound links: Not bookmarks, but links that are coded into software. Mostly to specific pages within the site (eg. a help/update page or a product fact sheet).
  6. Non-site-specific software-embedded inbound links: Some types of software assume that if there's a website, some files will also exist (eg. "favicon.ico" for IE, or "robots.txt" for a Search Engine spider).
  7. Off-line links: Links to this or other sites printed on ie. business cards, advertising, product fact sheets, user guides, contracts, physical products, etc.

The "easy parts" and those normally attended to by webmasters, are 1 and 2. These are the ones that are under the webmasters full control and can be changed any time. Members of this forum also have a very high interest in 3 (and 6) due to the need to be found, and to rank well in search engines. The term "Link Development" (forum title) normally refers to type 3, or a combination of types 2+3 (reciprocal links; i link to you, you link to me).

Link types 3 to 7 are generally "risky" for webmasters. They can be implemented without the webmaster knowing anything about it and without him/her being able to influence it in any way. Types 4 to 7 are especially risky, as their "life" is normally longer than types 1-3 (they are not changed easily or often). Printed matter, especially, tends to be stored for ages.

(*) Note: This list builds on my personal experience including all seven types. For types 5-7 especially during the last 5-6 years or so. If you re-publish it, include a link to this forum thread and my name (the link will then become a type 7, or 3 if it's on www). There's an understanding in here, that we do not provide self-promoting links, but i do use this list for other purposes, that's why i ask.

On-site and outbound links

As these are clearly your scope, you should limit yourself explicitly to these, or ...well, the amount of work will increase ;) I can only scratch the surface on these two points here. Anyway, your case company needs to be informed in some way or another that types 3 to 7 also exist, and can potentially also influence user perception of their site.

Consider the case of a site with 1 million users per month. This is a fairly large site, although nowhere near the amazons and yahoos and the like. Let's say that only 10% of the users have a bookmark to the site. That's 100.000 links to portions of the site that you simply cannot control - it might very well be that there are more inbound links than there are pages. And this is just link type 4.

1. On-site
It's helpful to distinguish between "navigation" and "context" (or "content").

Some software used in web development will automatically check that site navigation is okay as new pages are added, old pages are removed and some pages change name and/or location. That is: as soon as you are on the site, you will not experience problems getting from page a to page b. This does not safeguard you against link type 3-7 problems though.

You should always make sure that whenever a page shifts location or name, there are always a replacement page (or other mechanism) that will point to the new location. Over time, this "redirection system" will indeed become a system in it's own right, further complicated by pages that are deleted for whatever purpose.

You might want to read Cool URIs don't change [w3.org] by Tim Berners-Lee on this topic.

For context/content, the problem is somewhat different. Typically some type of article is written, and it includes one or more links. As time goes by, the whole site changes, pages get relocated, and they might even get different names. Even if a navigation system is in place, the article may still point to a page that has been moved.

2. Outbound
For outbound, there's the "context" issue, same as above. Only, now you don't always know that the documents you link to has another location than when the article was written. An example could be this forum. When i post a link such as the one above, it will probably not be changed if the location should change at some point. Should this thread move, or should the whole forum move, the links pointing to it probably will change.

Then there's the "directory" - if links are assembled into some directory-type section, these will often get checked, but you still tend to forget the "context" links. Third, there's the content partners, acting as part of your site (seen from the end user perspective) but really being external sites that you link to - if they change locations or fail, it will harm your site. There's probably even more kinds.

Maintenance

Step one, in both cases, is to realize what links you actually have, and where they are stored. Then, you can begin to check them.

Some kind of guideline or framework is actually a must, but i do not see it employed (often). I see people checking their "directory section" at some arbitrary frequency (or issue a statement like "these links worked when we planted them, they might not do so forever") and rely on the software for making sure their internal navigation works, and then they just plainly forget the rest.

- so how should you do it?
An automatic link checker, as mentioned by Hawkgirl, is a good starting point. There are several different ones, and i will not recommend one above the rest. Make sure, though, that the chosen tool does not use too much unnecessary bandwith at the site that you link to.

Starting point, because: It is not always enough. There are two types of problems here.

1) The page that you link to changes address.
2) The page that you link to changes scope.

Number one will be caught by a program, but if you link to eg. www.example.com, and this link works great, this does not mean that you can just plainly forget it. Sometimes sites are closed, and popular ones will often sell their URL, so that other parties can use it for their (not necessarily related) purpose. Even less popular ones will not necessarily terminate, rather they will turn into commercials for some hosting company or domain name service.

Manual inspection is the only 100% fool-proof solution. You can not easily program yourself out of this situation.

This, in turn, implies that either you should not have that many outbound links, or you should use more ressources on them, or (as the last option) you should simply post one of these "these links worked when"-statements.

The latter might be okay for some types of sites, but for certain other types (eg. banks, isp's, authorities, shops, associations, etc. etc.) it's simply not an option to link to, say, pornography by accident (because the link was pointing at something else once and it still works).

- so who should do it?
The automatic link check, the navigation etc. - all that concerns "the site" (eg. link types 1 and 2) is normally the responsibility of the webmaster (this might be a whole department in some cases).

Manual link check, though - no, i'd rather hire a few students to do this. Especially if you have a lot of links. It would not be feasible to spend (high) webmaster salary on this.

- so how often should you do it?
The classical response: It depends.

Navigation (#1, sub-type 1) should be working always. Literally. You simply do not put up a new page, move it, or delete it, without implementing the changes in your navigation system.

Context-links (#1, sub-type 2) should be treated as navigation when they were "on-site". This means always. If you have some old article pointing to some old terms of service, then, when the TOS changes, the link in the old article should point to the new one, or to the older version if that is the right case. There can be a lot of work in this one.

Outbound links (sub-types 1 and 2) will differ. If you link to the main domain, my experience is that you dont have to check as often as when linking "deep" (ie. www.example.com/dir/subdir/page.html). Your main interest will be to make sure that your users are not met by a 404 or by unwanted content.

For non-critical URL's i'd say a frequency of max once per day is enough. A link to the white house does not need to be checked at all, as it will simply be the right one (these types rarely change). On the other hand, a personal homepage might need a check every month or perhaps even more frequently.

Once per day, i'd employ only when we are getting close to something "critical" (eg. Outbound, sub-type 3). An example could be your hosted shopping cart solution or hosted payment gateway. If these sites are not up and running, you will lose sales. Even though they are "external" sites, often they will let you customize layout and integrate them into your own sites, but they remain external.

I hope some of these thoughts were useful. As i said, it's only a "scratch in the surface" - there's no way you can make a "one size fits all"-recommendation on this subject, so others may even have views that are directly opposite to mine.

/claus

student25

12:23 pm on Jul 21, 2003 (gmt 0)

10+ Year Member



thanx very much! U've been of loads of help to me!
If anyone wishes to comment on claus' reply, I'll be most grateful