Forum Moderators: open
In many cases the pages haven't been updated since the late 1990s, and some go back to 1995. And often, where a page purports to be a "resource" - a links page, many of those old links now lead nowhere. Plus the webmaster email addresses are defunct. Basically, there are many many "forgotten" pages ranking high in Google and they're holding back new sites and pages that aren't in the old loop of links, nor have the benefit of the educational site PageRank.
Now, since the topic area is by definition very old (medieval) the content of some of the pages I'm referring to is as valid today as it was in the mid-1990s and will still be valid in the years to come, but it means that Google's SERPS show a degree of stagnancy with new material at a disadvantage, especially as there no longer seems a way to contact anyone involved with those old pages to get one's newer pages added to their links.
I've read that long-established web pages are "bomb-proof" in Google, or likened to "good wine", but I've also read that Google likes fresh content. So the ideal would be fresh content on an old page, but it doesn't seem to happen that way. I'm curious how Google will deal with this in the long term - say ten years from now when there are pages lying around for 20 years or more, no longer anyone's active responsibility (and, it should be said, produced to very different technical standards to those what will exist in 10 years). At the very least I hope that they detect broken links and maybe feed that into SERPS as a ranking factor - this would seem like a pretty good indicator.
For Google, high PR + lots of external backlinks works wonders in the SERPs. You may have a point that in the future Google may see pages like this causing an excessive amount of stagnancy in the SERPs. However, probably at the moment Google doesn't get much complaints about this that they see it as a priority.
probably at the moment Google doesn't get much complaints about this that they see it as a priority
Surely Google is thinking ahead and not simply reacting to complaints? In some search areas their SERPS are littered with "resource" pages full of links to pages that no longer exist or aren't maintained any more, and as has been said, some of those pages enjoy big PR and lots of inbound links from days gone by.
I often get emails from a site that monitors millions of sites for broken links, so it can't be too hard for Google to do likewise (if they don't already). Then all they have to do is adjust their SERPS accordingly, on the fairly sound basis that a significant number of broken links is an indicator of a page which has less relevancy than when those same links were active.
...a significant number of broken links is an indicator of a page which has less relevancy...
I like this idea myself. I should think this also applies to other areas like small busineess web sites. The insurance agent web site that is up-to-date with internal and external links intact would suggest a higher-quality web site than one with broken links and old content. It should keep webmasters on their toes anyway!
I compete with a few "infomational" sites that are, quite frankly, ancient. I understand that the infomation may still be valid, but I also believe that the Internet is WAY too cluttered. How many sites do you really need on the same topic? Why not give precendence to those that are maintained regularly? I say, let's start emptying out the back of the closet!
One would imagine that recent content would be deemed more relevant than old and that well organized sites would be considerd more important. Maybe Goolge HAS already thought of this. Who knows what the future holds!
Its only content is "Site under construction, will return in September 2000"
All the internal navigation leads you to different pages with the same message.
Ive emailed google hundreds of times about it and they do nothing.
A site like this enjoys great position for years and a site like mine with fresh content and hundreds of useful pages get penalized for god knows what.
They have their head up their ass if oyu ask me.
Sometimes the directory pages I come across while researching link possibilities are insidious, because they have date displays on them that suggest that they were just updated, causing me to waste effort in trying to contact them.
Patrick Taylor's idea of using broken links as an indicator of relevancy is an interesting one. Patrick... maybe you should enter one of those Google programming contests to bring this to their attention. ;)
I say, let's start emptying out the back of the closet!
And I say be careful before doing anything like that. Old information can be very useful. For example:
You are considering buying a recent biography? The reviews (if any) will be found in recent pages. But you want to know how it compares to the 'standard' biography written years ago but still in print? An old page may provide a review. Who cares if some of the links are broken?
The internet is more than a means of peddling goods, it's also a reference library.
What is the current thinking on how much search engines, especially Google, punish sites with broken links? For example, is it about number, percentage, or is it split between effects for internal and external links? For example, if pages go dead at a rate of .25%-.5% a week, I would think an "authority" site would have trouble keeping up. Given that link-checking has to happen from time to time, does it pay to have some sort of *automated link removal* program?
In this connection, has anyone read 2004 WWW Conference paper "Sic Transit Gloria Telae: Towards an Understanding of the Web’s Decay" by Ziv Bar-Yossef et alii? (URL: [wwwconf.ecs.soton.ac.uk...] ) It presents a concept of "decay" richer than "are there broken links." For example, although Yahoo gets rid of broken links almost immediately, their directory links to many pages which *themselves* link to broken pages, or reside in "neighborhoods" with high concentrations of broken links. They also have good ideas about detecting soft failures, broken pages where you don't get a 404.
Last question. Has anyone observed how *Googlebot* deals with dead pages? Does it try again every time it spiders the site, or does it stop trying?
I don't understand how this helps anyone that uses Google or Yahoo! search engine. Google or Yahoo! likely won't do anything about this because it would be very difficult to find sites that are old, but don't have useful information. It would be to difficult for them to automate this and they won't hire no one to do this, they like the cheap automated way.
Also don't even bother providing feedback to them, they get 100000s of feedback, so what makes you think they will look at yours?
But should a maintained, quality site have to compete with one that hasn't been looked at in years and half the links (or more) are broken.
Yes.
That's no problem for a well-maintained, quality site. If the SE's are doing their job it will feature higher in the serps, whereas the older site will wither because it loses links. But if people still link to it, it's still valid.
if people still link to it, it's still valid
Not true in some areas where the loop is an old one (and the damage is greater where the pages have high PR - such as on educational sites). The assumption, if it's built into the Google ranking system without taking account of "decay" (a good way to describe it), results in SERPS with a higher than necessary proportion of useless pages.
If a webmaster of a so-called resource page can't be bothered to look after it even to check their own links then the so-called resource doesn't exist any more. And because there is no longer any way to make contact it's not possible to ask them to link to one's own page. So they're holding up progress, basically. Google could easily adjust the ranking of pages where many - or all - the links are broken. They should, because they're the ones who made links so important in the first place.
Google has two problems: malicious webmasters doing artificial promotion, and careless webmasters doing too little quality control of their own. I'd guess these days the former is considered much more harmful, and as a result of concern for it, the latter is not getting the attention that it might otherwise have.
I suspect Google is more concerned with the deleterious effects of NEW links on quality: but knowing Google, they've already tested both theories and got whatever good they thought they could get out of them, which probably wasn't much.
MSN seems to be several years behind the times: you might suggest this to them.
Google could easily adjust the ranking of pages where many - or all - the links are broken. They should, because they're the ones who made links so important in the first place.
Google made incoming links important, not outgoing links.
If the incoming links rot away, so will the importance of the page. If the incoming links remain but come from a poorly maintained site which is itself beginning to rot, this will have the same effect. Why not let nature take its course?
Penalizing pages with a high percentage of broken links is not a bad idea. It may already be happening for all I know. But for most sites this will only effect links pages which do not often appear in serps anyway.
[0008] Each of these conventional methods has shortcomings, however. Term-based methods are biased towards pages whose content or display is carefully chosen towards the given term-based method. Thus, they can be easily manipulated by the designers of the web page. Link-based methods have the problem that relatively new pages have usually fewer hyperlinks pointing to them than older pages, which tends to give a lower score to newer pages.
from a Google patent: "Methods and apparatus for employing usage statistics in document retrieval" [appft1.uspto.gov]
as earlier discussed here: [webmasterworld.com...]