Google Webmaster Tools now shows you where your 404 errors come from!

I have always tracked down 404 URLs with simple typos in them by doing a Google search, but that only works when the anchor text is the same as the URL in the link.

I get a few incoming links with extra punctuation on the end (usually period, occasionally comma, and very occasionally something else) often formed by poorly designed URL auto-linking routines in common forum, blog and CMS software. It is not readily possible to search for those, and so I already have a rule in my .htaccess that sends a 301 redirect to the same URL with the trailing junk stripped off.

Even so, I still get a few links each month that make very little sense, whoever linked was really not paying attention to what they were doing. The duff URLs show in the server logs as a Googlebot (or other bot) access (and therefore *without* any referrer information) and then a few days later appear in the Google WMT 404 report. Very often, that is the only places they show... because no human has clicked the link. I hope that someone eventually clicks on one so I can capture the referrer information, but it often does not happen.

At this point, a Google search for the duff URL occasionally finds the site where the problem link was posted, but this only works if the anchor text is the same as the link URL and the typo does not involve punctuation. There are a great many links that remain impossible to find, mainly because you can't search for stuff in the HREF on a page, so links with wordy anchor text and duff HREF can't be found. Even more important, many of the duff incoming links have weird punctuation on the end and Google just will not return results for a URL search with an underscore or a quote mark on the end.

However... with this new feature, the list of 404 errors is now much more useful. Now that information *can* be found - and very easily. What a great feature!

I have for a long time had various duff incoming links which were for a valid URL but with an additional underscore on the end, so they would fail to a 404 error. I have added a redirect for those on most sites with the problem, but some remain listed in WMT. I now discover that all of the duff links of that type come from Word documents scattered all over the web. Why this is so, I have no idea; but as least I can now look in to it.

Again, this is a great feature. I think people will be extremely shocked as to how many duff links they have pointing at their site and how careless the average netizen is when they cut and paste links. My pet peeve is people who post links with lots of unnecessary parameters in them, including session IDs, and, for Google searches, stuff like &client=Firefox or &client=Opera when I am using something else - and the totally ridiculous &rls=GGGL,GGGL;GGGL:2006-17,GGGL;GGGL:en stuff you see users of Firefox posting without any thought whatsoever.

I see that WMT no longer reports the *date* of last Home Page access. It just says the site was "visited", but it does now link through to the graphical crawl report.

Google Webmaster Tools now shows you where your 404 errors come from!

g1smd

Receptional Andy

g1smd

Receptional Andy

centime

Marcia

Receptional Andy

g1smd

le_gber

mark_roach

g1smd

icedowl

Reno

mcglynn

g1smd

rrussell

SEOMike

Reno

g1smd

trinorthlighting

maximillianos

g1smd

maximillianos

nmjudy

Lame_Wolf

g1smd

maximillianos

Oliver Henniges

docbird

nmjudy

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week