lucy24 - 6:09 pm on Nov 26, 2012 (gmt 0)
That sound you heard in the distance was me shooting myself in the foot. The subsequent longer and louder sound was me realizing I've just handed g### four bad links on a plate, and it will take more than a simple click on "Fixed" to make them go away.
I can thank a robot for drawing it to my attention. One of those random robots with "link" in its name that drifts by once or twice a year, does its stuff and moves on. In the middle of its list of exactly 100 requests-- dynamically generated apparently, because some of them are quite new files-- came a block of four 404s, all ending in jsp.
Although they were obviously not my pages, I recognized the set of names. My first thought was that for some reason it deleted the http://www.example.com/ part of some external links and treated them as local links instead. So I pulled up the likely culprit page and searched the raw html for any and all "href". This brought me smash into this passage where I'm, ahem, poking fun at someone else's 1999-vintage html.*
:: pause for gratuitous comments about pots, kettles and glass houses ::
My own raw html-- quoting the original-- says, in part,
<tr><td align="right" valign="top"><img SRC="/images/selector.gif" width=8 height=15></td><td><font face="verdana, helvetica, arial,sans-serif" size="2" color="#000080"><a href="/backgroundandhistory.jsp"><b>Background and History</b></font></a></td></tr><tr><td align="right" valign="top">
... et cetera for a total of four links. (And, yes, four <font face... tags.)
When you're reading the page, nothing comes through as a link-- or any other kind of tag. That, of course, was the point of changing every < to < But all a crawling robot cares about is 'href="/blahblah.xtn"' which comes through loud and clear.
Oh, ###. Oh, ###. Oh, ###.
As I type this, it occurs to me that maybe it will work if I change a few things into numerical entities. Say, one letter in "href" and all the non-alphabetics in the links. That's assuming the googlebot extracts all its information from the raw html. Can't do anything about the visible text, since that's the whole point of the passage.
:: mutter, grumble ::
* It really dates from 2001, but 1999 sounds better.