|ouch, ouch, ouch|
That sound you heard in the distance was me shooting myself in the foot. The subsequent longer and louder sound was me realizing I've just handed g### four bad links on a plate, and it will take more than a simple click on "Fixed" to make them go away.
I can thank a robot for drawing it to my attention. One of those random robots with "link" in its name that drifts by once or twice a year, does its stuff and moves on. In the middle of its list of exactly 100 requests-- dynamically generated apparently, because some of them are quite new files-- came a block of four 404s, all ending in jsp.
Although they were obviously not my pages, I recognized the set of names. My first thought was that for some reason it deleted the http://www.example.com/ part of some external links and treated them as local links instead. So I pulled up the likely culprit page and searched the raw html for any and all "href". This brought me smash into this passage where I'm, ahem, poking fun at someone else's 1999-vintage html.*
:: pause for gratuitous comments about pots, kettles and glass houses ::
My own raw html-- quoting the original-- says, in part,
<tr><td align="right" valign="top"><img SRC="/images/selector.gif" width=8 height=15></td><td><font face="verdana, helvetica, arial,sans-serif" size="2" color="#000080"><a href="/backgroundandhistory.jsp"><b>Background and History</b></font></a></td></tr><tr><td align="right" valign="top">
... et cetera for a total of four links. (And, yes, four <font face... tags.)
When you're reading the page, nothing comes through as a link-- or any other kind of tag. That, of course, was the point of changing every < to < But all a crawling robot cares about is 'href="/blahblah.xtn"' which comes through loud and clear.
Oh, ###. Oh, ###. Oh, ###.
As I type this, it occurs to me that maybe it will work if I change a few things into numerical entities. Say, one letter in "href" and all the non-alphabetics in the links. That's assuming the googlebot extracts all its information from the raw html. Can't do anything about the visible text, since that's the whole point of the passage.
:: mutter, grumble ::
* It really dates from 2001, but 1999 sounds better.
So here's another reason to use example.com for example code and links.
You transfer the problem to example.com, and they don't and won't give a hoot.
Wouldn't work this time, since it turned out there was no example dot com. All four offending links-- and also the images-- use the unimpeachable /site-absolute form. Can't change them, since I'm quoting verbatim.
Make that semi-impeachable. One of the four is for /index.jsp by name. And on the real site, / can stand for any of five domain names, each in with-or-without form. When you are the sole occupant of your niche, Duplicate or even Decaplicate (do not cite me as an authority for this word) Content is not a problem.
Hm. Wonder how many times a day the googlebot requests a page from www.example.com? Links from WebmasterWorld alone... ;)
I'd hope that Google engineers had read RFC whateveritis and filtered that one out. :)
RFC 2606 - Reserved Top Level DNS Names:
|3. Reserved Example Second Level Domain Names |
The Internet Assigned Numbers Authority (IANA) also currently has the following second level domain names reserved which can be used as examples.
Well, google has always had that idiot savant quality...