I have got a web site with longish page URLs (but shorter than 300 characters, so there should be no issues with server or browsers - the URL lengths that these can cope with seem to be in the four figures).
Users and linking sites seem to have no problem with my site. I notice one pattern of links to pages of my site, though, that puzzle me.
- these links refer to existing pages of my site
- but the URL is curiously abbreviated, to mostly 86 characters (counting the [),...] sometimes also to 93 characters, by replacing a middle part of the string with three dots.
Linked-to URL (from unknown site) is
What I also know of these wrong URLs:
- they are requested by both the Googlebot and the Yahoo! Slurp robots.
- they show up as "not found" URLs in my Google Webmaster Tools report
i.e. these two circumstances indicate that it's not a case of defective or malicious user agents requesting malformed URLs (I get a lot of those too) but that there are bona fide HTML links to these URLs from pages somewhere.
- I have looked in vain in the Google Webmaster Tools reports for links to these URLs.
- I have found no requests from user agents other than search engine spiders for these abbreviated URLs. Which would indicate that the pages they are on are not high-traffic - search engine spiders follow these links but no actual humans seem to click on them.
Any ideas on what kind of software/user behavious could generate this kind of abbreviated links in web pages? (I have seen link URL abbreviation on some forums but this always abbreviated the displayed text, not the linked-to URL, in the manner of <a href="http://www.example.com/word1-word2-word3-word4.html">http://www.example.com/word1-...-word4.html</a>. Reformatting the local part of linked-to URLs looks like a pretty brain-dead thing for any software to do.)