Forum Moderators: mack

Message Too Old, No Replies

Requests for Malormed URLS

Crazy characters appended to some requested URL

         

webcentric

6:51 am on Apr 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



OK, first of all, that should say "Malformed" not "Malormed".

Anyway,I keep seeing 404's after Binbbot requests urls like the following. Is this just Bingbot having a bad day? It's really irritating to sift through errors and find something I have no idea how to fix.

/directory/sometopic/anothertopic/page​

lucy24

8:11 pm on Apr 24, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, if you've got my background, you looked at the specimen and instantly said: Aha, it's UTF-8 getting reinterpreted as Latin-1. (Ã is C3, the first half of many things in the Latin-1 range, and â is similarly E2, meaning a character in the three-byte range. The thing that looks like a comma is actually a “low-9” single quote.)

:: detour to to text editor ::

If you do the convert-to-reinterpret dance twice, passing the intermediate ​ stage, you end up with ... ###, I miss that every time. It's the Zero Width Space (E2808B), aka one variant of the Byte Order Mark.

In any case, I don't think this is entrapment; I think it's the robot following links where a bit of the subsequent text has gotten garbled in. I see similar things now and then from the Googlebot, though generally without the file-encoding complication. Just the other day they were persistently asking for "/directory/subdir/ website" like that, with a telltale space saying where they went astray.

webcentric

4:28 am on Apr 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks lucy. I had a hunch this had something to do with character encoding though I'm pretty sure they didn't pick up that link directly from my site. It's not a huge problem obviously but I see it fairly regularly. Once a week or so and yes, I've seen it from Google too.

keyplyr

10:04 am on Apr 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bingbot has been doing this for 5 years or more. There is at least one other long thread about it.