Welcome to WebmasterWorld Guest from 54.166.181.58

Forum Moderators: mack

Message Too Old, No Replies

Weird error

     
2:59 pm on Jun 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:June 6, 2017
posts: 40
votes: 1


I was checking stats for my website in Bing and in the crawl errors section one of the 404 errors showed my website with the following characters following the address:

example.com/â​âÃ&#​195;’à &#​195;€™​;Ãâ&​#172; â​;€&#​195;„¢​;ƒÆ’Ã​4;â‚&​#172; Ã​8;¢Ã​â€​7;¬Ã&​#194;â€​;žÂ¢​95;ƒÆ’Ã&#​226; â€​;™ÃƒÂ​¢Ã¢​5;€š&#​195;¬Ã​6;¡Ã​ĉ&​#195;‚¬​5;¡Ãƒ​€š​5;‚¯​;ƒÆ’Ã​6; â€&​#226;Ã​62;€ Ã​¢â‚​94;â„​Ã​¢&​#195;ƒÂ¢​95;¢â​¬Å¡​95;‚¬​5;ƒâ€​¦Ã‚Â&​#194;ÃÆ​26;à ​2;€™Ã​ƒÂ¢Ã&#​194;â€​šÂ¬​5;…¡​Ã​¢â‚&#​194;Å¡​ƒâ€&#​197;ÂÂ​94;ÃÆ​;à â​€™Ã&#​198;â€​; â​62;‚¬â​„¢Ãƒ&​#195;’Â​¢ÃƒÂ&​#194;â​;‚¬Å&​#194;ÂÂ&#​194;Ãâ​;€¦Ã&#​226;¡Ã​98;Æ’Ã&#​160;â€​ÃÂ&​#162;ââ​;€šÂ&#​194;Ã…​;¡ÃƒÆ​’â&#​162;‚¬Å​4;Ãâ&​#226;šÃ​6;½Â​;ÂÂ​

Does anyone know what would give this type of error?

[edited by: goodroi at 4:05 pm (utc) on Jun 13, 2017]
4:21 pm on June 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:June 6, 2017
posts: 40
votes: 1


Adding that my website was recently (5/19) migrated to https.
5:14 pm on June 13, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4053
votes: 249


Looks like a mismatch of server settings for the character set, but it may be due to the BING crawler's parsing of your encoding. It is hard to say where without information about the platform and structure of the site. On html pages you would have a meta tag in the header declaring the charset. If it is content served from a database it could be a setting in the sql tables that misstates the encoding. IF this is only seen in BING reports I would look into what you can find out about its crawler's compatibility with your character encoding.
5:26 pm on June 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:June 6, 2017
posts: 40
votes: 1


Ok, thank you. I don't know how to trace down what you said. All I know is it is the same header information that was in place before the https was implemented on the site is the same header information now other than "https" has replaced the "http" protocol. Here is a link to my site (I guess it is ok to share links to our site? if not please remove):

https:// example dot com/

Thanks for your help.

[edited by: phranque at 7:09 pm (utc) on Jun 13, 2017]
[edit reason] exemplified domain [/edit]

5:41 pm on June 13, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15244
votes: 691


The short answer is: If it's an isolated error, do not waste even one second worrying about it. It almost certainly comes from the robot misreading a link on someone else's site, and is simply not your problem.

But my goodness, what a lot of garbage! ​ is the nonbreaking space or one version of the BOM. Everything else is what you'd get if you repeatedly toggled between Windows-Latin-1 encoding (not ISO-Latin-1) and UTF-8 with detours into decimally encoded characters (but why, when they're all in the same character set?). Someone on one of the more technically oriented subforums may be able to figure out exactly what was done--and how--and what the underlying text is.

Does WMT say where they found the URL?

Edit: OK, we overlapped. Your website name will shortly be deleted, but as long as it's there I should point out that the HTML 5 DTD
<!DOCTYPE html>
calls for the HTML 5 charset declaration, which is simply
<meta charset = "UTF-8">
Browsers will know how to interpret your version, which is the HTML 4 form, but you should update it anyway.

An annoying quirk of in-document charset declarations is that they can be overridden by a charset declared globally, for example in htaccess. (This is ###backward, but I don't make the rules.) However, that long string you posted goes way beyond a simple charset misreading. It involves multiple, repeated back-and-forth togglings, probably in someone else's database.

:: idly wondering if it would be possible for your product line to tap into the obvious second and far more numerous market, assuming you would wish to do so ::
6:01 pm on June 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:June 6, 2017
posts: 40
votes: 1


I am not sure where to find that info. I was going to attach some screen shots but I don't see a way of doing this in this message?
6:12 pm on June 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:June 6, 2017
posts: 40
votes: 1


<<OK, we overlapped. Your website name will shortly be deleted, but as long as it's there I should point out that the HTML 5 DTD
<!DOCTYPE html>
calls for the HTML 5 charset declaration, which is simply
<meta charset = "UTF-8">
Browsers will know how to interpret your version, which is the HTML 4 form, but you should update it anyway.>>

Ok, I am not sure I am understanding what you mean...are you instructing that:

Instead of:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

I should have:

<meta charset= "UTF-8">

?

[edited by: DChan at 6:15 pm (utc) on Jun 13, 2017]

6:14 pm on June 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:June 6, 2017
posts: 40
votes: 1


:: idly wondering if it would be possible for your product line to tap into the obvious second and far more numerous market, assuming you would wish to do so ::

Not sure what you mean but I am open to suggestions. Sadly, since migrating to https my site has almost flat lined.
6:17 pm on June 13, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15244
votes: 691


I am not sure where to find that info.

Did you mean the how did we find out about this URL? I dunno; it may not even exist.

I just checked for myself at Bing's wmt, and unfortunately they don't show any current 400-class errors so I don't know what extra information they would give. Maybe they just don't say, so that doesn't take you much further. (The Inbound Links area only lists links to actual, current pages; none of that via this intermediate link business you find at G###.)

But really, if it's just that one gibberish link, it's almost certainly not worth bothering about. You might check your access logs and see how often the bingbot has actually requested the URL. I don't know about Bing, but I've observed that if the Googlebot has never received anything but a 404 for a given URL (as happens if there's a typo link from someone else's site), they will not request it very often.

Instead of:
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

I should have:
<meta charset= "UTF-8">

Yes, exactly. This is a non-lethal error, because browsers are supposed to be forgiving. It's like when someone uses bad grammar: you know what they meant even if it's technically wrong.
6:32 pm on June 13, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:June 6, 2017
posts: 40
votes: 1


Ok thank you. The reason I was worried about it because the url seemed to be my .com url, my index page. I will try not to continue to worry about that error.

Thank you again for your instruction on the charset. I had no idea. It may be the editing program I use (expression web) that set the charset this way or it may have been in the template that I am using (I got it from zero theme website). I will get this changed asap.
11:16 pm on June 13, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 890


Actually, the syntax of charset meta tag is probably not the issue. I agree with not2easy and suspect it's a mismatch between the page meta tag and somewhere else that declares a different charset (or even the same**.)

You don't need charset meta tags on your HTML mark-ups (despite what some validators may say.) You only need to declare it in your root level htaccess file*:
AddDefaultCharset utf-8
*Unless of course some of your pages require a different language.

It's prudent to keep as little as possible in the HEAD section. Keep your pages lean & fast loading.

**Having the charset declared it two places, even if exactly the same, may cause browsers to renegotiate, slowing rendering.
1:34 am on June 14, 2017 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15244
votes: 691


*Unless of course some of your pages require a different language.

Different character set, since the whole point of utf-8 is that it covers any and all languages, including some seriously dead ones.

Dunno about others, but the w3 validator doesn't care* about charset. All it insists on in the <head> section is the <title>.

In any case I certainly didn't mean to imply that the problem was with the syntax of the charset declaration. In fact, I believe I specifically said it wasn't.

Horse's mouth [httpd.apache.org] says
AddDefaultCharset should only be used when all of the text resources to which it applies are known to be in that character encoding and it is too inconvenient to label their charset individually.

:: irritably wondering why AddDefaultCharset is core while AddCharset is mod_mime ::


* My fingers typed chare". This is really true.
1:48 am on June 14, 2017 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 890


Dunno about others, but the w3 validator doesn't care* about charset.
Yes it does. If it doesn't find a charset declaration in the page mark-up it gives this warning (not an error however):
No character encoding information was found within the document, either in an HTML meta element or an XML declaration. It is often recommended to declare the character encoding in the document itself, especially if there is a chance that the document will be read from or saved to disk, CD, etc.
So what I said above is, pay no attention to that warning and just set it in the headers site-wide via htaccess. However if you allow your pages to be downloaded and saved on the user's machine, it might help to have the charset declaration on the page in case some program needs it to render the page.
9:47 am on June 14, 2017 (gmt 0)

New User

Top Contributors Of The Month

joined:June 6, 2017
posts: 40
votes: 1


Ok, thank you all for your help, I really appreciate.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members