homepage Welcome to WebmasterWorld Guest from 54.163.91.250
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / HTML
Forum Library, Charter, Moderators: incrediBILL

HTML Forum

    
Language identifyer "&lang=" in URL being interpeted as "<="
Both Google and Yahoo are caching &LANG= in URL as <=
Fiver




msg:3881691
 2:56 pm on Mar 30, 2009 (gmt 0)

I haven't run into this before, but perhaps someone else has. I'm consulting on a site with multi languages that uses the variable name "LANG" in the URL.

This results in URLs like

domain.com/blah.php?arg1=34&lang=eng

Both Google and Yahoo are caching these URLs to read

domain.com/blah.php?arg1=34<=eng

Now, I understand that &lang; is html code for the less than sign that is appearing, but the URL is not terminating with a semi-colon, so I find it more than a little strange. The fact that both Y and G are misinterpreting it is even stranger.

I suppose I could recommend changing the language variable name, but is there a more elegant solution that wouldn't require a back-end change and the complications that come with it?

Thanks,
Naoise

 

encyclo




msg:3881714
 3:17 pm on Mar 30, 2009 (gmt 0)

Unfortunately, the search engines are entirely correct - the semi-colon is not obligatory to make an entity reference. You get the same issue when you use the variable
&copy which gets turned into a copyright sign.

This is why the ampersands in all variables in URL links must be encoded as &amp; - in most cases the browser (or SE bot) can handle unescaped ampersands, but not always.

So you need to modify your code to use &amp; in on-page links (ie. in the HTML) at all times:

example.com/blah.php?arg1=34[b]&amp;lang=eng[/b]

The W3C validator will show the unescaped ampersands as errors if you validate the generated page. This is actually a good example which shows that search engine bots really do respect standards and prefer valid HTML. :)

Fiver




msg:3881729
 3:39 pm on Mar 30, 2009 (gmt 0)

Thanks for the clarification encyclo - thorough as usual. Looks like no quick fix for this site, though admittedly this isn't causing huge problems at the moment. Cost/Benefits time.

swa66




msg:3881761
 4:28 pm on Mar 30, 2009 (gmt 0)

indeed an excellent example of why validation should be done.

Fiver: who not do the global substitution in URLs ? all "&" that are not followed by "amp;" get replaced by "&amp;"

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / HTML
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved