|Brackets vs Encoded URLs|
Is inconsistent linking to bracketed *and* encoded URLs dup content?
Scenario: Internal linking is pointing to two different versions of a URL, one with brackets  and the other version with the brackets encoded as %5B%5D
Version 1: site.com/test?hello=all&howdy=all&ciao=all
Version 2: site.com/test?hello%5B%5D=all&howdy%5B%5D=all&ciao%5B%5D=all
Question: Will search engines view these as duplicate content? Technically there is a difference in characters, but it's only because one version encodes the brackets, and the other does not (See: http://www.w3schools.com/tags/ref_urlencode.asp)
We are asking the developer to encode ALL URLs because this seems cleaner but they are telling us that Google will see zero difference. We aren't sure if this is true, since engines can get so hung up on even one single difference in character.
We don't want to unnecessarily fracture the internal link structure of the site, so again - any feedback is welcome, thank you. :)
There is an easy solution, use canonical with one and only one version of the URL.
In this case, if any search engine would treat these URL differently, canonical would solve that issue.
I personally would not rely on the fact that all search engines would necessarily consider such URLs as identical, even if they probably should.
Take the safe road, go for canonical. I would for sure. It is very easy to implement.
You are going to laugh, but we actually already have canonical tags implemented that use only bracketed URLs.
The original goal of canonicals was actually to deal with excess search parameters, not character-based URL inconsistencies (that issue reared its head unexpectedly during development).
But you are totally right, the canonical tags will still help consolidate everything. Thanks again. ;)
URW. Make sure all your URL parameters are configured properly in Google Webmaster Tool.
By default, these are set to 'Let Google Choose', but this may be dangerous if some of your parameters are sessionID or anything that has no impact on the content of the page.
Although canonical was about dealing with excess search parameters in your case, Google may have found about these extra parameters in remote ways. If so, Google will tell you it knows about them in the proper GWT page. Double-check the configuration of these parameters. Such configuration exists in Bing Webmaster too.
Better safe than sorry with Google... lol
Rewind several steps...
Why are you using URLs with parameters and with brackets at all?
|We are asking the developer to encode ALL URLs because this seems cleaner but they are telling us that Google will see zero difference. |
What I'd really like to hear is the developer telling you that those are dreadful URLs either way ;) Is it possible he has tried telling you this and you won't listen? If that isn't the case, maybe it's time to shop for a new developer.
We are using URLs w parameters and brackets bc of form submissions, long story. Brackets (apparently) are necessary because users can select more than one option from the search form.
Doing our best to otherwise to help engines discover the most important URLs that can result from form submissions (cue canonical tags), deal with pagination, and also URL inconsistencies due to encoded vs non-encoded characters. Whew. ;)
if one character of the requested url is different - whether more, less or different - that makes it a different url and therefore non-canonical.
if the requested url appears to be percent-encoded (check your server access log) then something was messed up with the percent-encoding on the page.
you should redirect these requests to the canonical url with a 301 status code response.
Thanks again to everyone for the helpful responses. I suspected as much about the character differences but needed confirmation.