Wordpress URL Hashtags and Canonical mess

Hi, I haven't posted in a while so I hope this is the right forum, I'm seeking an .htaccess solution, or any solution, to a convoluted mess which I'll do my best to explain.

#1 - Wordpress, still, does not handle canonical tags properly on paginated tag archive results(redirects all pages to the first page). I solved that problem with a rather simple theme functions file entry from here - [wordpress.stackexchange.com...] .I don't want to debate the issue of canonical pointing to page one on multi-page posts because Google has stated you should not and, on this particular site, each page is different. Think list of products with each jump having a different product from the group.

#2 - browserstate functions in history.js, still, do not degrade well on html4 browsers(IE9 etc) because they do not have an implementation of the session history API. The result is that it will turn example.com/some-content-here/4 into example.com/some-content-here/#some-content-here/4?_suid1234567 Which, unfortunately, breaks my solution #1. I end up with a canonical tag on all paginated content pointing to page one. More on history.js from github here - [github.com...]

INTERESTING
#3 - I wanted to view the problem from Google's eyes so I fetched as Googlebot. I fetched example.com/some-content-here/#some-content-here/4?_suid1234567 which, as I said has a rel-canonical pointing to page 1 instead of page 4, and googlebot fetches page 1. IGooglebot ignores everything after the hashtag and returns page 1 even if following my link leads a person to page 4. Of course this concerns me, it's unintentionally showing a visitor a different page than a search engine, but that's not my fault... I can't make googlebot ignore my url and load my canonical suggestion, they do that on their own when visitors do not.

As you can tell this isn't your every day problem but it does affect anyone using wordpress that has a plugin which uses the history.js file, such as theia post slider for example. MY CONCERN IS TO ENSURE PROPER INDEXING - I will eventually fix wordpress or fix the history.js degradation bug on non html4 browsers but I need to make sure I am getting indexed properly so....

Since Googlebot is ignoring paginated content with hashtags and _suid appended and jumping to page 1 I need to make sure that if anyone follows a link to example.com/some-content-here/#some-content-here/4?_suid1234567 that they also end up on page 1 of the multi page post or archive, thus I need a redirect to send people to the non-hashtag version of urls.

But it's not that simple. As you can see by my example url page 4 is actually in the url(/4?_) but after the hashtag portion. Oddly enough that's enough for wordpress to send a visitor to page 4 even if canonical reports being on page 1. This is causing some analytics nightmares and various other problems. If anyone has a suggestion to fix that I'm all ears BUT what I really need first is a redirect to strip the hashtag portion of the url without removing the page number that comes after, I don't want visitors to be unable to switch pages. As you might guess, if I remove the hashtag portion of the url then wordpress gets canonical right and so I no longer need to redirect to page one, and in fact I shouldn't.

Round and round we go, someone coming in on a hashtag link needs to land on a non-hashtag page but as soon as they go to the next page the hashtag appears and since that's not server side they don't get redirected.... and they create more links to hashtag pages... bleh.

I don't yet have a best effort for this, I was presented the problem this morning and am still trying to wrap my head around the implications. The site also has htaccess code to strip a .html ending from pages as well as your typical non-www to www redirect. A hashtag removal solution would need to work with both of those, and I am only looking for a hashtag removal htaccess solution(for now). I hope you found this set of issues interesting, I suspect it affects a lot more sites than people realize.

What are your thoughts?

mod note: I linked to the two reference sites because they offer a non-promotional plugin free solution to a common problem and insight into a bigger problem when the two issues collide. They are not my sites but are reputable, I hope they satisfy the linking policy on WW.

Wordpress URL Hashtags and Canonical mess

JS_Harris

whitespace

JS_Harris

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week