Forum Moderators: open
We'd rather not have these links indexed by a search engine as the tags sometimes change (for instance A/B testing), and we don't want to introduce all of these unique URLs for a particular page of content.
We were thinking about just creating a Javascript function that would allow us to do something like
<a href="http://www.widgets.com/page" onClick="addTag(this, 'foo');">
with the idea that the search engines would process the link properly for crawling and indexing purposes but wouldn't execute the Javascript. And then we could have clean URLs for the search engines without ever worrying about analytics-laden URLs competing with the canonical ones for a page. And then human visitors cause the link tags to be added.
This seems harmless enough, but is it considered kosher to search engines or does it look cloaky to have all of these links with JS additions to them?
Thanks,
Steve
[edited by: tedster at 7:38 am (utc) on Jan. 15, 2008]
[edit reason] switch to example.com - it can never be owned [/edit]
page=widgets
page=widgets&tag=1
page=widgets&tag=2
etc
We just search engines to have one URL, page = widget, associated with widgets content and not think there are 3+ different URLs pointing to the same content.
Putting a nofollow on the link with the analytics tag means the link won't be followed or pass rank mojo which is a bad thing. That link might be part of our navigation for example. Same with the robots.txt. We do want the link crawled and the destination indexed. We just don't want URL permutations that all point to the same content.
If we could do it through a Javascript mechanism described in the first post, the search engine would follow the link but not execute the JS (and hence not tag the link with the analytics tag whereas users would) But this is an option only if it doesn't raise any red flags with search engines.
I suppose we could test it out on some unimportant links and see what the effects are. But if others have been down this road before...
Thanks,
Steve
Personally I would not use the JS solution for a few reasons, but if you want to go that way, I'd first test it to ensure I understood how all of the major search engines treated your specific implememtation. Even then, use caution.
The reasons I would not go the JS route are:
1) I believe that there is some chance that using JS on all of your internal links would send a red flag to one or more of the engines, or that it might even trip filters or create issues with the engines algo's. This is because JS has been used in sneaky ways over the years ... ways the SE's don't like ... and I would not be surprised if the engines had their radar up on extensive use of it on internal links with respect to algos and rankings.
I also know that MC said at one point: "If you make lots of pages, don’t put JavaScript redirects on all of them." I know you're not exactly talking about JS redirects here, but again, even if you test your solution and it works, I'd worry about extensive use of JS in association with links. One can run afoul of algos despite the best of intentions, by finding new ways to do things that look like problems to automated systems.
2) The engines' ability to follow JS has changed over the years. I have no idea exactly how each of the three big SE's are handling and treating JS as we speak. What I can say is that Google does a pretty good job of following JS these days and I know MC acknowledged that recently.
How about IP delivery? Search engines don't like "cloaking" but it's my impression that they've come around to defining cloaking as serving different content to users than to bots. Using IP delivery to dish up clean URL's to the bots and tracking URL's to users seems a very, very legitimate way to operate. Many very large, very well known sites do exactly that. The engines don't want to kill sites; they just want to prevent sneaky and deceptive practices.
Another option is to capture the info, and use 301 redirects to send users and bots on to the canonical pages. But u might end up with a hecka lot of 301's that way.
We actually have been doing an IP delivery mechanism for the pages that are primarily driven by application code to avoid getting lost in URL variation loops for the same content
However, for other content, we use a content management system. Putting that same application code in the content violates that framework enough that we started looking for a nicer compromise. A JS onClick mechanism is a bit less offensive for an occasional link within content that we would like "clean" and still trackable.
And then there are links that fall in between these two areas where we will need to make a call as to what mechanism to use.
Steve