Forum Moderators: open
I'm having a problem with this. Any ideas?
Our application uses Tomcat and Struts with Hibernate to deliver dynamic web pages. The JSP pages use the c:url tag to automatically append JSESSIONID to links if the client cookies are disabled. For SEO purposes, we’ve tried several approaches to handling JSESSIONIDs so that search engines do not get incorrectly weight pages based on the transient JSESSIONID values. Ultimately, we’ve had to remove all the c:url tags so that the JSESSIONIDs are not generated and add cloaking to our Apache webserver to strip the JSESSIONID from incoming links. However, this has turned out to be a nightmare. The problem is that a bot like Google can visits our website over a thousand times in an hour. Add this to other bots, and we’re in a situation where search bots account for a significant fraction of the total traffic to our website. That’s normally fine; however, when we remove the JSESSIONIDs from the links, a robot crawler jumping from page to page on our website will be interpreted as a new unique user. Every new unique user is automatically assigned a session object by the servlet container. In addition, the use of Struts with Hibernate make it a requirement to have a session object available (since hibernate uses a unique session id to track persistent connections). Our system setup combined with removing the JSESSIONIDs from all urls on the website overwhelms our servlet container because new session objects are spawned every single time a bot hits our website. The only way to handle this problem, is to put the session-timout setting in the web.xml file down to a very small value (e.g. 1 minute), but then this creates the additional problem of our users inadvertently being logged out after a minute of inactivity. There seems to be no solutions to this problem. If I cater to the search bots, I screw the users. And vice versa. Any suggestions?
We're primarily focused on identifying search engine spiders here, not providing technical support.
You might have better luck with your question in one of the more technology-related forums here on WebmasterWorld.
Best of luck to you. :)
Yhe only valid reason I can even think of for adding JSESSIONIDs to the URL would be to allow someone with cookies disabled to use a cart in ecommerce.
If this is in fact the case, which I've implemented in the past, the solution is simple.
Do the following:
1) You only enable JSESSIONIDs when someone that has cookied disabled adds a product to the cart.
2) You disallow all bots from crawling the cart page and checkout process therefore no JSESSIONIDs will ever be displayed to the bots.
If this isn't the case, and your site is being monetized via something like AdSense, the JSESSIONIDs will completely mess up your ad targetting.