Forum Moderators: DixonJones
Can anyone recommend a method of creating a path specifically for a crawler, and whether or not new pages should be created for this?
My concern is that if duplicate 'spiderable' pages are created, and the bot somehow picks up a page with a session ID, there may be a duplicate content penalty assessed, or the pages may get kicked out of the index.
Anyone have any experience creating a "bot-friendly" path for a site that uses session ID's? Is there a way to use existing pages and 'turn-off' id's for the bots?
Also, the information is somewhat static (never changing, no variables), but the pages are dynamically delivered (.jsp). Is it possible that these pages could still get indexed with the session ID, considering the relatively smallnumber of pages (less than 100)?
Are there no settings for how the session is handled?
could this post be moved to Google News
it's not really google news though is it
Also, have you looked in your logs to see what urls spiders are requesting? Have you seen them requesting urls with a session?
There are less than 100 pages in this site, so I may just take a chance with the ID's. If it doesn't get indexed, Plan B will be a site rewrite into HTML. I'm still wary of a rewrite because of the potential to get a duplicate content penalty if the session ID strings get indexed.
Have any members ever "turned off" cookies and sessions through one directory of a site to allow the bots to crawl? Is this even viable?
Also, has anyone ever delivered relevant content pages by IP? It seems like IP delivery may be the only viable solution, even though there are no intentions of deceiving searchers. This is still probably to risky for me to implement. Just wondering if anyone has tried it -