Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Googlebot not obeying robots.txt for Session IDs?

         

grandma genie

5:48 pm on Nov 22, 2010 (gmt 0)

10+ Year Member



Oh help! For some strange reason, the Googlebot has started to index my oscommerce pages to include session id numbers. There is a setting in oscommerce that is suppose to prevent this, and it used to work, but now google is indexing my pages with the session id number included, so they could potentially index the same page over and over again with a different session id number. They already have over 12000 of these pages indexed. This is terrible. Anyone coming to my site using those links will have all kinds of trouble checking out. I have this noted for the googlebot in robots.txt:

User-agent: googlebot
Disallow: /*osCsid=*

How do I get rid of the already indexed pages and stop them from continuing? How do I even contact them? They have made contacting them almost impossible.

Here is a sample of one of the links I found in Analytics:

mywebsite.com/osc/product_info.php?cPath=&products_id=1294&osCsid=98e1e61ebe8eb261801dcd4ce6066a39

- Grandma_genie

grandma genie

6:50 pm on Nov 22, 2010 (gmt 0)

10+ Year Member



Sorry for the second post. I was wrong. Googlebot is not indexing pages with a session ID number. What I found in Google Analytics was 12000 links from my officlal website (www.mywebsite.com) to the same site without the www. It was the link that had the session ID number in it. In the Links to Your Site Section, I found more than 12000 links like that. I don't understand how that happened. Where is Google finding those links? When you click on them, they take you to the osCommerce section of my site, but the page is empty saying Product Not Found. Can someone explain what is happening? I do have links from one part of my site to the osCommerce part, but they do not include session id numbers. They do include the cPath numbers. The session id number is the osCsid=.
Grandma_genie

tedster

7:06 pm on Nov 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah yes, that mess of data that gets dumped into the Webmaster Tools link report these days. It's been a technical boondoggle since the day it began.

All you need to do is make sure your server properly handles those URLs - with a 404 status if they really shouldn't be resolve, or a 301 status redirect to the canonical version of the URL if they should resolve. It's also a good idea to use the canonical link tag in the <head> section of the page as a good back-up strategy.

Then let Google sort out their technically challenged data dump as they get around to it and don't be concerned.

grandma genie

7:58 pm on Nov 22, 2010 (gmt 0)

10+ Year Member



Hi tedster,
Thank you. I was panicking, but I feel much better now. I'll take care of it.
- Jeannie

tedster

8:00 pm on Nov 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



they take you to the osCommerce section of my site, but the page is empty saying Product Not Found

Just make sure that those pages are sent with a 404 status in the http header and you're fine.