Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Keeping googlebot out of my shopping cart

         

rjwmotor

4:54 pm on Jan 23, 2010 (gmt 0)

10+ Year Member



I recently upgraded my shopping cart and it appears that Gbot is adding things to the cart. Once in there, there are links to update the cart item that are also being followed. These are URL's that look like they could definitely cause dup penalty.

I've read about the "nofollow" tag but will this work? I can't block access to the cart page as the whole site is in PHP.

What's the best way to deal with this and what type of code do I add to stop Gbot?

Thanks

tedster

7:55 pm on Jan 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First thing - there is no duplicate penalty. However, duplicate issues can definitely hurt crawling and indexing. Still, there will not be a "penalty" against your domain.

Still, you are correct to keep these URL out of the index and out of the regular crawl. Is there some obvious pattern to your shopping cart URLs that you can block via robots.txt?

rjwmotor

9:03 pm on Jan 23, 2010 (gmt 0)

10+ Year Member



When I add something to the cart this is the display URL:

http://www.example.com/store.php?add_to_cart=1&discount_price=&discounted_qty=&discounted_qty_in_cart=
&prod_rn=1686&microtime=0.21675700+1264280015&edit_item=&b_price=0.00
&option_0=0&option_1=0&option_2=0&option_3=&quantity=1

Wasn't sure if I could modify robots.txt or htaccess to block these URL's.

The biggest problem is g continues to checkout and then back out often carrying a session ID(only used during checkout). I also have a view cart section that has a link to the cart and the product so G is following those URLs as well. For cart purposes the product is listed with query strings in the checkout and view cart areas and G is indexing these.
The site has now uses SEO friendly URL's that are 301 redirects from query strings except when the checkout process is started.
Store.php is the base for every URL that is redirected.

My solution(I think) would just block G from starting to checkout and that would alleviate the rest of the problem. Not sure how to achieve this...

[edited by: tedster at 9:12 pm (utc) on Jan. 23, 2010]
[edit reason] line breaks added to prevent side scrolling [/edit]

tedster

9:16 pm on Jan 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Googlebot follows wildcards in the robots.txt file. This is a non-standard (so far) extension of the protocol. So you could use a line like this in robots.txt to block spidering of all your shopping cart URLs:

User-agent: googlebot
Disallow: /*add_to_cart

You could also exclude indexing of any URL with the ad_to_cart parameter in Webmaster Tools.

rjwmotor

9:37 pm on Jan 23, 2010 (gmt 0)

10+ Year Member



Makes me a little nervous listing googlebot and disallow. ;)
Like I said, several of these URLs that list the product URL for cart viewing have already been crawled and indexed. They do not contain the add_to_cart variable and have a different format so they do not follow the 301 to the regular product page. Should I just remove them w/ g removal tool?

Fear that other bots may or already doing the same thing. Would this code effectively block all?

User-agent: *
Disallow: /*add_to_cart

tedster

10:10 pm on Jan 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It will - as long as they support pattern-matching wildcards in robots.txt, which I believe the majors all do by now.

Should I just remove them w/ g removal tool?

Sure.

rjwmotor

10:22 pm on Jan 23, 2010 (gmt 0)

10+ Year Member



I also have some URLs that look like this:

http://www.example.com/store.php?rn=2051&action=show_detail&edit_item=

Would this work as well?

User-agent: *
Disallow: /*rn

tedster

10:44 pm on Jan 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't think so - every url with an "rn" combination anywhere in it would be blocked. I'd try this:

User-agent: *
Disallow: /*?rn

rjwmotor

11:18 pm on Jan 23, 2010 (gmt 0)

10+ Year Member



So the below code should(hopefully) work, right?

User-agent: *
Disallow: /*?rn
Disallow: /*add_to_cart

rjwmotor

11:21 pm on Jan 23, 2010 (gmt 0)

10+ Year Member



Or could I use
User-agent: *
Disallow: /*add_to_cart
Disallow: /*edit_item

That line(edit_item) is in the URL as well and in another variation that I do not want indexed or spidered.

tedster

11:28 pm on Jan 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Inside a Webmaster Tools account, under Site configuration >> Crawler access there is a pretty helpful robots.txt utility you can use to make sure your rules are doing what you want them to.

rjwmotor

11:44 pm on Jan 23, 2010 (gmt 0)

10+ Year Member



Thanks, didn't know that was there...

dstiles

12:18 am on Jan 24, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I block all forms, including all product/item order forms, from search engines.

I detect whether the access is from a browser, SE or possible bad bot and display or remove forms or return blank pages accordingly, depending on the actual page or form. With shopping I display products without the order forms.

This is in ASP but it's at least as easy to do in PHP.

Reading your intitial post I get the impression your products are submitted to the cart through links rather than forms, is that so? If that is the case, very bad! If not, sorry I mentioned it. :)