googlebot perseveration on a single page

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

googlebot perseveration on a single page

what should be done when G hammers away on one page?

arnarn

4:40 pm on Apr 26, 2006 (gmt 0)

For some days now, G is pulling the same page (which turns out to be our login/account page when a customer checks out of our store).

I can understand that since most products have a direct link to this (checkout) page, that G would attempt to index it, but why the repeated hammering away on the same page? You'd think once it grabbed the page it wouldn't continue to read it, over and over again; sometimes as often as 1/sec.

Unfortunately, we can't use a NOINDEX instruction on this main checkout page because we are using a storefront that controls the header.

The main question now is, will this affect how we get indexed / ranked? If there is a negative effect, are there any suggestions as to how to fix the problem given our lack of control over the header?

tedster

3:42 am on Apr 27, 2006 (gmt 0)

Is it always requesting the exact same url, or is there a query string that is varying?

arnarn

4:38 am on Apr 27, 2006 (gmt 0)

It's the "same" url with the exception of the Session_ID option (&Session_ID=different value).

Based on the unique Session_ID, it looks like googlebot is trying to follow the "checkout" or "buy one now" option.

I've gone to the log files and tracked back to where any given Session_ID first appeared and it was when googlebot was first looking at a product.

A normal user would have one session id for all of their activities, but with Google (I guess because each access from them looks like a new user, our storefront issues a unique session ID).

About 4 hours after getting the Product info, googlebot then hits the "checkout" page -- ONCE for EACH ONE of the products it looked at earlier (each hit to the checkout page having a different Session_ID, associated with the earlier Product hit).

Is this something to be concerned with? Is it something googlebot should be smart enough to know about, or maybe it does know about it but finds looking at the same page important for some reason?

tedster

5:36 am on Apr 27, 2006 (gmt 0)

You asked if there can be a negative effect -- and the answer is a resounding "Yes." Having the session ID in the url is a dangerous practice.

From Google's Technical Guidelines for Webmasters [google.com]

Allow search bots to crawl your sites without session IDs or
arguments that track their path through the site. These technique
are useful for tracking individual user behavior, but the access
pattern of bots is entirely different. Using these techniques may
result in incomplete indexing of your site, as bots may not be
able to eliminate URLs that look different but actually point to
the same page.

Essentially, you are creating a situation where a potentially infinite number of urls are possible for a single bit of content -- and as the bot starts trying these, the pages actually active in the index will start to erode. Track this kind of thing with a cookie if you must, but best practice is to keep it out of the url.

arnarn

7:25 am on Apr 27, 2006 (gmt 0)

thanks for the info.. but ouch! life is getting too complicated!

Unfortunately for us, our storefront software (do I dare say miva?) does the session_id assignment. On invoking a page (e.g. product info retrieval), you usually don't see/use the Session_ID as a parameter/option, it's only when https:// is invoked (such as for account setup or purchase processing "checkout").

On each store page, there are links to store functions: "Storefront", "Account", "Search", "product List", "Basket Contents" and "Checkout". The "Account" and "Checkout" links use https and use Session_ID. So, whenever G looks at a product, it sees the "Account" and "Checkout" link (each is a single page and would only "look" different to G because of the Session_ID being different).

Given the above, do you think that G will be "..able to eliminate URLs that look different but actually point to the same page."?

Is it time to change our storefront software?

whiterabbit

8:51 am on Apr 27, 2006 (gmt 0)

If you use the rel=nofollow on the actual links you don't want googlebot to follow, it sould work out ok (MSN and Yahoo also support this directive)

arnarn

3:00 pm on Apr 27, 2006 (gmt 0)

thanks for the suggestion wr

Unfortunately, the offending links are part of the storefront code and we have no control (other than the image displayed) over their structure. The specific links are for setting up a username/account and for checkout, and on these links the SessionID gets set (again, by the storefront code).

We can set the main/base part of the URL for the two functions, but if you try to add anything, it gets embedded before the options (one of which is the &Session_ID).

daveVk

3:04 am on Apr 28, 2006 (gmt 0)

As you can not use 'nofollow', consider using robot.txt file rule to exclude google from link (if site structure suitable).

tedster

3:15 am on Apr 28, 2006 (gmt 0)

I'm almost 100% sure that Miva Merchant does support session tracking through cookies rather than through a url parameter. Check in with their support and user community forums.

If there were no workaround, then no Miva sites would succeed in the search engines. Although they have had trouble, nevertheless approaches have been found that allow Miva sites to rank.

arnarn

3:47 pm on Apr 28, 2006 (gmt 0)

thanks daveVka & tedster

My understnding about session ID was that it was required to maintaining the integrity of the shopping basket and ensure security during the checkout / purchase process. I will give them a call to ask about any options of using cookies vs a url parameter/option, and will post back here if i find out anything new.

Knock on wood, we're doing OK in rankings with this m store (albeit we're not happy with the last week or so with the flux / update or whatever you want to call it). It's been a real PAIN to do some of the things we have done to get m store rankings.

It's just that G hammering away on the same page had us concerned and what might happen to the rest of our rankings if we can't get them to ignore the two pages with the problem.

arnarn

8:41 pm on Apr 28, 2006 (gmt 0)

Have just checked with m regarding the session ids. There is no direct solution, but a possibility that one of the 3rd party consulting groups could make the mods to use cookies vs sessionIDs.

Trying to keep on topic: will G hurt our store's current rankings if we have a situation where all (almost all) product pages in our store point to a "checkout" page using a unique session id. And, where G is now repeatedly crawling the same page ("checkout") because of the unique session id.

tedster pointed earlier to G's technical guidelines for webmasters and if I read correctly: "Allow search bots to crawl your sites without session IDs or arguments that track their path through the site...".

Well in our case, G CAN crawl our site without the session IDs (i.e. they can use the url and just not use the session id). Actually, we'd PREFER that they not use the sessionID as part of the url they crawl. So I think in our case, since we're not PREVENTING them from crawling without SessionIDs, we're OK w.r.t. to G's guidelines?

tedster

9:00 pm on Apr 28, 2006 (gmt 0)

Where is googlebot picking up the sessid urls? Plug up that leak and you may be ok.

arnarn

9:41 pm on Apr 28, 2006 (gmt 0)

tedster

I think that's where the 3rd party consulting comes in, IF the problem can be fixed.

The session ID comes in as follows: Every store page that's viewed has a store navigation bar near the top of the page. In the navigation bar there are 2 "offending" links: "Account" and "Checkout". The store software, (beyond our control), builds the page and as part of that build adds a Session_ID to the "Account" and "Checkout" links.

We can set the url for these links in our administrative area for the store, but we cannot change the options/parameters added to the URL or add any Link Specific Regulation (such as was suggested earlier, to use the REL=NOFOLLOW).

There's also one other area where on some product listings, there's a "buy one now" option and that as well has a session_id added to the link. But, I think that the most offensive links are the "Account" and "Checkout" links found on virtually EVERY page of the store since it's part of the store function navigation bar.

g1smd

11:40 pm on Apr 28, 2006 (gmt 0)

That's dumb thinking on behalf of the designers. Bots should be prevented from trying to access any pages that do not ever need to appear in search results.

For a forum, for example, you would stop robots trying to crawl any "log in" and "new reply" and "new post" and "edit post" and "edit profile" and "send PM" and "bookmark this thread" and "print friendly" pages as they should never appear in search results anyway.

For a shopping cart, bots should index product pages, but should never be allowed to try to access "buy" links.

arnarn

12:24 am on Apr 29, 2006 (gmt 0)

well, here's one more update for those who might be interested:

With all the hammering away on the "checkout" page (which is [oursite.com...] a site search on G now returns [oursite.com...] i.e. site:oursite.com returns [oursite.com...] The only way I can think that's possible is from all the [oursite.com...] links followed to "checkout" from our product pages.

In checking mcdar, https shows up as #1 in about 80% of the dc's with http on the others

tedster

4:41 am on Apr 29, 2006 (gmt 0)

I would fix it, asap. There are more than a few third party modules that can make your app more SE friendly. I know of one business that stayed with Miva but got those sessid strings handled and also fixed a bunch more of the shortcomings. It can be done - check around in the Miva support community for help on this, because it's not our purpose here, it's Miva business.

arnarn

6:29 am on Apr 29, 2006 (gmt 0)

it's already on my todo list! thanks!