Welcome to WebmasterWorld Guest from 54.226.130.194

Forum Moderators: goodroi

Message Too Old, No Replies

Limiting the depth of Google's crawl?

Can robots.txt be used to limit depth

     

daveozzz

7:20 am on Apr 27, 2007 (gmt 0)

5+ Year Member



Hi, I'm pretty new to SEO so I may be trying to do something impossible here.. any advice would be appreciated.

I have a site which was well indexed by Google and used to draw a nice amount of traffic. It consists of about 350 main pages, each with various sublinks.
I submitted a sitemap to google listing the 350 main pages which was picked up fine.

Recently I noticed that google had spidered much further into the site and had indexed the checkout page for every item plus all the other links off the main pages giving a total of about 950 index pages for my site. Most of the subpages indexed contain the same keywords as the main page and hence may be viewed as duplication.

Round about then the traffic dropped off and I noticed most of my main pages had gone supplemental. Obviously I'd like to get my traffic back!

My main pages look like this:
[mysite.com...]
The various sublinks are ugly and formatted something like this:
[mysite.com...]

Is it possible to write a rule in robots.txt which would disallow anything which is not "main.php?g2_itemId="..?

If it was possible, do you think it would do any good? Or am I barking up the wrong tree?

goodroi

3:48 pm on Apr 27, 2007 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld daveozzz!

Google does allow pattern matching aka wildcards in the robots.txt. Using that you could probably block the pages you want to be blocked.

Side note - i find the best remedy for supplemental is getting more links to your key pages

pageoneresults

3:52 pm on Apr 27, 2007 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld daveozzz!

Or am I barking up the wrong tree?

Ruff! Ruff! ;)

The ultimate scenario in this situation is to make those links invisible via JavaScript or some other solution (for example, .NET PostBacks).

On larger sites, I know they might use IP delivery based content and hide those links from the bots, you don't want them to have access to those for just the reason you point out above.

I wouldn't rely on the robots.txt file to do what you are attempting to do. Keep in mind that the other SEs may not support the protocol that Google does when it comes to advanced robots.txt usage.

daveozzz

6:57 pm on Apr 27, 2007 (gmt 0)

5+ Year Member



Glad I'm at least thinking along the right lines. Thanks for the replies.
I'll need to look into the Javascript thing cos I have zero knowledge of that at the moment.

In the meantime was doing some reading.. would using this meta tag on the main pages do any good?
<meta name="robots" content="index,nofollow">

So the main pages will be indexed but none of the links will be followed?

pageoneresults

7:02 pm on Apr 27, 2007 (gmt 0)

WebmasterWorld Senior Member pageoneresults is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



So the main pages will be indexed but none of the links will be followed?

Yes, but then "none" of the links will be followed including both internal/external.

daveozzz

7:21 pm on Apr 27, 2007 (gmt 0)

5+ Year Member



Yes, but then "none" of the links will be followed including both internal/external.

I think that's what I want. My site's a simple tree structure. Once you drill down to a main item all the links are Checkout, View Cart, Login and other things I don't really care about being indexing.

I'm thinking as long as there is a clear route from the home page down to all the main item pages, and as long as they get indexed, then I can safely do a index,nofollow to turn them into dead-ends... maybe? It's highly likely I'm completely misunderstanding how all this works tho!