Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: goodroi
I have a site which was well indexed by Google and used to draw a nice amount of traffic. It consists of about 350 main pages, each with various sublinks.
I submitted a sitemap to google listing the 350 main pages which was picked up fine.
Recently I noticed that google had spidered much further into the site and had indexed the checkout page for every item plus all the other links off the main pages giving a total of about 950 index pages for my site. Most of the subpages indexed contain the same keywords as the main page and hence may be viewed as duplication.
Round about then the traffic dropped off and I noticed most of my main pages had gone supplemental. Obviously I'd like to get my traffic back!
Is it possible to write a rule in robots.txt which would disallow anything which is not "main.php?g2_itemId="..?
If it was possible, do you think it would do any good? Or am I barking up the wrong tree?
Google does allow pattern matching aka wildcards in the robots.txt. Using that you could probably block the pages you want to be blocked.
Side note - i find the best remedy for supplemental is getting more links to your key pages
Or am I barking up the wrong tree?
Ruff! Ruff! ;)
On larger sites, I know they might use IP delivery based content and hide those links from the bots, you don't want them to have access to those for just the reason you point out above.
I wouldn't rely on the robots.txt file to do what you are attempting to do. Keep in mind that the other SEs may not support the protocol that Google does when it comes to advanced robots.txt usage.
In the meantime was doing some reading.. would using this meta tag on the main pages do any good?
<meta name="robots" content="index,nofollow">
So the main pages will be indexed but none of the links will be followed?
Yes, but then "none" of the links will be followed including both internal/external.
I think that's what I want. My site's a simple tree structure. Once you drill down to a main item all the links are Checkout, View Cart, Login and other things I don't really care about being indexing.
I'm thinking as long as there is a clear route from the home page down to all the main item pages, and as long as they get indexed, then I can safely do a index,nofollow to turn them into dead-ends... maybe? It's highly likely I'm completely misunderstanding how all this works tho!