homepage Welcome to WebmasterWorld Guest from 54.196.63.93
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
Forum Library, Charter, Moderators: goodroi

Sitemaps, Meta Data, and robots.txt Forum

    
Limiting the depth of Google's crawl?
Can robots.txt be used to limit depth
daveozzz

5+ Year Member



 
Msg#: 3323482 posted 7:20 am on Apr 27, 2007 (gmt 0)

Hi, I'm pretty new to SEO so I may be trying to do something impossible here.. any advice would be appreciated.

I have a site which was well indexed by Google and used to draw a nice amount of traffic. It consists of about 350 main pages, each with various sublinks.
I submitted a sitemap to google listing the 350 main pages which was picked up fine.

Recently I noticed that google had spidered much further into the site and had indexed the checkout page for every item plus all the other links off the main pages giving a total of about 950 index pages for my site. Most of the subpages indexed contain the same keywords as the main page and hence may be viewed as duplication.

Round about then the traffic dropped off and I noticed most of my main pages had gone supplemental. Obviously I'd like to get my traffic back!

My main pages look like this:
[mysite.com...]
The various sublinks are ugly and formatted something like this:
[mysite.com...]

Is it possible to write a rule in robots.txt which would disallow anything which is not "main.php?g2_itemId="..?

If it was possible, do you think it would do any good? Or am I barking up the wrong tree?

 

goodroi

WebmasterWorld Administrator goodroi us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 3323482 posted 3:48 pm on Apr 27, 2007 (gmt 0)

Welcome to WebmasterWorld daveozzz!

Google does allow pattern matching aka wildcards in the robots.txt. Using that you could probably block the pages you want to be blocked.

Side note - i find the best remedy for supplemental is getting more links to your key pages

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3323482 posted 3:52 pm on Apr 27, 2007 (gmt 0)

Welcome to WebmasterWorld daveozzz!

Or am I barking up the wrong tree?

Ruff! Ruff! ;)

The ultimate scenario in this situation is to make those links invisible via JavaScript or some other solution (for example, .NET PostBacks).

On larger sites, I know they might use IP delivery based content and hide those links from the bots, you don't want them to have access to those for just the reason you point out above.

I wouldn't rely on the robots.txt file to do what you are attempting to do. Keep in mind that the other SEs may not support the protocol that Google does when it comes to advanced robots.txt usage.

daveozzz

5+ Year Member



 
Msg#: 3323482 posted 6:57 pm on Apr 27, 2007 (gmt 0)

Glad I'm at least thinking along the right lines. Thanks for the replies.
I'll need to look into the Javascript thing cos I have zero knowledge of that at the moment.

In the meantime was doing some reading.. would using this meta tag on the main pages do any good?
<meta name="robots" content="index,nofollow">

So the main pages will be indexed but none of the links will be followed?

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 3323482 posted 7:02 pm on Apr 27, 2007 (gmt 0)

So the main pages will be indexed but none of the links will be followed?

Yes, but then "none" of the links will be followed including both internal/external.

daveozzz

5+ Year Member



 
Msg#: 3323482 posted 7:21 pm on Apr 27, 2007 (gmt 0)

Yes, but then "none" of the links will be followed including both internal/external.

I think that's what I want. My site's a simple tree structure. Once you drill down to a main item all the links are Checkout, View Cart, Login and other things I don't really care about being indexing.

I'm thinking as long as there is a clear route from the home page down to all the main item pages, and as long as they get indexed, then I can safely do a index,nofollow to turn them into dead-ends... maybe? It's highly likely I'm completely misunderstanding how all this works tho!

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Sitemaps, Meta Data, and robots.txt
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved