Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Forum link strategy

Helping googlebot crawl a forum

         

bluegray

10:56 am on Feb 13, 2007 (gmt 0)

10+ Year Member



I'd like to get some comments on the link strategy used in the Simple Machines Forum.
The default template have the following type of links:

http://forum.com/index.php?action=someaction
http://forum.com/index.php?board=some_board_id
http://forum.com/index.php?topic=some_topic_id

These are working fine and googlebot have no problem crawling, and it is easy to prevent crawling the action urls with robots.txt

But the topic links can further have some of the following attributes:

http://forum.com/index.php?topic=some_topic_id;prev_next=next

is a link to the next or the previous topic, and

http://forum.com/index.php?topic=some_topic_id.msg454

is a link to a specific message in the topic. But the page generated is the same page as for the main topic.

So each page can have a number of different urls that link to the same page. SMF tries to correct the problem by adding

<meta name="robots" content="noindex" />

in the header section of pages with .msg or prev_next in the url to let crawlers know not to index those pages.

This seems to work. Google will only index the main topic pages and duplicate content should be minimized.

But (and this is my real question ;) ), I still see googlebot crawling all the different urls. I know I can stop this with either robots.txt or by including nofollow in the robots meta tag, but I am hesitant to interfere too much with googlebots natural crawling.

Would it be ok if most of the links that googlebot encounters on a page are closed to it, and only the main topics are open?

Thanks for any feedback