Forum Moderators: open
Many of these are obvious and well documented in other threads... I'm just posting a summary of what I've been doing for the last couple of weeks (And it seemingly works).
The tips are for folks who run dynamic websites like forums, links engines etc.
1) Take careful note of what content you want spidered. With forums (particularly vBulletin) there is a lot of content which is meaningless when put into a search engine.
Good pages to get indexed are your forum summary pages and the threads themselves. These have good anchor texts and good content within themselves, which lead to an excellent cluster of keywords.
2) Take careful note of what content you DO NOT want spidered. Things like graphics files, newreply, newthread, registration page, memberlists and other pages which do not have content which would be attractive to new users or new visitors.
EXCLUDE as much as many of these pages as physically possible from your robots.txt, to target the spiders more effectively at content and keyword rich pages.
3) Remove session hashes from your targetted pages. Given that most of these pages are read-only-magnets for visitors, you really do not need to maintain a session id in the URL. For vBulletin, search for s=$session[sessionhash] (I think) and try to remove it from as many places as possible.
4) Remove as many spurious links to the same content as possible. For example, forum summaries will have a couple of ways in which you can see the page (typically "view thread" and "view last topic in thread"). Try to eliminate these as far as possible. They give the spiders a couple of different ways to get to the content and many of them issue redirects which are meaningless from a spiders perspective.
5) As far as possible try to make sure that the spider can get to your all your content with no more than 1 variable like threadid, linkid etc.
6) Eliminate signatures for new users. You don't need a PR / visitor leakage due to signatures with URLs.
7) Master mod_rewrite. Try to get your forum and link directory links in the format www.widgets.com/Thread_Title_ID23232/ or www.widgets.com/Link_Title_ID23232 instead of /forum/showthread.php?threadid=29879. It makes the URL more clickable from a users perspective. This technique is very well documented on one of the SQL Links software's forum sites.
8) Remove things like "-- powered by blah blah" from your <title> tags. They're meaningless and dilute your placements.
9) Make judicious use of H1 (at the very minimum) tags. These should match your title or add to your title when used in forums / link engines.
10) Think dumb ... try to figure out how to get to majority of your content in as few clicks as possible.
Ok.. anyone else want to share?
I had Inktomi and Google go through a 20GB crawl over one of these sites last month (luckily I'd spotted this and fixed with redirects.. but still .. tons of errors were detected).
If you're using a free forum engine often these are required under the terms of the licence.
Personally, we allow users to post URL's where relevant. If you're worried about PR leakage (can always be comabatted with good page design instead) or linking to a banned site, edit the source so that links are left as plain text.
URL's are useful to users, they should not be blocked.
TJ