Forum Moderators: open
I have a huge wwwboard forum and want to make sure as much of the important stuff is indexed by search engines, especially Google, as they will not index everything now it seems. And I want to make sure I do not pick up any defacto penalties because of the nature of the beast.
If you know the wwwboard, you will know each message produces a specific html file. I do not want to put "noindex" on these, as people often find the forum courtesy of individual messages. Also, I keep / archive messages for 12 months (so people can search the archives) and then delete them. These archived messages are referenced from archived discussion board area / directory pages (so they are not orphans and can be found after having been reindexed from the message file into the appropriate archive file). What I do, is when one month (say Feb 2002) is replaced by another month (in this case, Feb 2003), I put a permanent redirect in the .htaccess file (which I presume is advantageous, Google PR wise) from one to the other (rather then put a permanent moved just for the old directory).
However, I wondered if I should use the META HTTP-EQUIV="Expires" tag on the archive directories and / or messages (my feeling is just to do it on the messages). I also wondered (as these messages will never change once they have been archived, as no-one can post to them anymore), whether I should also put a META NAME="revisit-after" 366 days (by which time it will be deleted), so as to allow the Google bot a chance to crawl other pages / messages. I think this is good, but will the two (revisit and expires) tags conflict?
I have already moved the boards (there are two on one domain) to main.html (so the index.html pages do not have 500+ links on it). Due to the nature of the current messages, I was thinking of putting a deny to the /messages/ directories (as each message can be changed slightly, but not really, as replies to each message will add a link to the html).
Any help *really* appreciated :-)
Thanks. What about denying bots from the messages directory?
My fear is that, sometimes, people hit the "post message key" twice, which means two messages are the same (which may be seen as Spam). Plus, also, the bots might downrank the site as messages change (date wise, whenever someone replies, but all that gets added is another link to the new reply message). Maybe SE's don't like this as the original message remains the same (for all intense purposes, but they keep getting "renewed" - a Spammers trick to show fresh content when there is none).
Do you think a deny for the (current, active) messages is a good idea (doing this also helps me with my site map plans).
RSVP / :-)
GrinninGordan
Search engines do filter for duplicates and sort them out, each in their own way. But I wouldn't worry about accidental dupes causing a penalty. It takes some pretty severe and intentional spamming to make that happen, and search engines are used to sorting out dupes all the time (and being very lenient about it, from what I can see in the SERPs).
I'd think you would WANT immediate spidering from "FreshBot" style programs if you can get it. These WebmasterWorld boards often show in the Google search very quickly and that is quite a benefit. Sometimes I'm researching a question and the very thread I'm working with already shows up on the Google search I use. I think this is to be desired, not avoided.
You have probably noticed that many spammers have used wwwboard to gain link pop - My thinking is that Google picked up on this, and put a slight penalty on directories titles "wwwboard", as this is a protocol of running this script.
Maybe you could test by renaming the directory -
Thanks, but ahead of you on that one. I do not have wwwboard directory or doc names, and my scripts encode any URL's that are posted into numeric cgi click through target URL's (with a deny covering the cgi path in the robots text file). Even email addresses are encoded to stop any PR loss risk and to stop harvesters (my scripts relace @ with AT and . with DOT). I am even stopping people from being able to enter text domains (no hyperlinked domains) in the message body (so they can not post [,...] www, .com, etc.), just in case search engines can read these as links.
But thanks, I appreciate the thoughts :-)
Any more tips for wwwboards appreciated :-)