Forum Moderators: phranque

Message Too Old, No Replies

SEOptimizing a Forum Site

         

gershon

2:41 am on Jul 7, 2005 (gmt 0)

10+ Year Member



I know that a lot of forum / message board sites seem to fair very poorly (compared to where they should be) in Google and other search engines. So, my question is: why?. More importantly, what should be done to solve this?

Some things that I know:

  • Eliminate duplicate content - don't offer multiple links to the same thread (even if formatted differently). If you must do so, use 301s to drive it back to a prime URL.
  • Eliminate non-content URLs - they waste PR, confuse spiders, etc. - Hide them, using nofollow, robots, JS, whatever
  • Use tight HTML - not sure if this is important, but forums generally waste lots of bytes on tables info, etc, as opposed to content
  • URLs? - I don't think any contemporary spiders get scared of URLs with question marks in them, but it can't hurt to rewrite to alphanumeric.html URLs

I'm sure there's a lot more that I'm missing, and so I'm all ears.

PS The one exception to the "forums don't fare well on SE's" is, can you guess? WebmasterWorld! I tried to describe some its technique above... So, if anyone can share more, I'd be most obliged

2by4

3:04 am on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They rank poorly because you have to do a lot of customizing of the code to make them rank well. If you do this customizing they rank the same as any other page, better in some ways because the content changes all the time. Most forum software puts out massively bloated HTML, you can literally cut it down to about 1/3 of the size with no difference visible to the user, but it's not easy to do. Same with getting rid of extra links, there's mods to do that, but then you have to do more to really clean it up, which again involves going into the code. It's not easy to do. Of course most forums use session ids if you have cookies turned off, which all search bots do, so of course they don't get indexed at all.

Forums that do this type of stuff, like WebmasterWorld for example, rank well, forums that don't, don't rank well. Lots of forums do it, but most don't. But IMHO nobody does it better than WebmasterWorld on a technical level, not even close, but that's because WebmasterWorld doesn't run generic forum software.

gershon

3:10 am on Jul 7, 2005 (gmt 0)

10+ Year Member



2by4, what else do you feel WW does (besides the 4 points I mentioned)?

Also, is it good enough to block extra links with nofollow or robots.txt, or must they actually be removed?

2by4

3:19 am on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Basically, WebmasterWorld does it all, and more, the more you look the more you will see. It's not good enough to block extra links with robots.txt although it's a start. Since almost nobody does this right people who do have a very large advantage over those who don't or won't or can't. Removing extra links is much easier than most peopole realize if you're using the right forum software, for example phpbb.

martinibuster

3:28 am on Jul 7, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



I just used a phpbb hack that instantly creates a sitemap of the forum. Four days after implementation, Google is showing all of the forum posts as indexed instead of in supplemental.

There are other hacks (for a customized version of phpbb) that will slip your forum topics into a spider friendly url (forum-topic.html), as well as slip it into the title tags etc. Really cool stuff. Would love to see that for the regular phpbb. I suppose I could hack the hack, maybe someday. hehe

2by4

3:30 am on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hacking the hack gives excellent results, but there's the drawback that the more hacks you have running, the harder it is to update the code, and the more careful you have to be when you do update it, so it's definitely something to think about before proceeding.

gershon

3:51 pm on Jul 7, 2005 (gmt 0)

10+ Year Member



Okay, so I'll add to the list:
  • Site Map - so spiders find all the posts (any tips on what makes a good one?)

Also, I've found:
  • Session ID's - keep 'em out of urls
  • h tags - put thread title in them

What else?

PS Maritnibuster - I would think that forum-topic.html is no better these days - it seems that these days all spiders can handle fourm.php?t=xyz or whatever.

2by4

5:00 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"I would think that forum-topic.html is no better these days - it seems that these days all spiders can handle fourm.php?t=xyz or whatever."

topic-34.html is better than topic.php?topic=34. There is no case where not using mod rewrite is better. Why? Because with the rewrite, the spider is receiving a list of independent pages, eg: topic-34.html, topic-35.html etc. Without rewrite, it's receiving a single page, with different content. Does it matter? Definitely, no question. On one site I'd been lazy, gallery section done with simple query string parameters, google only had main page indexed, or given in serps I should say, all were indexed to some degree, but weren't performing at all. I switched it to mod rewrite, all pages now perform. What a spider can handle and what is best are not the same thing. You want to give the spider unique pages, it doesn't get confused then.

The question is not: what is the least I can do to get some of this working, but rather: what is the most I can do to get it all working? Happily, almost everyone chooses the former.

martinibuster

5:18 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



...would think that forum-topic.html is no better these days - it seems that these days all spiders can handle fourm.php?t=xyz or whatever.

Yeah, I thought so too until I saw all of the forum posts in supplemental. :(

The site map hack can be found by Googling phpbb hacks site map. It's the Advanced Site Map. I'm not endorsing it. I'm just saying what my experience with it so far was. I find that it is a simple and non-intrusive solution. My main concern is that it needs to break down the map into smaller chunks. But so far it's worked very well. Over seven thousand pages indexed. Makes me giddy to think about it.

There is an html error in the sitemap.php file, and you might want to comment out the section that displays your member list (I commented it out so that if I screwed something up I can just un-comment it). When I first set it up I received an error message but the message tells you what line is going wrong, and it was a folder location variable that needed to be fixed. The usual script thing, easy to configure.

2by4

5:42 pm on Jul 7, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



There's an error on the main seo mods I just realized, no 301 to the new pages in .htaccess, I think that's why the pages don't come up except as supplemental if the forum has already been indexed. However it doesn't seem to majorly affect the serp results. I'm going to fix that and see if it resolves the supplemental result issue.

maccas

6:06 pm on Jul 7, 2005 (gmt 0)

10+ Year Member



phpBB can be made very search engine friendly. Besides the mod rewrite hack (I customised mine to remove the duplicates i.e Goto page 1,2 etc) and remove session id hack, you can hide certain bits from unregistered users i.e next and previous messages (duplications), all the fluff like whos online. Get the add extra fields and add a few keywords to different topics title tag...

gershon

3:48 pm on Jul 8, 2005 (gmt 0)

10+ Year Member



Some follow up questions:
  • You said earlier that robots.txt is not enough to avoid PR leak - what about rel="nofollow"? (My guess would be that this would suffice, as the whole point is to keep any benefit from spammers etc.)
  • How important is it to condense the non-dislaying HTML (some bb's have hundreds of lines of this stuff)
  • Do spiders get confused by <a> tags without hrefs? Some bb's use these for JavaScript controls
  • Some bb's repeat the subject, in bold, each time, for each reply? Is this bad (might be considered spamming)?

And, a general question: many bb's slightly modify the html when a spider is detected? Is this bad? Would it be considered cloaking?

2by4

5:30 pm on Jul 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Brett's 26 steps to site success [searchengineworld.com] are excellent guidelines. If you look at WebmasterWorld as a clear example of that method, that should answer most of your questions.

Re page size:

D) Page Size:
The smaller the better. Keep it under 15k if you can. The smaller the better. Keep it under 12k if you can. The smaller the better. Keep it under 10k if you can - I trust you are getting the idea here. Over 5k and under 10k. Ya - that bites - it's tough to do, but it works. It works for search engines, and it works for surfers.

Top in priority is to get rid of query string sids. If you are going to run adsense, you also need to get rid of them for logged in users. Again, simply look at how it's done here, logins by cookie only. If you're williing to sacrifice old browser support, you can drop page size down even more.

"You said earlier that robots.txt is not enough to avoid PR leak - what about rel="nofollow"? (My guess would be that this would suffice, as the whole point is to keep any benefit from spammers etc.)"

I didn't say this was to drop PR leak, never mentioned PR.

G) Outbound Links:
From every page, link to one or two high ranking sites under that particular keyword. Use your keyword in the link text (this is ultra important for the future).

Obviously you can succeed without this, see these boards for example, but his advice was solid then, and it's solid now.

As I said, you can approach this two ways: what is the least you can get away with, or what is the most you can do to achieve long term success. You seem to be leaning towards the former.

gershon

6:26 pm on Jul 8, 2005 (gmt 0)

10+ Year Member



2by4, thanks. I'm going to check out that article

you can approach this two ways: what is the least you can get away with, or what is the most you can do to achieve long term success. You seem to be leaning towards the former.

Ouch! Well, I appreciate criticism... But, you should know that this is not the case - I'm just trying to direct my efforts as effeciently as possible, and combine SEO with functionality. That's why I asked if I need to remove the links (which some users might find useful), or it's enough to nofollow them.

Don't get me wrong, though - if you think I'm doing something wrong, by all means, please let me know!

2by4

6:39 pm on Jul 8, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



"That's why I asked if I need to remove the links (which some users might find useful), or it's enough to nofollow them."

Links are something you have to decide for yourself, there's pluses and minuses, using nofollow of course means you can worry less about undesirable spammy type links, but mods in general have to watch for spammy stuff no matter what. Using link redirectors like WebmasterWorld does is another way to do it, then blocking the redirect page in robots.txt. It just depends on what you want the forums to do long term, what type of audience you'll be getting, I don't think there's any hard and fast rules re links and forums, I'd say it's a case by case thing.

"I'm just trying to direct my efforts as effeciently as possible, and combine SEO with functionality"

Functionality and SEO go hand in hand from my experience, the better on page stuff is, the better seo is.

aditd

7:09 pm on Jul 9, 2005 (gmt 0)

10+ Year Member



somebody said about a phpbb hack. where did you get it?

bradley phil

11:45 pm on Jul 11, 2005 (gmt 0)

10+ Year Member



newbie post alert!

a massive advantage of the mod.rewrite hack I employ on my forums (invision) is that the URL is rewritten to http://www.example.com/boards/example_keywords_.html

[example_keywords] is the title of the thread. This has had a noticeable advantage: google adwords ads relevancy has increased, so I assume the page relevancy as a whole has increased. Certainly my pages are doing well in Yahoo and MSN (though they're not 'doing' at all in google for an unrelated reason which i'll pick your collective brains about later on)

I'm not certain about the benefit of the standard topic.html mod.rewrite hack but at least it kills session ID's.

I'd be very interested to see a highly (successfully) hacked invision 2.0 board. I probably wouldn't bother doing that with mine though, as 2.1 is fast approaching.

[edited by: Woz at 12:07 am (utc) on July 12, 2005]
[edit reason] Examplified, please see TOS#13 [/edit]