Forum Moderators: open
Googlebot's been eating it up, for breakfast, lunch, dinner and midnight snacks. My traffic via Google has roughly doubled in the past week. Everything is going great, right?
Maybe not... I just checked Google for a unique phrase on the forums. 6 (!) copies of it in the index, because Google hasn't (so far) been able to drop the variables in the URLs, meaning that for every possible way there is to reach the same content, Google's indexed it.
I originally altered my robots.txt file to ensure that only things such as showthread, forumdisplay, printthread and archive could be reached. BIG mistake, because now I am somewhat concerned about the duplicate content filter.
I've just now altered the robots.txt file to prevent Googlebot from indexing anything but the main page and the search-engine friendly (drops all sessionids and variables) archive.
*keeps fingers crossed*
I am also trying to get forum listing in Google. It is the only area of our site that will not get deep crawled.
I think the best way to go is like WW where every page
is home/forums/arts/artists/thread1.html
The only thing is that you will have to build the pages
3,4 or 5 times a day to keep things current. Or just have the built pages for search engines only.
In time things will be better for the spiders... For now they have some issues with complex ID's? + #
PS Huge kudos to RoadRash, who I'm guessing is the person who tipped me off on the vB forums that I might have a problem.
I doubt it would ban for duplicate, just drop 5 keep 1 at worst case.
Seems to me that MOD_REWRITE might be an option. Hopefully though Google will index the Archive section that VB has provided for its users. The Archive section is a "copy" of the forums produced as static HTML without without session IDs.
Please excuse my 'newbie' question, but what are "session Ids"? Also, I have a message board that is powered by "XMB 1.8 Partagium Final" from "Adventure Media". How do I find out if, and how many, of these pages Google crawls?
Thanks for any help.
Dave
I did a Google search for those XMB boards - I browsed some as a guest and didn't see any sessionids, so I think they are safe in that respect.
As to viewing if something from a board has been indexed by Google, I usually take a unique phrase from posts that are a few weeks old, place quotes around it, and search for it in Google.
As to viewing if something from a board has been indexed by Google, I usually take a unique phrase from posts that are a few weeks old, place quotes around it, and search for it in Google.
Doh! I told you it was a newbie question :o)
Thanks danielm, using that method proves that Google is grabbing the pages. Not sure if I am happy (as they are spidered) or dissapointed (as I'm getting as much traffic as possible) now.
Dave
The only thing is that you will have to build the pages 3,4 or 5 times a day to keep things current.
Do look in to what a bit of php / mod_rewrite can do - i don't know what forum script you're using, but it's often possible to dynamically rewrite the forum pages to output using / / / instead of? & &
Everything seems to be going just peachy.
Why would this be? More importantly, what should I do to ensure my forum pages are picked up by Google?
Dave