Forum Moderators: open

Message Too Old, No Replies

Google and vBulletin 3

Words of caution to people using vB3

         

danielm

2:51 am on Jul 26, 2003 (gmt 0)

10+ Year Member



Well, I've happily been tooling along, playing with the beta 4 release of vBulletin 3 (which no longer uses sessionids when it recognizes a spider!).

Googlebot's been eating it up, for breakfast, lunch, dinner and midnight snacks. My traffic via Google has roughly doubled in the past week. Everything is going great, right?

Maybe not... I just checked Google for a unique phrase on the forums. 6 (!) copies of it in the index, because Google hasn't (so far) been able to drop the variables in the URLs, meaning that for every possible way there is to reach the same content, Google's indexed it.

I originally altered my robots.txt file to ensure that only things such as showthread, forumdisplay, printthread and archive could be reached. BIG mistake, because now I am somewhat concerned about the duplicate content filter.

I've just now altered the robots.txt file to prevent Googlebot from indexing anything but the main page and the search-engine friendly (drops all sessionids and variables) archive.

*keeps fingers crossed*

RoadRash

6:09 am on Jul 26, 2003 (gmt 0)

10+ Year Member



Sounds like a good plan. I recommended the same solution at the official vB forums. I would hate to be banned for duplicate content!

GoogleGuy

6:24 am on Jul 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Cool--thanks for mentioning that danielm!

RobbieD

6:35 am on Jul 26, 2003 (gmt 0)

10+ Year Member



GG do you ever sleep? ;)

I am also trying to get forum listing in Google. It is the only area of our site that will not get deep crawled.
I think the best way to go is like WW where every page
is home/forums/arts/artists/thread1.html

The only thing is that you will have to build the pages
3,4 or 5 times a day to keep things current. Or just have the built pages for search engines only.

In time things will be better for the spiders... For now they have some issues with complex ID's? + #

danielm

6:55 am on Jul 26, 2003 (gmt 0)

10+ Year Member



By "cool", I hope GG means that I've nothing to worry about. ;-) Either that, or Google will alter the indexing of vB sites somehow? It really only makes sense to do the latter, since 90%+ of vB users (I'm guessing) don't have any notion of duplicate content filters, and shouldn't be punished for their ignorance.

PS Huge kudos to RoadRash, who I'm guessing is the person who tipped me off on the vB forums that I might have a problem.

deanril

7:14 am on Jul 26, 2003 (gmt 0)

10+ Year Member



I would think........ If there are 6 pages from 1 site with different way of getting to it (urls) that at worst case it would drop the 5 and keep the 1. There would be no reason to show all 6 if its all the same content, and if google is sharp enough which Im sure they are this will happen.

I doubt it would ban for duplicate, just drop 5 keep 1 at worst case.

John_Creed

4:39 pm on Jul 26, 2003 (gmt 0)

10+ Year Member



Deanril is correct. Although nowadays you never totally know what Google will ban for, I think it's safe to assume that VB users will be fine as far as the duplicate content issue goes.

Otherwise hundreds(or thousands?) of innocent sites that use the newer version of VB would be banned.

RoadRash

11:56 pm on Jul 26, 2003 (gmt 0)

10+ Year Member



You do however want the one google keeps to be the best optimized version, ie, the archive. You still did the best thing by blocking the bot from all other versions.

Bradley

12:26 am on Jul 27, 2003 (gmt 0)

10+ Year Member



Interesting that this thread was started because I made a post over at VBulletin because I am interseted in their software product. I made a post and gave reference to Google indexing 4-6 copies of a page.

Seems to me that MOD_REWRITE might be an option. Hopefully though Google will index the Archive section that VB has provided for its users. The Archive section is a "copy" of the forums produced as static HTML without without session IDs.

Dave_Hawley

3:09 am on Jul 27, 2003 (gmt 0)



Hi Guys

Please excuse my 'newbie' question, but what are "session Ids"? Also, I have a message board that is powered by "XMB 1.8 Partagium Final" from "Adventure Media". How do I find out if, and how many, of these pages Google crawls?

Thanks for any help.

Dave

danielm

7:48 am on Jul 27, 2003 (gmt 0)

10+ Year Member



Correct me if I'm wrong, but sessionids are cookie-less ways for a forum with PHP to know that a forum guest (i.e., not a member) has already viewed a particular thread or a particular post. However, the URLS would look something like www.example.com/id?=s23490845asdf913fg09132a930f0 and Googlebot will have none of that kind of crap. ;-)

I did a Google search for those XMB boards - I browsed some as a guest and didn't see any sessionids, so I think they are safe in that respect.

As to viewing if something from a board has been indexed by Google, I usually take a unique phrase from posts that are a few weeks old, place quotes around it, and search for it in Google.

Dave_Hawley

8:26 am on Jul 27, 2003 (gmt 0)



As to viewing if something from a board has been indexed by Google, I usually take a unique phrase from posts that are a few weeks old, place quotes around it, and search for it in Google.

Doh! I told you it was a newbie question :o)

Thanks danielm, using that method proves that Google is grabbing the pages. Not sure if I am happy (as they are spidered) or dissapointed (as I'm getting as much traffic as possible) now.

Dave

vincevincevince

8:41 am on Jul 27, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RobbieD:

The only thing is that you will have to build the pages 3,4 or 5 times a day to keep things current.

Do look in to what a bit of php / mod_rewrite can do - i don't know what forum script you're using, but it's often possible to dynamically rewrite the forum pages to output using / / / instead of? & &

danielm

3:43 am on Jul 30, 2003 (gmt 0)

10+ Year Member



Just an update. I originally made the change to the vBulletin forums on Friday night, where I disallowed all pages (except the archive) to robots. Googlebot found the first page the following evening (noticed it in the index on Monday with a fresh date of Saturday). Today (Tuesday), it's chewing up the archive like there's no tomorrow.

Everything seems to be going just peachy.

GoogleGuy

4:50 am on Jul 30, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Glad it's going well for you, danielm..

Dave_Hawley

5:08 am on Jul 30, 2003 (gmt 0)



Can anyone help with this? A few days ago I searched for some 'extact text' taken from our forum and it matched and found the page this text was on. Now, when i do this I keep getting "Google could not find a match for......"

Why would this be? More importantly, what should I do to ensure my forum pages are picked up by Google?

Dave