Welcome to WebmasterWorld Guest from 34.238.189.171

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Huge decrease in number of forum pages indexed

from 350.000 to 150.000 in two weeks

     
10:25 am on Feb 9, 2006 (gmt 0)

New User

10+ Year Member

joined:Dec 14, 2005
posts:21
votes: 0


We own a two year old forum. Since two weeks ago, when you search site:ww.ourwidgetforum.com results decrease every day. Two weeks ago we had 350.000, now 150.000. Traffic is now 40% of what we had 2 weeks ago.

Have you noticed similar experiences with you site or forum? (by the way we use vbulletin).

7:44 pm on Feb 9, 2006 (gmt 0)

Senior Member from KZ 

WebmasterWorld Senior Member lammert is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 10, 2005
posts:2952
votes: 35


One of the things I have seen with the introduction of the first Bigdaddy datacenter, is that Google reported far more search results on that datacenter than on the others. Then some weeks later the number decreased, although it is still larger than on the previous datacenters. My idea is that Google first put all single URLs they know seperately in the Bigdaddy datacenters and then run an algorithm which compares the different URLs and combines them where necessary--canonicalization as Google calls it--which decreases the number of unique URLs in the index.
8:23 pm on Feb 9, 2006 (gmt 0)

Administrator

WebmasterWorld Administrator rogerd is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Aug 2, 2000
posts:9687
votes: 1


vBulletin can produce plenty of duplicate content with links using different query strings to access the same page, etc. In the best case, maybe some of your loss were dupes.

Since your traffic sank, though, that's probably not the only factor at work. Can you tell anything about what kind of pages were dropped?

9:23 pm on Feb 9, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 20, 2002
posts:889
votes: 0


On the non-BigDaddy datacentres there's been a big change for my site. My claimed number of pages has dropped from a few thousand to 560. But all of thse pages are fully indexed. Last week there may have been thousands but they were supplementals from page 35 onwards. It feels to me like there's been a big clear out of supplementals
11:05 am on Feb 10, 2006 (gmt 0)

New User

10+ Year Member

joined:Dec 14, 2005
posts:21
votes: 0


"Since your traffic sank, though, that's probably not the only factor at work. Can you tell anything about what kind of pages were dropped?"

I'm sure it's the main factor. I agree, vbulletin creates a lot of duplicate urls, because we have 90.000 post, and 8000 threads and that generated (two weeks ago)350.000 pages in google!.

My guess it's that new big daddy algo is filtering duplicate content, as a collateral effect, unfortunately, our traffic is 50% of what we has two weeks ago

1:47 pm on Feb 10, 2006 (gmt 0)

New User

5+ Year Member

joined:Sept 7, 2010
posts:7
votes: 0


Same thing happen on my vbulletin forum. Traffic decreased by 35% and indexed pages dropped by 80%. The decline started on January 24th.
3:24 pm on Feb 10, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Jan 10, 2005
posts:124
votes: 0


my page count has gone from 420,000 in normal DC's to 820 in Big daddy!
3:35 pm on Feb 10, 2006 (gmt 0)

New User

10+ Year Member

joined:May 6, 2005
posts:3
votes: 0


My forums have dropped from 1,040,000 pages in normal google DC's, to only 276 pages in Big Daddy. I am seeing a dramatic decrease in traffic.
4:17 pm on Feb 10, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The "&goto=nextoldest" and "&goto=nextnewest" links in vbulletin are your biggest nightmare.

Not only do they present duplicate content, but as each thread is bumped the same URL does not point to the same thread any more.

Think about it.

That is a major design flaw with vbulletin, one that they seem in no hurry to fix. There are thousands of poorly indexed forums out there running that software (and other software with similar major flaws).

The correct response for those links should be to next issue a 301 redirect to the correct thread number. That would mean that the URLs with the &goto parameter would never be indexed, so it would not matter that they pointed to ever changing content. Their presence would still allow easy site indexing without confusing the bots.

4:21 pm on Feb 10, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Search engines also try accessing thousands of non-content pages within forums.

Ideally you want just the thread or topic lists and the individual threads indexed, and nothing else (maybe member profile pages, but that invites "forum profile link drop spammers" even more).

You can also help things along by using the robots.txt file in a sensible manner:

.

Here's one I made earlier:

# Vbulletin Robots File.
User-agent: *
Disallow: /forum/newthread.php
Disallow: /forum/newreply.php
Disallow: /forum/showpost.php
Disallow: /forum/printthread.php
Disallow: /forum/member.php
Disallow: /forum/memberlist.php
Disallow: /forum/calendar.php
Disallow: /forum/sendmessage.php
Disallow: /forum/subscription.php
Disallow: /forum/search.php
Disallow: /forum/report.php
Disallow: /forum/misc.php
Disallow: /forum/register.php
Disallow: /forum/login.php
Disallow: /forum/forumdisplay.php?page
Disallow: /forum/forumdisplay.php?sort
Disallow: /forum/forumdisplay.php?order
Disallow: /forum/forumdisplay.php?pp
Disallow: /forum/forumdisplay.php?daysprune
Disallow: /forum/forumdisplay.php?do
# Allow: /forum/forumdisplay.php?f=
# Allow: /forum/forumdisplay.php
Disallow: /forum/showthread.php?mode
Disallow: /forum/showthread.php?goto
Disallow: /forum/showthread.php?post
Disallow: /forum/showthread.php?page
Disallow: /forum/showthread.php?pp
Disallow: /forum/showthread.php?p
# Allow: /forum/showthread.php?t=
# Allow: /forum/showthread.php
Disallow: /forum/profile.php?do
# Allow: /forum/profile.php
# Allow: /forum/showprofile.php
# Allow: /forum/announcement.php
# Allow: /forum/faq.php
# Allow: /forum/index.php

There are many flaws within vbulletin itself, that can only be fixed by the authors. There are certain pages that should have <meta name="robots" content="noindex"> by default, and the design should take more care to present one canonical URL for each piece of content.

4:37 pm on Feb 10, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 17, 2002
posts:1189
votes: 6


I had a similar problem with phpbb. The board was heavily modded to be SE friendly but I think the search engines keep changing the rules.

You can never stop learning at this game. Here's three tips I learnt this week.

1. Change the topic header to H1 tags.
phpbb just displays the topic header in a td but changing it to H1 should help a lot

2. Place rel="nofollow" on ALL href's in all templates.
This will stop the SE's wandering about.

3. If you can, serve a different template based on the UA.
If you use something like the eXtreme Styles mod you can set it to serve a different template to bots. Regular visitors read the properly formatted page with links here and there and SE's will read a plain unformatted page rich in content, low on links and with zero graphics.

I don't think this is a penalty situation as the content is exactly the same for SE and viewer.

2:18 pm on Feb 13, 2006 (gmt 0)

New User

10+ Year Member

joined:Dec 14, 2005
posts:21
votes: 0


Just an update, Google is behaving like a roller coaster, site:mywidgetforum 135.000 results in the morning, 210.000 in the afternoon ( I remind you, I had 350.000 2 weeks ago)
3:12 pm on Feb 13, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Google has 80 datacentres. You have vastly different numbers of pages indexed, depending on which datacentre you look at.

Make a note of the IP number by running your mouse over the "cache" link.

12:23 am on Mar 31, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Vbulletin by default allows every post and thread to have at least 10 different URLs that can access it. For example, a post on a vbulletin forum could be expressed as:

/forum/showthread.php?t=54321
/forum/showthread.php?t=54321&p=22446688
/forum/showthread.php?t=54321&page=2
/forum/showthread.php?mode=hybrid&t=54321
/forum/showthread.php?p=22446688&mode=linear#post22446688
/forum/showthread.php?p=22446688&mode=threaded#post224466 88
/forum/showthread.php?t=34567&goto=nextnewest
/forum/showthread.php?t=87654&goto=nextoldest
/forum/showthread.php?goto=lastpost&t=54321
/forum/showpost.php?p=22446688
/forum/showpost.php?p=22446688&postcount=45
/forum/printthread.php?t=54321

and that is without introducing URLs that include the page parameter, for threads that are more than one page long, and the pp parameter for changing the default number of posts per page; either or both of which can be added to most of the URLs above too.

It is important to keep as many of those out as possible other than the basic /forum/showthread.php?t=54321 version.

.

Another big problem is the "next" and "previous" links that cause massive duplicate content issues because they allow a thread like
/forum/showthread.php?t=54321 to be indexed as
/forum/showthread.php?t=34567&goto=nextnewest and as
/forum/showthread.php?t=87654&goto=nextoldest too.

Additionally if any of the three threads is bumped, the "next" and "previous" links that are indexed no longer point to the same thread, because they contain the thread number of the thread that they were ON (along with the goto parameter), not the real thread number of the thread that they actually pointed to.

This is a major programming error by the people that designed the forum software. The link should either contain the true thread number of the thread that it points to, or else clicking the "next" and "previous" links should go via a 301 redirect to a URL that includes the real true canonical thread number of the target thread.

[edited by: tedster at 1:57 am (utc) on Mar. 31, 2006]
[edit reason] splice from another spot [/edit]

2:08 am on Mar 31, 2006 (gmt 0)

Preferred Member

10+ Year Member Top Contributors Of The Month

joined:June 19, 2005
posts:369
votes: 18


g1smd you have introduced a very major problem that I am expierencing. google for two of my large forums google has freaked out on them and indexed only the "print thread" version
3:36 am on Mar 31, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:July 25, 2003
posts:608
votes: 0


I had both the print page and the actual news article showing. I used the rel="nofollow" in the print page link and it worked well.
4:21 am on Mar 31, 2006 (gmt 0)

Preferred Member

10+ Year Member

joined:May 27, 2005
posts:614
votes: 0


I used some free software to make a google sitemap and since then have recorded many more visitors and greater googlebot visits than before.

Maybe a google sitemap may help get indexed.

2:33 pm on Apr 2, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


Put the <meta name="robots" content="noindex"> tag on the print page itself.
1:27 am on Apr 3, 2006 (gmt 0)

Junior Member

10+ Year Member

joined:Mar 3, 2003
posts:170
votes: 0


I have implemented some of these changes to a couple of my sites.

Now the BIG question is how long before the changes take effect.

3:25 pm on Apr 6, 2006 (gmt 0)

Senior Member

WebmasterWorld Senior Member g1smd is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:July 3, 2002
posts:18903
votes: 0


The proper content pages will be better indexed within about a month.

Pages to be delisted will take several months to fade out of the index.

Anything already listed as a Supplemental Result will hang around with an old cache for a few years.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members