Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Looks like Forums are being de-indexed?

         

van_zant

7:53 pm on Mar 19, 2006 (gmt 0)

10+ Year Member



<snip>

Using this site, I noticed that my site which is mainly a form dropped from it's original 70 000 + results indexed to around 300+, with about 3 DCs with 80 000+

I did some snooping around with some other forums and the issue seems to be the same.

<snip>

3 DCs at 100 000

14 DCs at 9000

<snip>

4 DCs at 250 000

13 DCs at 30 000

<snip>

10: 500 000
7: 1 000 000+

Most of the ones with high results are from

www-gv
www-lm
www-kr

[edited by: lawman at 8:07 pm (utc) on Mar. 19, 2006]
[edit reason] No Tools, No Links Please [/edit]

europeforvisitors

6:51 pm on Mar 22, 2006 (gmt 0)



U have to differentiate

Do you have any specific university in mind, or are you speaking of universities in general?

phantombookman

6:51 pm on Mar 22, 2006 (gmt 0)

10+ Year Member



Google is clearly in a state at the moment
Lots of changes etc
Surely it is better to let them settle things down before starting to panic

I still do not see the point in the Big Daddy thread, unless you know that things have been 'finalised' what can be constructively construed?

g1smd

10:26 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hacking may be an issue. Hackers use Google to find forums of a particular type to hack, and then replace the content with junk.

But maybe spammers are more the issue. Spammers find forums of a certain type and then auto-bot-post thousands of links to c*sino, p*lls, and p*rn sites from junk posts. Many forum owners don't take time to clean that junk up.

Google might want to limit the damage that is caused to such forums by not showing them in the index so making them more difficult for spammers to find, or maybe they are dropping forums that already have multiple links to bad neighbourhoods from posting signatures, or, more likely from member profile pages.

Go look at almost any vbulletin or PHPbb or Invision forum. Find all the posters with zero to five posts. I'll pretty much guarantee that 90% or more of those are just utilising their forum profile page as a free link to some spammy website. Look deeper and you'll find that those same people have done the same trick on many thousands of forums.

So, it is possible that a great swathe of forum-land is now classed as a bad neighbourhood. If you are a forum Admin, go back and delete all members with zero to two posts that have been a member for more than 3 months and who have not logged in for more than 3 months. That will probably fix some 75% plus of the spam that your site is linking to.

g1smd

10:52 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Most of them have "duplicate content" issues: the same content at more than one URL within the site.

Many don't cater well for the "unique title and description per page" requirement.

A lot have code that does not validate and may be causing bots to falter.

.

With that type of software, duplicate content arises when a page of information can be reached in multiple ways. There is the obvious www and non-www to think about, but also many other ways caused by the way that the software has been written.

For example, a post on a vbulletin forum could be expressed as:

/forum/showthread.php?t=54321
/forum/showthread.php?t=54321&p=22446688
/forum/showthread.php?t=54321&page=2
/forum/showthread.php?mode=hybrid&t=54321
/forum/showthread.php?p=22446688&mode=linear#post22446688
/forum/showthread.php?p=22446688&mode=threaded#post22446688
/forum/showthread.php?t=34567&goto=nextnewest
/forum/showthread.php?t=87654&goto=nextoldest
/forum/showthread.php?goto=lastpost&t=54321
/forum/showpost.php?p=22446688
/forum/showpost.php?p=22446688&postcount=45
/forum/printthread.php?t=54321

and that is without introducing URLs that include the page parameter, for threads that are more than one page long, and the pp parameter for changing the default number of posts per page; either or both of which can be added to most of the URLs above too.

.

Oh, and did I mention about duplicate content caused by the usage of session IDs too?

Do not hand out a session ID until someone is acually logging in. Session IDs are one of the biggest causes of failure in indexing forums.

.

Another big problem is the "next" and "previous" links that cause massive duplicate content issues because they allow a thread like /forum/showthread.php?t=54321 to be indexed as /forum/showthread.php?t=34567&goto=nextnewest and as /forum/showthread.php?t=87654&goto=nextoldest too. Additionally if any of the three threads is bumped, the "next" and "previous" links that are indexed no longer point to the same thread, because they contain the thread number of the thread that they were ON (along with the goto parameter), not the real thread number of the thread that they actually pointed to.

This is a major programming error by the people that designed the forum software. The link should either contain the true thread number of the thread that it points to, or else clicking the "next" and "previous" links should go via a 301 redirect to a URL that includes the real true canonical thread number of the target thread.

g1smd

11:08 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Make sure that you get all the "newtopic" and "newreply" and "sendPM" and "login" and "printthread" and "editprofile" and "markforumread" type URLs completely out of the index.

They are presenting hundreds of millions of almost identical "you are not logged in" error messages, many thousands of each per site. Search engines do not ever need to see those pages.

Get a <meta name="robots" content="noindex"> tag on all those pages, use the rel="nofollow" attribute on all links that point to such pages, and/or disallow those pages in the robots.txt file. A combination of those things is usually the best way.

pageoneresults

11:10 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yikes! Can you say footprints on a large scale?

How long has this been like this g1smd?

g1smd

11:13 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Several years, but getting the people that write forum software to WAKE UP is beyond impossible.

[P1R: you have an interesting link coming your way by PM...]

Lorel

11:50 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Wfernley:


I have gone through and made them completely SE friendly. The only way you see variables or id's is if your a visitor. SE's see clean /forums/post2312.html files instead of /forums/post.php?id=2312&uid=n309jd03k...

Could these be considered cloaking? This could pull a site down.

g1smd

11:54 pm on Mar 22, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If all the other non-SE-friendly URLs serve a <meta name="robots" content="noindex"> tag on the page (easy to do with scripting) then this will never be an issue.

You could use robots.txt to disallow the "parameter-based" URLs but that isn't always as clean. They might still appear as URL-only entries in the search results.

maccas

12:08 am on Mar 23, 2006 (gmt 0)

10+ Year Member



"Could these be considered cloaking?" no it is showing different content to all users that are not logged in not just to robots. If it was just to robots i.e by ip or useragent then yes.

lammert

12:09 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I have gone through and made them completely SE friendly. The only way you see variables or id's is if your a visitor. SE's see clean /forums/post2312.html files instead of /forums/post.php?id=2312&uid=n309jd03k...

This is the best way to create your own duplicate content problem. How do you think people will link to your forum posts? With the link they see in the address bar, so all incomming links will be of the form /forums/post.php?... whereas Googlebot on its normal crawls sees the same pages with a clean .html name.

Also, don't forget the toolbar and Google's Mediabot. I have seen more than once that Googlebot tried to index a page which was definitly not visible from the outside. The only reference Google could have was because I viewed the page in my browser with the toolbar installed. Ofcourse, the toolbar will see the users-URL, not the search engine URL.

I also don't trust the Mediabot. Although it doesn't crawl pages to store them in the index, it would certainly be possible that the URL list created by this bot is now and then compared with the URL list the regular bot finds. Mediabot (just as the toolbar) sees the user URL, not your manually created search engine friendly URL.

What would Google do when Mediabot and the toolbar see consequently different URLs from a site than Googlebot? Probably drop that site from the index because of suspected cloaking or duplicate content.

g1smd

12:25 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If all the other non-SE-friendly URLs serve a <meta name="robots" content="noindex"> tag on the page (easy to do with scripting) then this will never be an issue.

You could use robots.txt to disallow the "parameter-based" URLs but that isn't always as clean. They might still appear as URL-only entries in the search results.

gcc_llc

12:29 am on Mar 23, 2006 (gmt 0)

10+ Year Member



@lammert

"This is the best way to create your own duplicate content problem. How do you think people will link to your forum posts? With the link they see in the address bar, so all incomming links will be of the form /forums/post.php?... whereas Googlebot on its normal crawls sees the same pages with a clean .html name."

Look at the address bar on this page. Then check how many pages are indexed here.

EDIT: Sorry. I thought you were implying the rewrites wouldn't work. I didn't see his "visitor" part. Why he's doing that I have no idea.

lammert

12:44 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Look at the address bar on this page. Then check how many pages are indexed here.

Totally different situation. WebmasterWorld serves SE friendly URLs to both users and search engines which is good and done by many sites. According to message #26, wfernley is feeding SE friendly URLs to the search engines, but SE unfriendly URLs to the visitor. Or maybe my interpretation of "The only way you see variables or id's is if your a visitor" is different than yours?

All incomming links which carry PR value will point to SE unfriendly URLs. Yet the bot only sees the files with the .html extension. Where does this incomming PR go?

[added]We posted at the same time :) I'll leave this post here for clarity to the other readers[/added]

gcc_llc

1:27 am on Mar 23, 2006 (gmt 0)

10+ Year Member



What timing ;)

kamikaze Optimizer

1:48 am on Mar 23, 2006 (gmt 0)

10+ Year Member



g1smd:

Go look at almost any vbulletin or PHPbb or Invision forum. Find all the posters with zero to five posts. I'll pretty much guarantee that 90% or more of those are just utilising their forum profile page as a free link to some spammy website...

Your guarantee does not hold any weight on my busy (500+ users on the site as I type this) forum with 25K registered members. Just not true. I get 1-2 spammy posts per week and they are easy to deal with.

...If you are a forum Admin, go back and delete all members with zero to two posts that have been a member for more than 3 months and who have not logged in for more than 3 months. That will probably fix some 75% plus of the spam that your site is linking to.

That is not the solution. You can do so many other things to a sig or post link:

1) Make it only visible to logged in members.
2) Program the no rel = tag into all outbound links.
3) Use a cgi outbound script (this is what I do) and robot deny access to the cgi folder, works well.

g1smd

1:53 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



>> Your guarantee does not hold any weight on my busy (500+ users on the site as I type this) forum with 25K registered members. Just not true. I get 1-2 spammy posts per week and they are easy to deal with. <<

I wasn't talking about spammy posts. I was talking about the thousands of Zero Post lurkers who are only there for the Link back to their own site from the Profile page. Many forums have them, because many forum Admins are not very tech savvy, and have never bothered to check exactly who is registered and what they have on their Profile page.

But yes, the rel="nofollow" tag, and so on, can help. Some forums have a policy of no links out until you have a reasonable (say, 50 or 100) number of posts. Others allow no links at all. However there are thousands of forums that are abused by spammers in multiple ways. It is wise to check things out, and dejunk at the earliest possible opportunity.

gcc_llc

2:10 am on Mar 23, 2006 (gmt 0)

10+ Year Member



So in other words those forum owners who don't SEO their site will get penalized. Same as those who don't put much effort into their webpage. Not my problem :)

Signatures, profiles, outbound links are easily taken care of on the fly along with every other part you don't want followed (register, usercp, etc...). The amoutn of programs and plugins to do this are numerous. Those owners that don't pay attention will suffer but that arguement works for every part of SEO. This really doens't apply here unless you've ignored SEO to begin with.

kamikaze Optimizer

2:10 am on Mar 23, 2006 (gmt 0)

10+ Year Member



Well, for my site, the cgi script takes care of that. Prior to the install of the cgi script (about a year ago) only logged in members (no bots) could access the profile.

I guess what I am trying to say is, this issue of forum spamming (posting, sigs, profiles...) was dealt with long ago by most all forum software companies and forum admins that have even the smallest clue of SEO, way before the bloggers ever found their solutions.

gcc_llc

2:14 am on Mar 23, 2006 (gmt 0)

10+ Year Member



Exactly.

The funny part is, the supp results I see now are from when I did ignore SEO. Lots of php?='s. Once I wised up and applied some of the most basic principles my index was about 10x that what it was before but now I get thrown back into supp hell for my previous mistakes. Its a bit frustrating after the amount of work I've put in over the last 6 months.

kamikaze Optimizer

2:23 am on Mar 23, 2006 (gmt 0)

10+ Year Member



The supp issue will fix itself in time, it might be six months more or even longer.

Unless the site started correctly from scratch, all forums that have SEO'ed have been through this period that you are in.

gcc_llc

3:22 am on Mar 23, 2006 (gmt 0)

10+ Year Member



That actually does make me feel better.

g1smd

11:11 am on Mar 23, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The comments made at [webmasterworld.com ] are also relevant to this discussion.
This 53 message thread spans 2 pages: 53