Forum Moderators: goodroi
[url snip]
[edited by: goodroi at 1:38 am (utc) on Nov. 26, 2006]
User-agent: *
Disallow:
You might want to dig around on your server and make sure that you really want to allow all directories to be spidered. Many sites contain cgi-bin or stats directories that you should not allow to be spidered.
Ref: A Standard for Robot Exclusion [robotstxt.org]
Jim
welcome to webmasterworld :) in general people like to use disallow in their robots.txt to block the bots from getting into admin areas, development and testing pages and pages with heavy duplicate content.
when you say you want all of your pages indexed, i think you mean to say you want all of your content pages indexed (feel free to correct me :)) personally i would not want the engines indexing the admin parts of my forums. to get the engines to index the content pages get links pointing to these deep pages and not only to your home page. by default the engines will index as much as possible so just get those links pointing to your good stuff.
as for a specific example of disallowing stuff, you may want to double check your regular pages vs printer friendly pages. that was a problem with other forum programs. it might also be helpful if you poke around other sites running invision and see what they put into their robots.txt
good luck and happy indexing
User-agent: *
Disallow: /advertise/
Disallow: /forum/index.php?act=idx
Disallow: /forum/index.php?act=Login
Disallow: /forum/index.php?act=Search
Disallow: /forum/index.php?act=Shoutbox
Disallow: /forum/index.php?act=Reg
Disallow: /forum/index.php?act=Msg
Disallow: /forum/index.php?act=Mail
Disallow: /forum/index.php?act=Forward
Disallow: /forum/index.php?act=Track
Disallow: /forum/index.php?act=Post
Disallow: /forum/index.php?act=Print
Disallow: /forum/index.php?act=ST
Disallow: /forum/index.php?act=boardrules
Disallow: /forum/index.php?act=Help
Disallow: /forum/index.php?act=Stats
Disallow: /forum/index.php?act=Members
Disallow: /forum/index.php?act=Online
Disallow: /forum/index.php?act=calendar
Disallow: /forum/index.php?act=SR
Disallow: /forum/index.php?act=ICQ
Disallow: /forum/index.php?act=MSN
Disallow: /forum/index.php?act=AOL
Disallow: /forum/index.php?act=AIM
Disallow: /forum/index.php?act=SC
Disallow: /forum/index.php?act=task
Disallow: /forum/index.php?act=findpost
Disallow: /forum/index.php?act=UserCP
Disallow: /forum/index.php?&act=
Disallow: /forum/index.php?act=report
Disallow: /forum/index.php?act=buddy
Disallow: /forum/index.php?act=legends
Disallow: /forum/index.php?CODE=
Disallow: /forum/index.php?automodule
Disallow: /forum/index.php?act=attach
Disallow: /forum/index.php?&&CODE=
Disallow: /forum/index.php?&debug=1
Disallow: /forum/index.php?act=Profile
Disallow: /forum/index.php?showuser
Disallow: /forum/index.php?s=
Disallow: /*&mode=linear$
Disallow: /*&mode=threaded$
Disallow: /*&mode=linearplus$
Disallow: /*&p=
Disallow: /*&pid=
[edited by: Asia_Expat at 8:42 pm (utc) on Nov. 26, 2006]
[edited by: Asia_Expat at 8:46 pm (utc) on Nov. 26, 2006]
I'd like some additional info on your Invision Forum as well. I just added my /forum/ to robots.txt to disallow all robots, as I feel the 2 years of my forum issuing sessions and a problem with the config file have pulled my entire site down.
I'd like to eventually have the forum indexed, but not at the cost of the rest of the site. I've been using the IPB Portal as the destination URL to the forum from internal links in my site, and the Portal only issues session IDs, so I will eventually have to dump that I guess since to go anywhere from that page you have a session in the URL.
.mydomain.com
(add your own domain of course, and don't forget the dot at the beginning)
... and the session ID problems simply vanished into thin air. Also, if you look closely at the robots file above, IPB seesion ID's are disallowed, so eventually, any session ID's you have indexed should be dropped from the index.
If you have never attempted to manage the indexing of your forum, I would think it's most certainly causing you serious issues. All I can suggest is that you implement the above robots file and see what happens. It also included wildcard exclusions to take care of 'Printer Friendly' versions also, and also all the post 'Snapback' URL's.... it's a very comprehensive robots for IPB and takes care of just about everything, I think. (NOTE: I still have to add the 'Getlastpost' wildcard... chack back later).
I think you should let the bots back into your forum straight away and let them see this robots file because I noticed good results very quickly.
Wait for me to add the 'Getlastpost' exclusion though.
[edited by: Asia_Expat at 10:56 pm (utc) on Nov. 26, 2006]
If anyone can see any issues with my robots file, I'd be really grateful if you could let me know, preferably by PM...
User-agent: *
Disallow: /forum/index.php?act=idx
Disallow: /forum/index.php?act=Login
Disallow: /forum/index.php?act=Search
Disallow: /forum/index.php?act=Shoutbox
Disallow: /forum/index.php?act=Reg
Disallow: /forum/index.php?act=Msg
Disallow: /forum/index.php?act=Mail
Disallow: /forum/index.php?act=Forward
Disallow: /forum/index.php?act=Track
Disallow: /forum/index.php?act=Post
Disallow: /forum/index.php?act=Print
Disallow: /forum/index.php?act=ST
Disallow: /forum/index.php?act=boardrules
Disallow: /forum/index.php?act=Help
Disallow: /forum/index.php?act=Stats
Disallow: /forum/index.php?act=Members
Disallow: /forum/index.php?act=Online
Disallow: /forum/index.php?act=calendar
Disallow: /forum/index.php?act=SR
Disallow: /forum/index.php?act=ICQ
Disallow: /forum/index.php?act=MSN
Disallow: /forum/index.php?act=AOL
Disallow: /forum/index.php?act=AIM
Disallow: /forum/index.php?act=SC
Disallow: /forum/index.php?act=task
Disallow: /forum/index.php?act=findpost
Disallow: /forum/index.php?act=UserCP
Disallow: /forum/index.php?&act=
Disallow: /forum/index.php?act=report
Disallow: /forum/index.php?act=buddy
Disallow: /forum/index.php?act=legends
Disallow: /forum/index.php?CODE=
Disallow: /forum/index.php?automodule
Disallow: /forum/index.php?act=attach
Disallow: /forum/index.php?&&CODE=
Disallow: /forum/index.php?&debug=1
Disallow: /forum/index.php?act=Profile
Disallow: /forum/index.php?showuser
Disallow: /forum/index.php?s=
Disallow: /*&view=getnewpost$
Disallow: /*&view=getlastpost$
Disallow: /*&mode=linear$
Disallow: /*&mode=threaded$
Disallow: /*&mode=linearplus$
Disallow: /*&p=
Disallow: /*&pid=
So far, not a problem for any of the SEs except for Google.