Forum Moderators: goodroi
User-agent: *
Disallow: /admin/
Disallow: /attach_mod/
Disallow: /db/
Disallow: /files/
Disallow: /images/
Disallow: /includes/
Disallow: /language/
Disallow: /templates/
Disallow: /common.php
Disallow: /config.php
Disallow: /glance_config.php
Disallow: /groupcp.php
Disallow: /login.php
Disallow: /memberlist.php
Disallow: /modcp.php
Disallow: /posting.php
Disallow: /printview.php
Disallow: /privmsg.php
Disallow: /ranks.php
Disallow: /search.php
Disallow: /statistics.php
Disallow: /tellafriend.php
Disallow: /viewonline.php Do you have session id's disabled for bots?
Disallow: /supportbbs/admin
Disallow: /supportbbs/cache
Disallow: /supportbbs/docs
Disallow: /supportbbs/db
Disallow: /supportbbs/images
Disallow: /supportbbs/includes
Disallow: /supportbbs/language
Disallow: /supportbbs/templates
Disallow: /supportbbs/memberlist.php
Disallow: /supportbbs/profile.php
Disallow: /supportbbs/search.php
Disallow: /supportbbs/groupcp.php
Disallow: /supportbbs/faq.php
The last 5 Lines are what I started to add in reference to my original post. Would it be better to use the wildcard as an extension to block more URL's?
IE: Disallow: /supportbbs/faq.*?
But since it is a prefix match, you can't disallow all files of a specific type, such as
Disallow: *.php
that is invalid for most search engines.
However, just to make matters more complicated, Google has defined some extensions to robots.txt to allow you to disallow by filetype and more -- See their Webmaster Help section. You can use their special extensions within a robots.txt record specifically addressed to Googlebot, but you'll need to find another solution for all the other robots that visit your site.
For example, this would stop Googlebot from indexing Excel files:
User-agent: Googlebot
Disallow: /*.xls$
Jim
The Goal is to stop the bots from needlessly indexing certain Webpages on the bulletin board that do not really contain content deemed useful in a web search.
So I am going to try adding this:
Disallow: /supportbbs/memberlist
Disallow: /supportbbs/profile
Disallow: /supportbbs/search
Disallow: /supportbbs/groupcp
Disallow: /supportbbs/faq
Disallow: /supportbbs/search
Disallow: /supportbbs/posting
Disallow: /supportbbs/privmsg
and hopefully any page in the supportbbs folder 'prefix matching' - as you say, the listed phrases will not be indexed. It seems to be a good start, thanks again.
I am not sure if this is of use to you but I achieve the results you are after by using mod_rewrite to remove my dynamic looking pages.
EXAMPLE:
green-widget.php?items=16
to
green-widget/16/
I then use a wildcard comment to prevent Google from indexing any page with a question mark in it like so:
User-agent: Googlebot
Disallow: /*?
Although most validators will moan at you for this use of code the source is Google's own Webmaster Tips:
from Google: [google.com...]
------------------------------------
12. How do I tell Googlebot not to crawl dynamically generated pages on my
site?
The following robots.txt file will achieve this.
User-agent: Googlebot
Disallow: /*?
------------------------------------