Welcome to WebmasterWorld Guest from 220.127.116.11
Forum Moderators: goodroi
Do you have session id's disabled for bots?
The last 5 Lines are what I started to add in reference to my original post. Would it be better to use the wildcard as an extension to block more URL's?
IE: Disallow: /supportbbs/faq.*?
But since it is a prefix match, you can't disallow all files of a specific type, such as
that is invalid for most search engines.
However, just to make matters more complicated, Google has defined some extensions to robots.txt to allow you to disallow by filetype and more -- See their Webmaster Help section. You can use their special extensions within a robots.txt record specifically addressed to Googlebot, but you'll need to find another solution for all the other robots that visit your site.
For example, this would stop Googlebot from indexing Excel files:
The Goal is to stop the bots from needlessly indexing certain Webpages on the bulletin board that do not really contain content deemed useful in a web search.
So I am going to try adding this:
and hopefully any page in the supportbbs folder 'prefix matching' - as you say, the listed phrases will not be indexed. It seems to be a good start, thanks again.
I am not sure if this is of use to you but I achieve the results you are after by using mod_rewrite to remove my dynamic looking pages.
I then use a wildcard comment to prevent Google from indexing any page with a question mark in it like so:
Although most validators will moan at you for this use of code the source is Google's own Webmaster Tips:
from Google: [google.com...]
12. How do I tell Googlebot not to crawl dynamically generated pages on my
The following robots.txt file will achieve this.