| 5:44 am on Nov 11, 2004 (gmt 0)|
This is my robots.txt for my phpbb:
Do you have session id's disabled for bots?
| 6:51 am on Nov 11, 2004 (gmt 0)|
Hi - No I dont believe I do. How do you go about that?
Also - arent you disallowing files in the root folder if you dont include /phpbb/whatever.php in front like that?
| 3:28 pm on Nov 11, 2004 (gmt 0)|
This is what I have in my root folder. 'supportbbs' is the PHPBB2 folder.
The last 5 Lines are what I started to add in reference to my original post. Would it be better to use the wildcard as an extension to block more URL's?
IE: Disallow: /supportbbs/faq.*?
| 12:22 am on Nov 13, 2004 (gmt 0)|
The Standard does not support "wildcards." As specified, robots.txt uses prefix-matching, so
is equivalent to your
But since it is a prefix match, you can't disallow all files of a specific type, such as
that is invalid for most search engines.
However, just to make matters more complicated, Google has defined some extensions to robots.txt to allow you to disallow by filetype and more -- See their Webmaster Help section. You can use their special extensions within a robots.txt record specifically addressed to Googlebot, but you'll need to find another solution for all the other robots that visit your site.
For example, this would stop Googlebot from indexing Excel files:
| 12:40 am on Nov 13, 2004 (gmt 0)|
Thank you. (I was under the impression it would think these are folders)
The Goal is to stop the bots from needlessly indexing certain Webpages on the bulletin board that do not really contain content deemed useful in a web search.
So I am going to try adding this:
and hopefully any page in the supportbbs folder 'prefix matching' - as you say, the listed phrases will not be indexed. It seems to be a good start, thanks again.
| 4:54 am on Nov 16, 2004 (gmt 0)|
I am not sure if this is of use to you but I achieve the results you are after by using mod_rewrite to remove my dynamic looking pages.
I then use a wildcard comment to prevent Google from indexing any page with a question mark in it like so:
Although most validators will moan at you for this use of code the source is Google's own Webmaster Tips:
from Google: [google.com...]
12. How do I tell Googlebot not to crawl dynamically generated pages on my
The following robots.txt file will achieve this.