Have read where Engines like Google will stop indexing a site if it has too many dynamic pages in it. My site is using a bulletin board - PHPBB2, which uses the '?' in the dynamic pages it has. How do I go about telling a robot to only index the outer most pages , like just the topics, and replys, and not everything else? I already started declaring specific pages in a few lines of the current Robots.txt I am using, but If I have to list every single one, this could be quite tedious, and Im not even sure if it will work. As opposed to disallowing the entire folder the board is in , is there a simpler way? The support website for PHPBB2 said to come here for help - :/
The Standard does not support "wildcards." As specified, robots.txt uses prefix-matching, so Disallow: /faq is equivalent to your Disallow: /faq.*
But since it is a prefix match, you can't disallow all files of a specific type, such as Disallow: *.php that is invalid for most search engines.
However, just to make matters more complicated, Google has defined some extensions to robots.txt to allow you to disallow by filetype and more -- See their Webmaster Help section. You can use their special extensions within a robots.txt record specifically addressed to Googlebot, but you'll need to find another solution for all the other robots that visit your site.
For example, this would stop Googlebot from indexing Excel files: