do you have a webmaster tools account?
there is a tool to test if a url is allowed by your robots.txt
check webmaster tool to know whether u have placed robot.txt file correctly or not
Yes I have webmaster tools and it shows lists of pages that are disallowed. However many of those disallowed pages are showing amongst the 25,000 in the site:domain/forum check
Would any of you guys be willing to cast an eye over my robots.txt just to put my mind at rest that all is well and I havent blocked any of the site topics by mistake.
Are you able to explain the discrepancy between site:domain which shows far less than 1000 pages and site:domain/forum showing 25000 pages ?
Since you are talking about over 25,000 pages it will probably take googlebot some time to revisit (or I should say attempt to revisit) all of those pages to discover they are now blocked and to drop them from their system. If you want to speed up that process increase the amount of links pointing to those pages. Also please remember that Google's site: search is not always accurate. They have said that the total search results number is more of an estimate.
If you have used Google's robots.txt tool and it verified that your robots.txt is correctly allowing and disallowing what you want then you just need to wait. While you are waiting go work on your internal and external links that will help your search rankings and probably also increase the speed which google indexes your pages.
Thanks for your reply. I am still a little confused though. Ill try explain why.
I closely monitor my site stats, webmaster tools, analytics, and Hit tail. I regularly check to see how many pages I have listed in google. In the 4 years my site has been running, the maximum amount of pages Ive had listed is 3,500 which was made up of around 500 pages from the main domain and 3000 from the forum. In more recent times I had roughly 1,500 pages listed from the whole forum and they all featured very well for local low competition keyword searches.
The thing that has me alarmed at the moment is the figure of 25000 pages listed for the forum. Since I disallowed the wap 2 pages I expected them to be deindexed and replaced with normal topics. The figure of 25000 has really thrown me and that fact they are not being returned in searches has me even more concerned.
Also just to put my mind at rest would you possibly take a few minutes look at my robots.txt to check there is no glaring errors.
I documented a load of Duplicate Content issues with forums back in 2004 or 2005 or so. I think I might have also included some example robots.txt files to help counteract some of the problems.
you are probably looking at "URLs restricted by robots.txt" in the Web crawl Diagnostics tab.
have you also checked the "Analyze robots.txt" option in the Tools menu?
the pages at those urls are already indexed.
someone might actually be linking to those urls.
you have to decide what signal you really want to send the bot.
for example, you can tell the bot "go away" (Disallow) or "there's nothing here" (404) or "go somewhere else" (301/302) or "take this" (200).
there are other response options, but Disallow doesn't mean "Forget about it".
Thanks for that, most helpful. I hadnt realised there was an analyse tool.
It appears my pages are allowed.
Just one more question with regards to specific forum pages. Is it normal for forum pages to be viewed as a directory ?
The analysis tool says :
Detected as a directory; specific files may have different restrictions
[edited by: goodroi at 2:05 pm (utc) on Jan. 8, 2009]
[edit reason] examplified [/edit]
HMM.. valuable information