I have a fairly old and popular blog (started in 2007) which ran on Movable Type. We migrated to Wordpress during Christmas last year. Unfortunately I did not use Google Webmaster tools actively until we witnesses a huge drop in traffic in Nov 2012, by almost 60% which we have still not recovered from. One thing that surprises me on GWT is the URL's monitored -
ParameterURLs monitored
page 184,095
p 15,358
To put in perspective my blog has 14,000 posts, 10 categories and close to 1,000 tags. The number of url's monitored is fairly large and all of them are invalid links.
Movable Type paginates by adding a a variable ?page=page number so it would be mywebsite.com/index.php?page=2 however Wordpress paginates in this fashion - mywebsite.com/page/2/
What is happening now is Google is combining both the factors and crawling thousands of irrelevant urls like -
index.php?page=2
?page=2
watches/index.php?page=3
?page=3
index.php?page=4
?page=4
index.php?page=5
?page=5
index.php?page=6
?page=6
How can I stop Google from indexing any url with the variable ?page= and ?p= using robots.txt. I have configured WMT not to crawl any url's but with no effect. I want to do it using robots.txt now.
Since Google is treating them as individual pages the PR would be diluted correct?
According to GMT it has indexed 33,000 pages on my website.