We are having trouble getting pages out of the Google index and I would appreciate some thoughts.
We have an item page that takes in parameters as follows:
item.php?param=A¶m=B¶m=C¶m=D ...
It was never our intention to get our item page indexed but unfortunately it did. With a whole bunch of parameters and combinations that got crawled from external sites, we currently have 149,000 pages in the index for item.php.
To remove the pages from the index, we have added the following to robots.txt:
/item.php
In Google Websmater Tools, we have also put in a request to remove:
/item.php
Google has processed the request, however, when we do:
site:domain.com/item.php
We are still seeing the 149K page. We have now added a no-index, no-follow tags to the pages themselves but they will need to be crawled again by Google to see that. And unfortunately, all those combinations of parameters may not happen exactly again.
We've also added:
<link rel="canonical" href="http://www.domain.com/item.php" /> to the page.
My questions are:
1. /item.php in robots.txt should have removed all item.php pages including ones with parameters from Google, correct?
2. Why are we seeing 149K results with: site:domain.com/item.php if Google has processed our request in Webmaster Tools and also our robots.txt change? Is there a lag between site: command and Google saying they processed our request?
3. Does anyone have any direct experience on site: still showing results, including cached pages but those results not really being in the Google index?
4. Anything else we can do to get those pages not appearing in site: ?
Thanks.