WMT shows growing number of URLs restricted by robots.txt

I have a problem where Google reports big number of URLs restricted by robots.txt which is growing even though we have removed parameters from URL which would generate such number of URLs.

A bit of background:

The site has number of products listed over number of pages with 10 products per page. Product listing can be sorted by four criteria, lets say A, B, C and D, also ascending and descending for each criteria. Pagination used to be in Javascript and pages past page 1 were never crawled by Google.

We have asked for Javascript to be removed and put NOINDEX, FOLLOW to all pages with sort parameters in URL.

We have then noticed that "URLs restricted by robots.txt" in WMT started to grow, most of them being URLs with search parameters. We have calculated that there could be anything up to 300.000 of such URLs.

We have then asked for the sort parameters not to be passed back to server as a part of URL and that 301 is done for every requested page that has sort parameter in URL to the same base URL but without the sort parameter. We have also excluded URLs with search parameter via robots.txt

So now the sort parameters are being posted back to server using javascript. All that javascript does is using Form.submit with reference to sort button A, B, C or D, depending which one was clicked on. The server side sorts out what should be shown and whether the order is ascending or descending as it knows what it showed before (keeps sessions). The site is IIS (not sure if this matters).

However, despite now being a week after the change, the number of URLs restricted by robots.txt reported in WMT is still growing.

I would have expected for the number to slowly start to drop as Google should not be able to find URLs with sort parameters as a part of URL as there is no reference to such URL on the page.

I have also tested redirect using Fiddler and it all seems fine, e.g.

www.example.com/Product.aspx?page=1&Sort=ABCD
redirects to
www.example.com/Product.aspx?page=1

and there is no reference to URL with &Sort=ABCD anywhere on the page.

How can Google find pages with &Sort=ABCD when these URLs do not exist any more on the site and the only way is to actually type them in to address bar? Or has Google maybe had all these URLs somewhere in its index and now that it tries to check them up, it hits robots.txt and reports this? Theoretically, someone could have linked to some of such URLs, but not to 10.000 of them!

Or do we just remove sort parameter from robots.txt and let 301 do the job over the time, when Google realises that there are no more such URLs?

Or is it just a waiting game and eventually they will start to drop?
Using URL removal tool would be nightmare with such high number of URLs.

Any advice?

Many thanks

WMT shows growing number of URLs restricted by robots.txt

aakk9999

tedster

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week