I seem to be having a few issues with google not removing old content from my site and ignoring robots.txt.
If you run
and run through some of the pages, you will notice that pages like:
Still show up even tho they are now old pages and return a 404 error.
They have now been gone for a long time (~3 months).
In google webmaster tools i have also manually removed them to try and help out - but this didnt make a difference.
Also if you click on "repeat the search with the omitted results included" at the end of the results and then go towards the end of the indexed pages you will notice pages like:
are being indexed even tho they are excluded in robots.txt!
The robots.txt file is being fetched ok and its a valid syntax:
# Don't bother crawling any online forms
Also another issue i have noticed is that it google says that its crawled 172,000 when you do a "site:example.com.au" but the results only go upto 270 pages (if you navigate manually to the last page) .. this is especially worrying.
Any help will be greatly appreciated ... its driving me nuts.
[edited by: pageoneresults at 2:52 am (utc) on May 19, 2008]
[edit reason] Examplified URI References - Please Refer to TOS [/edit]