| 11:25 pm on May 24, 2004 (gmt 0)|
*cough* Has anyone else been experiencing this problem?
| 11:55 pm on May 24, 2004 (gmt 0)|
Have you tried a site search [google.com] on the subject?
| 1:05 am on May 25, 2004 (gmt 0)|
It's comforting to realise that I'm not alone in this... Not so comforting to realise that there's no evident solution to this problem..!
On some of my sites roughly half the indexed URL's are (and have always been) explicitly excluded by robots.txt. I notice they have even been given page rank.
| 1:09 am on May 25, 2004 (gmt 0)|
Read a few threads from that search... There is a solution.
 OK, found it here [webmasterworld.com]. [/edit]
| 1:14 am on May 25, 2004 (gmt 0)|
The solution is to let the bot crawl the pages, but include a meta noindex tag in the head.
<meta name="ROBOTS" content="NOINDEX">
| 1:19 am on May 25, 2004 (gmt 0)|
Thankyou both :)
This problem is doing my brain in, but the solution makes sense -- in a messy kind of way...
If the only way to get URLs out of the index is to ALLOW them to be crawled, does robots.txt have any real use other than to limit the bandwith that crawlers consume?
| 2:03 am on May 25, 2004 (gmt 0)|
I had a similiar problem (url-only listing of disallowed dynamically generated pages). In my case, using the "Google Automated Removal" feature at
worked fine, and removed the "url-only listings" within a day or two (based on robots.txt disallow).
| 3:20 am on May 25, 2004 (gmt 0)|
Thanks for the tip Robert. I'm testing both techniques using different sites... will see what I come up with! :)
| 1:02 am on May 26, 2004 (gmt 0)|
Just 1 day later...
Removal Technique 1 (site A)
Result: robots.txt still disallows URLs and ALL disallowed urls OUT of index.
Removal Technique 2 (site B)
Result: disallowed URLs now crawlable by G (with noindex tags) and ALL disallowed urls are still IN the index.
I'm off to use the removal tool again.......!
| 1:15 am on May 26, 2004 (gmt 0)|