celenoid

msg:45862 | 11:25 pm on May 24, 2004 (gmt 0) |
*cough* Has anyone else been experiencing this problem?
|
jdMorgan

msg:45863 | 11:55 pm on May 24, 2004 (gmt 0) |
Yes, many. Have you tried a site search [google.com] on the subject? Jim
|
celenoid

msg:45864 | 1:05 am on May 25, 2004 (gmt 0) |
Thanks Jim. It's comforting to realise that I'm not alone in this... Not so comforting to realise that there's no evident solution to this problem..! On some of my sites roughly half the indexed URL's are (and have always been) explicitly excluded by robots.txt. I notice they have even been given page rank. sigh.
|
jdMorgan

msg:45865 | 1:09 am on May 25, 2004 (gmt 0) |
Read a few threads from that search... There is a solution. Jim [edit] OK, found it here [webmasterworld.com]. [/edit]
|
TheDave

msg:45866 | 1:14 am on May 25, 2004 (gmt 0) |
The solution is to let the bot crawl the pages, but include a meta noindex tag in the head. <meta name="ROBOTS" content="NOINDEX">
|
celenoid

msg:45867 | 1:19 am on May 25, 2004 (gmt 0) |
Thankyou both :) This problem is doing my brain in, but the solution makes sense -- in a messy kind of way... If the only way to get URLs out of the index is to ALLOW them to be crawled, does robots.txt have any real use other than to limit the bandwith that crawlers consume?
|
Robert Thivierge

msg:45868 | 2:03 am on May 25, 2004 (gmt 0) |
I had a similiar problem (url-only listing of disallowed dynamically generated pages). In my case, using the "Google Automated Removal" feature at [services.google.com:8882...] worked fine, and removed the "url-only listings" within a day or two (based on robots.txt disallow).
|
celenoid

msg:45869 | 3:20 am on May 25, 2004 (gmt 0) |
Thanks for the tip Robert. I'm testing both techniques using different sites... will see what I come up with! :)
|
celenoid

msg:45870 | 1:02 am on May 26, 2004 (gmt 0) |
Just 1 day later... Removal Technique 1 (site A) [services.google.com:8882...] Result: robots.txt still disallows URLs and ALL disallowed urls OUT of index. Removal Technique 2 (site B) Result: disallowed URLs now crawlable by G (with noindex tags) and ALL disallowed urls are still IN the index. I'm off to use the removal tool again.......!
|
jdMorgan

msg:45871 | 1:15 am on May 26, 2004 (gmt 0) |
Use both. Jim
|
|