Forum Moderators: goodroi
All pages of my site http://www.example.com have a standard "mailto:" link named "kontakt".
I noticed that if I ask Google for:
site:www.example.com
it returns "about 6140 results". But if I ask for
site:www.example.com kontakt
it reports only 2190 results.
Where's the rest?
Let's see what happens if I ask for
site:www.contaact.com -kontakt
- we obtain "Results 1 - 3 of about 4,380"
<snip>
Apart from 1 (one) RTF file and one NOT LINKED TO and forgotten file - the rest are pages that are PROHIBITED BY ROBOTS.TXT FILE!
The clue is here: the prohibited pages are at :2317 port, which has different robots.txt file: http://www.example.com:2317/robots.txt
User-Agent: *
Disallow: / I can prove that I did not modify the ":2317/robots.txt" file: the port 2317 is used by "GeneWeb 4.10" - a specialized Web server for drawing genealogical trees, which is a part of standard Debian Linux distribution. The robots.txt is produced by the server itself and it cannot be edited/modified.
So, if you have some content that should not appear in the Google results, it is not enough to create a correct robots.txt. You should probably think of some cloaking in pages which have links to the disallowed pages (by JavaScript or so).
Sad, isn't it?
[edited by: goodroi at 1:29 pm (utc) on Mar. 26, 2007]
[edit reason] Please no specific links [/edit]