Page is a not externally linkable
- Search Engines
-- Sitemaps, Meta Data, and robots.txt
---- Googlebot possibly ignoring robots.txt


dstiles - 7:55 pm on May 3, 2012 (gmt 0)


Oh, that page on that domain is password protected. Do we have any reference to the password in our gmail, GTB, android, googlebot or web preview scrapes? :)

> It's almost impossible to keep a web server secret by not publishing links to it.

And how is the URL discovered? Not usually legitimately (or at least, with legitimate aims), that's for sure. If I do not notify anyone of a web domain or subdomain it can only be found by scraping DNS. After that it's usually a case of an automatic home page name or trying the usual index/default with a choice of extensions such as html, asp, php etc.

Remember that .com/.org/.net domains are known by G as soon as they are registered. This does not apply to many TLDs registered in countries outside the US (eg UK - as far as I know).

Too much laxness in the US registry; too much power given to G; too much nosiness by G.


Thread source:: http://www.webmasterworld.com/robots_txt/4446154.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com