Welcome to WebmasterWorld Guest from

Forum Moderators: goodroi

Message Too Old, No Replies

Disallow: /

How did googlebot index my web server dir structure?

9:51 am on Apr 21, 2007 (gmt 0)

New User

5+ Year Member

joined:Mar 15, 2007
votes: 0

I'm developing a website. It is on a publicly accessible URL but not linked to or indexed.

I have a robots.txt with the following content:

User-Agent: *
Disallow: /

I added the site to webmaster tools under my google account because I wanted to submit the XML sitemap (which I am generating dynamically through my CMS) to see to test it via webmaster tools.

So while I am in my dashboard, I run a diagnostic on my robots.txt file. googlebot has found it and here is how it has interpreted it:

User-agent: *
Disallow: /dir1/
Disallow: /dir2/
Disallow: /dir3/

Where dirX above is the name of a directory on the web server.

Unfortunately, I don't have access to the access log for this client's hosting account, so can't see exactly what googlebot has been up to. My question though is this, how did googlebot discover the directory structure on the web server? I am quite certain that not all of those directories can be discovered by a simple deep crawl of the site.

Anyway, this has freaked me out a little. Does someone have an explanation?

Also, I decided to change my robots.txt file to:

User-Agent: *
Disallow: *

Is there any practical difference achieved by doing so?

Thanks for your consideration. This is my first post to webmasterworld! :)


1:14 pm on Apr 25, 2007 (gmt 0)

Administrator from US 

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 21, 2004
votes: 67

Welcome to WebmasterWorld thinkweb!

Google can find information about your site from many different sources. The Google toolbar can help Google discover pages. Other sites linking to you, can reveal urls to Google. You can also submit a sitemap to Google. The simplest and most common way is for Google to just crawl your site.

Has your robots.txt always been live? Maybe Google crawled your site before you uploaded a robots.txt? Do you have a line in your robots.txt for Google?


5:24 am on Apr 26, 2007 (gmt 0)

New User

5+ Year Member

joined:Mar 15, 2007
votes: 0

thanks for the reply. Yes, I've added the site to google webamaster tools to test the sitemap. So google knows about it from my toolbar activity and webmaster tools. Anyway, site is not in the indexed results yet so now I'm just harassing the client to hurry up and finish their content so I can flick the switch to "live".