Welcome to WebmasterWorld Guest from 107.20.5.156

Forum Moderators: goodroi

Message Too Old, No Replies

Disallow: /

How did googlebot index my web server dir structure?

     

thinkweb

9:51 am on Apr 21, 2007 (gmt 0)

5+ Year Member



I'm developing a website. It is on a publicly accessible URL but not linked to or indexed.

I have a robots.txt with the following content:

User-Agent: *
Disallow: /

I added the site to webmaster tools under my google account because I wanted to submit the XML sitemap (which I am generating dynamically through my CMS) to see to test it via webmaster tools.

So while I am in my dashboard, I run a diagnostic on my robots.txt file. googlebot has found it and here is how it has interpreted it:

User-agent: *
Disallow: /dir1/
Disallow: /dir2/
Disallow: /dir3/

Where dirX above is the name of a directory on the web server.

Unfortunately, I don't have access to the access log for this client's hosting account, so can't see exactly what googlebot has been up to. My question though is this, how did googlebot discover the directory structure on the web server? I am quite certain that not all of those directories can be discovered by a simple deep crawl of the site.

Anyway, this has freaked me out a little. Does someone have an explanation?

Also, I decided to change my robots.txt file to:

User-Agent: *
Disallow: *

Is there any practical difference achieved by doing so?

Thanks for your consideration. This is my first post to webmasterworld! :)

Mark

goodroi

1:14 pm on Apr 25, 2007 (gmt 0)

WebmasterWorld Administrator goodroi is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Welcome to WebmasterWorld thinkweb!

Google can find information about your site from many different sources. The Google toolbar can help Google discover pages. Other sites linking to you, can reveal urls to Google. You can also submit a sitemap to Google. The simplest and most common way is for Google to just crawl your site.

Has your robots.txt always been live? Maybe Google crawled your site before you uploaded a robots.txt? Do you have a line in your robots.txt for Google?

cheers,
greg

thinkweb

5:24 am on Apr 26, 2007 (gmt 0)

5+ Year Member



thanks for the reply. Yes, I've added the site to google webamaster tools to test the sitemap. So google knows about it from my toolbar activity and webmaster tools. Anyway, site is not in the indexed results yet so now I'm just harassing the client to hurry up and finish their content so I can flick the switch to "live".
 

Featured Threads

Hot Threads This Week

Hot Threads This Month