Forum Moderators: phranque

Message Too Old, No Replies

Google knows about pages that are password protected

         

kflm

2:49 am on Oct 4, 2006 (gmt 0)

10+ Year Member



Ok, so recently looking at a site within Google Webmaster Tools (sitemaps), and going to the Web Crawl section I noticed that Google is alerting me to 80 URLs restricted by robots.txt (80).

Ok so here's what is setup, on this server I have a directory called /webreports/

This directory (and everything beneath it) is password protected by an .htaccess file (I'm on Apache/1.3.37) within /webreports/. So basically you can read anything without a password and username.

So, stupidly, in my robots.txt file, I had listed:
User-agent: *
Disallow: /webreports/

Ok, fine, not the best way to hide a directory. Anyway, within that webreports directory are webalizer stats for about 5 sites. Example is:
/webreports/site1/
/webreports/site2/
/webreports/site1/usage_200604.html

and so on and so on.

So the question is. If the /webreports/ directory is password protected.....how does Google list 80 URLS within that directory as being restricted by a robots.txt file?

These pages aren't linked from anywhere. They are just standard webalizer reports, but the 80 urls google is listing is like it was able to read the directory and crawl it.

Hope this makes sense.

-k

kflm

2:56 am on Oct 4, 2006 (gmt 0)

10+ Year Member



Nevermind....issue solved.

Somehow the sitemap file had those urls in it. That's how Google knew to try to crawl the pages.

-k