homepage Welcome to WebmasterWorld Guest from 54.198.130.203
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Google / Google SEO News and Discussion
Forum Library, Charter, Moderators: Robert Charlton & aakk9999 & brotherhood of lan & goodroi

Google SEO News and Discussion Forum

This 45 message thread spans 2 pages: < < 45 ( 1 [2]     
Googlebot Indexing Anything it Finds
Google needs to stop!
atlrus




msg:758216
 1:01 pm on Jul 6, 2006 (gmt 0)

I did a site: search and what do I see? My FTP log is cached by Google...and there has never been a link to it, ever!

I have a few more pages like that cached by Google - image directory, articles directory - pages with no links to them, pages and files I did not know existed...

Obviously Google is just crawling everyhting on my website's directory - pages and files and all.
To me - this is a privacy invasion.

Anyone else see this with their sites?

 

Tapolyai




msg:3012595
 3:10 am on Jul 18, 2006 (gmt 0)

The web is public. Without restricting who can enter your site or what you post on your site, it is all available for anyone, human or robot, to download and process.

Hmmm... I would agree to all until
...to download and process.

My metaphor of a store is not so far away. You can walk in, any time, look around, examine things, even buy things, return things and so on. You can walk by my store front, and if it good enough, indeed you will walk in further...

But... You may NOT take anything from my store/reproduce it. Certain things I will grant you to look at, maybe even take home to try out, or use sample of - but you may not indiscriminately rummage through and take everything and reproduce.

I remember well the "new mindset", and the "you are old fashioned", and "get with the times" comments around the dotcom era. Yet, those old ethics still stand. Lo where are those dotcoms?

I have yet to find an Internet instance where an existing law would not be sufficient. But then what would politicians do?

[edited by: Tapolyai at 3:16 am (utc) on July 18, 2006]

eeek




msg:3012708
 5:44 am on Jul 18, 2006 (gmt 0)

My FTP log is cached by Google...and there has never been a link to it, ever!

Any chance you are running sitemap?

mcavic




msg:3012775
 7:10 am on Jul 18, 2006 (gmt 0)

But... You may NOT take anything from my store/reproduce it.

This analogy is not without merit. But the entire philosophy of the Web is that if it's accessible, then it's public.

Google's philosophy, and to a lesser extent, the philosophy of all search engines, is that if it's public, then it wants to be found.

Maybe changing those philosophies would lead to a better Web. Maybe it wouldn't. But who's to make that decision? Trust me, we do *not* want lawmakers to do it.

atlrus




msg:3012990
 1:06 pm on Jul 18, 2006 (gmt 0)

I dont run sitemap to this site.
It's weird how only this website is like that, the rest of them are a-ok.

I am all for free Internet and all, I just think it's down right stupid and wrong for Google to visit pages and files to which NO ONE links to, and at the same time half of my pages are suplemental :)

I have always believed that the robots.txt should have the command *Allow* rather than *Disallow* and this incident strengthens my belief.

whoisgregg




msg:3013051
 1:52 pm on Jul 18, 2006 (gmt 0)

But... You may NOT take anything from my store/reproduce it.

This analogy is not without merit. But the entire philosophy of the Web is that if it's accessible, then it's public.

It's the philosophy because it's the technological reality. You are incapable of "walking in and looking around" a web page without downloading an exact copy of the page to your computer.

It's where the metaphor fails. To force the metaphor back in line we'd have to say it's like letting people walk in the store and look around but erasing their memory of everything they saw and interacted with so they can't make any new ideas based on what was in your store.

whoisgregg




msg:3013055
 1:59 pm on Jul 18, 2006 (gmt 0)

Don't get me wrong, I think it should be simpler to manage how bots can use/abuse a site, but webmasters can change their own behavior a lot faster than bot programmers can decide on some new standard. :)

Tapolyai




msg:3013064
 2:13 pm on Jul 18, 2006 (gmt 0)

It's where the metaphor fails. To force the metaphor back in line we'd have to say it's like letting people walk in the store and look around but erasing their memory of everything they saw and interacted with so they can't make any new ideas based on what was in your store.

In think your point has merit, but I am still not giving up ;).

Yes, indeed when we visit a web site "we create a copy", maybe even cache it, and recall it later - for our own amusement... But isn't it the same as we walk into a store, look around, "cache" what we see, and later remember "oh! That painting sort of looked good in that store", and go back and get it? I might even have a catalog at the exit for you to pick up, with pictures and prices since my "cache" isn't as good as computers'. Let's not get bogged down on the definition of cache or some intermediary storage solution which is required to see/hear/experience a web site/store. I am confident these are different then when you create access to the same "cached" or stored content for others to peruse.

With that note, I have to agree that anything on the Web is public, which is not explicitly restricted, except just as a store has a physicial restriction (you have to go to the store), a web site has to be visited. Yes, limited indexing and referencing of benefit to all and permissible (yellow pages/white pages, directories, catalogs).

Not just that, just because something is "in public", it does not make it "public domain".

[edited by: Tapolyai at 2:16 pm (utc) on July 18, 2006]

mcavic




msg:3013147
 3:00 pm on Jul 18, 2006 (gmt 0)

I just think it's down right stupid and wrong for Google to visit pages and files to which NO ONE links to

Atlrus, you did link to the FTP log. Your Web server produced a directory listing, and that listing had a link to the log. There's no other way Google could have found your WS FTP log.

theBear




msg:3013199
 3:50 pm on Jul 18, 2006 (gmt 0)

Altrus,

You need to add a:

Options -Indexes

to the .htaccess file in the root directory (for an entire site).

This is a common site setup issue.

You have to make certain that your server gets told all it should be.

You should assume nothing when setting up a server or site. Test everything for both expected operation and for possible unexpected operation.

Incorrect site/server/dns configuration is also the cause of all of the non-www/www, 301 problems, duplicate content, etc. issues that appear every third or fourth day on WebmasterWorld.

Cheers,
theBear

PS: There are over 1 million copies of WS_FTP.log files in Google's index.

[edited by: theBear at 3:56 pm (utc) on July 18, 2006]

GrendelKhan TSU




msg:3013332
 5:07 pm on Jul 18, 2006 (gmt 0)

All Your Basesite Belong to Google.

Resistence is futile. :p

(sorry I know that jokes been 908029348023 times here, but I couldn't resist)

FiRe




msg:3013360
 5:34 pm on Jul 18, 2006 (gmt 0)

[google.com...]
Thought it might help :-)

wmuser




msg:3014983
 8:32 pm on Jul 19, 2006 (gmt 0)

Agree,thats your fault,eithe rpassword protect it either add no index in your .htaccess

atlrus




msg:3021351
 12:28 pm on Jul 25, 2006 (gmt 0)


System: The following message was spliced on to this thread from: http://www.webmasterworld.com/google/3021349.htm [webmasterworld.com] by jatar_k - 10:41 am on July 25, 2006 <small>(pst -7)</small>


Now, here is the real kicker - Google will cache pages through the toolbar!?!

This is what's happened:

I use feedvalidator(dot)org to check my feed for mistakes and typos. You tipe in the feed URL and it checks it. Nothing is saved or published. And today, I was checking the "contain the term" for my site, and there it was -
feedvalidator.org/check.cgi?url=http%3A%2F%2Fwww.mywebsite.com
cached by Google...

WHAT'S NEXT - MY BANK ACCOUNT!

[edited by: jatar_k at 5:41 pm (utc) on July 25, 2006]

webdude




msg:3024701
 6:55 pm on Jul 27, 2006 (gmt 0)

the toolbar may be the culprit too...

hutcheson




msg:3024711
 7:03 pm on Jul 27, 2006 (gmt 0)

It doesn't really matter how Google found it, nor even how Google knew that it had permissions to access it.

What was the file doing in a public folder anyway?

That's not "in a store", that's not "in a house" or "on a back porch".

That's painted on the side of the house, where anyone can see it. You thought it was hidden because you didn't think anyone knew you had a northwest wall to your house. And you're whining because a guy in the next block flew a balloon over your house, and saw it. HE SHOULDN'T HAVE BEEN LOOKING IN THAT DIRECTION? HE SHOULD HAVE KNOWN THAT WALL WASN'T FOR PUBLIC DISPLAY?

Hogwash. What would you say about an author who put a very personal tidbit on page 347 of his new book, and then whined that it was secret because it wasn't mentioned in the index of the book?

This 45 message thread spans 2 pages: < < 45 ( 1 [2]
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google SEO News and Discussion
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved