Welcome to WebmasterWorld Guest from 188.8.131.52
My posting there was a bit off topic, so he asked me to start a thread of its own on this. So here's my idea:
Concerning links: what about intranets accomplishing the internet to what I think is named "deep web"?
My warehouse-management is running on an internal WAMP-system and I constantly work on it having my toolbar enabled. In order to get access from home as well, I forwarded port 80 to the server, so at least in principle google-spiders might get access to the system, tracking the IP, while I work from home.
Maybe I should modify the password-protection and place some links there? ;)
Besides: is my customer-data really safe with the toolbar enabled?
What do you think?
At one point a year or so ago one of my customers managed to get his secure CP URL into google somehow. I THINK it was through a key logger in a cafe or some such in that case but I certainly wasn't happy to hear he found his CP by typing the complete secure URL into google instead of into the browser's Location field.
I don't know if google would look at your intranet but if it has a .com domain associated with it it's quite likely. I know google picks up a new .com registration within days and goes hunting for its web site.
If you are accessing your intranet remotely using a domain name then that DNS registration is available to all and sundry. It WILL attract hackers. It was stupid of me to name a test domain on my local server, publicly accessible for remote testing, as paypal.example.com but it did prove the point. :(
Have you considered using a port other than the traditional 80 and 8080 for http?
Technically, here in Germany the connection is generally cut once within every 24 hours, and then assigned a different IP. There is additional password-protection to my data, of course, so I feel quite save on this, actually.
The initial question was NOT how to make sure intranets NOT be found, but whether there is evidence google and other SE spiders already target indexing the deep web.
But I did not find the time, yet, to check my logfiles accordingly.
If neither of these applies then google probably won't find you - unless, perhaps, they really do read gmail or toolbars or whatever. You could always include a robots.txt forbidding access and a meta robots per page for extra safety, although if a hacker's bot finds you that wouldn't help. I assume the password protection is good quality.
Back to the toolbar - it would not surprise me if google stores all links accessed, although I obviously have no specific knowledge of that. I would never use a toolbar anyway, especially when accessing a private site or control panel.
For similar reasons, I am not really happy with it, either. But I like the quick access to the search-form-field, and the toolbar still gives me a short glimpse of the "importance of a website" (however outdated and sometimes weird the results may be).
I often tried, but I don't get used to using two different browsers.
Not sure what you mean by "two different browsers". Do you mean two different types (IE and FF) or multiple windows open?
I use Firefox exclusively for browsing online. Currently I have 7 windows open with about 45 tabs between them. Of those, about eight tabs are different google searches. And that's only on one of my machines! :)
If you publish a host to the Domain Name System, visit a URL via a search engine toolbar, or if a reference to the URL appears somewhere in the public internet, then that URL has a realistic chance of being added to the content discovery queue at a major search engine.
The only reliable method I'm aware of to prevent the content at a published URL making it onto the public internet is to authenticate users - ideally via a username/password combination, or otherwise by some other characteristic of the client - IP, for instance.
If you have a reasonable authentication mechanism, then search engines cannot access the content of that URL. The best they can do is request the URL and get denied.
is my customer-data really safe with the toolbar enabled?
The last time I checked, the toolbar sent URLs back to Google, not content from URLs. You can install a packet sniffer and check whether the same happens on your own computer.
For me, the bottom line is that if you publish a URL to the public internet, the either you verify users or consider the contents available to anyone - search engines included. If a search engine can get the content, so can anyone with an internet connection.
[edited by: Receptional_Andy at 10:40 pm (utc) on April 22, 2009]
Yepp. It emerged from Brett's thread on seeking unconventional means to place URLs. I was wondering, whether google - with it's ever-growing hunger for data - is already targetting the deep-web.
And maybe not just incidentally...
I was thinking of checking the logfiles of my apache-intranet, but I found I did not care for it for two years. Due to either a misconfiguration or some breakdowns this logfile is 0,5 GByte of size;) Currently, I see no way to have a decent look in an acceptable timespan. I'd have to split this file first and have other things to do.