Links from intranets?

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Links from intranets?

Oliver Henniges

2:16 pm on Apr 17, 2009 (gmt 0)

In this thread [webmasterworld.com] Brett was seeking "...ways sites can link, refer, or point urls to your pages other than direct hrefs."

My posting there was a bit off topic, so he asked me to start a thread of its own on this. So here's my idea:

Concerning links: what about intranets accomplishing the internet to what I think is named "deep web"?

My warehouse-management is running on an internal WAMP-system and I constantly work on it having my toolbar enabled. In order to get access from home as well, I forwarded port 80 to the server, so at least in principle google-spiders might get access to the system, tracking the IP, while I work from home.

Maybe I should modify the password-protection and place some links there? ;)

Besides: is my customer-data really safe with the toolbar enabled?

What do you think?

dstiles

11:04 pm on Apr 17, 2009 (gmt 0)

Last year I had to advise a customer to turn off google toolbar when using his control panel. I'm not saying google actually LOOKS at what is going on but I wouldn't bet they don't. I didn't find any evidence that they did in that instance but there was a suggestion here in a recent thread that google may take notice.

At one point a year or so ago one of my customers managed to get his secure CP URL into google somehow. I THINK it was through a key logger in a cafe or some such in that case but I certainly wasn't happy to hear he found his CP by typing the complete secure URL into google instead of into the browser's Location field.

I don't know if google would look at your intranet but if it has a .com domain associated with it it's quite likely. I know google picks up a new .com registration within days and goes hunting for its web site.

If you are accessing your intranet remotely using a domain name then that DNS registration is available to all and sundry. It WILL attract hackers. It was stupid of me to name a test domain on my local server, publicly accessible for remote testing, as paypal.example.com but it did prove the point. :(

Oliver Henniges

10:01 am on Apr 19, 2009 (gmt 0)

> If you are accessing your intranet remotely using a domain name .... It WILL attract hackers.

It's a dynamic address. I used the dyndns-service for a while, but after some temporal breakdowns a year ago, I programmed my own means to find out about the IP.

dstiles

6:01 pm on Apr 19, 2009 (gmt 0)

As long as it doesn't require an entry in a public-facing DNS system. Dynamic IPs often stay attached to a single "broadband account" now for some considerable time, so it would be theoretically possible for a bot to find you. If there is an A or MX record in the DNS, whether it's a domain name or simply an IP may not matter.

Have you considered using a port other than the traditional 80 and 8080 for http?

Oliver Henniges

11:51 am on Apr 20, 2009 (gmt 0)

Thx for you help, dstiles.

Technically, here in Germany the connection is generally cut once within every 24 hours, and then assigned a different IP. There is additional password-protection to my data, of course, so I feel quite save on this, actually.

The initial question was NOT how to make sure intranets NOT be found, but whether there is evidence google and other SE spiders already target indexing the deep web.

But I did not find the time, yet, to check my logfiles accordingly.

dstiles

7:28 pm on Apr 20, 2009 (gmt 0)

My original point was that I don't see why, if there is a permanent link to your server, that google will not try to follow it IF there is an external link somewhere OR you are using a .com/net/org domain that google may try to access.

If neither of these applies then google probably won't find you - unless, perhaps, they really do read gmail or toolbars or whatever. You could always include a robots.txt forbidding access and a meta robots per page for extra safety, although if a hacker's bot finds you that wouldn't help. I assume the password protection is good quality.

Back to the toolbar - it would not surprise me if google stores all links accessed, although I obviously have no specific knowledge of that. I would never use a toolbar anyway, especially when accessing a private site or control panel.

Oliver Henniges

2:36 pm on Apr 21, 2009 (gmt 0)

> I would never use a toolbar anyway, especially when accessing a private site or control panel.

For similar reasons, I am not really happy with it, either. But I like the quick access to the search-form-field, and the toolbar still gives me a short glimpse of the "importance of a website" (however outdated and sometimes weird the results may be).

I often tried, but I don't get used to using two different browsers.

dstiles

10:18 pm on Apr 21, 2009 (gmt 0)

Can't you use Firefox's google box instead of the toolbar search field, Oliver? Obviously, not using GTB I don't know if there is any comparison between the two search fields.

Not sure what you mean by "two different browsers". Do you mean two different types (IE and FF) or multiple windows open?

I use Firefox exclusively for browsing online. Currently I have 7 windows open with about 45 tabs between them. Of those, about eight tabs are different google searches. And that's only on one of my machines! :)

Oliver Henniges

9:42 pm on Apr 22, 2009 (gmt 0)

I meant two different types. I do have FF installed, but I tend to use IE, normally. Don't know why, to be honest. Maybe I'm getting old...

dstiles

10:23 pm on Apr 22, 2009 (gmt 0)

Aren't we all! :)

Receptional Andy

10:37 pm on Apr 22, 2009 (gmt 0)

There are a few issues floating around in this thread.

If you publish a host to the Domain Name System, visit a URL via a search engine toolbar, or if a reference to the URL appears somewhere in the public internet, then that URL has a realistic chance of being added to the content discovery queue at a major search engine.

The only reliable method I'm aware of to prevent the content at a published URL making it onto the public internet is to authenticate users - ideally via a username/password combination, or otherwise by some other characteristic of the client - IP, for instance.

If you have a reasonable authentication mechanism, then search engines cannot access the content of that URL. The best they can do is request the URL and get denied.

is my customer-data really safe with the toolbar enabled?

The last time I checked, the toolbar sent URLs back to Google, not content from URLs. You can install a packet sniffer and check whether the same happens on your own computer.

For me, the bottom line is that if you publish a URL to the public internet, the either you verify users or consider the contents available to anyone - search engines included. If a search engine can get the content, so can anyone with an internet connection.

[edited by: Receptional_Andy at 10:40 pm (utc) on April 22, 2009]

dstiles

5:28 pm on Apr 23, 2009 (gmt 0)

Or, as I do, block all IPs in the web server (IIS) except for the one or two I actually want to let through.

Oliver Henniges

5:56 pm on Apr 25, 2009 (gmt 0)

> There are a few issues floating around in this thread.

Yepp. It emerged from Brett's thread on seeking unconventional means to place URLs. I was wondering, whether google - with it's ever-growing hunger for data - is already targetting the deep-web.

And maybe not just incidentally...

I was thinking of checking the logfiles of my apache-intranet, but I found I did not care for it for two years. Due to either a misconfiguration or some breakdowns this logfile is 0,5 GByte of size;) Currently, I see no way to have a decent look in an acceptable timespan. I'd have to split this file first and have other things to do.

dstiles

9:53 pm on Apr 25, 2009 (gmt 0)

Is it unix/linux-based? If so use the tail command to get the last so many lines. Don't ask me for exact help - haven't used it for a couple of years. :)

For Windows there are unix-style tool kits that offer several of the commands.