Welcome to WebmasterWorld Guest from 54.227.157.163

Forum Moderators: open

Message Too Old, No Replies

Google Indexing Secure Servers?

     
9:02 pm on Feb 28, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 26, 2000
posts:2176
votes: 0


From [google.com...]

google only indexes publicly available web pages. Secure sites (https: ) are not included in the google index.

Yet a search for allinurl: https [google.com] returns 3.8 million pages.

Have I been asleep on this one, or does google need to re-write some copy??

9:06 pm on Feb 28, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Sept 16, 2001
posts:2059
votes: 0


Well i guess they software people worked faster than the webmasters :) At google atleast i know that any member here would have updated that :)

But i am think they realised that some uses the https protocol even if the information doesn't need a secure connection.

11:44 am on Mar 1, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


Thanks, WebGuerrilla, I'm quite amazed. My first thought was to check that the servers didn't send those pages without ssl, or that they were included without being indexed but no, google definitely indexed them.

https does not ensure privacy, just that it's hard to eaves-drop. Still, those of us who've considered putting our 'add to basked' URLs behind ssl to keep sessions out of 'bots need to be more careful.

I expect the help page to be fixed pretty soon...

Calum

6:51 pm on Mar 1, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 26, 2000
posts:2176
votes: 0



I'm just at a loss as to what the potential benfit of doing it is. When you browse through the pages, all you really find is shopping cart order forms. I didn't come across anything that looked like quality content that would show up as a search result.

google doesn't usually introduce new ideas unless there is some practical application down the road. So what would that application be?

8:41 pm on Mar 1, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 25, 2001
posts:43
votes: 0


Carding?
8:47 pm on Mar 1, 2002 (gmt 0)

Administrator from US 

WebmasterWorld Administrator brett_tabke is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 21, 1999
posts:38250
votes: 109


For google to index https urls, is not necc a problem. All those google has indexed are not login oriented. If the bot starts filling out ids and passwords, kicking out cookies - then we'll have a problem.

strip mine the net!

9:06 pm on Mar 1, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member littleman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 17, 2000
posts:2924
votes: 0


Looks like a couple of them are doing some type of delivery system. I was able to pull a non secure version of www.safeweb.com and control.business.mindspring.com with LWP.

Then there is www.redcross.org/donate/donation-form.asp which looks like it was never cached, but is just in the index from pure linkpop/PR.

I don't think they actually are spidering the secure pages, but you could be in the google index without the spider ever hitting the page.

9:24 pm on Mar 1, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member littleman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 17, 2000
posts:2924
votes: 0


Did more poking, many site have non-secure parallels to there secure pages. Looks like google will just ignore the the 's' in https and try to hit the same page on port 80.
10:49 pm on Mar 1, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 25, 2001
posts:43
votes: 0


Sure look like a bunch of them are indexed to me, there's cached links all over the place and there is content there when you view the cached result.

The biggest problem right away that I see is some of this pages are ranking well and I don't believe too many people have designed their sites to be entered via the https server pages. In the earlier cases where google has indexed .pdf and other file types normally they don't rank well so it's not much of a problem for someone locating your company via a word document - however in this case it's going to cause a problem for some I would imagine.

I wonder how many documents the crawler will eventually find after it crawls some more?

11:31 pm on Mar 1, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member littleman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 17, 2000
posts:2924
votes: 0


Non secure versions of:
business.mindspring.com [cgi-fun.hypermart.net]
www.safeweb.com [cgi-fun.hypermart.net]
rhn.redhat.com [cgi-fun.hypermart.net]
alerts.securityfocus.com [cgi-fun.hypermart.net] -> these guys are UA delivering a non-ssl to google.
www.fortify.net [cgi-fun.hypermart.net]
5:20 pm on Mar 2, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


littleman, rhn.redhat.com gives a 302 to the https address when accessed unencrypted in port 80 and a 404 when accessed unencrypted on port 443 so googlebot wouldn't be able to set the cache if it couldn't use ssl.

Calum

6:03 pm on Mar 2, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member littleman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 17, 2000
posts:2924
votes: 0

6:14 pm on Mar 2, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


The proxy script follows 302 redirects. I telneted with googlebot's User-Agent and it still wouldn't give me the page.

You had me worried for a moment.;)

Calum

6:31 pm on Mar 2, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member littleman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 17, 2000
posts:2924
votes: 0


That is just a useragent faking script I wrote using perl's LWP, it can not grab secure pages.
6:35 pm on Mar 2, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member littleman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 17, 2000
posts:2924
votes: 0


Big url this time:

cgi-fun.hypermart.net [cgi-fun.hypermart.net] This one is calling the header information of the other script.
BTW, folks turn off you JS if you want to avoid the popups.

7:26 pm on Mar 2, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member ciml is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 22, 2001
posts:3805
votes: 2


This is most odd, I just can't duplicate your results. When I use your script I get the page OK with fred as the UA:

http://cgi-fun.hypermart.net/lookup.pl...user%3Dfred [cgi-fun.hypermart.net]

I can get my secure [cgi-fun.hypermart.net] servers [cgi-fun.hypermart.net] with your script (which give different pages without ssl).

The following gives me a 302:

telnet rhn.redhat.com 80
GET / HTTP/1.1
Host: rhn.redhat.com
User-Agent: googlebot/2.1 (+http://www.googlebot.com/bot.html)

While this gives me a 404:

telnet rhn.redhat.com 443
GET / HTTP/1.1
Host: rhn.redhat.com
User-Agent: googlebot/2.1 (+http://www.googlebot.com/bot.html)

Calum

9:38 pm on Mar 2, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member littleman is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:June 17, 2000
posts:2924
votes: 0


Ah, well it looks like the script could handle secure pages. So, perhaps google does actually spider secure pages on occasion. I guess that is why googleGuy has been so quiet! Well done Calum!

For what it is worth, look at this:
[translate.google.com...]

Could be that google is adding pages that get in via the translation service.

Of course there is the remote chance that they are IP delivering to googlebot.

5:29 pm on Mar 6, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Feb 25, 2001
posts:43
votes: 0


google's submission form also allows "https". FYI alltheweb.com doesn't allow https submission via free but does via paid only.
8:19 pm on Mar 10, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member heini is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Jan 31, 2001
posts:4404
votes: 0


Dutch SE site Voelspriet [voelspriet.nl] received a response by google's PR Manager Nathan Tyler. Short quote:
google is currently testing a new crawling technology that will enable our users view HTTPS pages within google search results....we discovered a bug in this new technology...The improved version of google's web crawler will recognize all robots.txt.. will be deployed in the next 30 days
.
12:12 am on Sept 5, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Oct 25, 2001
posts:661
votes: 1


Hi Guys,

Just doing some work on https - and found this interesting thread. But now I'm really confused.....

if you search at google for allinurl: https

ie all indexed pages which have https in their url - you get 2.7 Million URLS

But if you look at the first couple of results - the goole toolbar 'pagerank' is greyed out. Once you get past 3 or 4 - the pages have quite high pagerank??

I can see from the first ones that there are several redirections going on - which would explain high ranking 'final destination' pages with no page rank on the final destination page.

Is that the answer?? Can someone else please have a look at this for me?

Thanks in advance.

 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members