Forum Moderators: phranque

Message Too Old, No Replies

Pages from site with a 301 redirect to another site can be indexed?

pages indexed show up as https: and not http:

         

justme

1:46 am on Sep 4, 2010 (gmt 0)

10+ Year Member



We have a website that has been 301 redirected (through IIS) to our preferred/main website for several years now. e.g. www.mysiteA.com (301 redirected) --> www.mysiteB.com (these are not the real domain names, just using an example.)

For kicks, I decided to check on "site:www.mysiteA.com". I expected to see no results since every page on the site is 301 redirected. However, many pages showed up in the results and all of them showed up as https:// instead of http:// (note the "s"). mysiteA does not have any ssl certificate requirements set in IIS. However, mysiteB does have pages that require an ssl connection and has a valid certificate for the domain.

What's strange is that only new pages added to mysiteB (after the redirect was set up) show up in the "site:www.mysiteA.com" results as [mysiteA.com...] So when the links are clicked, the browser displays an untrusted connection message with the complaint that the ssl certificate does not match the domain. None of the old pages show up in the "site:www.mysiteA.com" result list.

A couple of questions:
1) What is causing the search engines to index the pages as https: instead of http:?
2) Why did any page from mysiteA get indexed even though all pages were 301 redirected to mysiteB?

TIA for any words of wisdom you can provide.

Brett_Tabke

1:21 pm on Sep 4, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



What happens in the browser when you attempt to visit those pages via https? Can you run a header checker to see what is really coming in the headers of the redirects?

I am curious to know if there may be backlinks to the pages using https?

justme

2:58 pm on Sep 5, 2010 (gmt 0)

10+ Year Member



Brett,

Just any FYI, when I discovered only the pages added after the redirect was setup were being indexed, I added the new pages yesterday from mysiteB to mysiteA to see if at the next Google index this would go away. I confirmed the redirect settings from mysiteA to mysiteB are in place and are working properly when browsed with http(no s). Any idea why it was indexed in the first place even though the entire website was 301-reditected?

I ran web-sniffer on the URL. This is the response I got when I entered https:www.mysiteA.com/... (website IP/URL details hidden). Sniffer also displayed the correct html code for the web page which prior to yesterday only existed on the www.mysiteB.com site. Is this the information you are looking for?


HTTP Request Header

Connect to [hidden] on port 443 ... ok

GET [hidden] HTTP/1.1[CRLF]
Host: [hidden][CRLF]
Connection: close[CRLF]
User-Agent: Web-sniffer/1.0.36 (+[web-sniffer])[CRLF]
Accept-Charset: ISO-8859-1,UTF-8;q=0.7,*;q=0.7[CRLF]
Cache-Control: no[CRLF]
Accept-Language: de,en;q=0.7,en-us;q=0.3[CRLF]
Referer: [web-sniffer][CRLF]
[CRLF]

HTTP Response Header
NameValueDelim
Status: HTTP/1.1 200 OK
Content-Length:45434
Content-Type:text/html
Last-Modified:Mon, 19 Jul 2010 11:51:02 GMT
Accept-Ranges:bytes
ETag:"506772a93827cb1:24b0"
Server:Microsoft-IIS/6.0
MicrosoftOfficeWebServer:5.0_Pub: 5.0_Pub
X-Powered-By:ASP.NET
Date:Sun, 05 Sep 2010 14:25:29 GMT
Connection:close

aakk9999

11:24 pm on Sep 5, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Does MysiteA.com points to the same webspace as MysiteB.COM?

If it does, then even though MysiteA does not have a valid SSL, the request for https variant of URLs for MysiteA will still technically work and return the requested URL.

In this case you should make sure that you are redirecting https variant of URLs from MysiteA to MysiteB as well.

justme

6:02 pm on Sep 7, 2010 (gmt 0)

10+ Year Member



aakk9999,

Does MysiteA.com points to the same webspace as MysiteB.COM?

To answer your question thoroughly, I need to give a little history on the websites. mysiteB was the original website and was mostly targeted for consumers. mysiteA was later created to target businesses. Over the years the products and information offered on both sites started to overlap. When the overlap became huge, we decided to maintain the original mysiteB only and 301 redirect mysiteA to mysiteB.

Back to your question. The websites were on separate directories with different domains managed by IIS on one server. When we setup the 301 redirect, we made both websites identical. So I believe the answer to your question is "yes, mysiteA redirects to the same webspace in mysiteB".

We have since added many new pages and folders to mysiteB which now leads us to the problems I identified above.

If it does, then even though MysiteA does not have a valid SSL, the request for https variant of URLs for MysiteA will still technically work and return the requested URL.

In this case you should make sure that you are redirecting https variant of URLs from MysiteA to MysiteB as well.

I'm not sure I understand your suggestion above. Are you saying that I can 301 redirect "https" requests and this should eliminate the "untrusted connection" label? If so, I do not know how to do this with IIS. Can you point me to resources where I can learn how to do this? Or is this a simple checking of an IIS checkbox?

TIA

aakk9999

7:21 pm on Sep 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I do not know IIS admin so I am not familiar how to set up redirect that way. Also, depending on whether all or just some of your pages on MySiteB use https protocol, you may need to use additional logic and you may or may not be able to do this within IIS (not sure). If you have ISAPI rewrite module installed on your IIS then you may be able to do more complex redirects using regex - just search for 'ISAPI http https' and you will find plenty of resources.

With the site where we did something similar, the 301 redirect is managed within aspx script. Basically, we also had a setup where siteA points to the same webspace as siteB and where some of pages use https protocol whilst other pages use http.

When the request arrives on server, aspx script first checks whether the URL is for siteA or siteB. If it is for siteA, it does appropriate redirect to siteB, including https into http (or vice versa) if required. If it is for siteB, then it checks if the correct protocol is used, if not, redirect is served to a correct protocol. Redirects must happen in one hit (no chain redirects).

Therefore, how you do your redirects depends on how your pages on MySiteB *should* be served. As you have SSL then each page on MySiteB can be reached via either http or https protocol. If all URLs on MySiteB use https protocol then firstly you should already be redirecting http version of MySiteB URL to https version of MySiteB URL and therefore the redirect from MySiteA should go to https version of MySiteB regardless of protocol used in MySiteA request.

However, if on MySiteB some pages should be served as http and others only https, then:

- you would need to (or you alredy have) a logic within MySiteB to redirect the page to its version of either http or to https depending on how this page should be served (if you do not do this, you will get duplicate content)

- the 301 redirect from MySiteA to MySiteB should redirect to a correct (canonical) version of URL that MySiteB should (is?) serving (http or https)

justme

8:50 pm on Sep 7, 2010 (gmt 0)

10+ Year Member



Thanks. I will definitely seek out resources on "isapi https http rewrite module for IIS".

However, this problem only manifests when I do a site:www.mysiteA.com on a broswer. Will the ISAPI rewrite work when a URL from that list is clicked?

I'm also trying to understand why the search engines indexed those pages as https instead of http never mind why the pages were indexed at all since the entire site was 301 redirected with IIS. Furthermore not all of the pages in mysiteA was indexed. The indexing of pages seem to be selective and is a small fraction of the total number of html pages on the site. I did not check all, but the URLs in the first 5 pages of the "site" result list does not require an SSL connection.

This has been the baffling part of all this. If you can shed some light on this, that would be much appreciated.

aakk9999

10:41 pm on Sep 7, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



From my experience, there could be only one URL with "https" protocol on MySiteA, which returns 200 OK (since there is no redirect for https protocol) and unless you are using absolute links when interlinking on your pages, that one single URL will cause all other links on that page to use "https" protocol.

That "https" link could be an external link from somewhere, an incorrect link on your MySiteB or it could have been a small window where Google managed to find such URL in the past (eg. incorrect linking somewhere within your site to [MySiteA...] which may even be corrected since). As such request (to [MySiteA)...] returns 200 OK and is not redirected, Google may keep requesting it for a long time and from it can potentially get other links that would now be https on MySiteA.

Sometimes it is hard (or even impossible) to find where such mistake originated, but correct redirection between https/http from MySiteA to MySiteB will fix this problem eventually as it will tell Google that this page is now moved permanently to a new location.

You can try to run xenu link sleuth against your MySiteB to see if you can detect something and correct if you do, but as I said, even if the linking error is not there any more, if Google picked it up, it has this URL from MySiteA as a valid URL that returns 200 so it may keep re-requesting it.

If both sites point to the same webspace (which you say they do) then the same ISAPI rules will get executed for requests that arrive to either MySiteA or MySiteB and it is up to you to ensure that you have correct redirection rules set - be careful as this can be easily mess up unless you know exactly what you are doing.

Yes, ISAPI redirect will get executed for all requests for [MySiteA...] regardless whether the redirect is from browser or from bot, so, if set up correctly, it will work when you click on your link showing when you run "site" command. If it does not work from your result in "site" command, then it just does not work, fullstop.

And lastly, why just some of your https pages from MySiteA are showing in "site" command - there could be many reasons. Google filtering pages is one, the second may be that just some of them are able to be requested as "https" (depends on your internal linking structure) etc.

bwnbwn

9:51 pm on Sep 8, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Quick question the pages are they static pages?

justme

11:55 pm on Sep 8, 2010 (gmt 0)

10+ Year Member



aakk9999,

Thank you for your time, suggestion and detailed explanation. They are much appreciated. I will take another close look at the IIS settings for mysiteA and the html pages that are affected on mysiteB.

justme

11:57 pm on Sep 8, 2010 (gmt 0)

10+ Year Member



bwnbwn,

Quick question the pages are they static pages?

Yes, the pages affected are static html pages on mysiteB and not dynamically generated with a script. Prior to my copying the pages to mysiteA this weekend, the pages did not exist on mysiteA.