Forum Moderators: mack

Message Too Old, No Replies

Bing Indexing HTTP Version of Page that is on HTTPS

301 redirects and canonical Links are in place

         

webcentric

11:29 am on Jun 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I just recently found that one page (as far as I can tell so far) is ranking well in Bing for both its https version and its http version (just a few results behind the https version). The page in question has a canonical url that specifies https. The site has been redirecting to https for over a year and there's no way to type an http url in any browser without being directed to https. Some other odd aspects of this situation is that, while the https version is ranked better, I can see by the description (which includes the current number of articles in the system), that it's an older version of the page than the http version. So, a page that theoretically doesn't exist, is showing more recent data than the actual page and it's giving the real page a run for its money in the results.

In BWT, I have only ever set up the https version of the site. I guess my question here is "how else can I possibly tell Bing to exclude this page from the index?" And a secondary question is, why on earth is it even in the index to begin with? The only think I can think of is someone is linking to the non-https version. Still, Bing should be able to recognize a 301 redirect and a canonical url when it encounters them. Call me miffed. Thanks for any insights or solutions you may be able to offer.

dstiles

6:46 pm on Jun 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Are you absolutely sure there is no way to get the http version? Specifically, by some kind of non-browser software such as wget or a specific type of bot?

If you have an http version of the site AT ALL it's possible to access it unless the 301 is absolutely guaranteed.

webcentric

7:45 pm on Jun 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Well, these days, I'm absolutely certain of nothing! Having said that, this is an IIS site but the concepts are the same as for a linux site. Essentially every request is tested to determine whether it's a secure request or not. In IIS this is done on Application_BeginRequest. All non-https requests are permanently (i.e. 301) redirected to the https version. I haven't tested this with anything other than the usual browsers but I can't understand why the user agent would matter. This is application-level redirecting. I've done the same for www to non-www which seems to be working perfectly.

dstiles

7:35 pm on Jul 1, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I manage an IIS server (windows 2012) and it's a sod to get right.

Selection of http/https is "approved" by a higher level than IIS itself. When it comes to IIS you have to make sure the bindings are correct. If you have both http AND https in the binding then technically you can receive both. If you remove http them that should fix any problems. Trouble then is: if someone comes in with http it will not be redirected.

I do not know all the ins and outs of this: I read and tinker until I get what I need.

My point about wget (and curl and others) is that they do not necessarily redirect automatically as a browser would.

lucy24

8:16 pm on Jul 1, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



they do not necessarily redirect automatically as a browser would

They don't need to, do they? All that matters is that you, at the server end, sent out the 301 instead of the requested content. If they choose not to follow the redirect (major search engines generally do) that's their lookout.

:: detour to double-check something ::

Yup, there's a "fetch as bingbot" feature. What happens when you request some of those URLs they've been indexing? (I don't have https so I experimented by requesting the without-www form of a page on a site that's only listed as with-www. It came back as a 301, not as a "What the ### are you talking about? You don't own that site!")

webcentric

3:08 pm on Jul 2, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Lucy. Fetch as Bingbot does return the 301 properly. I also think the http version has been removed as I can't find it today. I also see that the page 1 results have changes (yesterday it showed 8 results but today it lists 14) so something seems to be in a state of flux over there.

I'm beginning to think that this issue is related to the fact that the site was up with just a simple home page for the first year of its existence and wasn't on https. I made the change to https late last year I think and started working on the site in earnest. Guessing that the original home page URL was just a piece of residual data in the index. Anyway, all appears well now but I'll watch for awhile to see if any other anomalies crop up.

webcentric

3:46 pm on Jul 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And now the http version is back again. I'd like to make a removal request but I'm worried they'll just remove both versions despite what I tell them. Redirection is working so don't know what else to do. Frustrating!