| 10:13 pm on May 18, 2009 (gmt 0)|
Hello j1mms, and welcome to the forums.
Sounds like a nasty experience. Have you looked for all the backlinks that point ot the https versions and notified them about the change? Depending solely on 301 redirects is not always the best.
Also, do you have any other 301 redirects in place, for example for canonical purposes or expired pages? You want to avoid long chains of redirects, since they can take Google a longer while to verify as not being manipulative.
If you don't already have one, an xml sitemap could also be a help. Was your homepage indexed as https?
| 10:00 am on May 19, 2009 (gmt 0)|
Thanks - We added 301s ( direct from https to http), new canonical tags, blocked https urls with meta robots, added XML sitemap.
We cannot find any backlinks pointing at https as we believe this was a short term onsite error.
As far as long chains of redirects, again we dont have issues here, just one which WMT flagged against the sitemap.
Is this issue related to SSL expiring or duplicate content?
Now that the indexed https pages have dropped out of the index, does anyone know how long it might take google to start listing the old http pages once again?
Is there anything further we can do? Xenu seems to look ok onsite..
Yes homepage was indexed, this is still showing p1 for company name searches.
| 2:34 pm on May 19, 2009 (gmt 0)|
|Thanks - We added 301s ( direct from https to http), new canonical tags, blocked https urls with meta robots, added XML sitemap. |
If you have added 301s and canonical tag on https pages, then you should not block https pages through robots.txt. You should let google come through and sort itself out by encountering canonical and 301s on https version of pages.
| 2:49 pm on May 19, 2009 (gmt 0)|
Hi - Interesting!
Wouldn't the 301s trigger first in any case? - Also I would have thought the canonical would of been detected at the same time the robots was.
| 6:46 pm on May 19, 2009 (gmt 0)|
The 301 will trigger if googlebot requests the https url - but once googlebot has indexed the new robots.txt then it will not even request the url, so it won't see the redirect.
Are you watching googlebot's interactions with your server in your server logs? Might be some clues there.
| 9:32 am on May 20, 2009 (gmt 0)|
We set a meta robots on each https page rather than the root level robots.txt.
We have now removed this to see if this will help and keep an eye on googlebot/spidering.
Has anyone else had such issues with a removed/expired SSL certificate?
Interestingly most phrases are sat neatly just outside the top 100, as though we have +100 on all our ranks.
| 3:52 pm on May 20, 2009 (gmt 0)|
I am not sure about what effects on your ranking will be once Google drops https pages from its index, however, dropping the pages from Google index will take some time.
It may be useful to know that in our case it took about 3 months time for Google to drop roughly 3000 URLs from its index (we removed sort parameters from URL and did 301 redirect from URLs with permuted sort parameters to equivalent URL but without sort parameters). But it will happen eventually.
| 4:59 pm on May 20, 2009 (gmt 0)|
For reference, here's a thread about the same issue - with some technical ideas for both Windows and Apache servers:
Removing https pages from Google with robots.txt [webmasterworld.com]
| 8:02 pm on May 20, 2009 (gmt 0)|
I may be misinterpreting what you've done, but this comment jumped out at me....
|We set a meta robots on each https page rather than the root level robots.txt |
I'm not quite understanding how you separated https pages from http pages to do this... and particularly whether you put the meta robots tag just on the pages that should be secure (like login and checkout pages), or whether you're talking about product pages that inadvertently became indexed as https.
On the "900 https site product and category pages" you mention, the pages are the same. What happens when you have an https canonical issue is that the same pages are seen under multiple urls, and the issue is one of references to the http pages that are causing them to be indexed also as https, combined with a server setup that hasn't canonicalized these pages as http.
The 301 redirect, if properly done, should stop the https indexing problem... but, if you've also put the meta robots tag on all of those 900 pages, you're likely to have another problem....
|Now that the indexed https pages have dropped out of the index, does anyone know how long it might take google to start listing the old http pages once again? |
If you've got meta robots noindex on all those pages, Google won't reindex them. It will drop them all, regardless of whether they're https or http.
Again, I may be misinterpreting what you've done, but the above is a possible scenario that occurred to me.
| 8:34 am on May 21, 2009 (gmt 0)|
Thanks - Tried to clarify a bit below..
We set to detect IF https protocal was requested.. - If so, we set onpage meta robots to block access for https requests.
There have never been any meta robots on any http page.
| 4:09 pm on May 21, 2009 (gmt 0)|
|We set to detect IF https protocal was requested.. - If so, we set onpage meta robots to block access for https requests. |
There have never been any meta robots on any http page.
If you've somehow managed to set up two sets of dedicated files, one for https requests and another set of dedicated files for http requests, this would make sense.
But, when one of your "http pages" has received an https request, adding the noindex robots meta tag to that page is going to block further http access to that page as well, since there would be only one file serving both protocols.
Again, you are indicating that your "old http pages" haven't gotten listed again... and, from my understanding of your description, I believe this is the reason why.