Google Remove Tool & https

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Remove Tool & https

Ellio

7:10 pm on Mar 16, 2006 (gmt 0)

Can you use the Google remove tool to get rid of:
[widgets.com...]

Without effecting

[widgets.com?...]

I suspect you can as it uses full absolute URLS but I am fearful that it may disregard the http status when logging the pages for no-inclusion in the next six months!

Whats the opinion out there.

Ellio

10:23 am on Mar 17, 2006 (gmt 0)

Looks like nobody has any idea or opinion....

McClaw

2:46 pm on Mar 17, 2006 (gmt 0)

I think It would be safer to do a 301 redirect from the https version to the http version.

Ellio

4:09 pm on Mar 17, 2006 (gmt 0)

Far as I can tell you cannot do a 301 from https to http on a windows server using IIS.

lammert

4:27 pm on Mar 17, 2006 (gmt 0)

The Google URL removal tool treats widgets.com and www.widgets.com the same, so I wouldn't try it on https vs. http. Large chance that your whole http:// site disappears for six months.

tedster

8:23 pm on Mar 17, 2006 (gmt 0)

Here's one thing Google has to say:

...if you serve content via both http and https, you'll need a separate robots.txt file for each of these protocols. For example, to allow Googlebot to index all http pages but no https pages, you'd use the robots.txt files below.
For your http protocol (http://yourserver.com/robots.txt):
User-agent: *
Allow: /
For the https protocol (https://yourserver.com/robots.txt):
User-agent: *
Disallow: /
[google.com...]
To use the remove tool, you must also have a robots.txt in place. So if your https: remove request sends the bot to http: by some accident, then the bot would NOT see the proper robots.txt -- so theoretically the http: urls would be safe.
Warning - I have not tested this in the real world.

Nick0r

9:13 pm on Mar 17, 2006 (gmt 0)

If I was you I wouldn't touch the url remove tool what-so-ever, I've had nothing but a horrible experience with it. Don't let the 6 months fool you either, it can last a lot longer than that.

tedster

9:27 pm on Mar 17, 2006 (gmt 0)

I agree -- the dual robots.txt file alone without a remove request is what I would try first.

Ellio

11:30 pm on Mar 17, 2006 (gmt 0)

You cannot have dual robots.txt files for http and https when using Windows server via IIS.

Correct me if I am wrong but I have asked everywhere and am yet to get an answer that is workable on a basic HTML site.

Also, you could use the Google remove tool without using robots.txt to block providing the https version of the page shows a 404 error - say by removing the SSL cert.

McClaw

6:45 am on Mar 20, 2006 (gmt 0)

Do you have a dynamic site? asp / aspx?

If so, you can redirect each page using code.

(Probably alot of work, but worth it in the long run)

All you need to do is create the https version of the site as a completely seperate site, duplicate the pages, and add only the redirect code in the pages.

I'm assuming that you dont want to use the https version again.

idolw

6:53 am on Mar 20, 2006 (gmt 0)

Ellio, did you find your https:// pages in the SERPs out of the blue?
we have noticed it on one of our sites. it is back to http:// now.
it is quite strange as there is no link to the https:// page from anywhere.
leads to a suspicion that someone could enter the https:// page by mistake with a toolbar installed on it.

Ellio

9:44 am on Mar 20, 2006 (gmt 0)

Do you have a dynamic site? asp / aspx?
If so, you can redirect each page using code.
(Probably alot of work, but worth it in the long run)
All you need to do is create the https version of the site as a completely seperate site, duplicate the pages, and add only the redirect code in the pages.
I'm assuming that you dont want to use the https version again.

We do not have a dynamic site.
Basic HTML plus javacript & a few asp pages.

Ellio

9:52 am on Mar 20, 2006 (gmt 0)

Ellio, did you find your https:// pages in the SERPs out of the blue?
we have noticed it on one of our sites. it is back to http:// now.
it is quite strange as there is no link to the https:// page from anywhere.
leads to a suspicion that someone could enter the https:// page by mistake with a toolbar installed on it.

I noticed it when doing a site: search as the htpp index page (and others) had been replaced by the https version.

I have worked out how it happened and corrected the error. We had a secure forms on the same domain but they have now been transferred to a related domain and the SSL cert has been removed and a new SSL installed on the related domain.

The problem was with relative linking. There were no direct links to the https index page but there was a single relative /page.html link on the form pages and in turn that page had a /index.html relative link that the robot then sees as [mysite.co.uk...]

Good reason for ONLY using absolute linking. All our links have been changed to absolute but we are still waiting for the https pages to go and the http pages to return.

Luckily the problem occured in the default index and not Big Daddy.

Ellio

12:02 pm on Mar 20, 2006 (gmt 0)

Google have finally removed the https versions of the pages and replaced the http index page. Ranking however has not returned yet.

About half the site is now URL only but that may change at next crawl.

Ellio

11:14 am on Mar 21, 2006 (gmt 0)

Update for all those that were following this:

Google help emailed to say that they had removed the offending https pages as requested.

They confirmed that the http page versions would not be effected as these pages did not return a 404 error.

Sounds like its safe to use remove tool providing ONLY the exact URL being removed shows a 404 error.

I would not rely on Google remove/robots.txt for this.