Forum Moderators: phranque

Message Too Old, No Replies

Google is indexing some of my pages as "https"

We don't use SSL or https, where is this coming from?

         

maximillianos

5:31 am on May 11, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I noticed this evening that G was showing a result to one of our newer pages as "https://...".

We don't use https anywhere on our site or SSL for that matter. How would Google begin to index our site this way? Do they just start looking on both http and https to see what is there?

My problem is this. I have no idea how long this has been going on. I have no idea if I have any inbound links from big sites out there that might have accidentally used the "https" prefix. Is there any way I can run a search to find any inbound https links to my site?

I figured out how to both redirect https to http, and disable https altogether. But I was not sure which I should do at this point. If I am not sure what is out there, is it better to 301 all https requests to httpd? The one problem with this is browsers popup a security warning/certificate error, etc. Since we don't have a valid SSL certificate.

Right now I have simply commented out the "listen 443" from the ssl.conf file and that seems to have disabled any https requests.

Any advice on how to handle this? Should I not worry about any old errant https inbound links and just turn it off to avoid dup content problems?

Thanks for any advice!

lammert

10:19 am on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Pointing links to your site with https:// is one of the methods competitors can use to try to generate duplicate content on your site to let your rankings tank.

Disabling the response on port 443 as you have done now should be sufficient. The search engines won't find your content anymore and these URLs will gradually be removed from their index.

maximillianos

4:55 pm on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I had a remote hunch it might be related to a competitor. We've had quite a few problems over the past year or two with a competitor trying to hurt our site.

I'm glad my first hunch was the right one to disable port 443 and just shut it down all together.

Interestingly we also have been seeing tons of indexed pages using our IP address instead of our domain name. I wonder if it is related? In the IP case we ended up doing a 301 redirect for all requests using the IP to move them to the domain name.

Do you think that is a sufficient solution?

Thanks for the response lammert!

bwnbwn

5:13 pm on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



maximillianos the ip should never resolve the domain this is an error when setting up the domain on the server that is exploited as well.
This needs to be fixed at the server to not allow this to even happen.

maximillianos

5:21 pm on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks bwnbwn! I'll look into fixing it asap. I appreciate the advice!

So far the only solution I've found is to 301 the IP requests to the domain name. Do you know off-hand how to tell apache to ignore IP requests? Would it be done through a rewrite condition? Or server config?

Thanks!

lammert

5:43 pm on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The IP address of your site showing up in the search index is the same type of attack as the https:// version showing up in the index. The problem is that a 301 redirect from the IP address to the real domain name is not enough to stop all attacks. The only real solution is changing your server configuration in such a way that your site is only served for example.com or www.example.com, but not for anything else.

The easiest way to do this (assuming Apache) is to switch on name-based virtual hosting and create two <VirtualHost> containers. The first <VirtualHost> functions as a catch-all for everything which is not your domain name. This <VirtualHost> can for example return an empty page or a 403 Denied error for every request. The second <VirtualHost> contains the definitions of your site with a ServerName example.com and ServerAlias www.example.com line.

With this setup you will not only be immune for IP address attacks, but also to attacks where your competitor registers throw-away domains and lets them all point to your IP address.

maximillianos

5:46 pm on May 12, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Great advice lammert. Thanks. Off to re-configure my Apache now.

maximillianos

4:41 am on May 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Finally got it figured out! I had to move all my default server config settings into a virtual host in order to be able to disable my default site from coming up when requested by IP.

I was already using virtual hosts for a few other sites, but my main site was configured up top in my conf file. After I moved all that down in a virtual host, it worked like a charm.

I might point a junk domain at it just to test and make sure that portion works.

Thanks again everyone!

maximillianos

1:45 pm on May 13, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Okay, my wife brought up an interesting point. If I missed these two holes where external sources could harm my rankings, are there any other domain/ip tactics that someone could exploit? What other glaring holes am I missing? =)

I still can't believe I didn't think of how someone could point junk domains at my IP. That one is so obvious, it just never occurred to me.

BradleyT

7:25 pm on May 13, 2010 (gmt 0)

10+ Year Member



I don't know if you actually "fixed" it. We've had the exact thing happening over the past few weeks and it seems to have corrected itself. Google even chose an https page for a sitelink - even though they had been using the http version of that page for 2+ years. Just 1 out of the 8 was listed as https.

I also moved our site to a new IP address in early April and Google was showing our domain URL listings as #1-2 and our old IP URLs in results #5-#10.

For the most part everything has cleaned itself up this week.

lammert

2:03 am on May 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can fix this issue on the website side, not on the Google side. What maximillianos has done by only serving the site for legitimate requests on the domain name is the maximum possible. Google's algorithm may even then screw things up from time to time--especially during site moves between servers--but those issues should not have long lasting consequences.

g1smd

7:50 am on May 14, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



As well as the http/https issue and the www/no-www and the www/IP duplicate content issues, also be aware of:
- appended port number: www.example.com:80/
- appended period on host name: www.example.com./
- CapItaliSation issues: example.com/page vs. example.com/pAGE (not usually an issue with Apache)
- appended query strings: www.example.com?added-junk

Robert Charlton

7:13 pm on May 15, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Chances are that if you have https issues, no matter who's causing it, you have other dupe content issues as well.

I suggest you take a look at the Hot Topics [webmasterworld.com] area in the Google SEO forum, pinned to the top of the forum home page, and look at the threads in the "Duplicate Content" section.

If you're running Apache, also check out the Apache forum for more specific code suggestions. This thread is probably the definitive guide on the subject....

A guide to fixing duplicate content & URL issues on Apache
How to canonicalize all of your URLs with a single redirect
http://www.webmasterworld.com/apache/3208525.htm
[webmasterworld.com...]

Sgt_Kickaxe

10:33 am on May 17, 2010 (gmt 0)



I've seen errors like this before, Google is great at sorting them out quickly so I wouldn't worry about it just yet. Review your code for a possible cause and then wait it out would be my advice.

My favorite Google funny listings, which seems to happen to me a bunch, is when a site I launch has had a previous domain owner. My site index page shows up under the www version and the previous owners site shows up for the non www version DESPITE not having been online for 2 years or more. It only lasts a day or so but it's amusing to see. It also tells me Google doesn't forget anything.

maximillianos

11:54 pm on May 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I actually switched back to doing 301 redirects for both my IP and https requests. The problem was (I discovered) Google had indexed so many pages via my IP and https that by blocking them, I was losing a lot of traffic. So I went back and re-directed them and my traffic went back up to normal levels.

Hopefully now that they are 301'd to my main domain, G will sort out the rest in the coming months.