Forum Moderators: DixonJones

Message Too Old, No Replies

Redundant hostnames notification.

How do you track this down?

         

Broadway

3:29 pm on Mar 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I received a redundant hostnames notification via my Google Analytics account.

It says I'm getting traffic from both:
www.example.com
example.com

My preferred form is http://www.example.com

I checked, that is what I have declared in my WebmasterTools account. (That was done years and years ago.)

Also done years ago is this directive in my .htaccess file:
# To redirect all users to access the site WITH the 'www.' prefix,
# (http://example.com/... will be redirected to http://www.example.com/...)
# uncomment the following:
RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule ^ http%{ENV:protossl}://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

I've checked, the above redirect works.

Any new pages on my site are through Drupal. On those pages I only link using relative formats:
href="page.html"
href="../page.html"
I never use full URL's, i.e. href="http://www.example.com/page.html"

How can I track this problem down?
Just finding out which pages are being accessed in the wrong form
http://example.com/page.html
would be a giant help.

lucy24

8:08 pm on Mar 28, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Eeuw, what a strangely worded rule. It sounds as if you've only got one site. The ordinary form is
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

If different parts of the same site use either https or http, you may need to go a little fancier. But in general, try to stick with literal text in the target. The (blahblah)? part takes care of your first condition, so it's "exactly my preferred form or exactly nothing". Then if you get a wonky request with appended port number, that's taken care of too.

If both forms of your hostname are on Webmaster Tools, you should be able to look up each form's links separately. Unless they've gone and changed something. You can also check your raw logs for this pattern (assuming ordinary Apache format):
^(\d+\.\d+\.\d+\.\d+) - - \S+ -0[78]00\] "GET (\S+) HTTP/1\.[01]" 30[12].+\n\1 - - \S+ -0[78]00\] "GET \2 HTTP/1\.[01]" 200

Translation: Consecutive requests from the same IP for the same file, where the first receives a redirect. If you've got a busy site this code will need to be tweaked to allow for intervening lines. Further tweaks let you constrain the code to requests for pages. (I did a quick check to make sure there were no typos, and was interested to find that the bingbot loves requesting robots.txt from the "wrong" form of my name.)

If you log headers, you can also look at the hostname that way. But then you have to plow through the unwanted robots; you're probably only interested in the requests that end up with a 200.

Broadway

3:21 am on Mar 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Lucy.
The non-to-www code I'm using is the out-of-the-box code in the .htaccess file for new installs of Drupal 7.
Most of what's in .htaccess is Greek to me, so I've just used what they've provided.
My site doesn't use https.
I can say, however, that I've been using that code for I would assume over two years and I've not gotten a redundant hostnames notice from Google before.

[edited by: Broadway at 3:57 am (utc) on Mar 29, 2015]

Broadway

3:25 am on Mar 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've double just double checked. In webmaster tools on the Home page that lists all of your websites, it only shows www.example.com forms. It doesn't show any non-www website information.

Broadway

4:21 am on Mar 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've spent some time in a log file (apache), really only to get confused.
Right now I'm having a hard time understanding what the Redundant Hostnames notice is telling me.
I think it's telling me that my server serves pages whether they are called via:
http://www.example.com
or
http://example.com
So, isn't that difficulty just handled and resolved by the directive in the .htaccess file?
If yes, then that's what I need to work on.
If no, what other situations could lead to this same server performance?

lucy24

8:09 am on Mar 29, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In webmaster tools you have to add each hostname separately. So that means two versions of each of your sites, with and without www. Paradoxically, you have to set up both versions in order to tell them you only want to use one of them. And yes, you have to do this even if all googlebot requests are already getting redirected.

If you have an existing redirect that sends everyone to your preferred form (you can easily check this by typing in your "other" name for a few randomly chosen URLs on your site), then I don't know what they're on about. Except...

"Redundant"? What an odd word. Maybe they mean something entirely different. Wouldn't be the first time. (Witness the revelation that, in googlespeak, "minify" means "compress".)

But start by adding all your unwanted duplicates to WMT.

RhinoFish

9:02 pm on Mar 30, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yep, what lucy24 said.

See it here too:
[productforums.google.com...]

And see the last paragraph here (not the note, the paragraph above it):
[support.google.com...]

G should make this, verifying alternatives, a part of a single WMT account. Then again, just think how complex that would get if you used both versions (www and non) for different purposes. :-)

Broadway

4:29 am on Apr 4, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Ok thank you both. I've been working hard on getting this site converted to mobile friendly, so I haven't been back to this thread in a while. I'll do what you have mentioned and read the links.

It never crossed my mind to do this before, but here are the "details" Google Analytics gives me about this notification:

>>
Property http://www.example.com is receiving data from redundant hostnames. Some of the redundant hostnames are:

example.com
www.example.com

Redundant hostnames are counted as separate rows in reports, so hits that are going to the same page on your site from different hostnames will be split into multiple rows. With data split across multiple rows, traffic to specific pages will appear lower than it actually is.

To avoid this problem, consider setting up a 301 redirect from one of your redundant hostnames to the other, or create a search-and-replace filter that strips "www." from hostnames.

<<

Broadway

2:58 am on Apr 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm still way confused about this issue, but not overly alarmed.
The Google Analytics notification is now marked resolved.
(I'm not sure if I clicked the recheck button or they did it automatically.)

I have found out that Google Analytics has a HostNames report.

In that report I can see that the request for non-www pages totalled 101 pages, all on the same day.
And that type of event has only happened once (on that one day) in the past year.

I'm not sure what has trigged that event but I'm assuming it's a one-off.

Broadway

3:13 am on Apr 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



And by the way, I added the non-www to WMT. It was verified without any problem. And when I went to the "site settings" to declare the www version as the preferred form, it already had that checked. Possibly because I already had a redirect non-to-www in place in .htaccess.

Broadway

3:22 am on Apr 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



In that Google Analytics HostNames report I did see a HostName that was unfamiliar and had no reason to be there.
I think if a domain is in that report it means that it is serving pages with your GA code in them.
The traffic involving that domain wasn't just one day (it was more like 4) but it did exactly overlap when I had the non-www traffic.
I Googled that domain and in the SERP's it said that the website might be hacked.
From the name, I'm assuming it was a mom & Pop type website for a restaurant or something.
I Googled the domain's name with the term "malicious" and nothing really came up.
I am assuming the fact that they were hacked some how affected me.
I'm assuming (hoping) that is past history now.

lucy24

6:08 am on Apr 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Make sure your www redirect is optimally worded. Exact format will depend on your server, but conceptually it is always:
"If the requested hostname is anything other than this specific preferred form, then redirect to the right form".
As a bonus, this will also take care of rare horrors like evil sites temporarily pointing their DNS to your (physical) site.

RhinoFish

8:27 pm on Apr 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you're using htaccess / Apache to redirect, know that there are many super smart and helpful folks here at WebmasterWorld who can assist you, here:
[webmasterworld.com...]