Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Search Console, Content Keywords listing the word "https"

         

Frank_Rizzo

10:08 am on Aug 21, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google WMT > Google Index > Content Keywords

1, Stats
2, Horse
.
.
.
6. https

Significance: 33%, Occurrences: 198,309

Why is the word 'https' being listed as the sixth most popular content keyword for the site? And is it an issue that needs fixing?

keyplyr

10:44 am on Aug 21, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This may or may not be related but if it is the site in your profile, I get an HTTPS Privacy Error (using Chrome) when trying to access the site:

NET:: ERR_CERT_COMMON_NAME_INVAILD

That would "need fixing."

keyplyr

12:11 pm on Aug 21, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Perhaps that error page is served enough times that HTTPS ranks #6 .

Frank_Rizzo

2:46 pm on Aug 21, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No that's not the site with the https issue. The link you clicked was to a profile page on another site, which I had inadvertently set the link to https on a non https site. I have now corrected that.

Back to the main issue on the Stats site. For some reason GWT thinks that https://www.example.com deserves creating a keyword for https.

I can't find much on this searching on google. But I did find a post stating that google will be ditching that particular keywords feature in GWT. That will 'make the problem go away' but surely it would still be a problem somewhere, unless of course, we sell https widgets.

aristotle

12:33 am on Aug 22, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've noticed that this list can include things from the source code of a page. For example, I just looked at a list that includes the words "blogspot" and "wordpress", evidently because the site has about a hundred outlinks, some of which point to pages on blogspot.com and wordpress.com. Those are in the source code but aren't anywhere in the visible text on any of the pages.

keyplyr

12:56 am on Aug 22, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



google will be ditching that particular keywords feature in GWT.
Doubtful. That feature is very useful in determining on page keyword proliferation balance compared to search terms. I use it a lot. But who knows with Google.

@aristotle - I've never seen this so perhaps it is a bug, along with a couple other bugs in GSC.

Nutterum

7:46 am on Aug 22, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



I have had an issue with my contact form where in the code I had used list of all countries. Since many of those countries are islands, I had "Islands" as top 5 keyword. So yeah, sometimes the Bot just views your content and "ranks it" regardless if it is relevant or not.

Andy Langton

10:18 am on Aug 22, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What are the "top URLs" listed if you click on the keyword? And, are you viewing stats for the http:// version in Search Console (verify and check the HTTPS version instead, if that's the case).

Frank_Rizzo

10:10 am on Aug 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Interesting.

--------------------------
Google found the keyword https on these top pages:
Top URLs
sitemap.xml.gz
sitemap1.xml.gz
sitemap5.xml.gz
sitemap4.xml.gz
sitemap2.xml.gz
--------------------------

an example entry in the file is this:

<loc>https://www.example.com/blue-widgets</loc>

The sitemaps are autogenerated each night by an old google python script, which has these credentials:

name='sitemap_gen',
version='1.5',
description='Sitemap Generator',
license='BSD',
author='Google Inc.',
author_email='opensource at google.com',
url='http://code.google.com/p/sitemap-generators/',

The script works well, and it populates GWT but clearly something is going awry in that GWT now thinks https is a keyword.

rainborick

4:12 pm on Aug 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I suspect Google has indexed your sitemap file. Block it in robots.txt and I bet you see those keywords disappear in a couple of weeks.

keyplyr

7:20 pm on Aug 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



It would be illadvised to "block" sitemap in robots.txt.

Andy Langton

7:23 pm on Aug 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You can noindex it via HTTP headers ;)

rainborick

11:41 pm on Aug 25, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, Andy's idea about using the noindex HTTP header is much better.

Frank_Rizzo

9:52 am on Aug 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK I have added this

<Files ~ "sitemap.*\.xml(\.gz)?$">
Header append X-Robots-Tag "noindex"
</Files>

and will see what happens.

Andy Langton

10:23 am on Aug 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just to note a curiosity when specifically querying Google for gz files:

[google.co.uk...]

Google does seem to index sitemap GZ files, nonetheless:
[google.co.uk...]

trabis

4:25 pm on Aug 27, 2016 (gmt 0)

10+ Year Member Top Contributors Of The Month



I have a similar problem with keywords like "mail", "google", "yahoo", "reddit", "aol", "app" and many others that are not present on the source code. After searching about it, I was able to identify the source of such strings. They were present in an external javascript file that was being loaded by a Wordpress plugin called "AddToAny" (it is a social sharing plugin). I removed the "offending" plugin and I am now waiting to see if the keywords get dropped from the search console.

If you search on google for "How share buttons could ruin your SEO – da Agency" you may read a complete description on this issue (if the site is not loading then look at the google cache version).

The site were this is/was happening is only 2 months old so I will not be able to associate ranking changes with the removal of this plugin.

To consider:
- When I ask google to show me the pages containing this keywords, it will show me any page containing this javascript file. Because this javascript file is used in every page, it means that every page on my site is listed has having this keywords.
- When I do a site search on google for a keyword like "reddit1234" it gives back 0 results as expected.
- When I do a site search on google for "reddit" or "aol" or any other weird keyword that I should not be ranking for, it gives back 1000 results which is about 20% of my indexed pages.

Frank_Rizzo

10:40 am on Oct 26, 2016 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



FYI: The

<Files ~ "sitemap.*\.xml(\.gz)?$">
Header append X-Robots-Tag "noindex"
</Files>


seems to have fixed it. https is no longer listed as a Content Keyword. It took a couple of months to go through but it has now gone.