Welcome to WebmasterWorld Guest from 54.82.10.219

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Google Search Console, Content Keywords listing the word "https"

     
10:08 am on Aug 21, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 17, 2002
posts:1187
votes: 6


Google WMT > Google Index > Content Keywords

1, Stats
2, Horse
.
.
.
6. https

Significance: 33%, Occurrences: 198,309

Why is the word 'https' being listed as the sixth most popular content keyword for the site? And is it an issue that needs fixing?
10:44 am on Aug 21, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


This may or may not be related but if it is the site in your profile, I get an HTTPS Privacy Error (using Chrome) when trying to access the site:

NET:: ERR_CERT_COMMON_NAME_INVAILD

That would "need fixing."
12:11 pm on Aug 21, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


Perhaps that error page is served enough times that HTTPS ranks #6 .
2:46 pm on Aug 21, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 17, 2002
posts:1187
votes: 6


No that's not the site with the https issue. The link you clicked was to a profile page on another site, which I had inadvertently set the link to https on a non https site. I have now corrected that.

Back to the main issue on the Stats site. For some reason GWT thinks that https://www.example.com deserves creating a keyword for https.

I can't find much on this searching on google. But I did find a post stating that google will be ditching that particular keywords feature in GWT. That will 'make the problem go away' but surely it would still be a problem somewhere, unless of course, we sell https widgets.
12:33 am on Aug 22, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member aristotle is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 4, 2008
posts:3485
votes: 310


I've noticed that this list can include things from the source code of a page. For example, I just looked at a list that includes the words "blogspot" and "wordpress", evidently because the site has about a hundred outlinks, some of which point to pages on blogspot.com and wordpress.com. Those are in the source code but aren't anywhere in the visible text on any of the pages.
12:56 am on Aug 22, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


google will be ditching that particular keywords feature in GWT.
Doubtful. That feature is very useful in determining on page keyword proliferation balance compared to search terms. I use it a lot. But who knows with Google.

@aristotle - I've never seen this so perhaps it is a bug, along with a couple other bugs in GSC.
7:46 am on Aug 22, 2016 (gmt 0)

Preferred Member from BG 

Top Contributors Of The Month

joined:Aug 11, 2014
posts:546
votes: 173


I have had an issue with my contact form where in the code I had used list of all countries. Since many of those countries are islands, I had "Islands" as top 5 keyword. So yeah, sometimes the Bot just views your content and "ranks it" regardless if it is relevant or not.
10:18 am on Aug 22, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


What are the "top URLs" listed if you click on the keyword? And, are you viewing stats for the http:// version in Search Console (verify and check the HTTPS version instead, if that's the case).
10:10 am on Aug 25, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 17, 2002
posts:1187
votes: 6


Interesting.

--------------------------
Google found the keyword https on these top pages:
Top URLs
sitemap.xml.gz
sitemap1.xml.gz
sitemap5.xml.gz
sitemap4.xml.gz
sitemap2.xml.gz
--------------------------

an example entry in the file is this:

<loc>https://www.example.com/blue-widgets</loc>

The sitemaps are autogenerated each night by an old google python script, which has these credentials:

name='sitemap_gen',
version='1.5',
description='Sitemap Generator',
license='BSD',
author='Google Inc.',
author_email='opensource at google.com',
url='http://code.google.com/p/sitemap-generators/',

The script works well, and it populates GWT but clearly something is going awry in that GWT now thinks https is a keyword.
4:12 pm on Aug 25, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 15, 2003
posts:951
votes: 30


I suspect Google has indexed your sitemap file. Block it in robots.txt and I bet you see those keywords disappear in a couple of weeks.
7:20 pm on Aug 25, 2016 (gmt 0)

Moderator from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 891


It would be illadvised to "block" sitemap in robots.txt.
7:23 pm on Aug 25, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


You can noindex it via HTTP headers ;)
11:41 pm on Aug 25, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Apr 15, 2003
posts:951
votes: 30


Yeah, Andy's idea about using the noindex HTTP header is much better.
9:52 am on Aug 26, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 17, 2002
posts:1187
votes: 6


OK I have added this

<Files ~ "sitemap.*\.xml(\.gz)?$">
Header append X-Robots-Tag "noindex"
</Files>

and will see what happens.
10:23 am on Aug 26, 2016 (gmt 0)

Senior Member from GB 

WebmasterWorld Senior Member andy_langton is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 27, 2003
posts:3332
votes: 140


Just to note a curiosity when specifically querying Google for gz files:

[google.co.uk...]

Google does seem to index sitemap GZ files, nonetheless:
[google.co.uk...]
4:25 pm on Aug 27, 2016 (gmt 0)

Junior Member

joined:July 29, 2014
posts:47
votes: 0


I have a similar problem with keywords like "mail", "google", "yahoo", "reddit", "aol", "app" and many others that are not present on the source code. After searching about it, I was able to identify the source of such strings. They were present in an external javascript file that was being loaded by a Wordpress plugin called "AddToAny" (it is a social sharing plugin). I removed the "offending" plugin and I am now waiting to see if the keywords get dropped from the search console.

If you search on google for "How share buttons could ruin your SEO da Agency" you may read a complete description on this issue (if the site is not loading then look at the google cache version).

The site were this is/was happening is only 2 months old so I will not be able to associate ranking changes with the removal of this plugin.

To consider:
- When I ask google to show me the pages containing this keywords, it will show me any page containing this javascript file. Because this javascript file is used in every page, it means that every page on my site is listed has having this keywords.
- When I do a site search on google for a keyword like "reddit1234" it gives back 0 results as expected.
- When I do a site search on google for "reddit" or "aol" or any other weird keyword that I should not be ranking for, it gives back 1000 results which is about 20% of my indexed pages.
10:40 am on Oct 26, 2016 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 17, 2002
posts:1187
votes: 6


FYI: The

<Files ~ "sitemap.*\.xml(\.gz)?$">
Header append X-Robots-Tag "noindex"
</Files>


seems to have fixed it. https is no longer listed as a Content Keyword. It took a couple of months to go through but it has now gone.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members