Welcome to WebmasterWorld Guest from 3.227.233.6

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Pesky old http addresses in a site command

     
10:07 am on Mar 2, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Dec 20, 2017
posts: 98
votes: 0


Hi
When i run a site command on my site i get scores of http addresses along with https.
These old urls do redirect to the https equivalent but I am sure they are cluttering up my serps results.
They all have 301 on them
I have used the url removal tool but stil they remain.

How can i get them removed from google given that url removal does no seem to do the job
Thanks
Arturo
12:03 am on Mar 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


They should drop off naturally as time passes.

It may in fact be a redundancy issue with site: example.com itself. Look at your old HTTP property at GSC to see if there is any activity. There should only be a handful from bots that do not (or will not) support 301 redirect.

If you have a proper 301 redirect in place at your server, your site visitors & bots will all get to the new pages going forward.
12:49 am on Mar 3, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Dec 20, 2017
posts: 98
votes: 0


Can you tell me where in GSC i can see old http activity?
I don't see an option
Thanks
1:32 am on Mar 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


At the top-right of GSC, there's a drop-down listing your web properties.

If you had a GSC account prior to your switch to HTTPS, the old HTTP property should be listed there as well.
3:33 am on Mar 3, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4399
votes: 314


Are these static html pages, Joomla, WordPress or some other CMS? Do the old URLs redirect to the new URLs in your .htaccess file with a "[R=301]" flag or something else? Is this domain hosted on Apache? There are a number of things that could be at play and I wouldn't venture to guess without some basic information.

IF you had an entry at GSC prior to the https and 301 redirects, then you might get information there as keyplyr suggested. You can do a quick check of your robots.txt to be sure you are not listing any http sitemaps there.
3:58 am on Mar 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


Now that not2easy mentions sitemap.xml, make sure your sitemap.xml uses all HTTPS paths.
9:20 am on Mar 3, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Dec 20, 2017
posts: 98
votes: 0


I appreciate all your suggestions and i am working through them

Sitemaps only point to https (yoast generated)
Pages are wordpress pages
Apache hosting
There was no GSC account before https was installed , but there may have been old http paths in google

In the removal ssection of GSC, when i out in certain urls, GSC generates
https://example.com/https://www.domain.com/15078/
i.e. the removal request show htt and https in the url to be remvoed:

This is the htaccess code that geenrate this:

RewriteRule proddet\.asp http://www.example.com/department/gifts/willow-tree/ [R=301,L]
# BEGIN W3TC Browser Cache
<IfModule mod_deflate.c>
AddOutputFilterByType DEFLATE text/css text/x-component application/x-javascript application/javascript text/javascript text/x-js text/html text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/bmp application/java application/msword application/vnd.ms-fontobject application/x-msdownload image/x-icon image/webp application/json application/vnd.ms-access application/vnd.ms-project application/x-font-otf application/vnd.ms-opentype application/vnd.oasis.opendocument.database application/vnd.oasis.opendocument.chart application/vnd.oasis.opendocument.formula application/vnd.oasis.opendocument.graphics application/vnd.oasis.opendocument.presentation application/vnd.oasis.opendocument.spreadsheet application/vnd.oasis.opendocument.text audio/ogg application/pdf application/vnd.ms-powerpoint image/svg+xml application/x-shockwave-flash image/tiff application/x-font-ttf application/vnd.ms-opentype audio/wav application/vnd.ms-write application/font-woff application/font-woff2 application/vnd.ms-excel
<IfModule mod_mime.c>
# DEFLATE by extension
AddOutputFilter DEFLATE js css htm html xml
</IfModule>
</IfModule>
<FilesMatch "\.(html|htm|rtf|rtx|svg|txt|xsd|xsl|xml|HTML|HTM|RTF|RTX|SVG|TXT|XSD|XSL|XML)$">
<IfModule mod_headers.c>
Header append Vary User-Agent env=!dont-vary
</IfModule>
</FilesMatch>
<FilesMatch "\.(bmp|class|doc|docx|eot|exe|ico|webp|json|mdb|mpp|otf|_otf|odb|odc|odf|odg|odp|ods|odt|ogg|pdf|pot|pps|ppt|pptx|svg|svgz|swf|tif|tiff|ttf|ttc|_ttf|wav|wri|woff|woff2|xla|xls|xlsx|xlt|xlw|BMP|CLASS|DOC|DOCX|EOT|EXE|ICO|WEBP|JSON|MDB|MPP|OTF|_OTF|ODB|ODC|ODF|ODG|ODP|ODS|ODT|OGG|PDF|POT|PPS|PPT|PPTX|SVG|SVGZ|SWF|TIF|TIFF|TTF|TTC|_TTF|WAV|WRI|WOFF|WOFF2|XLA|XLS|XLSX|XLT|XLW)$">
<IfModule mod_headers.c>
Header unset Last-Modified
</IfModule>
</FilesMatch>
# END W3TC Browser Cache
# BEGIN W3TC Page Cache core
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{HTTPS} =on
RewriteRule .* - [E=W3TC_SSL:_ssl]
RewriteCond %{SERVER_PORT} =443
RewriteRule .* - [E=W3TC_SSL:_ssl]
RewriteCond %{HTTP:X-Forwarded-Proto} =https [NC]
RewriteRule .* - [E=W3TC_SSL:_ssl]
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteRule .* - [E=W3TC_ENC:_gzip]
RewriteCond %{HTTP_COOKIE} w3tc_preview [NC]
RewriteRule .* - [E=W3TC_PREVIEW:_preview]
RewriteCond %{REQUEST_METHOD} !=POST
RewriteCond %{QUERY_STRING} =""
RewriteCond %{REQUEST_URI} \/$
RewriteCond %{HTTP_COOKIE} !(comment_author|wp\-postpass|w3tc_logged_out|wordpress_logged_in|wptouch_switch_toggle) [NC]
RewriteCond "%{DOCUMENT_ROOT}/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_SSL}%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" -f
RewriteRule .* "/wp-content/cache/page_enhanced/%{HTTP_HOST}/%{REQUEST_URI}/_index%{ENV:W3TC_SSL}%{ENV:W3TC_PREVIEW}.html%{ENV:W3TC_ENC}" [L]
</IfModule>
# END W3TC Page Cache core
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]


[edited by: Robert_Charlton at 10:18 am (utc) on Mar 3, 2018]
[edit reason] :Replaced domain.com.with example.com, which does not trigger auto-linking. [/edit]

9:22 am on Mar 3, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Dec 20, 2017
posts: 98
votes: 0


correction ,removal urls genrated in GSC is

..."https://domain.com/https://www.domain.com/15078/....
9:40 am on Mar 3, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Dec 20, 2017
posts: 98
votes: 0


So if i enter a url with www in it,
....."https://www.domain.com/ukgs_logo/

I get generated for removal, a url that does not exist:
...."https://domain.com/https://www.domain.com/ukgs_logo/
9:58 am on Mar 3, 2018 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:12913
votes: 893


I highly suggest *not* to use the removal tool.

Again, just let any old address fall off naturally.
1:37 pm on Mar 3, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11774
votes: 227


There was no GSC account before https was installed , but there may have been old http paths in google

you should create and verify your http: property in GSC so you can track the issues with the http urls separately.

You must add http and https versions of your site as separate properties.

source: [support.google.com...]
2:59 pm on Mar 3, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4399
votes: 314


I do not think I have ever seen such a scary htaccess file. Because the separate elements are not separated with an empty line, it is difficult to see where one ends and another begins - but that isn't the scary part. The nested modules may be fine, (?) but they don't do anything that can't be done without a plugin. I somehow doubt that all those file extensions are in use on your site.

That first rule rewrites to "http://www.example.com/..." (no https in there) then adds a [R=301,L]. I do not see any HTTP to HTTPS rule anywhere, but if all pages/content are within WP then it will use whatever you have entered into your Settings. The example you quote above
"https://domain.com/https://www.domain.com/ukgs_logo/
is adding the non www version of the domain as a prefix, then appending the request. I would definitely look at what you have in your Settings for starters. It looks like plugin/settings conflict.
9:04 pm on Mar 3, 2018 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11774
votes: 227


In the removal ssection of GSC, when i out in certain urls, GSC generates
https://example.com/https://www.domain.com/15078/

this is probably a result of an error in a document crawled by googlebot.

but that isn't the scary part. The nested modules may be fine, (?) but they don't do anything that can't be done without a plugin. I somehow doubt that all those file extensions are in use on your site.

it looks like these sections are managed by (W3TC Cache) plugins.
8:58 am on Mar 4, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Dec 20, 2017
posts: 98
votes: 0


phranque: yes w3c is being used

not2easy
"https://domain.com/https://www.domain.com/ukgs_logo/
is adding the non www version of the domain as a prefix, then appending the request.
I would definitely look at what you have in your Settings for starters. It looks like plugin/settings conflict.

Do you mean wordrpess settings are something else?
I have 100 urls I suspect need removing and they all have the form
..."https://www.domain,com/123
i.e.e contain www

thanks
2:57 pm on Mar 4, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4399
votes: 314


Yes, I meant the WP Admin Settings, where you tell WP the URL to use. Since there is no canonical rewrite in your htaccess file, you are relying on WP Settings for that function and it appears that WP is taking the URL you enter:
https://www.example.com/ukgs_logo/ 
and adding the non-www.domain as a prefix to the URL you requested.



BTW - as mentioned above, you should not be requesting URL removal from Google for these types of URLs as they don't exist.
9:01 am on Mar 5, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Dec 20, 2017
posts: 98
votes: 0


cheked my worpress setting
https://example.com

So if i add a no canocial rewrite, then site command will no longer show https://www.example.com only https://example.com ?
9:05 am on Mar 5, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Dec 20, 2017
posts: 98
votes: 0


would that be adding this line to the header.php in worpdpress?

<link rel="canonical" href="https://www.example.com<?php echo $_SERVER['REQUEST_URI'];?>">
9:07 am on Mar 5, 2018 (gmt 0)

Junior Member

Top Contributors Of The Month

joined:Dec 20, 2017
posts: 98
votes: 0


But what i dont understand is that these https://www.example.com urls seem to exist only in google, not in my site.
2:25 pm on Mar 5, 2018 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4399
votes: 314


cheked my worpress setting
https://example.com

So if i add a no canocial rewrite, then site command will no longer show https://www.example.com only https://example.com ?
That means that your site is set up to use the "no www" URLs. Rather than rewrite the request, something is causing the request to be prefixed with your settings. That first line rewrite (if it is involved in rewriting your URLs) might be a factor. That is why I suggest this part of your question be asked in the Apache forum.

would that be adding this line to the header.php in worpdpress?

You should not edit your core WP files such as header.php, you said you were using Yoast, you should let that plugin manage your meta canonicals. I was referring to the rewrite normally used in htaccess to rewrite www to non-www. The first line is rewriting to the www and without https. That question is for the Apache [webmasterworld.com] forum.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members