Forum Moderators: phranque
Any explanation why the rewrite rule doesn't work for Googlebot anymore?
(example.net can of course not be added as a property in GSC to enable research.)
That will obviously result in a duplicate content problem. (Also note: no https and no www.)
RewriteCond %{HTTP_HOST} !=www.example.com
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
But now the Google Search Console has started to include, in the Top linking pages reportAre they saying anything about “via this intermediate link”? If so, you can safely ignore the whole thing.
This is of course assuming you have no spurious conditions/rules before this that might treat Googlebot differently?
Is it possible that your redirection somehow "stopped working" for a period of time?
Do you have a separate HTTP to HTTPS redirect?
Are you using absolute URLs to "https://www.example.com" for your internal linking?
(example.net can of course not be added as a property in GSC to enable research.) Why not?
Was this previously an active domain?
I assume all variations (HTTP vs HTTPS, www vs non-www) of example.net all point here?
your redirect could be greatly simplified
Are they saying anything about “via this intermediate link”?
Have you checked the logs for example.net
you do need to put everything into GSC
The previous rule, which should not be of relevance here is:
RewriteCond %{REQUEST_URI} ^([^.]+\.html)
RewriteRule \.html. https://www.example.com/$1 [R=301,L]
hmm... I think I don't need that [HTTP to HTTPS] redirect anymore after penders' simplified code.
Very rarely. I use relative internal links.
> (example.net can of course not be added as a property in GSC to enable research.) Why not?
A site example.net doesn't exist. It's just a parked domain. But, admittedly, I haven't tried to add it as a property of mine.
> Was this previously an active domain?
Never has been. example.net has always been just parked.
it doesn't look like it's doing what it should be doing?
For Googlebot to see a backlink from http://example.net to https://www.example.com, it must have been able to crawl http://example.net (if the redirect failed for whatever reason) and seen links
If you do a site:example.net search in Google, do you get any results? In the GSC linked pages report you should be able to drill down to see exactly where the page is being linked from. Do you have a rel="canonical" tag set on your pages?
Presumably you have no other directives in your .htaccess file
Aside: What do you think that rule is doing?As written, the effect of the rule is to strip away any and all garbage that might happen to occur after “html”. It's one of a surprising number of rules that most sites will never need, but might become necessary if you detect extraneous stuff entering your .html URLs. (This does not apply to spurious query strings, only to the URLpath itself.) If you never actually see requests in this unwanted form, the rule can safely be deleted, since it's just that extra nano-erg of work for the server.
As written, the effect of the rule is to strip away any and all garbage that might happen to occur after “html”.
the wrong backreference has been used in the substitutionWhoops! So it has.
Unless path-info has been explicitly enabled, then the default handler for text/html files should already reject path-info.In that case I guess the idea is to avoid Duplicate Content--same as stripping the query string from an html request--since the correct page will otherwise be served at multiple URLs. But, unlike standard variations such as
RewriteCond %{HTTP_HOST} ^(www\.)?example\.net [NC]
RewriteRule (.*) - [R=403 404 or 410,L]
Are the [NC] and [L] flags redundant?The [NC] flag creates a tiny bit of extra work for the server, since it has to start by making the test string and the pattern both lower-case before comparing them against each other. So it should only be used when there is a genuine possibility that a variably cased request might come in. (One can argue about whether this applies to hostnames. Human browsers tend to flatten the casing, regardless of what the user typed or clicked; robots that request EXAMPLE.com are going to be blocked anyway.)
What about putting the following rule aboveIf your hostname canonicalization redirect uses a negative condition (the one that goes !^www\.example\.com$) then you don't need an extra rule for example.net because it has already been covered. And, once again, you do not need the ^(www\.)? part for a domain that doesn't exist. Just example\.net without anchors. Except that, again, you don't need this rule at all.
Seems like it's not possible to find out why and how Google has indexed pages from example.net.
On some pages I do have a footer in the style "The address of this page is example.com/foo" and it is hyperlinked, but I didn't consider that as internal absolute linking.
But is there something I can do to diminish the presumed damage
RewriteCond %{HTTP_HOST} !=www.example.com
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
RewriteCond %{HTTPS} off
RewriteRule (.*) https://%{HTTP_HOST}%{REQUEST_URI} [R=301,L]
RewriteCond %{HTTPS} off [OR]
RewriteCond %{HTTP_HOST} !=www.example.com
RewriteRule (.*) https://www.example.com%{REQUEST_URI} [R=301,L] However, for Google to have "indexed" seemingly every page ... In the Google SERPs do you see a "description" for each search result?
Use the Fetch as Google / URL Inspection tool
RewriteCond %{HTTP_REFERER} !^https://(www\.)?google\.
RewriteRule \.(jpeg?|jpg|gif)$ https://www.example.com/foo.png [NC,R,L]
I would like to combine two aforenamed rules into one but feel I need affirmation.
...but the number has been slowly increasing.
...I will only get one result: the correct page on example.com.
I'm not willing to register example.net as my property in GSC. I would rather not want G to know anything about example.net.
A working hotlinking protection has been in my .htaccess for many many years without any trouble. The http and Bing versions are on separate rows.
RewriteCond %{HTTP_REFERER} !^https://(www\.)?google\.
RewriteRule \.(jpeg?|jpg|gif)$ https://www.example.com/foo.png [NC,R,L]
[edited by: penders at 9:10 pm (utc) on Oct 22, 2019]
I would like to combine two aforenamed rules into one but feel I need affirmation.Yes, that looks right ... maybe. YMMV, but I've never been able to get my server to recognize conditions in = followed by a literal string (no anchors, no escaping and so on). I'd stick with the RegEx version, in which the pattern is
I can only assume this has been taken out of context (missing conditions)?
hmm... that would seem to suggest that something is still not right?
It remains a mystery how Google had found out about the parked example.net. Via the registry and then somehow succeeded in crawling "pages"?
And why did my previous .htaccess not take care of the subdomain mail? Maybe something is wrong with the first two condition lines in the code at the very top of this thread.
RewriteCond %{HTTP_HOST} ^((www\.)?(exampleA|exampleB|exampleC)|example)\.com [NC,OR]
RewriteCond %{HTTP_HOST} ^(www\.)?example\.net [NC]
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
RewriteCond %{HTTPS} !=on [OR]
RewriteCond %{HTTP_HOST} !^(www\.example\.com)?$ [NC]
RewriteRule (.*) https://www.example.com/$1 [R=301,L]
most likely it is pointing to the same web server as example.com.Well, if it isn't, there would be no point in having that set of canonicalization redirects in place :)