Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Why is Googlebot still crawling my http://amp.example.com

I removed and redirected the AMP subdomain 4 years ago

         

guarriman3

8:24 am on Jan 4, 2024 (gmt 0)

10+ Year Member Top Contributors Of The Month



In 2020, I removed my AMP subdomain (amp.example.com) and 301-redirected all the HTTP+HTTPS traffic to example.com:
http://amp.example.com/product --> https://example.com/product
https://amp.example.com/product --> https://example.com/product


I'm still maintaining 4 properties in Google Search Console
1) HTTP+AMP (http://amp.example.com), with no sitemap
2) HTTPS+AMP (https://amp.example.com), with no sitemap
3) HTTP+NOAMP (http://example.com), with no sitemap
4) HTTPS+NOAMP (https://example.com), this is the main one, with one sitemap of 300k URLs

However, 4 years later, I'm still viewing 168k URLs into the GSC of the HTTP+AMP property, under the "Page with redirect". All of them show the same information:
  • Sitemaps: No referring sitemaps detected [it's ok for me]
  • Referring page: http://example.com/product [this is weird, why GSC is showing this URL of the HTTP+NOAMP property?
  • Last crawl: Jan 2, 2024, 10:41:16 PM
  • User-declared canonical: https://example.com/product [it's ok for me]

    Some questions
  • why is Googlebot still crawling my HTTP+AMP property after 4 years?
  • should I wait longer for Googlebot to stop crawling the URLs?
  • can I be penalized by this situation?
  • can I avoid this somehow?
  • why is GSC showing the HTTP+NOAMP referring page?

    Thank you.
  • not2easy

    12:02 pm on Jan 4, 2024 (gmt 0)

    WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



    What do you see in your access logs? Since it is all now on one domain/protocol and the old URLs are redirected to the new, there should not continue to be any crawling of the content, seeking it with old URLs - assuming you have set up 301 (permanent) rewrites and not 302 (temporary). If your logs are showing 302 responses, that is the reason they continue looking for the old URLs. Are you also redirecting inbound links?

    guarriman3

    4:58 pm on Jan 4, 2024 (gmt 0)

    10+ Year Member Top Contributors Of The Month



    Thank you very much, @not2easy, for your nice answer.

    I've just checked the Apache logs, and found that there are no 302 responses, and tons of 301 responses.

    And I also checked the HTTP headers of the redirects, and found one surprise: to redirect from 'http://amp.example.com/product' to 'https://example.com/product', I'm implementing two 301 redirects:

    1st 301 redirect: http://amp.example.com/product --> https://amp.example.com/product
    2nd 301 redirect: https://amp.example.com/product --> https://example.com/product


    May those multiple 301 redirects affect my SEO ranking, my crawling budget and the non-indexed pages in Search Console?
    (obviously, I will try to fix this issue)

    not2easy

    5:26 pm on Jan 4, 2024 (gmt 0)

    WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



    Possibly, if so it would be because the amp. subdomain exists and the redirect is causing a loop because it isn't capturing the request. The first 301 should go to the newer URL, capturing the request and appending it so there is one step.

    It appears you are using Mod_Alias (Redirect) and combining it with Mod_Rewrite which can wreak havoc. (can, not always)

    I'd suggest you post in the Apache forum for specifics, this isn't the place where the best help is found for this topic, we're OT already for the Google forum.

    edited to add - Apache forum is here: [webmasterworld.com...]

    lucy24

    5:37 pm on Jan 4, 2024 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    1st 301 redirect: http://amp.example.com/product --> https://amp.example.com/product

    Remove this redirect. It will be covered by the universal redirect of all amp. URLs, regardless of initially requested protocol.

    ... and we'll see you in the Apache subforum, because an http > https redirect should not happen until all other redirects have been handled.