Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

How to analyze a boost of 'Crawled - currently not indexed'

         

guarriman3

2:25 pm on Jan 8, 2024 (gmt 0)

10+ Year Member Top Contributors Of The Month



In November, the number of 'Crawled - currently not indexed' in Google Search Console boosted from 50k and today it is 150k.

If I analyze the affected URLs, I find a weird situation, because only 20% of the URLs are actually tagged as 'Crawled - currently not indexed' with the 'Inspection URL' tool. The rest are:
  • URLs with 'noindex' (most of them)
  • URLs tagged as 'indexed'
  • URLs with redirection
  • images (I thought that images were not 'pages')

    Why am I seeing this increase of 'Crawled - currently not indexed' pages, with a big share of pages that are not actually tagged as 'Crawled - currently not indexed'?

    The number of Indexed pages dropped from 524k to 448k in the same period, with the resulting decrease in visits through Google.

    I do not know how to start a sort of 'audit' to analyze this situation, in order to fix the problem with the right URLs.
  • aristotle

    7:25 pm on Jan 8, 2024 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    URLs with 'noindex' (most of them)

    If most of them have a noindex tag, then that must be the reason why most of them have been crawled but are currently not indexed.

    So your question only pertains to the remaining minority. Of these, how many were previously indexed but no longer are?

    myrddraal

    5:55 am on Jan 25, 2024 (gmt 0)

    10+ Year Member



    Normally that's a sign your site is being de-ranked or otherwise being devalued in Google's eyes.

    The higher they think of you, the higher the number of pages you can have in their index. Did this coincide with search traffic drop?

    Nutterum

    8:50 am on Feb 2, 2024 (gmt 0)

    10+ Year Member Top Contributors Of The Month



    For those encoutering this issue, usually this happens when you have a sitemap with the product/page but no natural path towards that product/page. Say you have a category " /sweaters " where the listing displays products. However the sitemap has products located under " /products " . But products is just a folder , there is no full list of products there or anything, it just holds where the products are. This will cause google to crawl the products from the sitemap, but only index those that it deems fit from the listing page , the rest being essentially island pages will go under crawled not indexed.

    Another common way to see such pages are products that have been sold but not redirected or 410-ned. This will cause an ever bigger backlog of crawled products that will never see the light of day in the SERP because google knows they are sold and as such of no value to the user for the most part (exception here are luxury products where google can decide to still show is the product description is detailed enough)

    tangor

    8:09 am on Feb 5, 2024 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    URLs with 'noindex' (most of them)
    URLs tagged as 'indexed'
    URLs with redirection
    images (I thought that images were not 'pages')


    Noindex stuff can be ignored, your choice to set up that way
    URLS with redirection, again your choice and don't count except for the end result of a new "index"
    Images ... got me on that one... unless g is merely saying that's something different.

    Which leaves "tagged as indexed" ... What do these pages have in common? Thin? Duplicative? Low Value? Spam?

    Also bear in mind that regardless of how mighty g might look from the outside, there are true limits to the magic indexing they are capable of doing. I suspect that in the future the indexing will be more precise and some of the "fluff" will begin to disappear.

    Make every URL count for unique and specific content.

    guarriman3

    9:14 am on Mar 11, 2024 (gmt 0)

    10+ Year Member Top Contributors Of The Month



    @aristotle
    how many were previously indexed but no longer are?

    Is there any batch method to check it? I mean, I can create a TXT file with the 1,000 page sample that Google provides, but can I create somehow a Python script to loop the list?

    @myrddraal
    Did this coincide with search traffic drop?

    Yes, since the number of Indexed pages dropped from 524k to 448k in the same period

    @Nutterum
    usually this happens when you have a sitemap with the product/page but no natural path towards that product/page

    Very interesting thoughts, thank you for sharing. I think it's a very important issue: my website lists 350,000 products, with the following structure
    Home (example.com)> 50 brand names (example.com/acme/) > 2,500 categories (example.com/acme/sweaters/) > 350,000 products (example.com/product/acme-sweater-33)


    The products are linked just from the categories, and some of these categories link 3,000 products. And each product links randomly to other 10 products of its category. I'm afraid that Google is struggling to crawl the products, but I don't have any smart idea to improve it.

    Google knows they are sold and as such of no value to the user for the most part

    Not sure if this is the case. I list products with valuable information, but it's not an e-commerce site. I just serve AdSense ads within the webpages. But you may be right that some of the pages lack enough valuable information.

    @tangor
    What do these "tagged as indexed" pages have in common? Thin? Duplicative? Low Value? Spam?

    My website lists products whose information was obtained from a not-public database. Some products may be lacking enough information and, you are right, may be considered as 'thin content'.

    Images ... got me on that one

    I've found out that all of these images have, as Referring Site, spammy webpages. How can avoid that?

    lucy24

    4:38 pm on Mar 11, 2024 (gmt 0)

    WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



    Are you taking G###'s unsupported word for it, or have you spot-checked (exact-text search) to verify that the pages in question really aren't indexed?

    I've found out that all of these images have, as Referring Site, spammy webpages.
    Do you mean that G sends the spammy site as referer* when crawling the image, or only that GSC says that that's how they found out about the page?

    * Googlebot-by-that-name sends a referer with image requests; Googlebot-Image doesn’t.