Sacrifice low-traffic 120k URLs to boost the rest of the site? - Google Search and SEO forum at WebmasterWorld - WebmasterWorld

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Sacrifice low-traffic 120k URLs to boost the rest of the site?

Considering to noindex 120k URLs versus 150k URLs of high-traffic URLs

guarriman3

5:05 pm on Nov 27, 2021 (gmt 0)

10+ Year Member

Top Contributors Of The Month

Hi,

One of my websites shows data of 150k cooking recipes. From each one these 150k URLs with high traffic, I link each one of the ingredients (totaling 120k URLs).

HOMEPAGE --> categories --> http:// example.com/recipe/steamed-rice --> http:// example.com/ingredient/instant-rice

Google users reaching my website are looking for recipes (6,000 clicks/day), and not for the ingredients (100 clicks/day). Additionally, the AdSense RPM of the recipes is 6 times higher than the AdSense RPM of the ingredients.

Due to the fact that the number of URLs with cooking recipes is very similar to the number of URLs with ingredients (150k vs 120k), I'm suspecting that the existence of the ingredients' URLs is actually damaging the traffic and the relevance of my website:
- the information of the recipes is richer (more words, more photos) than the information of the ingredients (no photos, very low amount of words, shorter pages): so the ingredients' URLs may be causing thin-content and crawl-budget issues in my website.
- the link juice of the recipes' URLs may be leaching and leaking into the ingredients' URLs, and I want to boost the relevance of the recipes.

I'm considering to 'noindex' the ingredients' URLs to prevent Googlebot from crawling, but allowing users to browse them (1st, noindex + nofollow; 2nd, include them into robots.txt after 2 years).

Any opinion about this decision is welcome. Thank you.

lucy24

6:11 pm on Nov 27, 2021 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I'm considering to 'noindex' the ingredients' URLs to prevent Googlebot from crawling, but allowing users to browse them (1st, noindex + nofollow; 2nd, include them into robots.txt after 2 years).

As I read the post, “noindex” was my thought too. But I doubt it would take G two years to find all the noindex tags. Do some random spot-checking (in access logs, not GSC or any other utility) to see how often URLs of this type get crawled.

Unless you have some very unusual content with highly specific linking text, a robots.txt disallow is a de facto noindex. That is, a user could theoretically cause the URL to come up in a search result, with the “the site’s robots.txt prevents” blahblah et cetera, but you’d have to really try.

First take a look at GSC and confirm that all those 120,000 URLs are actually indexed. If it says “crawled but not indexed” you could proceed directly to a Disallow. And while you’re in GSC, see if your ingredients pages ever do come up in a search.

NickMNS

6:40 pm on Nov 27, 2021 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Users looking for recipes have a different intent as compared to those looking for information on ingredients. The pages must be designed around intent. Before nuking the ingredients pages I would look at how to align the page to serve the user's intent.

I'm basing the above a recent video I saw about SEO given at a VC conference in SF, I posted a link in this thread.
[webmasterworld.com...]

I think that there is this underlying assumption that there is no other option other than deleting or no-indexing content, but before taking that "extreme" steps I think it is probably worth while seeing how the content can be improved or optimized.

aristotle

1:49 am on Nov 28, 2021 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

- the link juice of the recipes' URLs may be leaching and leaking into the ingredients' URLs, and I want to boost the relevance of the recipes.

I'm not 100% sure what you intend to do here. But would just like to point out that nofollow tags act as "sink holes" for "link juice". So using them will not increase the amount of link juice that's available for the recipe pages to redistribute elsewhere.

guarriman3

1:43 pm on Nov 29, 2021 (gmt 0)

10+ Year Member

Top Contributors Of The Month

Thank you @lucy24, @NickMNS, @aristotle, for your kind answers

I doubt it would take G two years to find all the noindex tags. Do some random spot-checking (in access logs, not GSC or any other utility) to see how often URLs of this type get crawled.

That's right, but I wouldn't like to experience the "Indexed, though blocked by robots.txt" issue (https://www.webmasterworld.com/google/5043269.htm). Anyway, I'll follow your advice, and will spot check my Apache logs to see if Googlebot is crawling the noindexed-URLs.

First take a look at GSC and confirm that all those 120,000 URLs are actually indexed. If it says “crawled but not indexed” you could proceed directly to a Disallow.

A big share of the ingredients' URLs are actually in the "Crawled - currently not indexed", but not the majority, so I'd prefere to do first the noindex and then the Disallow.

The pages must be designed around intent. Before nuking the ingredients pages I would look at how to align the page to serve the user's intent [...] Before taking that "extreme" steps I think it is probably worth while seeing how the content can be improved or optimized.

Very good point. I've been browsing the video and thinking about the target of "creating content to fulfil the intent" (the 'SEO stack' is inspirating)

Nofollow tags act as "sink holes" for "link juice". So using them will not increase the amount of link juice that's available for the recipe pages to redistribute elsewhere.

And if I 'noindex' the nofollow-linked URLs? I understood that the link juice of such noindexed URLs would be redistributed through the rest of the site.

martinibuster

3:09 pm on Nov 29, 2021 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

Sounds like the ingredients pages can use improvement.

What I like about ingredients pages is that they are useful. As a home cook, I use the ingredients pages to familiarize myself with ingredient, their typical uses, best kinds, best brands and substitutions. As a cook, I adore those pages when they are well done.

I click those links to ingredient pages and get lost in the research of ingredients. The user intent for recipe research aligns perfectly with learning about ingredients.

But don't get all abstracted with thinking in terms of user intent. Drop the intent and just focus on the user. Getting abstracted thinking of user intent is a distraction for focusing on the real things that matter, which is what matters to ME.

People jump from recipe to recipe, aggregating different recipes. A well done ingredient page can lock a person to that site.

Most recipe sites are total crap because they all do the same thing, some more annoyingly than others, like the whole B.S. personal stories, ugh.

The best recipe sites, where I sign up for the newsletter, they offer me useful information.

This is especially important for ethnic ingredients, like achiote, fish sauce and sambal oelek.

Images of the best products and affiliate links to them is useful. For example, I buy many of my ethnic ingredients from Amazon because it's more convenient than driving 80 miles round trip to the nearest town that sells it.

So my advice to you is to take what you have an improve it in order to become bigger and more popular with site visitors.

Good luck!

Roger Montti
aka martinibuster :)

aristotle

5:30 pm on Nov 29, 2021 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

And if I 'noindex' the nofollow-linked URLs?

I understood that the nofollow links would be on the recipe pages. If that's the case, then none of their link juice will be transferred to the ingredients pages. But the recipe pages will not keep it either. It will disappear from the site. You can picture it as disappearing into sink holes.

And if I 'noindex' the nofollow-linked URLs? I understood that the link juice of such noindexed URLs would be redistributed through the rest of the site.

Well they won't have any link juice (except the small amount that google gifts to all pages) if all the links to them are nofollow.

So in summary, it appears to me that there would be no benefit at all to using nofollow tags on any links from the recipe pages to the ingredients pages, but there would be a drawback in that the site will lose some of its existing link juice.

londrum

7:39 pm on Nov 29, 2021 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

based on my experience of nuking thousands of pages to try and improve my own rankings, i would be very careful because i reckon that's what ultimately wrecked the rankings of one of my sites.
(granted, i nuked them, then brought them back again, then nuked them again, and maybe it was all that chopping and changing around that google didn't like).

you're basically telling google that half your site isn't worth indexing, and who knows how they will react to that.

if it was me i would much rather try and improve the pages first, rather than no-indexing or removing them

tangor

10:26 pm on Nov 29, 2021 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I've always viewed "link juice" as what exists from point a to point b... ie: one page to another, not one page to a site. Might be incorrect, then again I never tested it one way or the other since I consider all pages that *I* want to exist have value to MY SITE, else they get nuked or never exist in the first place.

I doubt you will see any benefit by dropping 120K pages from a site google knows is 270k pages. Might not have the effect you desire!

There are other ways to deal with crawl budget/frequency and those might produce better results.

aristotle

11:18 pm on Nov 29, 2021 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I've always viewed "link juice" as what exists from point a to point b... ie: one page to another, not one page to a site.

In the old days google's algorithm awarded a visible pagerank (PR) to each page. Presumably this was a "measure" of how much total "link juice" the page could distribute to other pages by linking to them (with dofollow links). It was equally divided among these links.

My simple-minded onderstanding of the matter is this:
The greater the pagerank of a page, the greater the amount of "link juice" that page can distribute. But the page somehow retains its same pagerank while also increasing the pageranks of other pages.

NickMNS

1:43 am on Nov 30, 2021 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I've always viewed "link juice" as what exists from point a to point b... ie: one page to another, not one page to a site.

Page rank is calculated on a per page basis, it has to be since a link must point to a page. But it is an iterative process, and within a given website in most cases pages point to each other, so every page benefits from an inbound link. So in that sense, practically page rank is website based. Except when you start adding no-follow directives, then you could be cutting of entire sections of a website, with no benefit to remaining parts.

My simple-minded onderstanding of the matter is this:
The greater the pagerank of a page, the greater the amount of "link juice" that page can distribute. But the page somehow retains its same pagerank while also increasing the pageranks of other pages.

If a page has a page rank say 7, then it's page rank is "7", no matter the links that appear on that page. Now assume you have 1 link leading away from that page. The page rank transferred from the page with PR 7 is 7. But now, instead of one link you have two links to different pages, each of those pages will receive 50% of 7 or 3.5.

It is my understanding (I don't know this for sure) if you add a no-follow to one of the two links then the link without the no-follow will remain at 50% of PR7 while the other is ignored. This assumes that Google will heed the directive, which they say they use as a hint and not a rule.

aristotle

2:53 pm on Nov 30, 2021 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

If a page has a page rank say 7, then it's page rank is "7", no matter the links that appear on that page. Now assume you have 1 link leading away from that page. The page rank transferred from the page with PR 7 is 7. But now, instead of one link you have two links to different pages, each of those pages will receive 50% of 7 or 3.5.

Google's page rank scale is logarithmic. If I remember correctly, each numerical increase (such as PR2->PR3) is approximately a jump of 6 times. So in this example, a PR3 page would be able to distribute about 6 times as much link juice as a PR2 page.

So your PR 3.5 calculation is incorrect. I don't remember exactly how to do this type of logarithmic calculation, but believe offhand that the true answer would be about PR6.5

But these type of calculations don't take account of other important factors such as the anchor text of the link, the age of the link, how often the link is clicked, and the relevance of the content of the linking page, all of which can affect the value given to a link by google's algorithm.

martinibuster

1:58 am on Dec 1, 2021 (gmt 0)

WebmasterWorld Administrator

10+ Year Member

Top Contributors Of The Month

assumes that Google will heed the directive,

I pointed this out in another thread already, but I'll repeat it here: the nofollow link attribute is not a directive. It's a hint, like canonicals.

As for nofollows and PageRank, If there are ten outlinks on a page, and one of them is nofollow, Google will still process the links as if there were ten outlinks.

Google apparently changed the behavior in response to attempts to "sculpt" links, i.e. reduce how much PageRank flows to certain nofollowed pages.

As for discussing what percentage of PR flows out of a page through a link, I don't think it's a good use of time trying to find a consensus because there are a lot of unknowns. Some links don't send PageRank, some links send a diminished amount of PR. It's not a set percentage. So there's no good answer.

The most we know is on the macro level, that if there are X amount of links on page then Google counts it as X amount of links out, regardless if they're nofollow or not. The percentages are not known.

bananaseo

10:23 pm on Dec 14, 2021 (gmt 0)

Guys you have to differentiate between PageRank/Linkjuice and crawling. Google can and does crawl nofollow links. So if your website tends to have problems with crawl budget, nofollow will not solve those problems. You have to prevent the page from indexing (noindex) or exclude it from crawling (robots.txt or GSC)

If a page has a page rank say 7, then it's page rank is "7", no matter the links that appear on that page. Now assume you have 1 link leading away from that page. The page rank transferred from the page with PR 7 is 7. But now, instead of one link you have two links to different pages, each of those pages will receive 50% of 7 or 3.5.

It is my understanding (I don't know this for sure) if you add a no-follow to one of the two links then the link without the no-follow will remain at 50% of PR7 while the other is ignored. This assumes that Google will heed the directive, which they say they use as a hint and not a rule.

This is interesting. Because in my understanding by doing so the link without nofollow will get 100% of PR7 while the other is ignored, but still can be crawled and even indexed (if you do not add noindex or exclude Google from crawling explicitly). Best practice is as far as i know to nofollow all links pointing to not relevant pages, which would be the ingredients, and keep the link juice floating around all other pages (similar recipes, menu links, ... ).

Talking about 120k URLs for ingredients out of maybe 300k URLs in total and if you plan to exclude them from indexing anyway i think nofollow is way to go, no matter how link juice is getting passed. But you should consider keeping them anyway and maybe include them in a more user benifitial way as others mentioned.