Forum Moderators: Robert Charlton & goodroi

Large site audits – how to spot thin pages beyond manual spot checks

         

urmos

4:03 pm on Feb 5, 2026 (gmt 0)



On large websites (1k+ URLs), I’m finding that manual spot checks miss most thin content issues.

Most weak pages seem to come from:

– category templates
– auto-generated filters
– duplicated product blocks
– orphaned pages

Full sitemap crawling exposes patterns instantly, while sampling hides them.

For those managing big sites:

Do you crawl everything first, or still rely on manual review to guide fixes?

What signals have proven most reliable for identifying low-value content?

not2easy

5:52 pm on Feb 5, 2026 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hello urmos and welcome to WebmasterWorld [webmasterworld.com]

I have a few suggestions but rather than guessing at the structure, it would help to know whether this is a CMS like WP or are there individual pages (like example.html) that you are evaluating. For individual pages you could compare the page size/kbs and check the smaller pages.

Since you mention category templates it seems that you might be using a CMS. That means the pages are served from a database and have no 'size' until you actually view the page. In that case, category pages (and archive pages) shouldn't be indexed because they may repeat the same content found on pages/posts.

Have you created a sitemap? You don't need to submit a sitemap but if you do have one it should help you find old, outdated content and update or remove it.

BTW - that welcome link above offers tips on using the forums' features and settings and gives you how-to tips to help in finding things here.

tangor

9:39 pm on Feb 7, 2026 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Do you crawl everything first, or still rely on manual review to guide fixes?


Rely on knowing what is thin content and AVOID posting it! Manual followup is still essential.

Ecom has more problems for "thin" content than other niches, generally in product descriptions or when going too granular---ie. red widget, green widget, blue widget, violet widget when all are the SAME widget!. One description and a color select fixes 4 thins down to one "thin".

Taran

10:20 am on Mar 6, 2026 (gmt 0)

10+ Year Member Top Contributors Of The Month



On large sites the fastest way is to crawl everything first and then sort by signals that usually expose thin pages, word count alone is useless but when you combine very low internal links, near identical titles or H1 patterns, very low impressions in Search Console, and high template repetition you start seeing clusters fast, filters and parameter pages usually stand out because hundreds of URLs share the same structure and only one variable changes, and orphaned URLs show up when a crawl finds them in the sitemap but they have almost zero internal links pointing to them.