Welcome to WebmasterWorld Guest from 3.85.214.0

Forum Moderators: Robert Charlton & goodroi

Message Too Old, No Replies

Is Too Many Files Blocked From Indexing = SEO Death

     
2:20 pm on Feb 20, 2017 (gmt 0)

Preferred Member

5+ Year Member

joined:Mar 22, 2011
posts: 449
votes: 7


Here is a benefits of users vs. benefits for SEO success question:

I have a new site about to launch and I have to make a critical decision:

The big draw of the site are nicely put together PDF files. To monetize the site, I have the PDFs served inline with ads served around the PDF.

I have 2 different options for serving the PDFs inline:

1. Create a separate HTML file that serves the PDF inline.

For example:
File one served as: HomePAGEURL/pdf-inline1.html
File two served as: HomePAGEURL/pdf-inline2.html

2. Serve all of the PDFs through a single script:

For example:
File one served as: HomePAGEURL/cgi-bin/showmypdf.cgi?pdf-inline1.html
File two served as: HomePAGEURL/cgi-bin/showmypdf.cgi?pdf-inline2.html


Option 1
Pros -
1. It is 15% faster for users.
2. Less cpu processor dependent.

Cons -

1. It create a bunch of extra HTML files (1 per file) that I have to block from bots reading. This will result in double the number of blocked from robot files on my site.


Option 1
Pros -
1. We result in HALF the number of files I'd have to block from robots.

Cons -

1. Slower for users and uses more computing resources.

My concern is that with an additional 20,000 files to blocked from robots (approx. HALF of the files on the site) it will look negative for my site's SEO profile.

What do you guys think?
6:41 pm on Feb 20, 2017 (gmt 0)

Administrator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:4508
votes: 348


I am probably not understanding the setup you describe, are the pdfs shown within static html pages? If so, is the pdf the sole content of each page other than a standard header/sidebar footer? If so, I have a few suggestions. Add a canonical to the headers and do not block robots. The pdf files can be in one (or more) directories with no-index headers if you do not want them indexed.

Particularly if you plan to use AdSense for ads, their ads bot needs to crawl the content to not see the page as devoid of content.

The downside to that is that while this would communicate to Google how you want the content seen/used, it does not mean anything to Bing and most other bots.
6:57 pm on Feb 20, 2017 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Feb 12, 2006
posts:2710
votes: 116


i had this problem with an events site. as the date of each event passed we kept the page on the site, but slapped a 'noindex' on it. so over time the number of noindexed pages has grown and grown and is now huge. but i don't think it makes any difference. i've never noticed the site getting a penalty
7:09 pm on Feb 20, 2017 (gmt 0)

Preferred Member

5+ Year Member

joined:Mar 22, 2011
posts: 449
votes: 7


@not2easy

makes sense, they are both static (HTML pages and PDFs).


@londrum

Thanks for sharing, that's exactly what I was looking for.

The users really win if the Number of Noindexed Pages doesn't matter.

Anyone else have a large number of noindexed resources with no SEO impact?
8:03 pm on Feb 20, 2017 (gmt 0)

Senior Member from CA 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Nov 25, 2003
posts:1339
votes: 438


Averaged across all my sites currently ~45% of pages are noindex-ed, maximum is ~60%. I started excluding pages about a decade ago and it's grown over time. To date I've never seen a problem although I do get contacted a couple times a year 'suggesting' that I'd see unspecified benefits if I opened them to indexing. Not going to happen but it's nice they 'care'. :)