Welcome to WebmasterWorld Guest from 35.153.135.60

Forum Moderators: open

Message Too Old, No Replies

How to display links to multiple (if not hundreds) of PDFs on single page

Need suggestions on linking to tons of PDF files on single web page

     
3:35 pm on Mar 6, 2015 (gmt 0)

New User

joined:Mar 6, 2015
posts: 3
votes: 0


So, I'm working with a government/public agency and their new website, and there are several pages where, by law, they have to give access to PDF files. A few of these pages have A LOT of PDF files. For example, this one ([thda.org ]) I'm working on now has 106 PDF files, all individually linked. Also important to know is that these PDFs are very long; several have over 150 pages.

How would you deal with this volume of PDFs and content? Should I be thinking about a different file format? Yes, people will need to occasionally print these files, which is why PDFs may work best. But I'm not sure, as I've never tackled this particular kind of project.

Any help/guidance would be much appreciated.

Thanks!
5:54 pm on Mar 6, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Sept 4, 2001
posts:2292
votes: 93


I do a few governmental websites and when it comes to .pdfs, I place the link on the page (obvious action) with a statement next to the link as to the file size and number of pages. This way the person knows before hand what they are getting into. I also have the link open in a new window.

If possible, I also provide an html version of the page and give the person the choice which one to view.

Keep in mind that certain devices may not be .pdf friendly especially if a person is paying for bandwidth usage. That is why a plain, html version is nice to have.

Marshall
6:47 pm on Mar 6, 2015 (gmt 0)

New User

joined:Mar 6, 2015
posts: 3
votes: 0


Thanks, Marshall. I can certainly add file size and page count, but I'm looking for a much quicker and more efficient way to add a large amount of PDFs at once to the page, rather than having to link every one of them individually. I am using Craft CMS (https://buildwithcraft.com/) for this project, although I'm not sure if a bulk upload & link function is available for this or any CMS on the market.

I'm really looking for a method to let me upload/add tons of PDFs and link them in one simple action. Manually linking hundreds of PDFs seems completely inefficient to me.
7:21 pm on Mar 6, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 25, 2002
posts:8637
votes: 283


I don't know Craft CMS, but are you able to integrate a library like plupload?

The other question is whether or not you are doing this once, or will be doing a huge number of uploads over and over.

If you're doing it once (and maybe if doing it over and over), I would write a script to import an Excel spreadsheet either CSV or straight Excel.

A couple of years ago a conference organizer wanted mini-sites for a series of conferences. Each had a similar format (about 5-6 pages per event) and the key element was the session schedule. The were using Excel spreadsheets.

Turns out there's a nice PHP Excel libary, so I just had them maintain their info as they always had and set it so that they would FTP over the spreadsheet. On page load, it would check the timestamp on the Excel doc versus the timestamp on the cache and if the spreadsheet was newer, just parse it out and create a new session grid.

It worked pretty well for them and let them use their normal way of organizing sessions (no real change to their workflow) and yet allowed them to update the website in real time as they changed the spreadsheet.

If memory serves, we eventually needed to add a "Publish" column so sessions in progress and not finalized wouldn't get published.

Anyway, the point being that if you can get someone else (gov employee or even someone no Amazon Turk) to fill in the spreadsheet for each doc (filename, timestamp, size, title, abstract if there is one) you would just have to parse the sheet as needed

Alternatively, you could use Plupload. That's how I would do it in Drupal, for example.
10:29 pm on Mar 6, 2015 (gmt 0)

Preferred Member from AU 

10+ Year Member Top Contributors Of The Month

joined:May 27, 2005
posts:457
votes: 16


The easy way to list documents where there are no access restrictions required is to simply allow the listing of the contents of the folder that they are stored in.

That way you simply upload to a designated folder and then provide a link to that folder. However that folder needs to be set for read permissions and without an index file. On Apache servers simply not having an index file in the folder should provide a list of its contents. On Windows servers there is an option in IIS settings to allow indexing of folders that do not contain a default index file.

Then users can simply right-click to save/open.

Or you can create a custom script that redesigns the look. For example with Windows you can use the FileSystemObject to retrieve file names and then run filters to make the list easier to read by removing underscores, etc. You can do similar things with PHP. In fact you can also sort by date, file name and file size, or even filter to only allow PDF files.

Anything like this can be done independently of a CMS.
12:57 am on Mar 7, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11715
votes: 211


welcome to WebmasterWorld, motifman22!


if you we on apache you could possibly use Apache Module mod_autoindex:
http://httpd.apache.org/docs/current/mod/mod_autoindex.html [httpd.apache.org]
not sure what the IIS equivalent is.

you might consider using a script as a default directory index document that organizes and lists the pdf content of the directory.
if you need more information than directory structure and file names to organize the content you will probably also need something like a spreadsheet as described by ergophobe or a database.
2:13 am on Mar 7, 2015 (gmt 0)

Moderator from US 

WebmasterWorld Administrator lifeinasia is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Dec 10, 2005
posts:5843
votes: 187


not sure what the IIS equivalent is

In IIS, you can allow Directory Browsing for that folder.
IIS7- select the folder, open Directory Browsing, click Enable, and also select which information you want to show (size, etc.)
IIS6- select the folder, right-click then Properties, in the Directory tab, check Directory Browsing
4:52 am on Mar 7, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15705
votes: 812


I'm really looking for a method to let me upload/add tons of PDFs and link them in one simple action. Manually linking hundreds of PDFs seems completely inefficient to me.

This sounds backward. The first question is what do you want your users to see? (Er... you did say this is a gummint site, right? You work for us, don't you?) Once you've established that, then you find some simple code to achieve the desired result.

Because I am who I am, my first thought involved a Find and Replace involving Regular Expressions. Same as you'd do in a book index with 19,000 references.

I think a raw directory index would make your average user very nervous. The people reading this forum are not average users.
1:47 pm on Mar 9, 2015 (gmt 0)

New User

joined:Mar 6, 2015
posts: 3
votes: 0


Really appreciate the responses.

I should mention that there are thousands of PDFs on this site and, unfortunately, it seems as the developer paid little attention to this and likely assumed that a temp would sit and manually download each PDF from each page, upload it to Craft, and link it manually on the new page. I wouldn't even want to know how long this would take someone to finish. Never dealt with even close to that many PDFs, which brings such a strong need for a PDF interface or something similar.

The easy way to list documents where there are no access restrictions required is to simply allow the listing of the contents of the folder that they are stored in.


The company is using a developer that designed the site in Craft CMS (which I have never used until now). So, is this likely something I can ask the developer to enable? I don't see any way in this CMS (which seems basic) to initialize this.

This sounds backward. The first question is what do you want your users to see?


Totally understand this question - which is what brought me to ask you guys. The users/public needs to see links to PDFs. This company gets requests all the time for housing forms, etc. that the public needs to print, email, etc. So, PDF is the best format.

Do you all know of a program that could scan a page like this ([thda.org ]) and download all of the linked PDFs? That would certinaly help. The reason I need this is because each Department here has saved their PDFs in different folders, and I don't have full access to every folder, nor do I want to spend the time asking each staff member where they've saved their PDFs, find them, upload them, etc. Make sense?

If I at least had a program that I could input a URL, and it download all linked files, that would save me DAYS of work.
2:59 pm on Mar 9, 2015 (gmt 0)

Senior Member

WebmasterWorld Senior Member ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Apr 25, 2002
posts:8637
votes: 283


This can easily be done on any *.nix with wget piped through grep or sed.

You might be able to do it with a little more hassle and a lot more GUI with something like XENU or Screaming Frog. I've never tried to use them that way, but maybe.

If I had a list of pages I needed to scrape in a text file, then I would definitely pipe that to wget, pipe the results to sed and output to a file. Or actually I might have wget output the results directly and then use Powergrep, my go to text processing tool on Windows, and have it generate the HTML.
3:25 pm on Mar 9, 2015 (gmt 0)

New User

joined:Mar 6, 2015
posts: 3
votes: 0


Thanks, ergo

How does wget name the files it downloads? Does it have to keep the original file name? Or can I force it to rename the files using the link text (as shown on the page). The latter would be helpful due to staff naming their files in various structures.
4:09 pm on Mar 9, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15705
votes: 812


Or actually I might have wget output the results directly and then use Powergrep, my go to text processing tool on Windows, and have it generate the HTML.

In the course of experimenting, I opened a window in Fetch to show a directory on my site, cmd-A, cmd-C (in That Other Platform I suppose it would be ctrl), opened a page in the text editor and said cmd-V. "Having reduced the problem to a previously solved one" * I didn't do the final stage, which is converting all those filenames into <a href blahblah. But anything that can be done manually can be done automatically with the appropriate program. (Er... it can, can't it?)

How often does the list of available pdfs change? Continuously and dynamically, or is the uploading done by a single designated person?

Psst! Next passing moderator! Could you tweak the title of this thread? I keep thinking it's about displaying the content of a whole slew of pdfs all at once, which is a pretty ghastly notion.


* Punchline to approximately half of all mathematician jokes, ever.
1:45 am on Mar 10, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11715
votes: 211


getting the list of pdfs linked from a list of pages is easy.
downloading that list of pdfs is easier.
maintaining and associating all the list content and structure for reuse is a harder problem.
1:47 am on Mar 10, 2015 (gmt 0)

Administrator

WebmasterWorld Administrator phranque is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Aug 10, 2004
posts:11715
votes: 211


ps - i tweaked the thread title.
3:12 am on Mar 10, 2015 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:15705
votes: 812


maintaining and associating all the list content and structure for reuse is a harder problem.

Generate it dynamically with preg-replace-blahblah in php-or-equivalent, cache the resulting file(s) for 24 hours or so to reduce server load, re-generate page(s) at some dead hour like 5AM local time?