Forum Moderators: open

Message Too Old, No Replies

How to display links to multiple (if not hundreds) of PDFs on single page

Need suggestions on linking to tons of PDF files on single web page

         

motifman22

3:35 pm on Mar 6, 2015 (gmt 0)

10+ Year Member



So, I'm working with a government/public agency and their new website, and there are several pages where, by law, they have to give access to PDF files. A few of these pages have A LOT of PDF files. For example, this one ([thda.org ]) I'm working on now has 106 PDF files, all individually linked. Also important to know is that these PDFs are very long; several have over 150 pages.

How would you deal with this volume of PDFs and content? Should I be thinking about a different file format? Yes, people will need to occasionally print these files, which is why PDFs may work best. But I'm not sure, as I've never tackled this particular kind of project.

Any help/guidance would be much appreciated.

Thanks!

Marshall

5:54 pm on Mar 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I do a few governmental websites and when it comes to .pdfs, I place the link on the page (obvious action) with a statement next to the link as to the file size and number of pages. This way the person knows before hand what they are getting into. I also have the link open in a new window.

If possible, I also provide an html version of the page and give the person the choice which one to view.

Keep in mind that certain devices may not be .pdf friendly especially if a person is paying for bandwidth usage. That is why a plain, html version is nice to have.

Marshall

motifman22

6:47 pm on Mar 6, 2015 (gmt 0)

10+ Year Member



Thanks, Marshall. I can certainly add file size and page count, but I'm looking for a much quicker and more efficient way to add a large amount of PDFs at once to the page, rather than having to link every one of them individually. I am using Craft CMS (https://buildwithcraft.com/) for this project, although I'm not sure if a bulk upload & link function is available for this or any CMS on the market.

I'm really looking for a method to let me upload/add tons of PDFs and link them in one simple action. Manually linking hundreds of PDFs seems completely inefficient to me.

ergophobe

7:21 pm on Mar 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I don't know Craft CMS, but are you able to integrate a library like plupload?

The other question is whether or not you are doing this once, or will be doing a huge number of uploads over and over.

If you're doing it once (and maybe if doing it over and over), I would write a script to import an Excel spreadsheet either CSV or straight Excel.

A couple of years ago a conference organizer wanted mini-sites for a series of conferences. Each had a similar format (about 5-6 pages per event) and the key element was the session schedule. The were using Excel spreadsheets.

Turns out there's a nice PHP Excel libary, so I just had them maintain their info as they always had and set it so that they would FTP over the spreadsheet. On page load, it would check the timestamp on the Excel doc versus the timestamp on the cache and if the spreadsheet was newer, just parse it out and create a new session grid.

It worked pretty well for them and let them use their normal way of organizing sessions (no real change to their workflow) and yet allowed them to update the website in real time as they changed the spreadsheet.

If memory serves, we eventually needed to add a "Publish" column so sessions in progress and not finalized wouldn't get published.

Anyway, the point being that if you can get someone else (gov employee or even someone no Amazon Turk) to fill in the spreadsheet for each doc (filename, timestamp, size, title, abstract if there is one) you would just have to parse the sheet as needed

Alternatively, you could use Plupload. That's how I would do it in Drupal, for example.

Kendo

10:29 pm on Mar 6, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The easy way to list documents where there are no access restrictions required is to simply allow the listing of the contents of the folder that they are stored in.

That way you simply upload to a designated folder and then provide a link to that folder. However that folder needs to be set for read permissions and without an index file. On Apache servers simply not having an index file in the folder should provide a list of its contents. On Windows servers there is an option in IIS settings to allow indexing of folders that do not contain a default index file.

Then users can simply right-click to save/open.

Or you can create a custom script that redesigns the look. For example with Windows you can use the FileSystemObject to retrieve file names and then run filters to make the list easier to read by removing underscores, etc. You can do similar things with PHP. In fact you can also sort by date, file name and file size, or even filter to only allow PDF files.

Anything like this can be done independently of a CMS.

phranque

12:57 am on Mar 7, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



welcome to WebmasterWorld, motifman22!


if you we on apache you could possibly use Apache Module mod_autoindex:
http://httpd.apache.org/docs/current/mod/mod_autoindex.html [httpd.apache.org]
not sure what the IIS equivalent is.

you might consider using a script as a default directory index document that organizes and lists the pdf content of the directory.
if you need more information than directory structure and file names to organize the content you will probably also need something like a spreadsheet as described by ergophobe or a database.

LifeinAsia

2:13 am on Mar 7, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



not sure what the IIS equivalent is

In IIS, you can allow Directory Browsing for that folder.
IIS7- select the folder, open Directory Browsing, click Enable, and also select which information you want to show (size, etc.)
IIS6- select the folder, right-click then Properties, in the Directory tab, check Directory Browsing

lucy24

4:52 am on Mar 7, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm really looking for a method to let me upload/add tons of PDFs and link them in one simple action. Manually linking hundreds of PDFs seems completely inefficient to me.

This sounds backward. The first question is what do you want your users to see? (Er... you did say this is a gummint site, right? You work for us, don't you?) Once you've established that, then you find some simple code to achieve the desired result.

Because I am who I am, my first thought involved a Find and Replace involving Regular Expressions. Same as you'd do in a book index with 19,000 references.

I think a raw directory index would make your average user very nervous. The people reading this forum are not average users.

motifman22

1:47 pm on Mar 9, 2015 (gmt 0)

10+ Year Member



Really appreciate the responses.

I should mention that there are thousands of PDFs on this site and, unfortunately, it seems as the developer paid little attention to this and likely assumed that a temp would sit and manually download each PDF from each page, upload it to Craft, and link it manually on the new page. I wouldn't even want to know how long this would take someone to finish. Never dealt with even close to that many PDFs, which brings such a strong need for a PDF interface or something similar.

The easy way to list documents where there are no access restrictions required is to simply allow the listing of the contents of the folder that they are stored in.


The company is using a developer that designed the site in Craft CMS (which I have never used until now). So, is this likely something I can ask the developer to enable? I don't see any way in this CMS (which seems basic) to initialize this.

This sounds backward. The first question is what do you want your users to see?


Totally understand this question - which is what brought me to ask you guys. The users/public needs to see links to PDFs. This company gets requests all the time for housing forms, etc. that the public needs to print, email, etc. So, PDF is the best format.

Do you all know of a program that could scan a page like this ([thda.org ]) and download all of the linked PDFs? That would certinaly help. The reason I need this is because each Department here has saved their PDFs in different folders, and I don't have full access to every folder, nor do I want to spend the time asking each staff member where they've saved their PDFs, find them, upload them, etc. Make sense?

If I at least had a program that I could input a URL, and it download all linked files, that would save me DAYS of work.

ergophobe

2:59 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



This can easily be done on any *.nix with wget piped through grep or sed.

You might be able to do it with a little more hassle and a lot more GUI with something like XENU or Screaming Frog. I've never tried to use them that way, but maybe.

If I had a list of pages I needed to scrape in a text file, then I would definitely pipe that to wget, pipe the results to sed and output to a file. Or actually I might have wget output the results directly and then use Powergrep, my go to text processing tool on Windows, and have it generate the HTML.

motifman22

3:25 pm on Mar 9, 2015 (gmt 0)

10+ Year Member



Thanks, ergo

How does wget name the files it downloads? Does it have to keep the original file name? Or can I force it to rename the files using the link text (as shown on the page). The latter would be helpful due to staff naming their files in various structures.

lucy24

4:09 pm on Mar 9, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Or actually I might have wget output the results directly and then use Powergrep, my go to text processing tool on Windows, and have it generate the HTML.

In the course of experimenting, I opened a window in Fetch to show a directory on my site, cmd-A, cmd-C (in That Other Platform I suppose it would be ctrl), opened a page in the text editor and said cmd-V. "Having reduced the problem to a previously solved one" * I didn't do the final stage, which is converting all those filenames into <a href blahblah. But anything that can be done manually can be done automatically with the appropriate program. (Er... it can, can't it?)

How often does the list of available pdfs change? Continuously and dynamically, or is the uploading done by a single designated person?

Psst! Next passing moderator! Could you tweak the title of this thread? I keep thinking it's about displaying the content of a whole slew of pdfs all at once, which is a pretty ghastly notion.


* Punchline to approximately half of all mathematician jokes, ever.

phranque

1:45 am on Mar 10, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



getting the list of pdfs linked from a list of pages is easy.
downloading that list of pdfs is easier.
maintaining and associating all the list content and structure for reuse is a harder problem.

phranque

1:47 am on Mar 10, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



ps - i tweaked the thread title.

lucy24

3:12 am on Mar 10, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



maintaining and associating all the list content and structure for reuse is a harder problem.

Generate it dynamically with preg-replace-blahblah in php-or-equivalent, cache the resulting file(s) for 24 hours or so to reduce server load, re-generate page(s) at some dead hour like 5AM local time?