The first challenge is splitting the pdf pages. Fortunately, there is a free executable you can deploy for this task. Meet
cpdf is short for ‘Coherent PDF Command Line Tools’ and offers a fantastic set of free tools available through github. For our purposes, splitting pdfs is done using a single line of code.
Assume we have a pdf called “DanMoore-0004.pdf”, which corresponds to multiple pages (in this case 2) of one of our Dan Moore vessel stations.
Example of a two page pdf that needs to have page 1 extracted.
Place this file into the folder where you have cpdf (or add cpdf to your bash). Open terminal and navigate to this folder. Now enter the following command:
./cpdf DanMoore-0004.pdf 1 -o DanMoore-0004_page1.pdf
That’s it! 1 indicates the page you want and the -o is the name of the outfile. You can easily turn this into a script to go through thousands of pdfs in seconds.
Using the above line, a single page from a pdf is readily extracted.
To convert to png is also super easy on a mac and requires a single line of code. Move all your single page pdfs into a new folder and navigate there in terminal. Now enter:
for i in *; do sips -s format png $i --out $i.png; done
This will take all the files in your folder using the wildcard (*) and use the mac command line tool sips to convert them to a png. This is incredibly useful when you have hundreds or thousands of pdfs.
From here, simply upload to zooniverse. I posted this for our lab and to link anyone who might find this useful. I’m sure there are other ways to accomplish the same thing, but regardless, this certainly beats spending days doing this manually!