The Adventure of PDF to Data Frame in R.

Justin Cocco
The Startup
Published in
9 min readJul 17, 2020

--

Organizations love PDFs, especially governmental bodies. To the masses, they are easy to read, with nice and clean formatting that is easy on the eyes. To the data scientist, they can be nightmares to upload. For example, take a look at this PDF:

Source: PennsylvaniaDepartment of Health, Demographics of Nursing Home Residents.

What a 105-page nightmare that would be! R reads PDFs as 1-line imports, but clearly this PDF is not designed with data scientists in mind.

--

--