The trick is using System.Reflection to expose hidden (private) properties of the PDFbox Page object. Program creates 1 image for each page of a PDF, computes word locations (if PDF is OCR'ed) then ...