Abstract: This research work proposes an innovative method for measuring text similarity of unstructured PDF documents using a hybrid approach that combines Latent Dirichlet Allocation (LDA) and ...