back arrow All projects

Term Recognition App for PDF Documents


Develop a solution to define terms in PDF files


We developed a mechanism from scratch that recognizes the font style (italic, bold, underline) and term boundaries, separating it from the following sentence and indicating the exact location in the document. The most complicated task was to recognize the distribution of text in tables since standard algorithms can not cope with this.

Our team implemented the solution based on Artificial Intelligence and Machine Learning methods


The client received a useful tool that saves time searching for the necessary data in documents

Tech Stack

Python, OCR, Tesseract 3,4, OpenCV, Pandas, PostgreSQL, Django, DRF, AWS


2 members: developers

Employee's feedback

Working on artificial intelligence algorithms is always interesting. It is required to calculate and analyze all variants of the data, to direct to the solution of the problem of finding the term. Our system is constantly self-training and this makes it unique

Other Projects