Term Recognition App for PDF Documents
Challenge
Develop a solution to define terms in PDF files
Solution
We developed a mechanism from scratch that recognizes the font style (italic, bold, underline) and term boundaries, separating it from the following sentence and indicating the exact location in the document. The most complicated task was to recognize the distribution of text in tables since standard algorithms can not cope with this.
Our team implemented the solution based on Artificial Intelligence and Machine Learning methods
Result
The client received a useful tool that saves time searching for the necessary data in documents
Tech Stack
Python, OCR, Tesseract 3,4, OpenCV, Pandas, PostgreSQL, Django, DRF, AWS
Team
2 members: developers
Working on artificial intelligence algorithms is always interesting. It is required to calculate and analyze all variants of the data, to direct to the solution of the problem of finding the term. Our system is constantly self-training and this makes it unique