Term Recognition App for PDF Documents
Develop a solution to define terms in PDF files
We developed a mechanism from scratch that recognizes the font style (italic, bold, underline) and term boundaries, separating it from the following sentence and indicating the exact location in the document. The most complicated task was to recognize the distribution of text in tables since standard algorithms can not cope with this.
Our team implemented the solution based on Artificial Intelligence and Machine Learning methods
The client received a useful tool that saves time searching for the necessary data in documents
Python, OCR, Tesseract 3,4, OpenCV, Pandas, PostgreSQL, Django, DRF, AWS
2 members: developers
Working on artificial intelligence algorithms is always interesting. It is required to calculate and analyze all variants of the data, to direct to the solution of the problem of finding the term. Our system is constantly self-training and this makes it unique