-
Notifications
You must be signed in to change notification settings - Fork 140
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Initial Checks
- I confirm that I'm on the latest version
Description
Hi, need some help regarding the above error I'm facing while parsing my document. Few PDFs are not able to be parsed. Not able to understand why. It is an medical invoice PDF, where I am aiming to extract the text contents along with their bounding box coordinates.
Example Code
import openparse
basic_doc_path = "/home/sanjayr/Workspace/30-claims/42969914.pdf"
parser = openparse.DocumentParser()
parsed_basic_doc = parser.parse(basic_doc_path)Python, open-parse & OS Version
python_version: 3.8.20
operating_system: Linux
os_version: 5.15.0-1074-azure
open-parse version: 0.7.0
install path: /home/sanjayr/.conda/envs/be-env/lib/python3.8/site-packages/openparse
python version: 3.8.20 (default, Oct 3 2024, 15:24:27) [GCC 11.2.0]
platform: Linux-5.15.0-1074-azure-x86_64-with-glibc2.17
related packages: PyMuPDF-1.24.11 pydantic-2.10.4 tokenizers-0.20.3 transformers-4.46.3 torch-2.4.1 torchvision-0.19.1
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working