PDF Extraction

Overview

The Gradient Accelerator Block for PDF extraction parses scanned and native PDFs into text, accurately capturing the content and formatting of the document. The Gradient PDF extraction API is particularly strong in maintaining table structure.

PDF extraction can be used in various business workflows to convert unstructured PDFs into text and structured data, which can then be used to enable subsequent tasks (e.g. document summarization and RAG).

Create an account and workspace

If you haven't already, go to gradient.ai, click sign up and create your account. Once you have verified your email, log in to the account. Click "Create New Workspace" and give it a name.

You can see your workspaces at any time by going to https://auth.gradient.ai/select-workspace.

Use the Accelerator Blocks playground

You can easily try out the Gradient Accelerator Block for PDF extraction via the playground UI.

Log into Gradient and select the workspace you want to use. Select the “Accelerator Blocks” tab from the left sidebar.
Navigate to the “PDF Extraction” block.
Upload the PDF you want to parse.
Hit “Submit” and PDF content should appear below!
1. Scroll down and click on “Show More” to view the JSON response, including detailed formatting information about the text and table blocks.

Use the SDK

The SDK is best suited for integrating PDF extraction into your application or business workflow.

Follow our SDK Quickstart to get set up. From there, simply provide your PDF to parse.

from gradientai import Gradient

gradient = Gradient()

filepath = "resources/Lorem_Ipsum.pdf"
result = gradient.extract_pdf(
    filepath=filepath
)