PDF Extractor

Extract Data from PDF in C# with PDF Extractor. PDFs are widely used for storing documents because they preserve formatting across different devices. However, working with PDFs often requires extracting specific content—such as images, text, metadata, or structured data — for reuse, analysis, or editing. By mastering PDF extraction, you can save time, improve workflows, and gain deeper insights from the files you work with.

Key Features

🔹 Extract Images

PDFs frequently contain logos, charts, photos, or scanned images. Extracting these images allows you to reuse them without needing to copy entire pages. High-Resolution Image Extraction – Retrieve images exactly as they appear in your PDF for professional use.

🔹 Extract Text

Text extraction lets you convert the readable content of a PDF into editable text. This is especially helpful when you need to repurpose or analyze written content. Choose from three precision modes to suit your needs:

Pure Mode — Retains original formatting for structured output

Raw Mode — Extracts plain text without formatting

Flatten Mode — Removes special characters and formatting for clean, minimal text

🔹 Extract Properties (Metadata)

Properties extraction lets you information about PDF document. Available properties that may interest you: FileName, Title, Author, Subject, Keywords, Created, Modified, Application, PDF Producer, Number of Pages.

🔹 Export data from AcroForms

PDF forms are widely used in applications, surveys, invoices, and contracts. They allow users to enter information directly into interactive fields. But once the forms are filled out, organizations often need to extract that data for storage, reporting, or analysis.

Getting Started

Download the assembly files from Here or NuGet.
Reference Documentize in your .NET project.
Add using Documentize;.
Set your license License.Set("license.lic"); - Optional.

Why Choose PDF Extractor

Ideal for developers and businesses managing visual content in reports, presentations, and archives.
Fast, efficient extraction for easy content reuse.
Multiple extraction modes for maximum flexibility.
Seamless .NET integration for simplified workflows.
Supported operating systems include Windows 7-11, and Windows Server 2003-2022, macOS (10.12+), and Linux.
Supported frameworks from 4.0 to 8.0.
Compatible with various Microsoft Visual Studio versions.
Detailed and high-quality documentation

PDF Extractor を使用した画像の抽出方法

Configure ImageExtractorOptions with the input file path and other necessary settings
Call PdfExtractor.Extract with an instance of ExtractImagesOptions as parameter
Access the extracted images through the ResultContainer.ResultCollection

PDF からテキストを抽出する方法

Create instances of ExtractTextOptions and set input PDF
Call PdfExtractor.Extract with an instance of ExtractTextOptions as parameter and access the extracted text

PDF フィールドデータのエクスポート方法

Create an instance of ExtractFormDataToDsvOptions to configure the process of exporting data to CSV
Add input and output files to the options
Call the PdfExtractor.Extract method, passing the options as a parameter

よくある質問

PDF Extractor とは何ですか？

PDF Extractor for .NET は、PDF ドキュメントから画像、テキスト、メタデータ、または PDF のフォームデータを迅速かつ簡単に抽出するために設計された強力なツールです。 .NET アプリケーションにシームレスに統合され、PDF からビジュアルコンテンツにアクセスするためのユーザーフレンドリーなソリューションを提供します。

PDF Extractor for .NET を他の PDF 操作に使用できますか？

いいえ、このコンポーネントは PDF からの抽出専用です。他の PDF 関連のタスクについては、Documentize ライブラリで提供されている追加コンポーネントを検討するか、文書処理のためのフル機能を活用してください。

PDF からテキスト/画像/メタデータ/フォームデータを抽出する必要があるのはなぜですか？

このデータを抽出することで、ドキュメントの分析、レポートの作成、AI の活用に役立ちます。

どのような出力形式がサポートされていますか？

現在、このコンポーネントは画像を PNG 形式で抽出します。フォームデータは CSV 形式でエクスポートされます。JSON や XML など他の形式が必要な場合は、追加ツールの使用や出力のカスタマイズが必要になる可能性があります。

スキャンした PDF からテキストを抽出できますか？

PDF がスキャンされたものであったり、テキストの画像を含む場合、画像ベースのテキストを編集可能な形式に変換するために OCR（光学文字認識）処理が必要になることがあります。

C#/.NET の PDF Extractor

PDF ドキュメントから画像、テキスト、メタデータ、フォームデータを抽出します