PDF Extractor

Extract Data from PDF in C# with PDF Extractor. Les PDF sont largement utilisés pour stocker des documents car ils conservent la mise en forme sur différents appareils. Cependant, travailler avec des PDF nécessite souvent d’extraire un contenu spécifique — images, texte, métadonnées ou données structurées — pour les réutiliser, les analyser ou les modifier. En maîtrisant l’extraction PDF, vous gagnez du temps, améliorez vos flux de travail et obtenez des informations plus approfondies à partir des fichiers que vous traitez.

Key Features

🔹 Extract Images

Les PDF contiennent fréquemment des logos, graphiques, photos ou images numérisées. Extraire ces images vous permet de les réutiliser sans avoir à copier des pages entières. Extraction d’images haute résolution – récupérez les images exactement comme elles apparaissent dans votre PDF pour un usage professionnel.

🔹 Extract Text

L’extraction de texte vous permet de convertir le contenu lisible d’un PDF en texte éditable. C’est particulièrement utile lorsque vous devez réutiliser ou analyser le texte. Choisissez parmi trois modes de précision selon vos besoins :

Pure Mode — Conserve la mise en forme originale pour une sortie structurée

Raw Mode — Extrait le texte brut sans mise en forme

Flatten Mode — Supprime les caractères spéciaux et la mise en forme pour un texte propre et minimal

🔹 Extract Properties (Metadata)

L’extraction de propriétés vous fournit des informations sur le document PDF. Propriétés disponibles susceptibles de vous intéresser : FileName, Title, Author, Subject, Keywords, Created, Modified, Application, PDF Producer, Number of Pages.

🔹 Export data from AcroForms

Les formulaires PDF sont largement utilisés dans les applications, enquêtes, factures et contrats. Ils permettent aux utilisateurs de saisir des informations directement dans des champs interactifs. Mais une fois les formulaires remplis, les organisations doivent souvent extraire ces données pour les stocker, les rapporter ou les analyser.

Getting Started

Download the assembly files from Here or NuGet.
Reference Documentize in your .NET project.
Add using Documentize;.
Set your license License.Set("license.lic"); - Optional.

Why Choose PDF Extractor

Ideal for developers and businesses managing visual content in reports, presentations, and archives.
Fast, efficient extraction for easy content reuse.
Multiple extraction modes for maximum flexibility.
Seamless .NET integration for simplified workflows.
Supported operating systems include Windows 7-11, and Windows Server 2003-2022, macOS (10.12+), and Linux.
Supported frameworks from 4.0 to 8.0.
Compatible with various Microsoft Visual Studio versions.
Detailed and high-quality documentation

How to Extract Images with PDF Extractor

Configure ImageExtractorOptions with the input file path and other necessary settings
Call PdfExtractor.Extract with an instance of ExtractImagesOptions as parameter
Access the extracted images through the ResultContainer.ResultCollection

How to Extract Text from PDF

Create instances of ExtractTextOptions and set input PDF
Call PdfExtractor.Extract with an instance of ExtractTextOptions as parameter and access the extracted text

How to Export PDF fields data

Create an instance of ExtractFormDataToDsvOptions to configure the process of exporting data to CSV
Add input and output files to the options
Call the PdfExtractor.Extract method, passing the options as a parameter

Foire aux questions

What is PDF Extractor?

PDF Extractor for .NET is a powerful tool designed to extract images, text, metadata from PDF documents, or Form Data in PDF quickly and easily. It seamlessly integrates into your .NET application, offering a user-friendly solution for accessing visual content from PDFs.

Can I use PDF Extractor for .NET for other PDF operations?

No, this component is specifically for extraction from PDFs. For other PDF-related tasks, you can explore the additional components available in Documentize library or leverage its full capabilities for document processing.

Why would I need to extract text/images/metadata/form data from a PDF?

Extracting this data can be useful for analyze documents, prepare reports, work with AI.

What types of output formats does it support?

Currently this component extracts images in PNG format. Forms data exports specifically into CSV format. If you need other formats like JSON or XML, you may need to use additional tools or customize the output yourself.

Can I extract text from scanned PDFs?

If the PDF is scanned or contains images of text, an OCR (Optical Character Recognition) process may be required to convert the image-based text into an editable format.

PDF Extractor in C#/.NET

Extract images, text, metadata, and form data from PDF document