PDF Extractor

Ekstrak Data dari PDF di C# dengan PDF Extractor. PDF banyak digunakan untuk menyimpan dokumen karena mempertahankan format di berbagai perangkat. Namun, bekerja dengan PDF sering memerlukan ekstraksi konten tertentu—seperti gambar, teks, metadata, atau data terstruktur—untuk penggunaan kembali, analisis, atau pengeditan. Dengan menguasai ekstraksi PDF, Anda dapat menghemat waktu, meningkatkan alur kerja, dan memperoleh wawasan yang lebih mendalam dari file yang Anda tangani.

Fitur Utama

🔹 Extract Images

PDF sering berisi logo, diagram, foto, atau gambar hasil pemindaian. Mengekstrak gambar-gambar ini memungkinkan Anda menggunakannya kembali tanpa harus menyalin seluruh halaman. Ekstraksi Gambar Resolusi Tinggi – Mengambil gambar persis seperti yang terlihat di PDF untuk keperluan profesional.

🔹 Extract Text

Ekstraksi teks memungkinkan Anda mengubah konten yang dapat dibaca pada PDF menjadi teks yang dapat diedit. Ini sangat membantu ketika Anda perlu menggunakan kembali atau menganalisis konten tertulis. Pilih dari tiga mode presisi untuk memenuhi kebutuhan Anda:

Pure Mode — Menjaga format asli untuk output terstruktur

Raw Mode — Mengekstrak teks polos tanpa format

Flatten Mode — Menghapus karakter khusus dan format untuk teks bersih dan minimal

🔹 Extract Properties (Metadata)

Ekstraksi properti memberi Anda informasi tentang dokumen PDF. Properti yang tersedia antara lain: FileName, Title, Author, Subject, Keywords, Created, Modified, Application, PDF Producer, Number of Pages.

🔹 Export data from AcroForms

Formulir PDF banyak digunakan dalam aplikasi, survei, faktur, dan kontrak. Mereka memungkinkan pengguna memasukkan informasi langsung ke bidang interaktif. Namun setelah formulir diisi, organisasi sering perlu mengekspor data tersebut untuk penyimpanan, pelaporan, atau analisis.

Getting Started

Download the assembly files from Here or NuGet.
Reference Documentize in your .NET project.
Add using Documentize;.
Set your license License.Set("license.lic"); - Optional.

Why Choose PDF Extractor

Ideal for developers and businesses managing visual content in reports, presentations, and archives.
Fast, efficient extraction for easy content reuse.
Multiple extraction modes for maximum flexibility.
Seamless .NET integration for simplified workflows.
Supported operating systems include Windows 7-11, and Windows Server 2003-2022, macOS (10.12+), and Linux.
Supported frameworks from 4.0 to 8.0.
Compatible with various Microsoft Visual Studio versions.
Detailed and high-quality documentation

How to Extract Images with PDF Extractor

Configure ImageExtractorOptions with the input file path and other necessary settings
Call PdfExtractor.Extract with an instance of ExtractImagesOptions as parameter
Access the extracted images through the ResultContainer.ResultCollection

How to Extract Text from PDF

Create instances of ExtractTextOptions and set input PDF
Call PdfExtractor.Extract with an instance of ExtractTextOptions as parameter and access the extracted text

How to Export PDF fields data

Create an instance of ExtractFormDataToDsvOptions to configure the process of exporting data to CSV
Add input and output files to the options
Call the PdfExtractor.Extract method, passing the options as a parameter

Pertanyaan yang Sering Diajukan

What is PDF Extractor?

PDF Extractor for .NET is a powerful tool designed to extract images, text, metadata from PDF documents, or Form Data in PDF quickly and easily. It seamlessly integrates into your .NET application, offering a user-friendly solution for accessing visual content from PDFs.

Can I use PDF Extractor for .NET for other PDF operations?

No, this plugin is specifically for extraction from PDFs. For other PDF-related tasks, you can explore the additional plugins available in Documentize library or leverage its full capabilities for document processing.

Why would I need to extract text/images/metadata/form data from a PDF?

Extracting this data can be useful for analyze documents, prepare reports, work with AI.

What types of output formats does it support?

Currently this plugin extracts images in PNG format. Forms data exports specifically into CSV format. If you need other formats like JSON or XML, you may need to use additional tools or customize the output yourself.

Can I extract text from scanned PDFs?

If the PDF is scanned or contains images of text, an OCR (Optical Character Recognition) process may be required to convert the image-based text into an editable format.

PDF Extractor in C#/.NET

Ekstrak gambar, teks, metadata, dan data formulir dari dokumen PDF