How to Get or Display Content From PDF File in Node
Throughout this eloquent tutorial, we will step by step explain how to read data from PDF files in node js using the pdf parse package.
PDF stands for portable document format, and pdf is a universally accepted file format created by adobe.
PDF is used to maintain information that can be shared without concerning operating systems, hardware, etc.
PDF parse is a popular JavaScript-based library that helps extract texts from PDF files. It works on every browser and comes with handy options that makes pdf reading easy in node.
This module is quite popular and effortlessly available on the npm registry; it is easy to install and set up in the node js environment.
Node Js Read Row by Row Content from PDF File Example
- Step 1: Make Node Project
- Step 2: Add PDF Parse Package
- Step 3: Build Server File
- Step 4: Get PDF Content
- Step 5: Run Node Project
Make Node Project
The first step is straightforward: create a blank directory; this is the locus where project-related code resides.
mkdir node-pubnub
Then, go inside the folder using given command.
cd node-pubnub
Open the command prompt of your terminal, type the following command, and press enter.
This command will create a specific package.json file that keeps the project-related information in one place.
npm init
Add PDF Parse Package
In this step, you will be executing the given command, which is for installing the pdf parse dependency.
npm install pdf-parse
Build Server File
In this step, we have to accomplish following tasks.
First is to create the server.js file in node project’s root.
Secondly, create the command for executing the server.js script, therefore add the given code into the package.json file as given below.
"scripts": {
"start": "node app.js"
},
Get PDF Content
Open the app.js file; without further ado, copy the following code and add it inside the main node file as suggested below.
const fs = require('fs')
const pdfParse = require('pdf-parse')
const getPDF = async (file) => {
let readFileSync = fs.readFileSync(file)
try {
let pdfExtract = await pdfParse(readFileSync)
console.log('File content: ', pdfExtract.text)
console.log('Total pages: ', pdfExtract.numpages)
console.log('All content: ', pdfExtract.info)
} catch (error) {
throw new Error(error)
}
}
const pdfRead = './demo.pdf'
getPDF(pdfRead)
We used the async-await keywords for managing the getPDF function.
Ideally, the async await approach is used when the inner function might have an inevitable delay in returning the response.
Likewise, the pdfParse() function won’t execute unless the promise is utterly fulfilled. On successful promise, completing pdf data will be read in our node feature.
Make sure to keep the PDF file in your node folder, the pdfRead variable has the demo.pdf file path.
Run Node Project
So the only imperative task is left is to run the node app.
Head over to console and type the following command and yes don’t forget to hit enter.
node app.js
On the successfully code execution given below output will be printed on your terminal.
File content:
Welcome to Smallpdf
Digital Documents—All In One Place
Access Files Anytime, Anywhere
Enhance Documents in One Click
Collaborate With Others
With the new Smallpdf experience, you can
freely upload, organize, and share digital
documents. When you enable the ‘Storage’
option, we’ll also store all processed files here.
You can access files stored on Smallpdf from
your computer, phone, or tablet. We’ll also
sync files from the Smallpdf Mobile App to our
online portal
When you right-click on a file, we’ll present
you with an array of options to convert,
compress, or modify it.
Forget mundane administrative tasks. With
Smallpdf, you can request e-signatures, send
large files, or even enable the Smallpdf G Suite
App for your entire organization.
Ready to take document management to the next level?
Total pages: 1
All content: {
PDFFormatVersion: '1.7',
IsAcroFormPresent: false,
IsXFAPresent: false,
Creator: 'Adobe InDesign 15.1 (Macintosh)',
Producer: 'Adobe PDF Library 15.0',
CreationDate: "D:20201014170810+02'00'",
ModDate: "D:20201014170810+02'00'",
Trapped: { name: 'False' }
}
Conclusion
We have seen how to systematically create a basic node app and spruce up the basic setup for reading the PDF file in node.
Moreover, we have seen how to profoundly use the third-party pdf parse library in node to extract the content from pdf file.
We are pretty much sure this guide will help you learn the pdf reading functionality in node.