How to Get or Display Content From PDF File in Node

Last updated on: by Digamber

Throughout this eloquent tutorial, we will step by step explain how to read data from PDF files in node js using the pdf parse package.

PDF stands for portable document format, and pdf is a universally accepted file format created by adobe.

PDF is used to maintain information that can be shared without concerning operating systems, hardware, etc.

PDF parse is a popular JavaScript-based library that helps extract texts from PDF files. It works on every browser and comes with handy options that makes pdf reading easy in node.

This module is quite popular and effortlessly available on the npm registry; it is easy to install and set up in the node js environment.

Node Js Read Row by Row Content from PDF File Example

  • Step 1: Make Node Project
  • Step 2: Add PDF Parse Package
  • Step 3: Build Server File
  • Step 4: Get PDF Content
  • Step 5: Run Node Project

Make Node Project

The first step is straightforward: create a blank directory; this is the locus where project-related code resides.

mkdir node-pubnub

Then, go inside the folder using given command.

cd node-pubnub

Open the command prompt of your terminal, type the following command, and press enter.

This command will create a specific package.json file that keeps the project-related information in one place.

npm init

Add PDF Parse Package

In this step, you will be executing the given command, which is for installing the pdf parse dependency.

npm install pdf-parse

Build Server File

In this step, we have to accomplish following tasks.

First is to create the server.js file in node project’s root.

Secondly, create the command for executing the server.js script, therefore add the given code into the package.json file as given below.


"scripts": {
   "start": "node app.js"
},

Get PDF Content

Open the app.js file; without further ado, copy the following code and add it inside the main node file as suggested below.

const fs = require('fs')
const pdfParse = require('pdf-parse')
const getPDF = async (file) => {
  let readFileSync = fs.readFileSync(file)
  try {
    let pdfExtract = await pdfParse(readFileSync)
    console.log('File content: ', pdfExtract.text)
    console.log('Total pages: ', pdfExtract.numpages)
    console.log('All content: ', pdfExtract.info)
  } catch (error) {
    throw new Error(error)
  }
}
const pdfRead = './demo.pdf'
getPDF(pdfRead)

We used the async-await keywords for managing the getPDF function.

Ideally, the async await approach is used when the inner function might have an inevitable delay in returning the response.

Likewise, the pdfParse() function won’t execute unless the promise is utterly fulfilled. On successful promise, completing pdf data will be read in our node feature.

Make sure to keep the PDF file in your node folder, the pdfRead variable has the demo.pdf file path.

Run Node Project

So the only imperative task is left is to run the node app.

Head over to console and type the following command and yes don’t forget to hit enter.

node app.js

On the successfully code execution given below output will be printed on your terminal.

File content:  
Welcome to Smallpdf
Digital Documents—All In One Place
Access Files Anytime, Anywhere 
Enhance Documents in One Click 
Collaborate With Others 
With the new Smallpdf experience, you can 
freely upload, organize, and share digital 
documents. When you enable the ‘Storage’ 
option, we’ll also store all processed files here. 
You can access files stored on Smallpdf from 
your computer, phone, or tablet. We’ll also 
sync files from the Smallpdf Mobile App to our 
online portal
When you right-click on a file, we’ll present 
you with an array of options to convert, 
compress, or modify it. 
Forget mundane administrative tasks. With 
Smallpdf, you can request e-signatures, send 
large files, or even enable the Smallpdf G Suite 
App for your entire organization. 
Ready to take document management to the next level? 
Total pages:  1
All content:  {
  PDFFormatVersion: '1.7',
  IsAcroFormPresent: false,
  IsXFAPresent: false,
  Creator: 'Adobe InDesign 15.1 (Macintosh)',
  Producer: 'Adobe PDF Library 15.0',
  CreationDate: "D:20201014170810+02'00'",
  ModDate: "D:20201014170810+02'00'",
  Trapped: { name: 'False' }
}

Conclusion

How to Get or Display Content From PDF File in Node

We have seen how to systematically create a basic node app and spruce up the basic setup for reading the PDF file in node.

Moreover, we have seen how to profoundly use the third-party pdf parse library in node to extract the content from pdf file.

We are pretty much sure this guide will help you learn the pdf reading functionality in node.

Digamber

I am Digamber, a full-stack developer and fitness aficionado. I created this site to bestow my coding experience with newbie programmers. I love to write on JavaScript, ECMAScript, React, Angular, Vue, Laravel.