Node Js Create Web Scraping Script using Cheerio Tutorial

Last Updated on by in Node JS

In this simple guide, you will find out how to create an asynchronous website scraping script in the node js application using the cheerio, pretty, and Axios libraries.

Ideally, web scraping is the process of extracting content and useful data from a website. Web scraping is typically used in various digital businesses that depend on data harvesting.

This post will step by step describe how to build a mini web scraping tool with Node js, Axios client, cheerio package.

After assimilating all the instructions mentioned in this guide, you will be familiar with scraping the web data in a node environment.

Let us start creating the cheerio web scraper script and comprehend how to use cheerio in node js to scrap data from a specific category from a book website.

How to Build Asynchronous Web Scraping Script in Node with Cheerio

  • Step 1: Create Node Project
  • Step 2: Add Cheerio and Pretty Modules
  • Step 2: Add Axios Package
  • Step 3: Create Server File
  • Step 4: Build Web Scrape Script
  • Step 5: Run Scraping Script

Create Node Project

In the first step, you must ensure that you have created an empty folder.

Here is how you can generate a new folder using the given command.

mkdir node-lab

Get inside the project folder.

cd node-lab

Now, after entering into the project folder run the suggested command. Make sure to provide all the information asked by command line tool.

npm init

Add Cheerio and Pretty Modules

Further, you have to open the terminal again and run the given command for installing the cheerio and pretty packages together.

npm install cheerio pretty

Add Axios Package

In this step, you have to install the Axios HTTP client for making an asynchronous HTTP request with reference to the web scraper tool.

npm install axios

Create Server File

Next, create a server.js file, in this file we will add the code to form the web scraping script.

Now, head over to package.json and append the server.js file name besides the script tag.

...  
...  
"scripts": {
    "start": "node server.js"
  },
...  
...  

Build Web Scrape Script

For scraping the data from the website, we are using a dummy site that doesn’t bother if you scrape their data.

Copy the following code and paste into the server.js file.

const fs = require('fs')
const cheerio = require('cheerio')
const axios = require('axios')

const API =
  'http://books.toscrape.com/catalogue/category/books/mystery_3/index.html'

const scrapperScript = async () => {
  try {
    const { data } = await axios.get(API)
    const $ = cheerio.load(data)

    const DataBooks = $('.row li .product_pod')

    const scrapedData = []

    DataBooks.each((index, el) => {
      const scrapItem = { title: '', price: '' }

      scrapItem.title = $(el).children('h3').text()
      scrapItem.price = $(el)
        .children('.product_price')
        .children('p.price_color')
        .text()

      scrapedData.push(scrapItem)
    })

    console.dir(scrapedData)

    fs.writeFile(
      'scrapedBooks.json',
      JSON.stringify(scrapedData, null, 2),
      (e) => {
        if (e) {
          console.log(e)
          return
        }
        console.log('Scraping completed.')
      },
    )
  } catch (error) {
    console.error(error)
  }
}

scrapperScript()

Run Scraping Script

We are now ready to pull the data from the website. Go to terminal, type the node command and press enter.

node server.js

After you run the command, you will see extracted data on the console screen as well as in the newly generated scrapedBooks.json file in your node project’s root.

Node Js Create Web Scraping Script using Cheerio Tutorial

Conclusion

In this comprehensive guide, we had a look at the process of making a web scraper script in node js.

We created a basic node app whose main objective is to extract data from the website using the node mechanism.

We pulled the data from other website using the asynchronous HTTP request and Cheerio plugin.

Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure.