Node Js Create Web Scraping Script using Cheerio Tutorial
In this simple guide, you will find out how to create an asynchronous website scraping script in the node js application using the cheerio, pretty, and Axios libraries.
Ideally, web scraping is the process of extracting content and useful data from a website. Web scraping is typically used in various digital businesses that depend on data harvesting.
This post will step by step describe how to build a mini web scraping tool with Node js, Axios client, cheerio package.
After assimilating all the instructions mentioned in this guide, you will be familiar with scraping the web data in a node environment.
Let us start creating the cheerio web scraper script and comprehend how to use cheerio in node js to scrap data from a specific category from a book website.
How to Build Asynchronous Web Scraping Script in Node with Cheerio
- Step 1: Create Node Project
- Step 2: Add Cheerio and Pretty Modules
- Step 2: Add Axios Package
- Step 3: Create Server File
- Step 4: Build Web Scrape Script
- Step 5: Run Scraping Script
Create Node Project
In the first step, you must ensure that you have created an empty folder.
Here is how you can generate a new folder using the given command.
mkdir node-lab
Get inside the project folder.
cd node-lab
Now, after entering into the project folder run the suggested command. Make sure to provide all the information asked by command line tool.
npm init
Add Cheerio and Pretty Modules
Further, you have to open the terminal again and run the given command for installing the cheerio and pretty packages together.
npm install cheerio pretty
Add Axios Package
In this step, you have to install the Axios HTTP client for making an asynchronous HTTP request with reference to the web scraper tool.
npm install axios
Create Server File
Next, create a server.js file, in this file we will add the code to form the web scraping script.
Now, head over to package.json and append the server.js file name besides the script tag.
...
...
"scripts": {
"start": "node server.js"
},
...
...
Build Web Scrape Script
For scraping the data from the website, we are using a dummy site that doesn’t bother if you scrape their data.
Copy the following code and paste into the server.js file.
const fs = require('fs')
const cheerio = require('cheerio')
const axios = require('axios')
const API =
'http://books.toscrape.com/catalogue/category/books/mystery_3/index.html'
const scrapperScript = async () => {
try {
const { data } = await axios.get(API)
const $ = cheerio.load(data)
const DataBooks = $('.row li .product_pod')
const scrapedData = []
DataBooks.each((index, el) => {
const scrapItem = { title: '', price: '' }
scrapItem.title = $(el).children('h3').text()
scrapItem.price = $(el)
.children('.product_price')
.children('p.price_color')
.text()
scrapedData.push(scrapItem)
})
console.dir(scrapedData)
fs.writeFile(
'scrapedBooks.json',
JSON.stringify(scrapedData, null, 2),
(e) => {
if (e) {
console.log(e)
return
}
console.log('Scraping completed.')
},
)
} catch (error) {
console.error(error)
}
}
scrapperScript()
Run Scraping Script
We are now ready to pull the data from the website. Go to terminal, type the node command and press enter.
node server.js
After you run the command, you will see extracted data on the console screen as well as in the newly generated scrapedBooks.json file in your node project’s root.
Conclusion
In this comprehensive guide, we had a look at the process of making a web scraper script in node js.
We created a basic node app whose main objective is to extract data from the website using the node mechanism.
We pulled the data from other website using the asynchronous HTTP request and Cheerio plugin.
Cheerio parses markup and provides an API for traversing/manipulating the resulting data structure.