NestJS is a versatile framework that can go beyond server-based applications. In this guide, we’ll focus on building a NestJS standalone application, showcasing how to utilize its modular structure and dependency injection to create a CLI-based tool for web scraping. This application will use yargs for command-line input and cheerio to extract image URLs from a webpage.

Why a Standalone NestJS Application?

While NestJS is often associated with server-based applications, it also provides the ability to create standalone applications without the overhead of an HTTP server. This makes it ideal for tasks like CLI tools, batch processing, or utilities such as a web scraper.

As the NestJS docs states:

There are several ways of mounting a Nest application. You can create a web app, a microservice or just a bare Nest standalone application (without any network listeners). The Nest standalone application is a wrapper around the Nest IoC container, which holds all instantiated classes. We can obtain a reference to any existing instance from within any imported module directly using the standalone application object. Thus, you can take advantage of the Nest framework anywhere, including, for example, scripted CRON jobs. You can even build a CLI on top of it.

So we can take advantage of the great Nest features like dependency injection and modular architecture to better organize our scripts, without exposing an HTTP server. Let’s see how we can create a web scraper CLI program!

GitHub Repository

You can find all of the code used in this project in the GitHub repository here.

Prerequisites

Before we start, ensure you have:

  • Node.js and npm installed on your system.
  • Basic understanding of TypeScript and NestJS.

Setting Up the Project

Run the following commands to create a new NestJS project:

npm install -g @nestjs/cli 
nest new nestjs-standalone
cd nestjs-standalone

Add the required packages for web scraping and CLI input handling:

pnpm install --save yargs cheerio axios

    Writing the Standalone Application

    main.ts

    The main.ts file is the entry point of the standalone application. This is where we initialize the NestJS application context using the createApplicationContext method. This approach is specifically designed for standalone applications, allowing us to run the application without starting an HTTP server.

    The bootstrap function manages the application’s lifecycle, handling command-line input via parseArguments and orchestrating the invocation of the ScraperService. Delegating command-line argument parsing to a separate file keeps the main file clean and focused on its primary purpose.

    import { NestFactory } from '@nestjs/core';
    import { AppModule } from './app.module';
    import { ScraperService } from './scraper.service';
    import { parseArguments } from './args';
    
    async function bootstrap() {
      const { url, save } = await parseArguments();
      const app = await NestFactory.createApplicationContext(AppModule);
      const scraperService = app.get(ScraperService);
      await scraperService.scrape(url, save);
      await app.close();
    }
    
    bootstrap();

    args.ts

    The args.ts file is responsible for setting up and parsing command-line arguments using the yargs library. It defines the structure of the expected inputs, including required options like --url and optional flags such as --save. By centralizing this logic, the file ensures that command-line validation is standardized and separated from the core application logic, making the application easier to maintain and extend.

    import yargs from 'yargs';
    
    export async function parseArguments() {
      return yargs(process.argv.slice(2))
        .usage('Usage: $0 --url <url> [--save]')
        .option('url', {
          alias: 'u',
          describe: 'The URL to scrape images from',
          type: 'string',
          demandOption: true,
        })
        .option('save', {
          alias: 's',
          describe: 'Save the scraped image URLs to a file',
          type: 'boolean',
          default: false,
        })
        .help().argv;
    }
    

    app.module.ts

    The AppModule is the root module, which ties together the application components and services.

    import { Module } from '@nestjs/common';
    import { ScraperService } from './scraper.service';
    
    @Module({
      imports: [],
      controllers: [],
      providers: [ScraperService],
    })
    export class AppModule {}

    scraper.service.ts

    The ScraperService encapsulates all the web scraping logic, keeping the implementation modular and testable. It uses axios to fetch the HTML content of the target URL and cheerio to parse and extract image sources. If the save parameter is true, the service also writes the scraped image URLs to a file.

    import { Injectable } from '@nestjs/common';
    import axios from 'axios';
    import * as cheerio from 'cheerio';
    import { writeFileSync } from 'fs';
    
    @Injectable()
    export class ScraperService {
      async scrape(url: string, save: boolean): Promise<string[]> {
        try {
          console.log(`Fetching data from ${url}...`);
          const response = await axios.get(url);
    
          const $ = cheerio.load(response.data);
    
          const images = $('img')
            .map((i, el) => $(el).attr('src'))
            .get()
            .filter((src) => !!src);
    
          if (images.length > 0) {
            console.log('Scraped Images:');
            images.forEach((image, index) => {
              console.log(`${index + 1}: ${image}`);
            });
    
            if (save) {
              writeFileSync('images.txt', images.join('\n'), 'utf8');
              console.log('Image URLs saved to images.txt');
            }
          }
    
          return images;
        } catch (error) {
          console.error('Error occurred while scraping:', error.message);
          return [];
        }
      }
    }

    Running the Standalone Application

    Compile the TypeScript code:

    pnpm run build

    Execute the application using the CLI:

    node dist/main.js --url=https://michaelguay.dev/

    To save the results to a file:

    node dist/main.js --url=https://michaelguay.dev/ --save

    Example Output:

    Fetching data from https://michaelguay.dev/…
    Scraped Images:
    1: https://michaelguay.dev/wp-content/uploads/2024/02/cropped-android-chrome-512x512-1.png
    2: https://michaelguay.dev/wp-content/uploads/2024/12/NestJS-Drizzle-ORM-2.png
    3: https://michaelguay.dev/wp-content/uploads/2024/11/0_bRLlXikiNJLW55Bv.png
    4: https://michaelguay.dev/wp-content/uploads/2024/10/1.png
    5: https://michaelguay.dev/wp-content/uploads/2024/10/NestJS-Drizzle-ORM-3.png
    6: https://michaelguay.dev/wp-content/uploads/2024/09/Untitled-design-8.png
    7: https://michaelguay.dev/wp-content/uploads/2024/08/24602613.png
    8: https://michaelguay.dev/wp-content/uploads/2024/08/1-1.png
    9: https://michaelguay.dev/wp-content/uploads/2024/07/1-2.png
    10: https://michaelguay.dev/wp-content/uploads/2024/07/1-1.png
    11: https://michaelguay.dev/wp-content/uploads/2024/06/1.png
    12: https://michaelguay.dev/wp-content/uploads/2024/07/images.png
    Image URLs saved to images.txt

      Advantages of NestJS Standalone Applications

      1. Modularity: Reuse services, modules, and providers across different parts of your project.
      2. Dependency Injection: Simplifies managing dependencies in a clean and scalable way.
      3. Flexibility: No need for an HTTP server, making it lightweight and efficient for tasks like CLI tools.

      Conclusion

      By leveraging the power of NestJS standalone applications, we built a modular and reusable web scraper that extracts image sources from web pages. This approach showcases the flexibility of NestJS beyond traditional server-based applications. With just a few enhancements, this scraper can be extended to handle more complex tasks like downloading images, handling pagination, or integrating with APIs.

      No longer do we have to sacrifice organizational structure and robust architecture when creating custom scripts. Even the simplest of programs can take advantage of the great features NestJS offers us.

      Try building your own NestJS standalone application today and explore the endless possibilities!

      Sign up to receive updates on new content & exclusive offers

      We don’t spam! Cancel anytime.