Optimizing Docker Images with Multi-Stage Builds

In modern production environments, Docker image size has a direct impact on deployment speed, resource consumption, and security posture. Large images lead to slower pulls, increased storage costs, and a broader attack surface due to unnecessary tools and dependencies. This chapter tackles that problem head-on by introducing multi-stage Docker builds.

We’ll refactor a typical application Dockerfile to leverage multi-stage builds, dramatically reducing its final size. By the end of this milestone, you will have a significantly smaller, more efficient, and more secure Docker image for your web application, ready for robust production deployment.

Project Overview: Leaner Containers for Production

This chapter focuses on a critical aspect of containerization: optimizing Docker image size. We will take a conventional Node.js web application Dockerfile and transform it into a multi-stage build. This transformation will demonstrate how to separate build-time dependencies from runtime essentials, resulting in a production image that is minimal, fast to deploy, and more secure.

Our goal is to achieve a noticeably smaller final image without compromising application functionality, a key requirement for any production deployment.

Tech Stack & Setup

For this chapter, our primary focus is on the Dockerfile itself and how Docker Engine processes it.

Docker Engine: We assume you have Docker Engine installed and running. As of 2026-05-22, the latest stable version of Docker Engine is actively maintained. (Specific version information for Docker Engine is dynamic and best checked against official releases at the time of installation, but the core Dockerfile syntax remains stable.)
Node.js: Our example application is a simple Node.js Express server. We’ll use node:20-alpine as our base image, which refers to Node.js version 20 (an LTS release) on an Alpine Linux base, known for its small footprint.
Text Editor: Any code editor (e.g., VS Code) will suffice.
Command Line: Basic familiarity with docker build and docker run commands.

The Challenge of Image Bloat

When containerizing an application, especially one that requires compilation or extensive build steps (like a Node.js application with frontend assets, a Go binary, or a Java JAR), the build process often pulls in many development dependencies, compilers, and tools. If these are all included in the final image, it becomes unnecessarily large.

For example, a Node.js application might need npm or yarn and various build tools to compile TypeScript or bundle frontend assets. These tools are critical during the build phase but are entirely superfluous at runtime. Shipping them in your production image adds bloat, increasing:

Deployment Times: Larger images take longer to pull from registries to deployment targets.
Storage Costs: More disk space is consumed on registries and host machines.
Attack Surface: Every additional file, library, or tool in an image introduces potential vulnerabilities that need to be patched and monitored.

Core Concept: Multi-Stage Builds Explained

Multi-stage builds are a powerful Dockerfile feature designed to create smaller, more secure images. They achieve this by allowing you to define multiple FROM instructions within a single Dockerfile. Each FROM instruction starts a new build stage, and critically, you can selectively copy artifacts from previous stages into a later stage.

📌 Key Idea: The core principle is to use a “builder” stage with all necessary development tools to compile or prepare your application, and then copy only the final, compiled artifacts into a separate, much smaller “runtime” stage. This discards all the build tools and intermediate files, resulting in a lean production image.

How it Solves the Problem

Consider a Node.js application:

Builder Stage: Uses a comprehensive Node.js image (e.g., node:20-alpine) to install all devDependencies and dependencies, compile TypeScript, or bundle frontend assets.
Runtime Stage: Uses a minimal Node.js image (e.g., node:20-alpine) and only copies the compiled application code and strictly necessary production dependencies (installed with --only=production) from the builder stage.

This separation ensures that your final image contains only what’s absolutely required for the application to run, not what’s needed to build it.

Build Plan: Optimizing Our Dockerfile

Our strategy for optimizing the Docker image for our Node.js web application involves these distinct steps:

Baseline Setup: Create a simple Node.js application and an initial, unoptimized Dockerfile to establish a baseline image size.
Identify Build vs. Runtime Needs: Determine which dependencies and files are essential for building the application and which are only needed for its execution.
Implement Builder Stage: Modify the Dockerfile to include a “builder” stage that handles all dependency installation and any compilation steps.
Implement Runtime Stage: Add a “runtime” stage that starts from a clean, minimal base image and selectively copies only the necessary artifacts (compiled code, production dependencies) from the “builder” stage.
Verify Optimization: Build the new multi-stage image and compare its size to the baseline. Confirm the application still functions correctly.

Architecture Overview

The multi-stage build process can be visualized as a pipeline where intermediate artifacts are passed between stages, but not the entire environment.

flowchart TD A[Source Code] --> B{Dockerfile Process} subgraph Build_Stage["Build Stage"] B --> C[Install Dev and Prod Deps] C --> D[Compile or Bundle App] end subgraph Runtime_Stage["Runtime Stage"] D -->|Copy Compiled App| E[Install Prod Deps Only] E --> F[Configure Entrypoint] end F --> G[Final Optimized Image]

Step-by-Step Implementation

Let’s begin by setting up our sample Node.js application and then refactoring its Dockerfile.

1. Create a Sample Node.js Application

If you haven’t already, create a directory called my-web-app and navigate into it.

mkdir my-web-app
cd my-web-app

Inside my-web-app, create package.json:

// my-web-app/package.json
{
  "name": "my-web-app",
  "version": "1.0.0",
  "description": "A simple Node.js web app",
  "main": "src/server.js",
  "scripts": {
    "start": "node src/server.js"
  },
  "dependencies": {
    "express": "^4.19.2"
  }
}

Next, create the src directory and the server.js file:

mkdir src

Inside my-web-app/src/server.js:

// my-web-app/src/server.js
const express = require('express');
const app = express();
const port = process.env.PORT || 3000;

app.get('/', (req, res) => {
  res.send('Hello from the optimized Docker container!');
});

app.listen(port, () => {
  console.log(`App listening at http://localhost:${port}`);
});

Install the local dependencies:

npm install

2. Initial, Unoptimized Dockerfile

Now, create a basic, single-stage Dockerfile that doesn’t use multi-stage builds. This will serve as our baseline.

Create my-web-app/Dockerfile.unoptimized:

# my-web-app/Dockerfile.unoptimized
FROM node:20-alpine

# Set the working directory inside the container
WORKDIR /app

# Copy package.json and package-lock.json to install dependencies
# This layer is cached if these files don't change
COPY package*.json ./

# Install all dependencies (dev and prod)
RUN npm install

# Copy the rest of the application source code
COPY . .

# Expose the port the application listens on
EXPOSE 3000

# Define the command to run the application
CMD ["npm", "start"]

Build this image and tag it my-web-app:unoptimized:

docker build -t my-web-app:unoptimized -f Dockerfile.unoptimized .

3. Refactor with Multi-Stage Build

Now, let’s create our optimized Dockerfile using multi-stage builds. We’ll define two stages: builder and the final runtime stage.

Create my-web-app/Dockerfile:

# my-web-app/Dockerfile (Optimized with Multi-Stage Build)

# --- Stage 1: Builder ---
# This stage installs all dependencies (dev + prod) and prepares the application.
FROM node:20-alpine AS builder

WORKDIR /app

# Copy package.json and package-lock.json first to leverage Docker layer caching.
# If these files don't change, this layer won't be rebuilt.
COPY package*.json ./

# Install all dependencies, including development ones if they were present.
# 'npm ci' is preferred for CI/CD as it uses package-lock.json for exact versions,
# ensuring reproducible builds.
RUN npm ci

# Copy the rest of the application source code into the builder stage.
COPY . .

# If you had a build step (e.g., TypeScript compilation, Webpack bundling),
# it would typically go here:
# RUN npm run build
# For our simple Express app, there's no separate 'build' script beyond 'npm start',
# so we'll just copy the raw source in the next stage.

# --- Stage 2: Production Runtime ---
# This stage uses a minimal base image and copies only the necessary
# production artifacts from the builder stage.
FROM node:20-alpine

WORKDIR /app

# Copy only the package.json and package-lock.json from the builder stage.
# This allows us to install only production dependencies in this final stage.
COPY --from=builder /app/package*.json ./

# Install only production dependencies.
# This command is crucial for keeping the final image lean.
RUN npm ci --only=production

# Copy the application source code (or compiled output) from the builder stage.
# If 'npm run build' produced a 'dist' folder, you'd copy that:
# COPY --from=builder /app/dist ./dist
# For our simple app, we copy the source directly.
COPY --from=builder /app/src ./src

EXPOSE 3000

# Define the command to run the application in production
CMD ["npm", "start"]

Why these changes are production-minded:

FROM node:20-alpine AS builder: By naming the first stage builder, we clearly define its purpose and make it referenceable.
RUN npm ci (in builder): npm ci ensures a clean installation of exact dependency versions as specified in package-lock.json, which is vital for reproducible builds.
FROM node:20-alpine (second instance): This starts a completely new, clean image build. None of the layers from the builder stage, except what we explicitly COPY --from, are included.
COPY --from=builder /app/package*.json ./: This is the key to multi-stage builds. It copies specific files from the builder stage’s filesystem (/app/package*.json) into the current stage’s working directory (./). We copy these to correctly install production dependencies.
RUN npm ci --only=production: This command is critical. It installs only the dependencies listed under dependencies in package.json, excluding devDependencies. This dramatically reduces the node_modules size.
COPY --from=builder /app/src ./src: We copy only the application’s source code (or compiled output if a build step was involved) from the builder stage. This prevents the final image from including unnecessary files from the build context.

Testing & Verification

Now, let’s build the optimized image and compare its size to our baseline.

Build the optimized image:
```
docker build -t my-web-app:optimized .
```
This command will use the Dockerfile we just created.
Compare image sizes:
```
docker images my-web-app
```
You should see two images listed: my-web-app:unoptimized and my-web-app:optimized. You will observe that the optimized version is smaller. For a very simple app like this, the difference might be minor (e.g., a few MB), but for real-world applications with many development dependencies or complex build steps (like bundling frontend assets or compiling a Go binary), the savings can be hundreds of megabytes or even gigabytes.
⚡ Real-world insight: A Go application compiled in a golang:latest builder stage and then copied to a scratch or alpine runtime stage can shrink from hundreds of MBs to just a few MBs for the final binary.
Run the optimized container:
Verify that the application still functions correctly with the optimized image.
```
docker run -p 3000:3000 my-web-app:optimized
```
Open your web browser and navigate to http://localhost:3000. You should see the message “Hello from the optimized Docker container!”. This confirms that all necessary files were successfully copied into the final stage.
Inspect container layers (optional but recommended):
To understand the layers that make up your image and confirm the absence of build-time artifacts, use docker history:
```
docker history my-web-app:optimized
```
Compare this output to docker history my-web-app:unoptimized. You’ll notice that the optimized image’s history is cleaner, reflecting only the steps taken in the final runtime stage, without the full npm install history of all dependencies.

Production Considerations

Multi-stage builds are a fundamental best practice for production Docker images due to their critical benefits:

Enhanced Security: By strictly including only runtime necessities, the attack surface is significantly reduced. Fewer installed packages, libraries, and tools mean fewer potential vulnerabilities for attackers to exploit.
Faster Deployment and Scaling: Smaller images transfer more quickly across networks, leading to faster build times, quicker deployments, and more responsive scaling operations in orchestrated environments like Kubernetes.
Reduced Resource Consumption: Smaller images consume less disk space on host machines and registries, which can lead to cost savings, especially at scale. They may also have a smaller memory footprint due to fewer loaded libraries.
Improved Cache Utilization: Docker layers are cached. By structuring your Dockerfile with multi-stage builds, you can optimize layer caching more effectively. For instance, changes to your application code won’t necessarily invalidate the npm ci --only=production layer in the runtime stage, speeding up rebuilds.
Clearer Separation of Concerns: The Dockerfile explicitly separates the “how to build” from the “how to run,” making the build process more transparent and easier to maintain.

🧠 Important: Always use minimal base images (e.g., alpine, slim) for your runtime stage. For compiled static binaries, scratch is the ultimate minimal base image.

Common Issues & Solutions

Application Fails to Start Due to Missing Files:
- Issue: The container starts, but the application immediately exits or logs “file not found” errors. This indicates that a critical file or directory required at runtime was not copied from the builder stage to the final runtime stage.
- Debugging: Carefully examine your COPY --from=builder commands in the final stage. Common culprits include missing config directories, static assets, or even the main application entry point script. Check the application logs (docker logs <container_id>) for specific file path errors.
- Solution: Ensure every file or folder necessary for the application’s runtime is explicitly copied. For example, if your application has a public folder for static files, ensure COPY --from=builder /app/public ./public is present.
Image Size Still Larger Than Expected:
- Issue: Despite using multi-stage builds, the final image size isn’t as small as anticipated.
- Debugging:
  - Base Image Choice: Are you using a minimal base image for your runtime stage (e.g., node:20-alpine vs. node:20)?
  - Over-copying: Are you copying too much from the builder stage? A common mistake is COPY --from=builder /app . instead of being selective (e.g., COPY --from=builder /app/dist ./dist, COPY --from=builder /app/node_modules ./node_modules).
- Solution:
  - Always default to alpine or slim variants for your runtime base image.
  - Be as granular as possible with COPY --from commands, only including truly essential runtime artifacts.
Dependency Installation Issues in Runtime Stage (--only=production):
- Issue: npm ci --only=production (or equivalent for other package managers) fails in the final stage, often due to missing system libraries required by a production dependency with native bindings.
- Debugging: This typically happens when a production dependency requires compilation (e.g., node-sass, sqlite3) and the minimal alpine base image lacks necessary build tools like gcc, g++, or make.
- Solution:
  - Pre-build in Builder: The best approach is to ensure any native dependencies are fully built and linked in the builder stage, and then copy only the resulting compiled modules (e.g., the node_modules directory) to the runtime stage.
  - Temporary Install: If pre-building is complex, you might temporarily install build tools in the runtime stage before npm ci --only=production, then immediately remove them (apk del ...) in the same RUN command to avoid increasing final image size.
  - Less Minimal Base Image: As a last resort, if native dependencies are unavoidable and complex, consider a slightly less minimal base image (e.g., node:20-slim) that might include more common system libraries.

Summary & Next Step

You’ve successfully implemented multi-stage builds, a cornerstone of production-ready Docker image optimization. Your application’s Docker image is now significantly smaller, more efficient, and inherently more secure by shedding unnecessary build-time components. This practice is crucial for maintainable and performant containerized applications.

We’ve verified the size reduction and confirmed the application still functions correctly within its optimized container. This lean image is now better prepared for deployment to any environment.

Next, we’ll shift our focus to securing sensitive information. In the upcoming chapter, we’ll explore how to handle environment variables and secrets effectively, ensuring that credentials and confidential data are managed securely within your Dockerized application stack.

This page is AI-assisted and reviewed. It references official documentation and recognized resources where relevant.

References

Docker Documentation: https://docs.docker.com/
Docker Multi-stage builds: https://docs.docker.com/build/building/multi-stage/
Node.js Docker Official Image: https://hub.docker.com/_/node
npm ci documentation: https://docs.npmjs.com/cli/commands/npm-ci