Packaging a Data Science Toolbox with Docker

The Data Science Toolbox project set out to tame an inconsistent classroom + lab environment. Different teammates had clashing versions of Python, TensorFlow, and BI tools, causing "works on my machine" drift.

Objectives

Standardize runtime versions for analytics libraries
Offer a simple web UI landing page linking to internal services
Ensure cold start reproducibility with one command

Architecture

Reverse proxy (nginx) fronting services on clean paths
Container set: JupyterLab, TensorFlow base image w/ GPU optional, lightweight git UI, and a Tableau bridge
Shared volume mounted for notebooks + datasets

Image Strategy

Each service has its own Dockerfile pinned to minor versions. A thin docker-compose.yml wires networks & volumes. Build args allow toggling GPU image.

Dev Experience

docker compose up -d → landing page lists live health checks. New contributors skip 45+ minutes of environment setup.

Security & Hardening

Non‑root users in containers
Read‑only configs baked; secrets via env file (excluded from repo)
Resource limits to prevent runaway training jobs

Lessons

Pin versions early; silent minor bumps caused reproducibility issues.
A tiny landing UI reduces friction; teammates actually use the stack.
Layer caching saves CI minutes—multi-stage builds trimmed 300MB.

Next Steps

Add Prometheus sidecar for resource graphs
Integrate VS Code server for remote editing

A disciplined container boundary turned an ad‑hoc tool sprawl into a predictable platform.