The Data Science Toolbox project set out to tame an inconsistent classroom + lab environment. Different teammates had clashing versions of Python, TensorFlow, and BI tools, causing "works on my machine" drift.

Objectives

  • Standardize runtime versions for analytics libraries
  • Offer a simple web UI landing page linking to internal services
  • Ensure cold start reproducibility with one command

Architecture

  • Reverse proxy (nginx) fronting services on clean paths
  • Container set: JupyterLab, TensorFlow base image w/ GPU optional, lightweight git UI, and a Tableau bridge
  • Shared volume mounted for notebooks + datasets

Image Strategy

Each service has its own Dockerfile pinned to minor versions. A thin docker-compose.yml wires networks & volumes. Build args allow toggling GPU image.

Dev Experience

docker compose up -d → landing page lists live health checks. New contributors skip 45+ minutes of environment setup.

Security & Hardening

  • Non‑root users in containers
  • Read‑only configs baked; secrets via env file (excluded from repo)
  • Resource limits to prevent runaway training jobs

Lessons

  1. Pin versions early; silent minor bumps caused reproducibility issues.
  2. A tiny landing UI reduces friction; teammates actually use the stack.
  3. Layer caching saves CI minutes—multi-stage builds trimmed 300MB.

Next Steps

  • Add Prometheus sidecar for resource graphs
  • Integrate VS Code server for remote editing

A disciplined container boundary turned an ad‑hoc tool sprawl into a predictable platform.