The Data Science Toolbox project set out to tame an inconsistent classroom + lab environment. Different teammates had clashing versions of Python, TensorFlow, and BI tools, causing "works on my machine" drift.
Objectives
- Standardize runtime versions for analytics libraries
- Offer a simple web UI landing page linking to internal services
- Ensure cold start reproducibility with one command
Architecture
- Reverse proxy (nginx) fronting services on clean paths
- Container set: JupyterLab, TensorFlow base image w/ GPU optional, lightweight git UI, and a Tableau bridge
- Shared volume mounted for notebooks + datasets
Image Strategy
Each service has its own Dockerfile pinned to minor versions. A thin docker-compose.yml wires networks & volumes. Build args allow toggling GPU image.
Dev Experience
docker compose up -d → landing page lists live health checks. New contributors skip 45+ minutes of environment setup.
Security & Hardening
- Non‑root users in containers
- Read‑only configs baked; secrets via env file (excluded from repo)
- Resource limits to prevent runaway training jobs
Lessons
- Pin versions early; silent minor bumps caused reproducibility issues.
- A tiny landing UI reduces friction; teammates actually use the stack.
- Layer caching saves CI minutes—multi-stage builds trimmed 300MB.
Next Steps
- Add Prometheus sidecar for resource graphs
- Integrate VS Code server for remote editing
A disciplined container boundary turned an ad‑hoc tool sprawl into a predictable platform.