Domino Data Lab Environment Setup & Docker Container Lessons
Published:
Topic: Setting up Python 3.11 + CUDA environment in Domino Data Lab
Table of Contents
- Initial Problem
- Understanding Domino Environments
- Environment Creation Process
- Technical Issues & Solutions
- Docker & Container Concepts
- Successful Solution
- Key Lessons Learned
Initial Problem
User’s Requirement:
- Need Python 3.10+ with CUDA support for machine learning work
- Working in Domino Data Lab platform
- Confused about available compute environments
- Initially thought Spark was needed (turned out to be unnecessary)
Initial Confusion:
- What is “Domino 6.0 Spark compute environment”?
- Which environment to choose for Python 3.10+ and CUDA?
- Difference between various compute environment types
Understanding Domino Environments
Environment Types in Domino
- Domino Standard Environment (DSE) - Complete set of libraries and packages
- Domino Minimal Environment (DME) - Lighter with fewer packages
- Custom Environments - User-built environments for specific needs
Available Environment Categories
- Standard Data Science environments
- GPU/CUDA environments
- Spark environments (for distributed computing)
- Custom Docker-based environments
Key Insight: Spark is NOT needed
- Spark = Distributed computing across multiple machines
- User’s need = Individual ML work with GPU acceleration
- Solution = Standard Python + CUDA environment
Environment Creation Process
Initial Approach: Search for Existing
- Looked for environments with both Python 3.10+ and CUDA
- Problem: No pre-built environment matched exact requirements
- Discovery: Domino allows custom environment creation
Solution: Duplicate and Modify Existing Environment
Steps Taken:
- Found base environment:
domino-dse5.3-cuda11.8
(had CUDA 11.8 but Python 3.9) - Duplicated environment using Domino’s interface
- Renamed to:
domino-dse5.3-cuda11.8-py3.11
- Added custom Dockerfile instruction to upgrade Python
Environment Configuration Interface
Domino Environment Editor:
- ✓ Environment Base (inherit from existing)
- ✓ Supported Cluster Settings (none/Spark/Ray/Dask/MPI)
- ✓ Dockerfile Instructions (custom modifications)
- ✓ Pluggable Workspace Tools
- ✓ Run Setup Scripts
- ✓ Environment Variables
- ✓ Advanced Settings
Technical Issues & Solutions
Issue 1: SSL Certificate Verification Error
Error Message: CondaSSLError: Encountered an SSL error. Most likely a certificate verification issue.
Exception: HTTPSConnectionPool (host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/main/linux-64/current_repodata.json (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (ssl.c:1129)')))
Root Cause:
- Corporate network with proxy servers/firewalls
- Self-signed certificates in certificate chain
- Conda refusing “unsafe” connections for security
Solution Applied:
RUN conda config --set ssl_verify false && conda install -c conda-forge python=3.11 -y
Why This Works:
- Disables SSL verification for conda
- Uses
conda-forge
channel (community-maintained, often more reliable) - Safe in controlled Domino environment
Issue 2: Long Build Times
Observed Behavior:
- Build process took 30+ minutes
- Multiple dependency resolution attempts
- “failed with initial frozen solve. Retrying with flexible solve”
Explanation:
- Python 3.9 -> 3.11: Major version upgrade
- Complex dependencies in scientific computing stack
- CUDA compatibility checks required
- Network latency downloading packages
Normal Build Process:
- Collecting package metadata (current_repodata.json)
- Solving environment (frozen solve) ❌
- Solving environment (flexible solve) ☑️
- Collecting package metadata (repodata.json)
- Final dependency resolution… (in progress)
Issue 3: Jupyter Notebook Startup Failure
Error Message: ModuleNotFoundError: No module named 'notebook.notebookapp'
Root Cause:
- Python 3.11 upgrade changed Jupyter architecture
- Jupyter Notebook 7.0+ restructured/relocated
notebook.notebookapp
module during architectural changes - Default Domino workspace configuration expects classic Notebook interface
Solutions Attempted:
- Updated Dockerfile - Added specific Jupyter packages:
RUN pip install jupyterlab notebook nbconvert ipykernel
- IDE Selection - Switched from “Jupyter” to “JupyterLab” in workspace creation
Resolution:
- JupyterLab works perfectly with Python 3.11
- Classic Jupyter Notebook fails due to module incompatibility
- Root issue: Jupyter ecosystem evolution and backward compatibility
Docker & Container Concepts
Key Conceptual Learning
Traditional Software Deployment Problems:
- “Works on my machine” syndrome
- Environment configuration complexity
- Dependency conflicts between applications
Docker’s Solution:
- Standardized containers = Software + Dependencies + Environment
- Write once, run anywhere philosophy
- Isolation without full virtualization overhead
Docker vs Python Virtual Environment
| Feature | Python Virtual Env | Docker Container | | :— | :— | :— | | Scope | Python packages only | Entire OS + applications | | Isolation| Python-level | System-level | | Size | MB-level | GB-level | | Includes | Python libs | OS + Python + CUDA + tools | | Use Case | Development | Development + Production |
Docker Naming Etymology
- Docker = Dock worker (stevedore)
- Analogy: Shipping containers revolutionized cargo transport
- Software containers standardize application deployment
- Docker “workers” manage these “software containers”
Domino Environment = Enhanced Docker Container
Domino Environment Contains:
- ✓ Base Linux OS (Ubuntu/CentOS)
- ✓ Python Environment (specific version + packages)
- ✓ GPU Support (CUDA drivers + cuDNN)
- ✓ Development Tools (Jupyter, VS Code, Git)
- ✓ Pre-installed ML Libraries (NumPy, Pandas, PyTorch)
- ✓ Domino-specific integrations
✓ Workspace management tools
SUCCESSFUL SOLUTION: Official Domino Method
Discovery of Official Documentation
After the conda approach failed, we discovered official Domino documentation for installing Python 3.11, which provided a much more efficient solution.
- Source: Installing Python 3.11 in a Domino Compute Environment (Sep 27, 2024)
Why the Official Method Works Better
| Aspect | Conda Approach (Failed) | Official Method (Success) | | :— | :— | :— | | Strategy | Upgrade existing Python in conda env | Install new Python via system packages | | Dependencies| Must resolve 200+ scientific packages | Bypasses conda dependency resolution | | CUDA Handling| Tries to maintain compatibility | Preserves existing CUDA installation | | Build Time | 1+ hours (failed) | ~3 minutes | | Success Rate | Failed | ✓ Successful |
Final Working Dockerfile
USER root
# Clean problematic repositories and install Python 3.11
RUN rm -f /etc/apt/sources.list.d/pgdg.list && \
apt update && \
apt install software-properties-common -y && \
add-apt-repository ppa:deadsnakes/ppa && \
apt update && \
apt install python3.11 python3.11-distutils -y
# Set Python 3.11 as default python and python3 commands
RUN update-alternatives --install /opt/conda/bin/python python /usr/bin/python3.11 1
RUN update-alternatives --install /opt/conda/bin/python3 python3 /usr/bin/python3.11 1
# Install pip for Python 3.11
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11
# Install PyTorch with CUDA 11.8 support and scientific packages
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 && \
pip install jupyterlab notebook nbconvert ipykernel && \
pip install numpy pandas matplotlib scikit-learn
USER ubuntu
Step-by-Step Command Explanation
User Permission Management
USER root
- Purpose: Switch to root user for system-level package installation.
- Why needed: Installing system packages requires administrator privileges.
Repository Cleanup and Python Installation
RUN rm -f /etc/apt/sources.list.d/pgdg.list && \
- Purpose: Remove problematic PostgreSQL repository.
apt update && \
- Purpose: Refresh package lists after cleaning repositories.
apt install software-properties-common -y && \
- Purpose: Install repository management tools.
add-apt-repository ppa:deadsnakes/ppa && \
- Purpose: Add
deadsnakes
Personal Package Archive, which provides the latest Python versions for Ubuntu.
- Purpose: Add
apt update && \
- Purpose: Refresh package lists to include new repository.
apt install python3.11 python3.11-distutils -y
- Result: Clean Python 3.11 installation alongside existing Python 3.9.
Python Command Redirection
RUN update-alternatives --install /opt/conda/bin/python python /usr/bin/python3.11 1
- Purpose: Make
python
command invoke Python 3.11.
- Purpose: Make
RUN update-alternatives --install /opt/conda/bin/python3 python3 /usr/bin/python3.11 1
- Purpose: Make
python3
command also invoke Python 3.11.
- Purpose: Make
Package Manager Installation
RUN curl -sS https://bootstrap.pypa.io/get-pip.py | python3.11
- Purpose: Download the official pip installer script and execute it with the new Python 3.11.
PyTorch and Scientific Packages
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 && \
--index-url
: Specifies the PyTorch repository with versions compiled for CUDA 11.8.
pip install jupyterlab notebook nbconvert ipykernel && \
pip install numpy pandas matplotlib scikit-learn
- Installs the essential scientific computing and Jupyter stack.
Security Best Practice
USER ubuntu
- Purpose: Switch back to a non-privileged user.
- Security: Prevents applications from running with root privileges, which is a standard practice.
Build Results: Success!
- Total build time: ~13 minutes
- Total process time: 3 minutes coding + 10 minutes Docker build/push
Successfully Installed:
- Python 3.11.13
- PyTorch 2.7.1+cu118
- All NVIDIA CUDA packages (cublas, cudnn, etc.)
- Complete scientific computing stack
- Jupyter notebook environment
Key Success Factors:
- System-level approach: Used
apt
instead of fighting conda dependencies. - Repository management: Properly cleaned and added Python repositories.
- Preserved CUDA: Kept existing GPU drivers and toolkit intact.
- Official methodology: Followed Domino’s recommended approach.
- Efficient layering: Each Docker layer had a clear, focused purpose.
Final Status:
- ✔ Environment builds successfully
- ✔ JupyterLab starts without errors
- ✔ Python 3.11 + CUDA + PyTorch working perfectly
- ✔ GPU access confirmed (NVIDIA A10G with 23GB VRAM)
- ❌ Classic Jupyter Notebook fails (module compatibility issue)
Key Lessons Learned
Technical Lessons
- Environment Hierarchy: Choose the CUDA environment first, then upgrade Python. Installing CUDA later is difficult as it requires system-level privileges not available in a running container.
- Corporate Network Challenges: SSL certificate issues are common.
conda-forge
is often a more reliable channel, and disabling SSL verification can be safe in controlled environments. - Dependency Management: Major Python version upgrades can take significant time due to complex interdependencies in the scientific computing stack.
- Container vs. Virtual Environment: Containers provide full system-level isolation, whereas virtual environments only isolate Python packages. Containers are essential for reproducible ML and GPU computing.
Practical Lessons
- Platform Understanding: Realize that Domino environments are sophisticated Docker containers and that custom environment creation is a powerful and necessary feature.
- Problem-Solving Approach: Start with existing solutions and modify them. Understand the root cause before applying a fix, especially in corporate environments.
The Ultimate Lesson:
When facing complex technical challenges, always check if the platform vendor has official guidance before attempting custom solutions. The official Domino method took ~3 minutes versus over an hour of failed attempts with conda.