Repairing user-managed notebooks on Google Cloud

In this note, I am sharing a case study on debugging and fixing jupyter-lab access issues.

The diagnostic script can be run on a VM instance as shown below:

(base) maciej.skorski@shared-notebooks:~$ sudo /opt/deeplearning/bin/diagnostic_tool.sh

Vertex Workbench Diagnostic Tool


Running system diagnostics...

Checking Docker service status...               [OK]
Checking Proxy Agent status...                  [OK]
Checking Jupyter service status in container...         [ERROR] Jupyter service is not running
Checking internal Jupyter API status...         [ERROR] Jupyter API is not active
Checking boot disk (/dev/sda1) space...         [OK]
Checking data disk (/dev/sdb) space...          [OK]
Checking DNS notebooks.googleapis.com...        [OK]
Checking DNS notebooks.cloud.google.com...      [OK]

System's health status is degraded

Diagnostic tool will collect the following information: 

  System information
  System Log /var/log/
  Docker information
  Jupyter service status
  Network information
  Proxy configuration: /opt/deeplearning/proxy-agent-config.json
  Conda environment information
  pip environment information
  GCP instance information

Do you want to continue (y/n)? n

Jupyter service runs from a container, but it somehow stopped in this case 😳

(base) maciej.skorski@shared-notebooks:~$ docker container ls
CONTAINER ID   IMAGE     COMMAND   CREATED   STATUS    PORTS     NAMES

Not a problem! We can restart the container, but carefully choosing the parameters to expose it properly (ports, mounted folders etc). The appropriate docker command can be retrieved from a running container on a similar healthy instance by docker inspect

(base) maciej.skorski@kaggle-test-shared:~$ docker inspect \
>   --format "$(curl -s https://gist.githubusercontent.com/ictus4u/e28b47dc826644412629093d5c9185be/raw/run.tpl)" 3f5b6d709ccc

docker run \
  --name "/payload-container" \
  --runtime "runc" \
  --volume "/home/jupyter:/home/jupyter" \
  --mount type=bind,source=/opt/deeplearning/jupyter/jupyter_notebook_config.py,destination=/opt/jupyter/.jupyter/jupyter_notebook_config.py,readonly \
  --log-driver "json-file" \
  --restart "always" \
  --publish "127.0.0.1:8080:8080/tcp" \
  --network "bridge" \
  --hostname "3f5b6d709ccc" \
  --expose "8080/tcp" \
  --env "NOTEBOOK_DISABLE_ROOT=" \
  --env "TENSORBOARD_PROXY_URL=/proxy/%PORT%/" \
  --env "LIT_PROXY_URL=/proxy/%PORT%/" \
  --env "PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
  --env "GCSFUSE_METADATA_IMAGE_TYPE=DLC" \
  --env "LC_ALL=C.UTF-8" \
  --env "LANG=C.UTF-8" \
  --env "ANACONDA_PYTHON_VERSION=3.10" \
  --env "DL_ANACONDA_HOME=/opt/conda" \
  --env "SHELL=/bin/bash" \
  --env "LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64::/opt/conda/lib" \
  --env "CONTAINER_NAME=tf2-cpu/2-11" \
  --env "KMP_BLOCKTIME=0" \
  --env "KMP_AFFINITY=granularity=fine,verbose,compact,1,0" \
  --env "KMP_SETTINGS=false" \
  --env "NODE_OPTIONS=--max-old-space-size=4096" \
  --env "ENABLE_MULTI_ENV=false" \
  --env "LIBRARY_PATH=:/opt/conda/lib" \
  --env "TENSORFLOW_VERSION=2.11.0" \
  --env "KMP_WARNINGS=0" \
  --env "PROJ_LIB=/opt/conda/share/proj" \
  --env "TESSERACT_PATH=/usr/bin/tesseract" \
  --env "PYTHONPATH=:/opt/facets/facets_overview/python/" \
  --env "MKL_THREADING_LAYER=GNU" \
  --env "PYTHONUSERBASE=/root/.local" \
  --env "MPLBACKEND=agg" \
  --env "GIT_COMMIT=7e2b36e4a2ac3ef3df74db56b1fd132d56620e8a" \
  --env "BUILD_DATE=20230419-235653" \
  --label "build-date"="20230419-235653" \
  --label "com.google.environment"="Container: TensorFlow 2-11" \
  --label "git-commit"="7e2b36e4a2ac3ef3df74db56b1fd132d56620e8a" \
  --label "kaggle-lang"="python" \
  --label "org.opencontainers.image.ref.name"="ubuntu" \
  --label "org.opencontainers.image.version"="20.04" \
  --label "tensorflow-version"="2.11.0" \
  --detach \
  --tty \
  --entrypoint "/entrypoint.sh" \
  "gcr.io/kaggle-images/python:latest" \
  "/run_jupyter.sh" 

Now the check goes OK 🙂

(base) maciej.skorski@shared-notebooks:~$ sudo /opt/deeplearning/bin/diagnostic_tool.sh

Vertex Workbench Diagnostic Tool


Running system diagnostics...

Checking Docker service status...               [OK]
Checking Proxy Agent status...                  [OK]
Checking Jupyter service status in container... [OK]
Checking internal Jupyter API status...         [OK]
Checking boot disk (/dev/sda1) space...         [OK]
Checking data disk (/dev/sdb) space...          [OK]
Checking DNS notebooks.googleapis.com...        [OK]
Checking DNS notebooks.cloud.google.com...      [OK]

Published by mskorski

Scientist, Consultant, Learning Enthusiast

Leave a comment

Your email address will not be published. Required fields are marked *