In this note, I am sharing a case study on debugging and fixing jupyter-lab
access issues.
The diagnostic script can be run on a VM instance as shown below:
(base) maciej.skorski@shared-notebooks:~$ sudo /opt/deeplearning/bin/diagnostic_tool.sh
Vertex Workbench Diagnostic Tool
Running system diagnostics...
Checking Docker service status... [OK]
Checking Proxy Agent status... [OK]
Checking Jupyter service status in container... [ERROR] Jupyter service is not running
Checking internal Jupyter API status... [ERROR] Jupyter API is not active
Checking boot disk (/dev/sda1) space... [OK]
Checking data disk (/dev/sdb) space... [OK]
Checking DNS notebooks.googleapis.com... [OK]
Checking DNS notebooks.cloud.google.com... [OK]
System's health status is degraded
Diagnostic tool will collect the following information:
System information
System Log /var/log/
Docker information
Jupyter service status
Network information
Proxy configuration: /opt/deeplearning/proxy-agent-config.json
Conda environment information
pip environment information
GCP instance information
Do you want to continue (y/n)? n
Jupyter service runs from a container, but it somehow stopped in this case 😳
(base) maciej.skorski@shared-notebooks:~$ docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
Not a problem! We can restart the container, but carefully choosing the parameters to expose it properly (ports, mounted folders etc). The appropriate docker command can be retrieved from a running container on a similar healthy instance by docker inspect
(base) maciej.skorski@kaggle-test-shared:~$ docker inspect \
> --format "$(curl -s https://gist.githubusercontent.com/ictus4u/e28b47dc826644412629093d5c9185be/raw/run.tpl)" 3f5b6d709ccc
docker run \
--name "/payload-container" \
--runtime "runc" \
--volume "/home/jupyter:/home/jupyter" \
--mount type=bind,source=/opt/deeplearning/jupyter/jupyter_notebook_config.py,destination=/opt/jupyter/.jupyter/jupyter_notebook_config.py,readonly \
--log-driver "json-file" \
--restart "always" \
--publish "127.0.0.1:8080:8080/tcp" \
--network "bridge" \
--hostname "3f5b6d709ccc" \
--expose "8080/tcp" \
--env "NOTEBOOK_DISABLE_ROOT=" \
--env "TENSORBOARD_PROXY_URL=/proxy/%PORT%/" \
--env "LIT_PROXY_URL=/proxy/%PORT%/" \
--env "PATH=/opt/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin" \
--env "GCSFUSE_METADATA_IMAGE_TYPE=DLC" \
--env "LC_ALL=C.UTF-8" \
--env "LANG=C.UTF-8" \
--env "ANACONDA_PYTHON_VERSION=3.10" \
--env "DL_ANACONDA_HOME=/opt/conda" \
--env "SHELL=/bin/bash" \
--env "LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/local/lib/x86_64-linux-gnu:/usr/local/nvidia/lib:/usr/local/nvidia/lib64::/opt/conda/lib" \
--env "CONTAINER_NAME=tf2-cpu/2-11" \
--env "KMP_BLOCKTIME=0" \
--env "KMP_AFFINITY=granularity=fine,verbose,compact,1,0" \
--env "KMP_SETTINGS=false" \
--env "NODE_OPTIONS=--max-old-space-size=4096" \
--env "ENABLE_MULTI_ENV=false" \
--env "LIBRARY_PATH=:/opt/conda/lib" \
--env "TENSORFLOW_VERSION=2.11.0" \
--env "KMP_WARNINGS=0" \
--env "PROJ_LIB=/opt/conda/share/proj" \
--env "TESSERACT_PATH=/usr/bin/tesseract" \
--env "PYTHONPATH=:/opt/facets/facets_overview/python/" \
--env "MKL_THREADING_LAYER=GNU" \
--env "PYTHONUSERBASE=/root/.local" \
--env "MPLBACKEND=agg" \
--env "GIT_COMMIT=7e2b36e4a2ac3ef3df74db56b1fd132d56620e8a" \
--env "BUILD_DATE=20230419-235653" \
--label "build-date"="20230419-235653" \
--label "com.google.environment"="Container: TensorFlow 2-11" \
--label "git-commit"="7e2b36e4a2ac3ef3df74db56b1fd132d56620e8a" \
--label "kaggle-lang"="python" \
--label "org.opencontainers.image.ref.name"="ubuntu" \
--label "org.opencontainers.image.version"="20.04" \
--label "tensorflow-version"="2.11.0" \
--detach \
--tty \
--entrypoint "/entrypoint.sh" \
"gcr.io/kaggle-images/python:latest" \
"/run_jupyter.sh"
Now the check goes OK 🙂
(base) maciej.skorski@shared-notebooks:~$ sudo /opt/deeplearning/bin/diagnostic_tool.sh
Vertex Workbench Diagnostic Tool
Running system diagnostics...
Checking Docker service status... [OK]
Checking Proxy Agent status... [OK]
Checking Jupyter service status in container... [OK]
Checking internal Jupyter API status... [OK]
Checking boot disk (/dev/sda1) space... [OK]
Checking data disk (/dev/sdb) space... [OK]
Checking DNS notebooks.googleapis.com... [OK]
Checking DNS notebooks.cloud.google.com... [OK]