2.3.x release versions

Component 2.3.7-debian12/-ubuntu22/-ml-ubuntu22/-rocky9
2025/07/25
2.3.6-debian12/-ubuntu22/-ml-ubuntu22/-rocky9
2025/07/15
2.3.5-debian12/-ubuntu22/-ml-ubuntu22/-rocky9
2025/07/04
2.3.4-debian12/-ubuntu22/-ml-ubuntu22/-rocky9
2025/06/20
2.3.3-debian12/-ubuntu22/-ml-ubuntu22/-rocky9
2025/06/09
Apache Atlas
initialization action
2.2.0 2.2.0 2.2.0 2.2.0 2.2.0
Apache Flink
optional component
1.17.0 1.17.0 1.17.0 1.17.0 1.17.0
Apache Hadoop
installed
3.3.6 3.3.6 3.3.6 3.3.6 3.3.6
Apache Hive
installed
3.1.3 3.1.3 3.1.3 3.1.3 3.1.3
Apache Hive WebHCat
optional component
3.1.3 3.1.3 3.1.3 3.1.3 3.1.3
Apache Hudi
optional component
0.15.0 0.15.0 0.15.0 0.15.0 0.15.0
Apache Iceberg
optional component
1.6.1 1.6.1 1.6.1 1.6.1 1.6.1
Apache Kafka
initialization action
3.1.0 3.1.0 3.1.0 3.1.0 3.1.0
Apache Pig
optional component
0.18.0-SNAPSHOT 0.18.0-SNAPSHOT 0.18.0-SNAPSHOT 0.18.0-SNAPSHOT 0.18.0-SNAPSHOT
Apache Spark
installed
3.5.3 3.5.3 3.5.3 3.5.3 3.5.3
Apache Sqoop
initialization action
1.5.0-SNAPSHOT 1.5.0-SNAPSHOT 1.5.0-SNAPSHOT 1.5.0-SNAPSHOT 1.5.0-SNAPSHOT
Apache Tez
installed
0.10.2 0.10.2 0.10.2 0.10.2 0.10.2
BigQuery Connector
installed
0.42.3 0.42.3 0.42.3 0.42.3 0.42.3
Cloud Storage Connector
installed
3.1.0 3.1.0 3.1.0 3.1.0 3.1.0
Conscrypt
installed
2.5.2 2.5.2 2.5.2 2.5.2 2.5.2
Delta Lake
optional component
3.2.0 3.2.0 3.2.0 3.2.0 3.2.0
Docker
optional component
28.1 28.1 28.1 28.1 28.1
Hue
initialization action
4.11.0 4.11.0 4.11.0 4.11.0 4.11.0
Java
installed
11 11 11 11 11
JupyterLab Notebook
optional component
3.6 3.6 3.6 3.6 3.6
Oozie
initialization action
5.2.1 5.2.1 5.2.1 5.2.1 5.2.1
Python
installed
micromamba 2.0.5 with Python 3.11 micromamba 2.0.5 with Python 3.11 micromamba 2.0.5 with Python 3.11 micromamba 2.0.5 with Python 3.11 micromamba 2.0.5 with Python 3.11
R
installed
R 4.3 R 4.3 R 4.3 R 4.3 R 4.3
Ranger
optional component
2.4.0 2.4.0 2.4.0 2.4.0 2.4.0
Scala
installed
2.12.18 2.12.18 2.12.18 2.12.18 2.12.18
Solr
optional component
9.4.1 9.4.1 9.4.1 9.4.1 9.4.1
Trino
optional component
432 432 432 432 432
Zeppelin Notebook
optional component
0.10.1 0.10.1 0.10.1 0.10.1 0.10.1
Zookeeper
optional component
3.9.3 3.9.3 3.9.3 3.9.3 3.9.3

Important changes in 2.3:

  • Version 2.3 is a lightweight image that contains only core components, reducing exposure to Common Vulnerabilities and Exposures (CVEs). For higher security compliance requirements, use the image version 2.3or later, when creating a Dataproc cluster.

  • If you choose to install optional components when creating a Dataproc cluster with 2.3 image, they will be downloaded and installed during cluster creation. This might increase the cluster startup time. To avoid this delay, you can create a custom image with the optional components pre-installed. This is achieved by running generate_custom_image.py with the --optional-components flag.

Notes:

  • The following are the optional components in 2.3 images:

    • Apache Flink
    • Apache Hive WebHCat
    • Apache Hudi
    • Apache Iceberg
    • Apache Pig
    • Delta Lake
    • Docker
    • JupyterLab Notebook
    • Ranger
    • Solr
    • Zeppelin Notebook
    • Zookeeper
  • yarn.nodemanager.recovery.enabled and HDFS Audit Logging are enabled by default in 2.3 images.

  • micromamba, instead of conda in previous image versions, is installed as part of the Python installation.

  • Docker and Zeppelin installation issues:

    • Installation fails if the cluster has no public internet access. As a workaround, create a cluster that uses a custom image with optional components pre-installed. You can do this by running generate_custom_image.py with the --optional-components flag.
    • Installation can fail if the cluster is pinned to an older sub-minor image version: Packages are installed on demand from public OSS repositories, and a package might not be available upstream to support the installation. As a workaround, create a cluster that uses a custom image with optional components pre-installed in the custom image. To do this, run generate_custom_image.py with the --optional-components flag.

Image version 2.3 machine learning (ML) components

The Dataproc 2.3-ml-ubuntu image extends the 2.3 base image with ML-specific software. It supports 2.3 image optional components and other 2.3 features, and adds the component versions listed in the following sections.

GPU-specific libraries

For Dataproc jobs that use GPU VMs, the following NVIDIA driver and libraries are available in the 2.3-ml-ubuntu image. You can use them to accomplish the following tasks:

  • Accelerate Spark batch workloads with the NVIDIA Spark Rapids library
  • Train machine learning workloads
  • Run distributed batch inference using Spark
Package Name Version
Spark Rapids 25.04.0
NVIDIA Driver Ubuntu 22.04 LTS Accelerated with NVIDIA driver version 570
CUDA 12.6.3
cublas 12.6.4
cusolver 11.7.1
cupti 12.6.80
cusparse 12.5.4
cuDNN 9.10.1
NCCL 2.27.5

XGBoost libraries

The following Maven package versions are available in 2.3-ml-ubuntu image to let you use XGBoost with Spark in Java or Scala.

Group ID Package Name Version
ml.dmlc xgboost4j-gpu_2.12 2.1.1
ml.dmlc xgboost4j-spark-gpu_2.12 2.1.1

Python libraries

The 2.3-ml-ubuntu image contains the following libraries, which support different stages in the ML lifecycle.

`2.3-ml-ubuntu` image Python libraries
Package Version
accelerate1.8.1
conda23.11.0
cookiecutter2.5.0
curl8.12.1
cython3.0.12
dask2023.12.1
datasets3.6.0
deepspeed0.17.2
delta-spark3.2.0
evaluate0.4.5
fastavro1.9.7
fastparquet2023.10.1
fiona1.10.0
gateway-provisioners[yarn]0.4.0
gcsfs2023.12.2.post1
google-auth-oauthlib1.2.2
google-cloud-aiplatform1.88.0
google-cloud-bigquery[pandas]3.31.0
google-cloud-bigquery-storage2.30.0
google-cloud-bigtable2.30.1
google-cloud-container2.56.1
google-cloud-datacatalog3.26.1
google-cloud-dataproc5.18.1
google-cloud-datastore2.21.0
google-cloud-language2.17.2
google-cloud-logging3.11.4
google-cloud-monitoring2.27.2
google-cloud-pubsub2.29.1
google-cloud-redis2.18.1
google-cloud-spanner3.53.0
google-cloud-speech2.32.0
google-cloud-storage2.19.0
google-cloud-texttospeech2.25.1
google-cloud-translate3.20.3
google-cloud-vision3.10.2
huggingface_hub0.33.1
httplib20.22.0
ipyparallel8.6.1
ipython-sql0.3.9
ipywidgets8.1.7
jupyter_contrib_nbextensions0.7.0
jupyter_http_over_ws0.0.8
jupyter_kernel_gateway2.5.2
jupyter_server1.24.0
jupyterhub4.1.6
jupyterlab3.6.8
jupyterlab-git0.44.0
jupyterlab_widgets3.0.15
koalas0.22.0
langchain0.3.26
lightgbm4.6.0
markdown3.5.2
matplotlib3.8.4
mlflow3.1.1
nbconvert7.14.2
nbdime3.2.1
nltk3.9.1
notebook6.5.7
numba0.58.1
numpy1.26.4
oauth2client4.1.3
onnx1.17.0
openblas0.3.25
opencv4.11.0
orc2.1.1
pandas2.1.4
pandas-profiling3.0.0
papermill2.4.0
pyarrow16.1.0
pydot2.0.0
pyhive0.7.0
pynvml12.0.0
pysal23.7
pytables3.9.2
python3.11
regex2023.12.25
requests2.32.2
requests-kerberos0.12.0
rtree1.1.0
scikit-image0.22.0
scikit-learn1.5.2
scipy1.11.4
seaborn0.13.2
sentence-transformers5.0.0
setuptools79.0.1
shap0.48.0
shapely2.1.1
spacy3.8.7
spark-tensorflow-distributor1.0.0
spyder5.5.6
sqlalchemy2.0.41
sympy1.13.3
tensorflow2.18.0
tokenizers0.21.4.dev0
toree0.5.0
torch2.6.0
torch-model-archiver0.11.1
torcheval0.0.7
tornado6.4.2
torchvision0.21.0
traitlets5.14.3
transformers4.53.1
uritemplate4.1.1
virtualenv20.26.6
wordcloud1.9.4
xgboost2.1.4

R libraries

The following R library versions are included in 2.3-ml-ubuntu image.

`2.3-ml-ubuntu` image R libraries
Package Name Version
r-ggplot2 3.4.4
r-irkernel 1.3.2
r-rcurl 1.98-1.16
r-recommended 4.3