Skip to content

Dataset Downloads

This page is split between two download paths:

  • Raw upstream data: original GLORYS, OSTIA, sea-level, SSS, EN4/ARGO, and land mask inputs used to build DepthDif training stores.
  • Packaged DepthDif datasets: prebuilt folders or archives that can be downloaded without rebuilding every intermediate from the upstream sources.

Use Data Sources for native product details and
Data Export for conversion into trainable stores.

All examples assume the project environment is available at
/work/envs/depth. Raw NetCDF data is stored under:

/data1/datasets/depth_v2

Raw and aligned intermediate products are normally stored under /data1/datasets/depth_v2. The model-ready packaged dataset used by the GeoTIFF dataloader is normally stored under the Hugging Face repo name:

/work/data/OceanVariableReconstruction

Raw Upstream Data

Raw download scripts write one CSV log in each output directory. Set
DRY_RUN_ONLY=1 to query availability without downloading files.

Copernicus Marine Inputs

GLORYS, OSTIA, sea-level, and SSS files are downloaded with the copernicusmarine get CLI. The scripts first run a dry query for each date and
then download only matching NetCDF files.

GLORYS

Role: weekly 3D reanalysis target and salinity/source fields.

Provider: Copernicus Marine Service.

Product family: Global Ocean Physics Reanalysis / GLORYS12V1. The weekly helper
downloads one daily file every 7 days because the catalogue exposes daily and
monthly streams, not a dedicated weekly stream.

Dataset candidates used by the script:

  • cmems_mod_glo_phy_my_0.083deg_P1D-m
  • cmems_mod_glo_phy_my_0.083deg_P1D-m_202311
  • global-reanalysis-phy-001-030-daily
START_DATE=2010-01-01 END_DATE=2024-07-31 STEP_DAYS=7 \
  src/depth_recon/data/dataset_creation/data_download_raw/get_glorys/download_glorys_weekly.sh \
  /data1/datasets/depth_v2/glorys_weekly

Daily and monthly helpers also exist for diagnostics or alternate exports:

  • src/depth_recon/data/dataset_creation/data_download_raw/get_glorys/download_glorys_daily.sh
  • src/depth_recon/data/dataset_creation/data_download_raw/get_glorys/download_glorys_monthly.sh

OSTIA

Role: daily sea-surface temperature input.

Provider: Copernicus Marine Service / UKMO OSTIA stream.

Product ID: SST_GLO_SST_L4_REP_OBSERVATIONS_010_011.

Dataset candidates used by the script:

  • METOFFICE-GLO-SST-L4-REP-OBS-SST
  • METOFFICE-GLO-SST-L4-REP-OBS-SST-V2
  • SST_GLO_SST_L4_REP_OBSERVATIONS_010_011
START_DATE=2010-01-01 END_DATE=2024-07-31 \
  src/depth_recon/data/dataset_creation/data_download_raw/get_ostia/download_ostia.sh \
  /data1/datasets/depth_v2/ostia

Sea Level

Role: daily auxiliary sea-surface height field, currently exported as adt.

Provider: Copernicus Marine Service.

Product ID: SEALEVEL_GLO_PHY_L4_MY_008_047.

Dataset ID:
cmems_obs-sl_glo_phy-ssh_my_allsat-l4-duacs-0.125deg_P1D.

START_DATE=2010-01-01 END_DATE=2024-07-31 \
  src/depth_recon/data/dataset_creation/data_download_raw/get_sealevel/download_sealevel_daily.sh \
  /data1/datasets/depth_v2/sealevel_daily

Sea-Surface Salinity

Role: daily auxiliary sea-surface salinity, density, and ice fields.

Provider: Copernicus Marine Service / CNR.

Product ID: MULTIOBS_GLO_PHY_S_SURFACE_MYNRT_015_013.

Default dataset ID: cmems_obs-mob_glo_phy-sss_my_multi_P1D.

START_DATE=2010-01-01 END_DATE=2024-07-31 \
  src/depth_recon/data/dataset_creation/data_download_raw/get_sss/download_sss_daily.sh \
  /data1/datasets/depth_v2/sss_daily

EN4 / ARGO Profiles

Role: sparse in-situ temperature and salinity profiles.

Provider: UK Met Office Hadley Centre EN4.

The helper downloads yearly EN4 profile ZIP archives from:

https://www.metoffice.gov.uk/hadobs/en4/data/en4-2-1

Expected archive names follow:

EN.4.2.2.profiles.g10.YYYY.zip
START_YEAR=2010 END_YEAR=2025 \
  src/depth_recon/data/dataset_creation/data_download_raw/get_argo/download_en4_profiles.sh \
  /data1/datasets/depth_v2/en4_profiles

Land Mask

The patch grid and GeoTIFF raster export use a global 0.1 degree land mask:

src/depth_recon/data/dataset_creation/data_download_raw/get_world/world_land_mask_glorys_0p1.tif

It is derived by downloading a world GeoJSON file and rasterizing it to the
GLORYS-style global grid where 1=land and 0=water:

/work/envs/depth/bin/python \
  src/depth_recon/data/dataset_creation/data_download_raw/get_world/download_manipulate_world_file.py \
  --overwrite

The default GeoJSON source URL is stored in the script and can be overridden
with --source-url.

Packaged DepthDif Datasets

Packaged dataset downloaders live under
src/depth_recon/data/dataset_creation/data_download_packaged/. Hosted dataset
URLs are configured in
src/depth_recon/data/dataset_creation/data_download_packaged/dataset_links.yaml.

Use packaged downloads when you want the prepared DepthDif artifacts directly instead of reconstructing them from the raw upstream products. The aligned ARGO downloader downloads Hugging Face package files into --output-dir, reusing existing files unless --force-download is passed. Pass --overwrite when existing package files should be replaced. Legacy zip links are still extracted into --output-dir.

Aligned ARGO Zarr

Role: prealigned sparse ARGO/EN4 profile store, packaged as a Hugging Face dataset folder with data/argo_glors_ostia_ssh.zarr and Parquet indices. This package includes GLORYS, OSTIA, sea-level, and SSS profile-context variables.

Configured link key: argo_aligned.

/work/envs/depth/bin/python -m depth_recon.data.dataset_creation.data_download_packaged.download_aligned_argo_zarr \
  --output-dir /data1/datasets/depth_v2/aligned_argo/hf_argo_glors_ostia_ssh

The zarr path produced by this download is:

/data1/datasets/depth_v2/aligned_argo/hf_argo_glors_ostia_ssh/data/argo_glors_ostia_ssh.zarr

Pass that path to --enriched-argo-zarr when exporting the GeoTIFF training dataset from the packaged copy.

Exported GeoTIFF Training Dataset

Role: exported DepthDif training dataset with aligned dense rasters and gridded
ARGO profile data, packaged as a zip archive.

Configured link key: depthdif_training.

/work/envs/depth/bin/python -m depth_recon.data.dataset_creation.data_download_packaged.download_exported_geotiff_dataset \
  --output-dir /work/data/depthdif/geotiff_export