Skip to content

Dataset Statistics

These statistics were measured from the local packaged dataset root at /work/data/OceanVariableReconstruction. The page is split into three parts:

  1. the exported enriched ARGO Zarr profile dataset
  2. the Hugging Face package layout for that same ARGO Zarr
  3. the saved GeoTIFF ML dataset used by the patch dataloader

Note: the exporter now also writes SSS fields. The measured counts below predate a full local re-export with those fields unless the corresponding Zarr or manifest timestamp has been regenerated after the SSS change. Schema tables list the current expected outputs.

1. Exported ARGO Zarr Dataset

Path:

/data1/datasets/depth_v2/aligned_argo/enriched_argo_profiles.zarr

This Zarr is the profile-level export. Each row is one EN4/ARGO profile that passed the date and coordinate filters. ARGO values are projected onto the GLORYS depth axis, and GLORYS, OSTIA, sea-level, and SSS fields are sampled at the same profile location and date.

The export was created at 2026-05-10T17:54:01+00:00 for the requested date range 20100101 to 20240731.

Summary

Item Value
Profile rows 6,637,381
GLORYS depth levels 50
Profile dates 20100101 to 20240731
Unique profile dates 5,142
Source EN4/ARGO files 169
Profiles with valid temperature 6,609,159
Profiles with valid potential temperature 6,609,159
Profiles with valid salinity 5,394,516
Valid temperature depth points 156,319,955
Valid potential-temperature depth points 156,319,955
Valid salinity depth points 133,248,365

Source Products

Source Files First file/date Last file/date
EN4/ARGO profiles 169 EN.4.2.2.f.profiles.g10.201001.nc EN.4.2.2.f.profiles.g10.202407.nc
GLORYS 843 20100101 20260220
OSTIA 5,326 20100101 20240731
Sea level 5,326 20100101 20240731
SSS 5,326 expected after full re-export 20100101 20240731

Dimensions

Dimension Meaning Count
profile One ARGO profile location/date. 6,637,381
glorys_depth Native GLORYS depth axis used for vertical alignment. 50

Variables

Variables with shape profile x glorys_depth store one value per ARGO profile and GLORYS depth level. Variables with shape profile store one value per profile.

Variable Explanation Shape Type
latitude Profile latitude in degrees north. profile float64
longitude Profile longitude in degrees east. profile float64
profile_date Profile observation date as YYYYMMDD. profile int64
profile_juld Original EN4/ARGO Julian day timestamp. profile float64
profile_idx Profile index in the exported Zarr. profile int64
profile_source_file Source EN4/ARGO file name. profile <U33
valid_observed_depth_count Number of valid observed depth levels in the source profile. profile int64
argo_temp_on_glorys_depth ARGO in-situ temperature interpolated to GLORYS depth levels. profile x glorys_depth float32
argo_temp_valid_on_glorys_depth True where interpolated ARGO temperature is valid. profile x glorys_depth bool
argo_temp_qc_on_glorys_depth ARGO temperature QC code on GLORYS depth levels. profile x glorys_depth int8
argo_potm_on_glorys_depth ARGO potential temperature interpolated to GLORYS depth levels. profile x glorys_depth float32
argo_potm_valid_on_glorys_depth True where interpolated ARGO potential temperature is valid. profile x glorys_depth bool
argo_potm_qc_on_glorys_depth ARGO potential-temperature QC code on GLORYS depth levels. profile x glorys_depth int8
argo_psal_on_glorys_depth ARGO practical salinity interpolated to GLORYS depth levels. profile x glorys_depth float32
argo_psal_valid_on_glorys_depth True where interpolated ARGO salinity is valid. profile x glorys_depth bool
argo_psal_qc_on_glorys_depth ARGO salinity QC code on GLORYS depth levels. profile x glorys_depth int8
argo_depth_qc_on_glorys_depth ARGO depth QC code projected to GLORYS depth levels. profile x glorys_depth int8
argo_juld_qc Whole-profile date/time QC code. profile int8
argo_position_qc Whole-profile position QC code. profile int8
argo_profile_depth_qc Whole-profile depth QC code. profile int8
argo_profile_potm_qc Whole-profile potential-temperature QC code. profile int8
argo_profile_psal_qc Whole-profile salinity QC code. profile int8
glorys_thetao GLORYS sea-water potential temperature sampled at the profile point. profile x glorys_depth float32
glorys_so GLORYS salinity sampled at the profile point. profile x glorys_depth float32
glorys_uo GLORYS eastward sea-water velocity sampled at the profile point. profile x glorys_depth float32
glorys_vo GLORYS northward sea-water velocity sampled at the profile point. profile x glorys_depth float32
glorys_zos GLORYS sea-surface height sampled at the profile point. profile float32
glorys_mlotst GLORYS mixed-layer thickness sampled at the profile point. profile float32
glorys_bottomT GLORYS sea-floor potential temperature sampled at the profile point. profile float32
glorys_sithick GLORYS sea-ice thickness sampled at the profile point. profile float32
glorys_siconc GLORYS sea-ice area fraction sampled at the profile point. profile float32
glorys_usi GLORYS eastward sea-ice velocity sampled at the profile point. profile float32
glorys_vsi GLORYS northward sea-ice velocity sampled at the profile point. profile float32
glorys_temporal_status Status of GLORYS temporal matching. profile int8
ostia_analysed_sst OSTIA analysed sea-surface temperature sampled at the profile point. profile float32
ostia_analysis_error OSTIA SST analysis error sampled at the profile point. profile float32
ostia_sea_ice_fraction OSTIA sea-ice fraction sampled at the profile point. profile float32
ostia_mask OSTIA categorical mask sampled at the profile point. profile float32
ostia_temporal_status Status of OSTIA temporal matching. profile int8
sealevel_sla Sea-level anomaly sampled at the profile point. profile float32
sealevel_err_sla Sea-level anomaly formal mapping error. profile float32
sealevel_adt Absolute dynamic topography sampled at the profile point. profile float32
sealevel_ugosa Eastward geostrophic velocity anomaly. profile float32
sealevel_err_ugosa Formal mapping error for eastward velocity anomaly. profile float32
sealevel_vgosa Northward geostrophic velocity anomaly. profile float32
sealevel_err_vgosa Formal mapping error for northward velocity anomaly. profile float32
sealevel_ugos Absolute eastward geostrophic velocity. profile float32
sealevel_vgos Absolute northward geostrophic velocity. profile float32
sealevel_flag_ice Sea-level product ice flag. profile float32
sealevel_tpa_correction TOPEX-A instrumental drift correction field. profile float32
sealevel_temporal_status Status of sea-level temporal matching. profile int8
sss_sos SSS analysed sea-surface salinity sampled at the profile point. profile float32
sss_dos SSS analysed sea-surface density sampled at the profile point. profile float32
sss_sea_ice_fraction SSS sea-ice fraction sampled at the profile point. profile float32
sss_temporal_status Status of SSS temporal matching. profile int8

The raw SSS error variables sos_error and dos_error are intentionally not exported to the enriched ARGO Zarr or the Hugging Face package. The SSS component in the export product is sss_sos, sss_dos, sss_sea_ice_fraction, and sss_temporal_status. Temporal status fields describe how the exporter chose the source file for each profile date; they do not describe spatial missingness, land masks, ice masks, or source QC flags.

Code Meaning Interpretation
0 nearest_or_exact The exporter found an exact source date, or the target date was inside the source time range and the nearest available file was used. This is the normal status.
1 nearest_edge The target date was outside the available source time range, so the exporter used the first or last available source file. Treat this as an edge extrapolation warning.
2 missing No usable source file existed for that modality, so the sampled values for that source were written as missing values.

2. Hugging Face Aligned ARGO Package

Path:

/data1/datasets/depth_v2/aligned_argo/hf_argo_glors_ostia_ssh

The Hugging Face package is a documentation and indexing layer around the same enriched ARGO zarr schema listed above. Its main data store is:

data/argo_glors_ostia_ssh.zarr

Package files:

Path Role
README.md Dataset card and quick-start notes.
LICENSE Dataset packaging license notice.
data/argo_glors_ostia_ssh.zarr/ Unchanged enriched ARGO zarr, including GLORYS, OSTIA, sea-level, and SSS profile-context variables.
indices/profiles.parquet One row per profile with scalar metadata, coordinates, dates, and valid-depth counts.
indices/variables.parquet Variable-level metadata generated from the zarr.
examples/ Minimal xarray and Parquet loading examples.
metadata/ Dataset description, citation, and STAC item metadata.

The GeoTIFF exporter can use the packaged zarr directly as --enriched-argo-zarr /data1/datasets/depth_v2/aligned_argo/hf_argo_glors_ostia_ssh/data/argo_glors_ostia_ssh.zarr.

3. Saved GeoTIFF ML Dataset

Path:

/work/data/OceanVariableReconstruction

This is the model-ready dataset. It stores dense fields as aligned GeoTIFF rasters and uses a compact, grid-indexed ARGO Zarr store for the patch dataloader. The active config is src/depth_recon/configs/px_space/training_super_config.yaml.

The GeoTIFF export manifest was created at 2026-05-12T11:28:47+00:00 and covers weekly target dates from 20100101 to 20240726.

Summary

Item Value
Target dates (weeks) 761
Grid size 3600 x 1800 pixels
Grid resolution 0.1 degrees
CRS EPSG:4326
GLORYS depth levels 50
Compact ARGO profiles 6,608,517
Compact ARGO profiles with valid temperature 6,608,321
Compact ARGO profiles with valid salinity 5,393,686
Valid compact ARGO temperature depth points 156,290,112
Valid compact ARGO salinity depth points 133,218,721

Compact ARGO Profile Store

Path:

/work/data/OceanVariableReconstruction/argo/argo_profiles_on_grid.zarr

This store is smaller than the enriched ARGO export because it keeps only the profile fields needed by the GeoTIFF patch loader.

Dimension Meaning Count
profile ARGO profile assigned to the GeoTIFF grid. 6,608,517
glorys_depth GLORYS depth level. 50
Variable Explanation Shape Type
argo_temp_kelvin_uint8 ARGO temperature values, quantized in Kelvin. profile x glorys_depth uint8
argo_temp_valid True where the temperature value is valid. profile x glorys_depth bool
argo_psal_uint8 ARGO practical salinity values, quantized. profile x glorys_depth uint8
argo_psal_valid True where the salinity value is valid. profile x glorys_depth bool
profile_date Date of the original ARGO profile observation. profile int32
target_date Weekly GLORYS target date assigned to the profile. profile int32
latitude Profile latitude in degrees north. profile float32
longitude Profile longitude in degrees east. profile float32
grid_row Row index of the nearest GeoTIFF grid cell. profile int32
grid_col Column index of the nearest GeoTIFF grid cell. profile int32
profile_source_file Source EN4/ARGO file name. profile <U33
source_profile_idx Profile index inside the source file. profile int32
Field First Last Unique target dates
ARGO profile dates 20100101 20240729 -
Assigned target dates 20100101 20240726 736

GeoTIFF Rasters

All raster files share the same grid: 3600 x 1800 pixels, EPSG:4326, stored as uint8.

Modality Variable Files Bands per file First date Last date
GLORYS thetao 761 50 20100101 20240726
GLORYS so 761 50 20100101 20240726
OSTIA analysed_sst 761 1 20100101 20240726
Sea level adt 761 1 20100101 20240726
SSS sos 761 expected after full re-export 1 20100101 20240726
SSS dos 761 expected after full re-export 1 20100101 20240726

Patch Dataset

The active GeoTIFF loader uses 128 x 128 pixel patches with a 32-pixel stride. At 0.1 degrees per pixel, each patch is 12.8 x 12.8 degrees and neighboring patch starts are 3.2 degrees apart.

Split Rows Unique spatial patches Dates Rows with ARGO
all 2,699,267 3,547 761 2,384,535
train 2,214,599 3,544 684 2,214,599
val 169,936 3,391 52 169,936

The configured split uses 2018 as validation year. Training and validation rows require at least one valid ARGO temperature profile; the all split keeps patch/date rows even when no ARGO profile is present.

ARGO profile support per patch/date row

Each value is the number of ARGO profiles inside one spatial patch for one weekly date. all has a minimum of 0 because it keeps patch/date rows without ARGO support; train and val require at least one valid ARGO temperature profile, so their minimum is 1.

Split Min Median Mean Max
all 0 13 25.32 8,064
train 1 15 28.24 8,064
val 1 16 34.19 4,580

Land fraction per selected patch/date row:

Split Min Median Mean Max
all 0.0000 0.0001 0.0469 0.5994
train 0.0000 0.0001 0.0420 0.5994
val 0.0000 0.0001 0.0430 0.5994

Overlap

Coverage multiplicity is the number of selected spatial patches covering a grid pixel before the date dimension is applied.

Statistic Value
Patch size 128 px / 12.8 degrees
Patch stride 32 px / 3.2 degrees
Nominal overlap per axis 96 px / 9.6 degrees
Nominal overlap fraction per axis 75%
Selected spatial patches 3,547
Covered grid pixels 4,883,456
Coverage multiplicity min 1
Coverage multiplicity median 15
Coverage multiplicity mean 11.90
Coverage multiplicity max 20