Tests¶
This page documents the repository test coverage added around diffusion math, dataset behavior, config wiring, and one-batch model smoke runs.
How To Run¶
Run the full local test suite with the repository Python environment:
/work/envs/depth/bin/python -m unittest discover -s tests -p 'test_*.py' -v
Run one test file:
/work/envs/depth/bin/python -m unittest tests.test_diffusion_math -v
Run one specific test:
/work/envs/depth/bin/python -m unittest tests.test_model_dry_runs.TestModelDryRuns.test_pixel_diffusion_completes_one_training_batch -v
Test Files¶
tests/test_diffusion_math.py¶
This file checks the core math and masking behavior in the diffusion stack.
-
test_beta_schedule_variants_match_their_implementationsConfirms thatlinear,quadratic,sigmoid, andcosineschedule selection returns the expected beta tensors. Why it matters: schedule mistakes silently change the entire forward/reverse diffusion process and can invalidate training comparisons. -
test_beta_schedule_rejects_invalid_ranges_and_unknown_variantsVerifies that invalid beta ranges and unknown schedule names fail loudly. Why it matters: bad config values should fail early instead of producing unstable training. -
test_build_valid_mask_aligns_channels_and_inverts_missing_modeChecks mask broadcasting and theobservedvsmissinginversion logic. Why it matters: supervision masks are used in multiple places, and shape or polarity errors would corrupt both training and inference behavior. -
test_build_ambient_further_valid_mask_keeps_subset_and_respects_minimum_pixelsVerifies that ambient further-corruption only removes observed pixels and still keeps a minimum amount of supervision. Why it matters: the ambient objective becomes degenerate if the further mask can remove everything. -
test_build_task_supervision_mask_switches_between_standard_and_ambient_targetsConfirms that standard mode supervisesy_valid_mask, while ambient mode supervisesx_valid_mask ∩ y_valid_mask. Why it matters: this is the central difference between the normal and ambient objectives. -
test_p_loss_averages_only_over_supervised_ocean_pixelsChecks the masked MSE calculation, including the land/ocean gate. Why it matters: this verifies that the loss is actually reduced only over intended pixels. -
test_p_loss_returns_zero_when_mask_selects_nothingVerifies the guarded zero-loss path when the supervision mask is empty. Why it matters: batches with no valid supervised support should not produce NaNs or crash. -
test_p_loss_applies_ambient_further_mask_to_the_noisy_branchConfirms that the optional ambient noisy-branch corruption is applied to the denoiser input. Why it matters: this is an objective-defining behavior, not just a logging detail.
tests/test_datasets_and_wiring.py¶
This file covers fake-data dataset loading, datamodule behavior, and config plumbing.
-
test_surface_temp_4band_dataset_builds_masks_coords_and_datesLoads a minimal fake.npysample and checksx,y,x_valid_mask,y_valid_mask,land_mask,coords, and parsed date. Why it matters: it validates the base light-dataset contract that training depends on. -
test_surface_temp_4band_dataset_can_hide_every_x_pixelUses maximal corruption and checks that allxsupport can be hidden whileyremains valid. Why it matters: sparse-input training should still behave correctly at extreme corruption levels. -
test_surface_temp_ostia_dataset_resamples_eo_and_zeroes_land_pixelsVerifies OSTIA EO resizing and land masking. Why it matters: EO conditioning must match target resolution and should not leak values onto invalid land regions. -
test_ostia_argo_tiff_dataset_synthetic_mode_rebuilds_sparse_xBuilds fake OSTIA/Argo/GLORYS GeoTIFFs and checks synthetic sparse-input generation from GLORYS support. Why it matters: this is the fake-data path that allows testing and training smoke runs on machines without the real local dataset. -
test_datamodule_split_is_deterministic_and_loader_settings_are_appliedChecks deterministic train/val splitting and dataloader options such as batch size and shuffle behavior. Why it matters: reproducibility and config wiring are both easy to break here. -
test_config_override_helpers_and_dataset_builder_use_nested_settingsVerifies CLI-style nested config overrides and thatbuild_dataset()applies nested dataset settings correctly. Why it matters: this protects the training entrypoint from silently ignoring user configuration.
tests/test_model_dry_runs.py¶
This file exercises one-batch model execution and config-based instantiation.
-
test_pixel_diffusion_from_config_wires_nested_settingsInstantiates the pixel diffusion model from YAML files and checks scheduler, ambient, sampler, blur, and noise settings. Why it matters: config wiring bugs can otherwise look like model-performance issues. -
test_pixel_training_step_uses_standard_target_and_passes_land_maskConfirms that the standard training path supervises the correct target and forwardsland_maskintop_loss. Why it matters: this protects the masked-loss implementation and the standard objective. -
test_pixel_validation_step_uses_ambient_target_and_intersection_maskConfirms that ambient validation uses thextarget, the correct supervision mask, and the further-valid mask. Why it matters: ambient mode has different semantics and needs separate protection. -
test_pixel_diffusion_completes_one_training_batchRuns a real Lightningfit()call for one batch with synthetic data. Why it matters: this is the practical smoke test that catches integration failures across datamodule, model, optimizer, and validation hooks. -
test_latent_diffusion_completes_one_training_batchRuns one synthetic latent-diffusion training batch. Why it matters: latent mode has different mask/channel behavior and must remain runnable end-to-end. -
test_autoencoder_lightning_completes_one_training_batchRuns one autoencoder training batch. Why it matters: latent diffusion depends on this component behaving correctly. -
test_autoencoder_from_configs_wires_loss_and_scheduler_settingsChecks that AE loss weights and scheduler options are loaded from config files. Why it matters: AE training should be reproducible from configs just like the diffusion model.
tests/test_dataset_ostia_argo_save_to_disk.py¶
This file covers the older OSTIA/Argo export path.
-
Alignment tests check interpolation onto GLORYS depth levels, duplicate-depth collapse, and the shallow-depth floor behavior. Why it matters: depth alignment is fundamental to producing consistent full-stack targets.
-
test_getitem_returns_glorys_aligned_channel_shapesVerifies that exported samples use full GLORYS-aligned band layouts and expose the expected info metadata. Why it matters: downstream disk export assumes strict shape alignment. -
Save/export tests cover skipping existing exports, detecting partial exports, overwriting exports, and GeoTIFF metadata writing. Why it matters: the disk export path is operational infrastructure, and metadata regressions would break later dataset consumption.
What These Tests Protect¶
At a high level, the suite is intended to catch four classes of regressions:
-
Diffusion math regressions: bad schedules, wrong target selection, incorrect ambient corruption, or masked-loss mistakes.
-
Dataset contract regressions: wrong shapes, invalid-mask semantics, broken synthetic sparse-input generation, or bad date/coord parsing.
-
Config wiring regressions: settings defined in YAML but not actually applied at runtime.
-
End-to-end integration regressions: models that instantiate but fail to complete even one real batch.
Notes¶
- The tests use
unittest, notpytest, because the pinned repository environment does not includepytest. - The dry-run tests intentionally use fake data and tiny models so they can run without the production dataset.
- The Lightning smoke tests may print warnings about disabled loggers or low dataloader worker counts. Those warnings are expected in these minimal test runs.