Ensemble read and plot updates by ukmo-huw-lewis · Pull Request #2056 · MetOffice/CSET

ukmo-huw-lewis · 2026-04-15T11:04:49Z

Contribution checklist

Aim to have all relevant checks ticked off before merging. See the developer's guide for more detail.

Documentation has been updated to reflect change.
New code has tests, and affected old tests have been updated.
All tests and CI checks pass.
Ensured the pull request title is descriptive.
Ensure rose-suite.conf.example has been updated if new diagnostic added.
Conda lock files have been updated if dependencies have changed.
Attributed any Generative AI, such as GitHub Copilot, used in this PR.
Marked the PR as ready to review.

ukmo-huw-lewis · 2026-04-15T11:26:44Z

Addresses #1972.

PART 1:
New function _merge_cubes_check_ensemble(cubes) attempts to merge input cubes.
Identifying ensemble inputs takes advantage of current issue that ensemble inputs do not merge, now that we set realization coordinate to 0 in instances (most inputs) where realization is not first set in input file. In this case, we now increment realization coordinate and then attempt to successfully merge cube.

This avoids the previous error iris.exceptions.DuplicateDataError: failed to merge into a single cube. as now realization coord is varying across an otherwise common cubelist.

New tests have been added to demonstrate expected behaviour.

PART 2:
Have also used this issue/PR to introduce some additional generalisations to ensemble plotting. This is anticipating example use-case in #2032 (i.e. where input stamp_coordinate may not always be "realization", and where number of ensembles to plot is not always a square grid).

Example outputs:

surface_spatial_plot with ensemble inputs (note here titles pick up "Realization" coord name rather than "Member" default)

cset --verbose bake -r ../src/CSET/recipes/surface_fields/generic_surface_spatial_plot_sequence.yaml --input-dir "${input_dir}" --output-dir ${output_dir} --VARNAME=${VARNAME} --MODEL_NAME="${model_name}" --METHOD="MEAN" --SUBAREA_TYPE=${SUBAREA_TYPE} --SUBAREA_EXTENT=${SUBAREA_EXTENT}

spatial_difference with 2 sets of ensemble inputs. Note some precision needed on how to read in 2 sets of ensembles using wildcards, to read in 2 sets of * inputs rather than risk seeing data as Nensemble x Nmodel path set.

input_dir=<path_to_data>/LFRic/20240121T0000Z*em*diagnostics_000_006.nc
input_dir2=<path_to_data>/UM/20240121T0000Z*em*pdiaga000.pp
cset --verbose bake -r ../src/CSET/recipes/surface_fields/surface_spatial_difference.yaml --input-dir "${input_dir}" "${input_dir2}" --output-dir ${output_dir} --VARNAME=${VARNAME} --BASE_MODEL="${model_name}" --OTHER_MODEL="${model_name2}" --METHOD="" --SUBAREA_TYPE=${SUBAREA_TYPE} --SUBAREA_EXTENT=${SUBAREA_EXTENT}

time_series

cset --verbose bake -r ../src/CSET/recipes/surface_fields/generic_surface_domain_mean_time_series.yaml --input-dir "${input_dir}" "${input_dir2}" --output-dir ${output_dir} --VARNAME=${VARNAME} --MODEL_NAME="['${model_name}', '${model_name2}']" --METHOD="MEAN"

histogram

cset --verbose bake -r ../src/CSET/recipes/surface_fields/generic_surface_histogram_series.yaml --input-dir "${input_dir}" --output-dir ${output_dir} --VARNAME=${VARNAME} --MODEL_NAME="${model_name}" --METHOD="MEAN" --SUBAREA_TYPE=${SUBAREA_TYPE} --SUBAREA_EXTENT=${SUBAREA_EXTENT} --SEQUENCE realization

``` cset --verbose bake -r ../src/CSET/recipes/surface_fields/generic_surface_histogram_series.yaml --input-dir "${input_dir}" --output-dir ${output_dir} --VARNAME=${VARNAME} --MODEL_NAME="${model_name}" --METHOD="MEAN" --SUBAREA_TYPE=${SUBAREA_TYPE} --SUBAREA_EXTENT=${SUBAREA_EXTENT} --SEQUENCE time ```

ukmo-huw-lewis · 2026-04-15T11:47:53Z

Marking as ready for review.

github-actions · 2026-04-15T12:13:58Z

Stephen Gallagher (SGallagherMet) · 2026-04-15T13:05:31Z

+        cubes = cubes.merge()
+    except iris.exceptions.MergeError:
+        for ir, cube in enumerate(cubes):
+            cube.coord("realization").points = ir + 1


This is a very blunt fallback mechanism. It renumbers ensembles without any warning which could make tracing results back to a specific model run confusing.
I don't think we should be arbitrarily renumbering ensemble members at all, but if it's unavoidable we should at least log a warning, I'd also be tempted to say that if CSET is running inside a workflow we should even generate a cylc message so it shows up in the gui.

Thanks.

Agree on room for additional output messaging here and will action.

Also agree this is a blunt fallback, but it does offer a bug fix. Note in previous iterations when ensemble support was working, we relied on em?? identifiers in filename (I think) to help determine ensemble numbers, so arguably just as blunt.

For instances where input files do already have a realization coord, note there is no impact of this change (see testing), so this is aiming to catch instances where ensemble outputs do not have a realization coord defined in inputs (turns out to be quite a lot of our use-cases though!).

Not sure if this helps you, but would currently argue this seems like a practical response to preserve functionality when members not identifiable by other ways.

I think I'm still a little confused about what case this is handling. Can you provide a concrete example of how multiple cubes with the same ensemble member have been generated?
The only way I know of is with time-lagging where you end up with multiple control members, in which case I would argue that only the problem cubes should be modified.

See original issue log #1972 - currently ensemble loading not working, and the cube.merge() call this replaces returns error
iris.exceptions.DuplicateDataError: failed to merge into a single cube.

Tests highlight variety of cases:
a) test_read_cubes_ensemble_with_realization_coord(): input example file contains all ensemble members in output file, so iris.load already gives cube with e.g. dims [realization, time, lat, lon]. This change has no impact.
b) test_read_cubes_ensemble_separate_files: reads ensemble members from separate files, but these have different realization coords within the file, so iris.load returns a Cubelist with different realization scalar coords, which can then merge nicely (adds N-dimension realization coord to the merged cube). This change has no impact.
c) test_read_cubes_ensemble_without_realization_coord: here I took copies of the test ensemble files and removed realization information from the file. This is equivalent to the examples which led to the #1972 errors being flagged. Here there is no information on ensemble member numbers, so this fallback enables the ensemble inputs to be processed.

Tests test_read_cubes_merge_cubes_* run through some equivalent testing but at level of direct calls to this function rather than higher-level call to read.read_cubes.

Thanks for time on this.

Thanks, that helps. in that case your test for c) has actually identified a different issue. The current behaviour of assigning member 0 to any cubes that don't have realization information is insufficient. Personally I think that means extracting that information from the file name should be reconsidered.

For this PR, I would recommend a stricter check. Perhaps only apply this renumbering if all of the realization numbers are 0?

My experience of this in testing was that the assigning member 0 is effectively an initialisation callback, which creates a realization coord where one does not exist. This is helpful/necessary given much of the subsequent code assumes a valid realization code.

Proposing we update doc strings for that callback to indicate this is about creating and initialising the coord where not existing rather than specifically a "deterministic" thing.

We can't depend on that callback to do 'clever' things with ensemble numbering (to best of my bug-fixing anyhow) as the callback is applied to different instances from input files separately (i.e. different cube elements of the CubeList), so there is no awareness of extent of an ensemble input within that callback. For this reason we rely on two-stage a) callback on loading cube to ensure coord exists, b) checks in merge to address all zero once we come to merging loaded cubes/CubeList.

Have introduced slightly stricter check to only update where realization==0 (i.e. still at initialised value).
Suggest can look at filename-related numbering 'sometime' (i.e. in separate issue/PR), but equally think we need to support a flexibility of potential filenaming ecosystems with CSET, so makes sense to use the merge pass/fail to discriminate type of inputs.

Stephen Gallagher (SGallagherMet) · 2026-04-15T13:13:59Z

+    if len(stamp_coords) == 1:
+        stamp_coordinate = stamp_coords[0]
+    else:
+        stamp_coordinate = "realization"


Log a warning here.

Also, this seems like a bit of an odd fallback, what if the available coordinates are "member" and "pseudo_level"? I feel like this function should either raise an error if it can't find a unique stamp coordinate, or it should return a tuple of available stamp coordinates so routines that call it can decide whether they know how to handle multiple stamp coordinates or not.

Thanks for time on this, appreciated.

Happy to add log message.

For context note the current approach here is that we have stamp_coordinate="realization" being set in calls to functions as a default. In principle, users can set stamp_coordinate as a recipe-level input, but none of the generic recipes currently have this as set variable, and to me feels like more 'flexibility' than practically useful. So this change effectively introduces the ability to support something other than realization with existing recipe approach.

In case where len(stamp_coords) > 1, then we stick to effectively current default that stamp_coordinate=="realization" (line 362), preserving the current default behaviour I think.

Logging statements now added.

ensemble only if still initialised to zero.

David Flack (daflack)

Overall, from my "science review" perspective, it looks good and makes sense, and I have yet to find an ensemble that doesn't work. It's been working nicely with data downloaded from MASS that was failing before (3 members including the control).

The only things to bring up are:

(might be a different PR) the missing labels on the axes for the lat/lon lines
Member vs. Realization in plot titles: I appreciate that realization is the cf compliant name, but thinking of publication plots (when included) it is usually member as the title. It would be great to be cf compliant - but I think we should probably stick with member as that has somewhat become the norm.

ukmo-huw-lewis added 2 commits April 15, 2026 10:43

Update read.py to interpret ensemble inputs

ffda573

Updates for ensemble plotting and update tests

acab17d

ukmo-huw-lewis linked an issue Apr 15, 2026 that may be closed by this pull request

Ensemble loading not recognising member 0 #1972

Open

ukmo-huw-lewis added 2 commits April 15, 2026 12:33

Update function definition

56a6eb8

Fix missing grid_size, grid_rows instance for power spectra

38241bc

ukmo-huw-lewis requested review from Stephen Gallagher (SGallagherMet), David Flack (daflack) and James Frost (jfrost-mo) April 15, 2026 11:48

Attempt to fix test

6f637f1

ukmo-huw-lewis added bug Something isn't working enhancement New feature or request labels Apr 15, 2026

Stephen Gallagher (SGallagherMet) reviewed Apr 15, 2026

View reviewed changes

ukmo-huw-lewis and others added 4 commits April 15, 2026 14:19

Remove print typo

f58c777

Update documentation for intitialising realization cube and update

cd18115

ensemble only if still initialised to zero.

Additional logging info

1e6eac9

Merge branch 'main' into 1972-ensemble-loading-not-recognising-member-0

3f5c5e7

David Flack (daflack) reviewed Apr 16, 2026

View reviewed changes

Conversation

ukmo-huw-lewis commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Contribution checklist

Uh oh!

ukmo-huw-lewis commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ukmo-huw-lewis commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ukmo-huw-lewis Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

David Flack (daflack) left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ukmo-huw-lewis commented Apr 15, 2026 •

edited

Loading

ukmo-huw-lewis commented Apr 15, 2026 •

edited

Loading

ukmo-huw-lewis commented Apr 15, 2026 •

edited

Loading

github-actions bot commented Apr 15, 2026 •

edited

Loading

ukmo-huw-lewis Apr 15, 2026 •

edited

Loading