Skip to content

feat: Copy/archive input XML files into the output directory#4003

Draft
kdrienCG wants to merge 12 commits intoGEOS-DEV:developfrom
kdrienCG:feature/kdrienCG/archiveInputDeck
Draft

feat: Copy/archive input XML files into the output directory#4003
kdrienCG wants to merge 12 commits intoGEOS-DEV:developfrom
kdrienCG:feature/kdrienCG/archiveInputDeck

Conversation

@kdrienCG
Copy link

@kdrienCG kdrienCG commented Mar 20, 2026

This PR proposes to automatically archive the XML input deck to the output directory, so that every set of results is accompanied by the exact input that produced it.

When GEOS is run with an output directory specified (-o <dir>), all XML input files (and every files that are included via the <Included> XML tag) are copied into the output directory with a timestamp.


Given a run like:

geosx -i foo.xml -o OUTPUT/

The following is created:

.
└─ OUTPUT/   
   └─ archive_inputFiles/
      └─ 20260320_103034/
         ├─ foo.xml
         │
         ├─ include1.xml
         │
         └─ some_subdir/
            └─ include2.xml

When included files are "behind" the input tree of the main XML, the following structure is proposed:

.
└─ OUTPUT/   
   └─ archive_inputFiles/
      └─ 20260320_103034/
         ├─ foo.xml
         │
         ├─ __one_level_behind_include.xml
         │
         ├─ __one_level_behind_folder/           
         |  └─ baz.xml
         │
         └─ ____two_level_behind_folder/      
            └─ buzz.xml

Files outside the input tree (reached via ../) are prefixed with __ for every ../ from foo.xml.

Rename the archive output directory "archive_inputFiles" instead of "inputFiles".
This prevents the archived XML files to unintentionally overwrite the standard "inputFiles" in GEOS/ when running with `-o .` where '.' is GEOS/ location
@jafranc jafranc added type: feature New feature or request type: new A new issue has been created and requires attention flag: no rebaseline Does not require rebaseline labels Mar 24, 2026
@MelReyCG MelReyCG removed the flag: no rebaseline Does not require rebaseline label Mar 24, 2026
@rrsettgast rrsettgast requested a review from Copilot March 24, 2026 13:19
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an automatic “input deck archiving” step so that runs writing to an output directory also copy the main XML input and all <Included> XML files into a timestamped archive folder under the output directory.

Changes:

  • Add archiveInputDeck::archiveInputDeck() to copy input XML(s) + recursively included XMLs into archive_inputFiles/<timestamp>/....
  • Hook archiving into ProblemManager::parseCommandLineInput() so it runs at startup.
  • Add new xmlWrapper::{collectIncluded, collectIncludedRecursive} helpers to enumerate included XML files.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/coreComponents/mainInterface/ProblemManager.cpp Calls the new archiving routine during command-line parsing.
src/coreComponents/fileIO/Outputs/ArchiveInputDeck.hpp Declares the new input deck archiving API.
src/coreComponents/fileIO/Outputs/ArchiveInputDeck.cpp Implements timestamped archiving and path prefixing for ../ includes.
src/coreComponents/fileIO/CMakeLists.txt Adds the new source/header to the fileIO target.
src/coreComponents/dataRepository/xmlWrapper.hpp Declares new include-collection helper APIs.
src/coreComponents/dataRepository/xmlWrapper.cpp Implements include-collection helpers used by archiving.

@rrsettgast
Copy link
Member

Do you want to save the exact deck, with the includes and their directory stucture? Is that required, or perhaps it would be better to just have a single file that duplicates the run?

There is a feature in pugiXML to just safe the XML tree to a new file. This will avoid any include xmls and just save a single file. This is the way I would do it.

Also, I didn't see any included table files for properties or functions being copied but maybe I missed it?

@kdrienCG
Copy link
Author

Do you want to save the exact deck, with the includes and their directory stucture? Is that required, or perhaps it would be better to just have a single file that duplicates the run?

There is a feature in pugiXML to just safe the XML tree to a new file. This will avoid any include xmls and just save a single file. This is the way I would do it.

Also, I didn't see any included table files for properties or functions being copied but maybe I missed it?

I think your solution is simpler. It has the clear advantage to be self-sufficient if someone wants to re-run a simulation via the archive XML file.

Though, there is one downside that I see in this solution.
Apart from loosing the directory structure (which may not be really important), the XML file will loose all formatting, and potentially any comment if it is ran without pugi::parse_comments.

For example, the 10x10x10Hex_LaplaceFEM_smoke.xml here:

<?xml version="1.0" ?>

<Problem>

  <Included>
    <File name="./Laplace_base.xml"/>
  </Included>

  <Mesh>
    <InternalMesh
      name="mesh"
      elementTypes="{ C3D8 }"
      xCoords="{ 0, 1 }"
      yCoords="{ 0, 1 }"
      zCoords="{ 0, 1 }"
      nx="{ 10 }"
      ny="{ 10 }"
      nz="{ 10 }"
      cellBlockNames="{ cb1 }"/>
  </Mesh>

  <ElementRegions>
    <CellElementRegion
      name="Domain"
      cellBlocks="{ * }"
      materialList="{ nullModel }"/>
  </ElementRegions>

  <NumericalMethods>
    <FiniteElements>
      <FiniteElementSpace
        name="FE1"
        order="1"/>
    </FiniteElements>
  </NumericalMethods>

</Problem>

<!-- and Laplace_base.xml that has comments -->

Will look like that if we use this feature:

<?xml version="1.0"?>
<Problem>
	<Mesh>
		<InternalMesh name="mesh" elementTypes="{ C3D8 }" xCoords="{ 0, 1 }" yCoords="{ 0, 1 }" zCoords="{ 0, 1 }" nx="{ 10 }" ny="{ 10 }" nz="{ 10 }" cellBlockNames="{ cb1 }" />
	</Mesh>
	<ElementRegions>
		<CellElementRegion name="Domain" cellBlocks="{ * }" materialList="{ nullModel }" />
	</ElementRegions>
	<NumericalMethods>
		<FiniteElements>
			<FiniteElementSpace name="FE1" order="1" />
		</FiniteElements>
	</NumericalMethods>
	<Solvers>
		<LaplaceFEM name="laplace" discretization="FE1" timeIntegrationOption="SteadyState" fieldName="Temperature" targetRegions="{ Domain }">
			<LinearSolverParameters directParallel="0" />
		</LaplaceFEM>
	</Solvers>
	<Events maxTime="2.0">
		<PeriodicEvent name="solverApplications" forceDt="1.0" target="/Solvers/laplace" />
		<PeriodicEvent name="outputs" timeFrequency="1.0" target="/Outputs/vtkOutput" />
		<PeriodicEvent name="restarts" timeFrequency="1.0" targetExactTimestep="0" target="/Outputs/restartOutput" />
	</Events>
	<Constitutive>
		<NullModel name="nullModel" />
	</Constitutive>
	<FieldSpecifications>
		<FieldSpecification name="sourceTerm" fieldName="Temperature" objectPath="nodeManager" functionName="DirichletTimeFunction" scale="1.0" setNames="{ source }" />
		<FieldSpecification name="sinkTerm" fieldName="Temperature" objectPath="nodeManager" scale="0.0" setNames="{ sink }" />
	</FieldSpecifications>
	<Outputs>
		<VTK name="vtkOutput" />
		<Restart name="restartOutput" />
	</Outputs>
	<Geometry>
		<Box name="source" xMin="{ -0.01, -0.01, -0.01 }" xMax="{ +0.01, +1.01, +1.01 }" />
		<Box name="sink" xMin="{ +0.99, -0.01, -0.01 }" xMax="{ +1.01, +1.01, +1.01 }" />
	</Geometry>
	<Functions>
		<TableFunction name="DirichletTimeFunction" inputVarNames="{ time }" coordinates="{ 0.0, 1.0, 2.0 }" values="{ 0.0, 3.e2, 4.e3 }" />
	</Functions>
</Problem>

The same downside would apply if I modify the XML completely, to make my solution runnable.

But to prevent this, I think I could make direct and "manual" modifications of single lines (in the archived XML files) where there is a path to match the archive structure.
Basically replacing every .. with __ in paths. (I'll need to modify a bit the logic of prefixBackwardPath() for this)
For example the archived xml:

<File name="../foo.xml"/>

Will be modified afterwards to:

<File name="__/foo.xml"/>  <!-- instead of __foo.xml -->

This way I could preserve the directory structure, keep any formatting and comments, and still be able to use an archive for a run.

The solution to duplicate the run in a single xml file would be a good fallback if it is not appropriate.

As for the table files, I haven't added them. They will be added.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: feature New feature or request type: new A new issue has been created and requires attention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants