Add support for environment context by ebyhr · Pull Request #3441 · apache/iceberg-python

ebyhr · 2026-05-30T00:53:56Z

Rationale for this change

Adds EnvironmentContext utility class to track engine metadata (name and version) and automatically includes it in snapshot summaries. This provides visibility into which engine version created/modified each snapshot.

Are these changes tested?

Yes

Are there any user-facing changes?

No - it's internal change

abnobdoss

This is great! This looks very helpful for debugging.

# Rationale for this change Allow users to create a view with fewer parameters. We can set an environment context (`engine-name` and `engine-version`) in `summary` field by default once #3441 is merged. ## Are these changes tested? Yes ## Are there any user-facing changes? No

kevinjqliu

Thanks for the PR! This will be very useful to users. I got a couple of nits

kevinjqliu · 2026-06-21T01:04:06Z

    timestamp_ms: int = Field(alias="timestamp-ms", default_factory=lambda: int(time.time() * 1000))
    """Timestamp when the version was created (ms from epoch)"""
-    summary: dict[str, str] = Field(default_factory=dict)
+    summary: dict[str, str] = Field(default_factory=lambda: EnvironmentContext.get())


this will add EnvironmentContext to all ViewVersion which might not be the desired behavior.

Similar to Table Metadata, can we only add EnvironmentContext in the write path?

kevinjqliu · 2026-06-21T01:09:49Z

        "total-files-size": str(file_size),
        "total-position-deletes": "0",
        "total-records": "3",
+        "engine-name": "pyiceberg",


Could we find a way to avoid repeating env context everywhere? Perhaps a small test helper like this would keep the exact assertions while localizing this new default behavior:

def with_environment_context(summary: dict[str, str]) -> dict[str, str]: return {**summary, **EnvironmentContext.get()} assert summaries[0] == with_environment_context({ "added-data-files": "3", "added-records": "5", ... })

kevinjqliu · 2026-06-21T01:13:12Z

+def test_default_value() -> None:
+    actual = EnvironmentContext.get()
+    assert len(actual) == 2
+    assert actual["engine-name"] == "pyiceberg"
+    assert re.match(r"^\d+\.\d+\.\d+", actual["engine-version"])


Suggested change

def test_default_value() -> None:

actual = EnvironmentContext.get()

assert len(actual) == 2

assert actual["engine-name"] == "pyiceberg"

assert re.match(r"^\d+\.\d+\.\d+", actual["engine-version"])

def test_default_value() -> None:

assert EnvironmentContext.get() == {

"engine-name": "pyiceberg",

"engine-version": __version__,

}

def test_get_returns_copy() -> None:

actual = EnvironmentContext.get()

actual["test-key"] = "test-value"

assert "test-key" not in EnvironmentContext.get()

EnvironmentContext might have been mutated, this is a better way to test that the returned value is a copy

+1 on this. The regex is really brittle and won't work for RC builds.

Ugh, I was missing the RC build.

I intentionally avoided using the same logic between environment_context.py and this test. I generally don't believe it's a good idea to write tests.

kevinjqliu · 2026-06-21T01:13:41Z

+def test_put_and_remove() -> None:
+    EnvironmentContext.put("test-key", "test-value")
+    assert EnvironmentContext.get()["test-key"] == "test-value"
+
+    EnvironmentContext.remove("test-key")
+    assert "test-key" not in EnvironmentContext.get()


Suggested change

def test_put_and_remove() -> None:

EnvironmentContext.put("test-key", "test-value")

assert EnvironmentContext.get()["test-key"] == "test-value"

EnvironmentContext.remove("test-key")

assert "test-key" not in EnvironmentContext.get()

def test_put_and_remove() -> None:

try:

EnvironmentContext.put("test-key", "test-value")

assert EnvironmentContext.get()["test-key"] == "test-value"

assert EnvironmentContext.remove("test-key") == "test-value"

assert "test-key" not in EnvironmentContext.get()

finally:

EnvironmentContext.remove("test-key")

use try/finally to avoid mutating EnvironmentContext

kevinjqliu · 2026-06-21T01:17:00Z

+    for key, value in EnvironmentContext.get().items():
+        summary.__setitem__(key, value)


Suggested change

for key, value in EnvironmentContext.get().items():

summary.__setitem__(key, value)

for key, value in EnvironmentContext.get().items():

summary[key] = value

nit

kevinjqliu · 2026-06-21T01:18:55Z

+class EnvironmentContext:
+    _PROPERTIES: dict[str, str] = {
+        "engine-name": "pyiceberg",
+        "engine-version": version("pyiceberg"),


Suggested change

"engine-version": version("pyiceberg"),

"engine-version": __version__,

nit

rambleraptor · 2026-06-21T01:51:27Z

This looks great! My biggest note is to remove the version regexes, since they won't work on RCs.

ebyhr force-pushed the ebi/environment-context branch 5 times, most recently from 838ef11 to 07587a3 Compare May 30, 2026 04:22

ebyhr marked this pull request as ready for review June 1, 2026 03:20

abnobdoss approved these changes Jun 2, 2026

View reviewed changes

ebyhr mentioned this pull request Jun 5, 2026

Add defaults to ViewVersion fields #3458

Merged

Add support for environment context

afe8bec

ebyhr force-pushed the ebi/environment-context branch from 07587a3 to afe8bec Compare June 9, 2026 22:40

ebyhr mentioned this pull request Jun 9, 2026

Add more examples for pyiceberg view #3414

Merged

kevinjqliu reviewed Jun 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for environment context#3441

Add support for environment context#3441
ebyhr wants to merge 1 commit into
apache:mainfrom
ebyhr:ebi/environment-context

ebyhr commented May 30, 2026

Uh oh!

abnobdoss left a comment

Uh oh!

kevinjqliu left a comment

Uh oh!

kevinjqliu Jun 21, 2026

Uh oh!

kevinjqliu Jun 21, 2026

Uh oh!

kevinjqliu Jun 21, 2026

Uh oh!

rambleraptor Jun 21, 2026

Uh oh!

ebyhr Jun 21, 2026

Uh oh!

kevinjqliu Jun 21, 2026

Uh oh!

kevinjqliu Jun 21, 2026

Uh oh!

kevinjqliu Jun 21, 2026

Uh oh!

rambleraptor commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		for key, value in EnvironmentContext.get().items():
		summary.__setitem__(key, value)

	"engine-version": version("pyiceberg"),
	"engine-version": __version__,

Conversation

ebyhr commented May 30, 2026

Rationale for this change

Are these changes tested?

Are there any user-facing changes?

Uh oh!

abnobdoss left a comment

Choose a reason for hiding this comment

Uh oh!

kevinjqliu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rambleraptor commented Jun 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants