Skip to content

Add support for environment context#3441

Open
ebyhr wants to merge 1 commit into
apache:mainfrom
ebyhr:ebi/environment-context
Open

Add support for environment context#3441
ebyhr wants to merge 1 commit into
apache:mainfrom
ebyhr:ebi/environment-context

Conversation

@ebyhr

@ebyhr ebyhr commented May 30, 2026

Copy link
Copy Markdown
Member

Rationale for this change

Adds EnvironmentContext utility class to track engine metadata (name and version) and automatically includes it in snapshot summaries. This provides visibility into which engine version created/modified each snapshot.

Are these changes tested?

Yes

Are there any user-facing changes?

No - it's internal change

@ebyhr ebyhr force-pushed the ebi/environment-context branch 5 times, most recently from 838ef11 to 07587a3 Compare May 30, 2026 04:22
@ebyhr ebyhr marked this pull request as ready for review June 1, 2026 03:20

@abnobdoss abnobdoss left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! This looks very helpful for debugging.

Fokko pushed a commit that referenced this pull request Jun 9, 2026
# Rationale for this change

Allow users to create a view with fewer parameters.

We can set an environment context (`engine-name` and `engine-version`)
in `summary` field by default
once #3441 is merged.

## Are these changes tested?

Yes

## Are there any user-facing changes?

No

<!-- In the case of user-facing changes, please add the changelog label.
-->
@ebyhr ebyhr force-pushed the ebi/environment-context branch from 07587a3 to afe8bec Compare June 9, 2026 22:40

@kevinjqliu kevinjqliu left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! This will be very useful to users. I got a couple of nits

timestamp_ms: int = Field(alias="timestamp-ms", default_factory=lambda: int(time.time() * 1000))
"""Timestamp when the version was created (ms from epoch)"""
summary: dict[str, str] = Field(default_factory=dict)
summary: dict[str, str] = Field(default_factory=lambda: EnvironmentContext.get())

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will add EnvironmentContext to all ViewVersion which might not be the desired behavior.

Similar to Table Metadata, can we only add EnvironmentContext in the write path?

"total-files-size": str(file_size),
"total-position-deletes": "0",
"total-records": "3",
"engine-name": "pyiceberg",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we find a way to avoid repeating env context everywhere? Perhaps a small test helper like this would keep the exact assertions while localizing this new default behavior:

def with_environment_context(summary: dict[str, str]) -> dict[str, str]:
    return {**summary, **EnvironmentContext.get()}

assert summaries[0] == with_environment_context({
    "added-data-files": "3",
    "added-records": "5",
    ...
})

Comment on lines +22 to +26
def test_default_value() -> None:
actual = EnvironmentContext.get()
assert len(actual) == 2
assert actual["engine-name"] == "pyiceberg"
assert re.match(r"^\d+\.\d+\.\d+", actual["engine-version"])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_default_value() -> None:
actual = EnvironmentContext.get()
assert len(actual) == 2
assert actual["engine-name"] == "pyiceberg"
assert re.match(r"^\d+\.\d+\.\d+", actual["engine-version"])
def test_default_value() -> None:
assert EnvironmentContext.get() == {
"engine-name": "pyiceberg",
"engine-version": __version__,
}
def test_get_returns_copy() -> None:
actual = EnvironmentContext.get()
actual["test-key"] = "test-value"
assert "test-key" not in EnvironmentContext.get()

EnvironmentContext might have been mutated, this is a better way to test that the returned value is a copy

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on this. The regex is really brittle and won't work for RC builds.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I was missing the RC build.

I intentionally avoided using the same logic between environment_context.py and this test. I generally don't believe it's a good idea to write tests.

Comment on lines +29 to +34
def test_put_and_remove() -> None:
EnvironmentContext.put("test-key", "test-value")
assert EnvironmentContext.get()["test-key"] == "test-value"

EnvironmentContext.remove("test-key")
assert "test-key" not in EnvironmentContext.get()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def test_put_and_remove() -> None:
EnvironmentContext.put("test-key", "test-value")
assert EnvironmentContext.get()["test-key"] == "test-value"
EnvironmentContext.remove("test-key")
assert "test-key" not in EnvironmentContext.get()
def test_put_and_remove() -> None:
try:
EnvironmentContext.put("test-key", "test-value")
assert EnvironmentContext.get()["test-key"] == "test-value"
assert EnvironmentContext.remove("test-key") == "test-value"
assert "test-key" not in EnvironmentContext.get()
finally:
EnvironmentContext.remove("test-key")

use try/finally to avoid mutating EnvironmentContext

Comment on lines +406 to +407
for key, value in EnvironmentContext.get().items():
summary.__setitem__(key, value)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for key, value in EnvironmentContext.get().items():
summary.__setitem__(key, value)
for key, value in EnvironmentContext.get().items():
summary[key] = value

nit

class EnvironmentContext:
_PROPERTIES: dict[str, str] = {
"engine-name": "pyiceberg",
"engine-version": version("pyiceberg"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"engine-version": version("pyiceberg"),
"engine-version": __version__,

nit

@rambleraptor

Copy link
Copy Markdown
Collaborator

This looks great! My biggest note is to remove the version regexes, since they won't work on RCs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants