Skip to content

::JSON.pretty_generate should sort hash keys #976

@jcpunk

Description

@jcpunk

Summary
JSON.pretty_generate currently preserves the insertion order of Ruby hashes. This leads to non-deterministic output when hash construction order varies, making diffs noisy and reproducibility harder. Add an option to sort hash keys during generation to produce stable, predictable JSON output.

Motivation / Problem

  • In many workflows (config generation, test fixtures, CI artifacts), stable serialization is critical.

  • Hash insertion order may differ across code paths, Ruby versions, or data sources, causing semantically identical objects to produce different JSON.

  • This complicates:

    • Git diffs and code reviews
    • Caching and content hashing
    • Snapshot testing
    • Reproducible builds

Proposed Solution
Introduce an option to JSON.pretty_generate (and possibly JSON.generate) to sort object keys lexicographically.

API Options (one of):

  1. Keyword argument:

    JSON.pretty_generate(obj, sort_keys: true)
  2. Extend JSON::State:

    state = JSON::State.new(sort_keys: true)
    JSON.pretty_generate(obj, state)

Behavior

  • When sort_keys: true, all hashes are serialized with keys sorted (string comparison).
  • Default remains false to preserve backward compatibility and performance characteristics.

Example

obj = { b: 1, a: 2 }

JSON.pretty_generate(obj)
# => {
#      "b": 1,
#      "a": 2
#    }

JSON.pretty_generate(obj, sort_keys: true)
# => {
#      "a": 2,
#      "b": 1
#    }

Alternatives Considered

  • Pre-sorting hashes before serialization:

    • Requires deep traversal and duplication of data structures
    • Error-prone and inefficient for large nested objects
  • Relying on insertion order discipline:

    • Not robust across boundaries or contributors

Impact

  • Improves determinism and reproducibility across tooling and environments
  • Reduces diff noise and improves developer experience
  • Aligns with behavior available in other ecosystems (e.g., Python’s json.dumps(sort_keys=True))

Performance Considerations

  • Sorting introduces overhead proportional to key count per object
  • Acceptable when opt-in; no impact on default behavior

Backward Compatibility

  • Fully backward compatible if default remains unsorted

Test Plan

  • Unit tests verifying:

    • Sorted vs unsorted output for flat and deeply nested hashes
    • Stability across multiple invocations
    • Mixed key types (symbols/strings) normalized to strings before sort
  • Benchmark comparison with and without sorting

Open Questions

  • Should sorting be strictly lexicographic on stringified keys?
  • Should there be a global default toggle via JSON::State configuration?

Additional Context
This feature would support reproducible outputs in CI pipelines and long-lived systems where deterministic artifacts are a requirement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions