Skip to content

Activate the microunit tax-unit delegation: thread CPS columns through entity construction #115

@MaxGhenis

Description

@MaxGhenis

Follow-up to #113 and #114.

#114 added the delegation seam: USMicroplexPipeline._build_policyengine_tax_units_via_microunit calls microunit.construct_tax_units only when the person frame carries microunit's raw CPS input columns, otherwise returns None and falls back to the legacy role-flag reconstruction. On today's data it always returns None — the delegation is inert. This issue is the activation.

Where the delegation actually runs

The microunit path is in PolicyEngine materialization: build_policyengine_entity_tables(population)_build_policyengine_tax_units(persons) (src/microplex_us/pipelines/us.py, ~4095 / ~4134). persons there is the synthesized population carrying microplex's normalized columns (household_id, age, marital_status, spouse_person_number, relationship_to_head, …), not raw CPS names — cps.py renames the raw columns very early (PH_SEQ→household_id, A_SPOUSE→spouse_person_number).

So activation is not "un-drop raw columns in a reader." It is: build microunit's CPS-like input contract from the normalized columns microplex already carries at materialization — directly analogous to the ACS→CPS mapping microunit documents as the consumer's responsibility.

Available vs. must-be-derived

microunit needs PH_SEQ, A_LINENO, A_AGE, A_MARITL, A_SPOUSE, PEPAR1, PEPAR2, A_EXPRRP:

microunit col source at materialization
PH_SEQ household_id (direct)
A_AGE age (direct)
A_MARITL marital_status → CPS marital codes (map)
A_LINENO assign per-household line numbers (1..n)
A_SPOUSE spouse_person_number → spouse's line number (after A_LINENO is assigned)
A_EXPRRP relationship_to_head (0/1/2/3) → CPS recode (lossy/approximate)
PEPAR1/PEPAR2 heuristic parent-line inference from relationship + ages (none carried)

The work

  1. Add a normalized person frame → microunit CPS contract adapter at materialization (line-number assignment, marital + relationship code maps, spouse-line resolution, heuristic parent inference).
  2. Gate it behind a default-OFF config flag so default behavior is unchanged until validated.
  3. Have _build_policyengine_tax_units_via_microunit use the adapter when raw cols are absent but the normalized cols are present.

⚠️ This is the entity-convergence switch (see #113)

Activating this makes microplex's tax units converge toward eCPS's (microunit is eCPS's engine). Per #113, do not report any "microplex beats eCPS" result off this change — measure it as an entity-convergence effect with a matched-N, symmetric-refit, holdout before/after comparison on the same target surface. Expected direct effect: the tax-unit/SPM under-splitting #113 traces ~78% of the loss gap to (MP ~1.16 tax units/hh vs eCPS ~1.34).

Acceptance

  • The adapter builds a microunit-valid frame from the normalized columns; the delegation fires on real synthesized data.
  • The fidelity of the lossy maps (A_EXPRRP from relationship_to_head, heuristic PEPAR1/PEPAR2) is validated — e.g. agreement with the legacy role-flag reconstruction on households where both can run, before trusting it.
  • Existing entity/pipeline tests stay green; default behavior unchanged with the flag off.
  • The before/after isolation comparison is run and reported as an entity-convergence effect, not a quality claim.
  • Done on its own branch; behavior-changing, must not ride along with the inert Adopt microunit for tax-unit reconstruction (scoped, behavior-preserving; part of #113) #114 seam.

A first prototype of the adapter (flag-gated, default off) is in progress — see the linked PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions