feat: implement PHIX validation for schools and daycares#152
feat: implement PHIX validation for schools and daycares#152eswarchandravidyasagar wants to merge 23 commits intomainfrom
Conversation
eswarchandravidyasagar
commented
Jan 14, 2026
- Added PHIX validation module to validate school/daycare names against the official PHIX reference list.
- Integrated validation into the preprocessing step in orchestrator.py.
- Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities.
- Created unit tests for the validation module covering various scenarios.
- Added documentation for the validation plan and updated the plans directory.
- Added PHIX validation module to validate school/daycare names against the official PHIX reference list. - Integrated validation into the preprocessing step in orchestrator.py. - Configurable options added to parameters.yaml for enabling validation and handling unmatched facilities. - Created unit tests for the validation module covering various scenarios. - Added documentation for the validation plan and updated the plans directory.
|
We don't have redistribution permission on the phix reference list file, so that will need to be removed and commits squashed. It'll also blow up the size of this repository and its history. Users will have to BYO phix reference list |
| # Path to PHIX reference Excel file (relative to project root) | ||
| reference_file: PHIX Reference Lists v5.2 - 2025Jun30.xlsx | ||
| # Minimum fuzzy match score (0-100) to consider a match | ||
| match_threshold: 85 |
There was a problem hiding this comment.
Is this required. It should be exact? This could enable bypass of the exact issues we'd like to protect against like similarly named schools being accidentally selected when a panorama user creates a forecast query
|
We likely need a mapping file that converts the PHU name from phix reference document, to standardized PHU acronyms (which should be enforced for template folders, etc) We also may need to allow functionality for this map to be many-to-one, in the case of PHUs which have merged since this was last updated. |
|
I know in this case that this is important to run early in pipeline before other processing, but I wonder also if we can emit something in the per-pdf validation log regarding valid facility being used for the target PHU? |
- Updated `validate_phix.py` to remove fuzzy matching and implement strict exact matching for facility names against the PHIX reference list. - Introduced PHU alias mapping to restrict validation to specific Public Health Units (PHUs) using a YAML configuration file. - Enhanced the `validate_facilities` function to support PHU scoping and improved error handling for unmatched facilities. - Updated tests to reflect changes in matching strategy and added new tests for PHU alias mapping and validation behavior. - Modified documentation to clarify the new validation process and configuration options.
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
… column prefix, support multiple facility columns
…ch when PHIX ID is verified, otherwise inexact match
…e version to match release.
uv lock --upgrade
Bumps the minor-and-patch group with 4 updates in the / directory: [pypdf](https://github.com/py-pdf/pypdf), [babel](https://github.com/python-babel/babel), [ty](https://github.com/astral-sh/ty) and [git-changelog](https://github.com/pawamoy/git-changelog). Updates `pypdf` from 6.6.0 to 6.6.2 - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](py-pdf/pypdf@6.6.0...6.6.2) Updates `babel` from 2.17.0 to 2.18.0 - [Release notes](https://github.com/python-babel/babel/releases) - [Changelog](https://github.com/python-babel/babel/blob/master/CHANGES.rst) - [Commits](python-babel/babel@v2.17.0...v2.18.0) Updates `ty` from 0.0.12 to 0.0.14 - [Release notes](https://github.com/astral-sh/ty/releases) - [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md) - [Commits](astral-sh/ty@0.0.12...0.0.14) Updates `git-changelog` from 2.7.0 to 2.7.1 - [Release notes](https://github.com/pawamoy/git-changelog/releases) - [Changelog](https://github.com/pawamoy/git-changelog/blob/main/CHANGELOG.md) - [Commits](pawamoy/git-changelog@2.7.0...2.7.1) --- updated-dependencies: - dependency-name: pypdf dependency-version: 6.6.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: babel dependency-version: 2.18.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: minor-and-patch - dependency-name: ty dependency-version: 0.0.14 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: git-changelog dependency-version: 2.7.1 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 3 updates in the / directory: [pypdf](https://github.com/py-pdf/pypdf), [pillow](https://github.com/python-pillow/Pillow) and [ty](https://github.com/astral-sh/ty). Updates `pypdf` from 6.6.2 to 6.7.0 - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](py-pdf/pypdf@6.6.2...6.7.0) Updates `pillow` from 12.1.0 to 12.1.1 - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](python-pillow/Pillow@12.1.0...12.1.1) Updates `ty` from 0.0.14 to 0.0.17 - [Release notes](https://github.com/astral-sh/ty/releases) - [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md) - [Commits](astral-sh/ty@0.0.14...0.0.17) --- updated-dependencies: - dependency-name: pypdf dependency-version: 6.7.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: minor-and-patch - dependency-name: pillow dependency-version: 12.1.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: ty dependency-version: 0.0.17 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 2 updates: [pypdf](https://github.com/py-pdf/pypdf) and [ty](https://github.com/astral-sh/ty). Updates `pypdf` from 6.7.0 to 6.7.2 - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](py-pdf/pypdf@6.7.0...6.7.2) Updates `ty` from 0.0.17 to 0.0.18 - [Release notes](https://github.com/astral-sh/ty/releases) - [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md) - [Commits](astral-sh/ty@0.0.17...0.0.18) --- updated-dependencies: - dependency-name: pypdf dependency-version: 6.7.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: ty dependency-version: 0.0.18 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 4 updates in the / directory: [pypdf](https://github.com/py-pdf/pypdf), [ty](https://github.com/astral-sh/ty), [git-changelog](https://github.com/pawamoy/git-changelog) and [pypandoc](https://github.com/JessicaTegner/pypandoc). Updates `pypdf` from 6.7.2 to 6.9.0 - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](py-pdf/pypdf@6.7.2...6.9.0) Updates `ty` from 0.0.18 to 0.0.23 - [Release notes](https://github.com/astral-sh/ty/releases) - [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md) - [Commits](astral-sh/ty@0.0.18...0.0.23) Updates `git-changelog` from 2.7.1 to 2.9.0 - [Release notes](https://github.com/pawamoy/git-changelog/releases) - [Changelog](https://github.com/pawamoy/git-changelog/blob/main/CHANGELOG.md) - [Commits](pawamoy/git-changelog@2.7.1...2.9.0) Updates `pypandoc` from 1.16.2 to 1.17 - [Release notes](https://github.com/JessicaTegner/pypandoc/releases) - [Changelog](https://github.com/JessicaTegner/pypandoc/blob/master/release.md) - [Commits](JessicaTegner/pypandoc@v1.16.2...v1.17) --- updated-dependencies: - dependency-name: pypdf dependency-version: 6.9.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: minor-and-patch - dependency-name: ty dependency-version: 0.0.23 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: git-changelog dependency-version: 2.9.0 dependency-type: direct:development update-type: version-update:semver-minor dependency-group: minor-and-patch - dependency-name: pypandoc dependency-version: '1.17' dependency-type: direct:development update-type: version-update:semver-minor dependency-group: minor-and-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 4 updates: [pypdf](https://github.com/py-pdf/pypdf), [pytest-cov](https://github.com/pytest-dev/pytest-cov), [ty](https://github.com/astral-sh/ty) and [git-changelog](https://github.com/pawamoy/git-changelog). Updates `pypdf` from 6.9.0 to 6.9.1 - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](py-pdf/pypdf@6.9.0...6.9.1) Updates `pytest-cov` from 7.0.0 to 7.1.0 - [Changelog](https://github.com/pytest-dev/pytest-cov/blob/master/CHANGELOG.rst) - [Commits](pytest-dev/pytest-cov@v7.0.0...v7.1.0) Updates `ty` from 0.0.23 to 0.0.24 - [Release notes](https://github.com/astral-sh/ty/releases) - [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md) - [Commits](astral-sh/ty@0.0.23...0.0.24) Updates `git-changelog` from 2.9.0 to 2.9.2 - [Release notes](https://github.com/pawamoy/git-changelog/releases) - [Changelog](https://github.com/pawamoy/git-changelog/blob/main/CHANGELOG.md) - [Commits](pawamoy/git-changelog@2.9.0...2.9.2) --- updated-dependencies: - dependency-name: pypdf dependency-version: 6.9.1 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: pytest-cov dependency-version: 7.1.0 dependency-type: direct:development update-type: version-update:semver-minor dependency-group: minor-and-patch - dependency-name: ty dependency-version: 0.0.24 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: git-changelog dependency-version: 2.9.2 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 3 updates: [pypdf](https://github.com/py-pdf/pypdf), [ty](https://github.com/astral-sh/ty) and [git-changelog](https://github.com/pawamoy/git-changelog). Updates `pypdf` from 6.9.1 to 6.9.2 - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](py-pdf/pypdf@6.9.1...6.9.2) Updates `ty` from 0.0.24 to 0.0.26 - [Release notes](https://github.com/astral-sh/ty/releases) - [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md) - [Commits](astral-sh/ty@0.0.24...0.0.26) Updates `git-changelog` from 2.9.2 to 2.9.3 - [Release notes](https://github.com/pawamoy/git-changelog/releases) - [Changelog](https://github.com/pawamoy/git-changelog/blob/main/CHANGELOG.md) - [Commits](pawamoy/git-changelog@2.9.2...2.9.3) --- updated-dependencies: - dependency-name: pypdf dependency-version: 6.9.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: ty dependency-version: 0.0.26 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: git-changelog dependency-version: 2.9.3 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps the minor-and-patch group with 5 updates in the / directory: | Package | From | To | | --- | --- | --- | | [pypdf](https://github.com/py-pdf/pypdf) | `6.9.2` | `6.10.0` | | [pillow](https://github.com/python-pillow/Pillow) | `12.1.1` | `12.2.0` | | [rapidfuzz](https://github.com/rapidfuzz/RapidFuzz) | `3.14.3` | `3.14.5` | | [pytest](https://github.com/pytest-dev/pytest) | `9.0.2` | `9.0.3` | | [ty](https://github.com/astral-sh/ty) | `0.0.26` | `0.0.29` | Updates `pypdf` from 6.9.2 to 6.10.0 - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](py-pdf/pypdf@6.9.2...6.10.0) Updates `pillow` from 12.1.1 to 12.2.0 - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](python-pillow/Pillow@12.1.1...12.2.0) Updates `rapidfuzz` from 3.14.3 to 3.14.5 - [Release notes](https://github.com/rapidfuzz/RapidFuzz/releases) - [Changelog](https://github.com/rapidfuzz/RapidFuzz/blob/main/CHANGELOG.rst) - [Commits](rapidfuzz/RapidFuzz@v3.14.3...v3.14.5) Updates `pytest` from 9.0.2 to 9.0.3 - [Release notes](https://github.com/pytest-dev/pytest/releases) - [Changelog](https://github.com/pytest-dev/pytest/blob/main/CHANGELOG.rst) - [Commits](pytest-dev/pytest@9.0.2...9.0.3) Updates `ty` from 0.0.26 to 0.0.29 - [Release notes](https://github.com/astral-sh/ty/releases) - [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md) - [Commits](astral-sh/ty@0.0.26...0.0.29) --- updated-dependencies: - dependency-name: pypdf dependency-version: 6.10.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: minor-and-patch - dependency-name: pillow dependency-version: 12.2.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: minor-and-patch - dependency-name: rapidfuzz dependency-version: 3.14.5 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: pytest dependency-version: 9.0.3 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: ty dependency-version: 0.0.29 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 5 to 6. - [Release notes](https://github.com/codecov/codecov-action/releases) - [Changelog](https://github.com/codecov/codecov-action/blob/main/CHANGELOG.md) - [Commits](codecov/codecov-action@v5...v6) --- updated-dependencies: - dependency-name: codecov/codecov-action dependency-version: '6' dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com>
Updates the requirements on [setuptools](https://github.com/pypa/setuptools) to permit the latest version. - [Release notes](https://github.com/pypa/setuptools/releases) - [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst) - [Commits](pypa/setuptools@v45.0.0...v82.0.1) --- updated-dependencies: - dependency-name: setuptools dependency-version: 82.0.1 dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com>
Bumps the minor-and-patch group with 3 updates in the / directory: [pypdf](https://github.com/py-pdf/pypdf), [pre-commit](https://github.com/pre-commit/pre-commit) and [ty](https://github.com/astral-sh/ty). Updates `pypdf` from 6.10.0 to 6.10.2 - [Release notes](https://github.com/py-pdf/pypdf/releases) - [Changelog](https://github.com/py-pdf/pypdf/blob/main/CHANGELOG.md) - [Commits](py-pdf/pypdf@6.10.0...6.10.2) Updates `pre-commit` from 4.5.1 to 4.6.0 - [Release notes](https://github.com/pre-commit/pre-commit/releases) - [Changelog](https://github.com/pre-commit/pre-commit/blob/main/CHANGELOG.md) - [Commits](pre-commit/pre-commit@v4.5.1...v4.6.0) Updates `ty` from 0.0.29 to 0.0.32 - [Release notes](https://github.com/astral-sh/ty/releases) - [Changelog](https://github.com/astral-sh/ty/blob/main/CHANGELOG.md) - [Commits](astral-sh/ty@0.0.29...0.0.32) --- updated-dependencies: - dependency-name: pypdf dependency-version: 6.10.2 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: minor-and-patch - dependency-name: pre-commit dependency-version: 4.6.0 dependency-type: direct:development update-type: version-update:semver-minor dependency-group: minor-and-patch - dependency-name: ty dependency-version: 0.0.32 dependency-type: direct:development update-type: version-update:semver-patch dependency-group: minor-and-patch ... Signed-off-by: dependabot[bot] <support@github.com>