Java to Python Test Suite

Verification-first test infrastructure for secure, dependency-aware Java to Python translation services.

Executive Summary

This project uses a Python + pytest stack because it maximizes test expressiveness, async API coverage, and security-focused validation in one cohesive framework. The suite is intentionally built around comparison and traceability: each major requirement is represented in marker groups, assertion patterns, dependency ordering tests, and visual models.

Why This Stack, How It Is Used, and Benefits

Technology	Why Used	How Used in This Suite	Benefit Over Alternatives
Python 3.11+	Fast iteration and excellent testing ecosystem	Executes all test layers and fixture logic	Lower friction than Java/JUnit for mixed async + security test authoring
pytest	Marker-based structure and fixture system	Separates unit/integration/correctness/negative/adversarial pipelines	Better parametrization and fixture ergonomics than unittest
pytest-asyncio	Native async compatibility	Runs async endpoint tests without custom event-loop wrappers	Cleaner than ad-hoc loop management
httpx + ASGITransport	In-process API contract testing	Calls API endpoints with dependency overrides and mock backends	Faster and more deterministic than external server + requests
cryptography + PyJWT	Realistic auth-path verification	Generates RSA keys and signs test JWTs at runtime	Stronger coverage than static token-only tests
javalang	Java structure awareness in validation workflows	Supports parser-oriented assertions in unit tests	More reliable than regex-only Java parsing checks

Important

The suite verifies not only correctness, but also translation safety and dependency order requirements, including base-class-before-subclass guarantees through topological sorting tests.

Executive Summary
Overview
Requirements to Validation Mapping
Requirements Verification and Validation
Architecture
Object Model
Dependency Graph and Topological Sort
Why Kahn's Algorithm Matters Here
Visualization as a Verification Tool
Technology Stack Decision Matrix
Test Suite Breakdown
Scientific and Computer Science Algorithm Catalog
Tool Compliance for Top Secret SCI/SCIF Regulated Environments
Setup and Installation
Usage
Roadmap
Contributing
License

Overview

This repository is a dedicated test harness for a Java-to-Python translation service. It validates parser behavior, method/type fidelity, API contract integrity, authorization controls, guardrail enforcement, and adversarial resilience. It is designed for teams that need reproducible quality and security checks before releasing translation features.

Important

The suite assumes an external orchestrator source path and environment variables are available as configured in conftest.py.

Core value for this project:

Confirms required behavior with explicit assertions (not heuristic checks only).
Compares expected ordering and output properties against actual responses.
Detects failures in dependency ordering and cycle handling early.
Verifies that translation order favors reusable base components before dependents.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github		.github
api		api
core		core
fixtures		fixtures
guardrails		guardrails
tests		tests
tools		tools
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SUPPORT.md		SUPPORT.md
conftest.py		conftest.py
main.py		main.py
pytest.ini		pytest.ini
requirements.txt		requirements.txt

Requirement	Implementation Focus	Evidence in Test Suite	Outcome Verified
Parse Java artifacts safely	Parser and class-info extraction paths	`tests/unit/test_java_parsing.py`	AST/data extraction is stable for normal and malformed inputs
Build dependency graph correctly	Intra-project edge construction	`tests/unit/test_dependency_graph.py`	No self-loops, no JDK noise, valid class map
Sort translation order by dependency	Topological ordering logic	`tests/unit/test_topological_sort.py`	Dependencies appear before dependent classes
Translate base classes before subclasses	Ordering invariant in project translation plan	`tests/unit/test_topological_sort.py` and `tests/integration/test_project_translate_api.py`	Base abstractions precede concrete subclasses/services
Detect cycles without dropping files	Cycle fallback behavior	`tests/adversarial/test_circular_dependencies.py` and unit cycle tests	`had_cycle` is true and all files remain represented
Block unsafe or manipulative input	Input guardrails	`tests/adversarial/test_prompt_injection.py` and `tests/unit/test_guardrails.py`	Injection/secret patterns rejected before model path
Enforce RBAC and policy boundaries	JWT + permission checks	`tests/negative/test_rbac_enforcement.py`	Unauthorized roles/actions are denied

Requirement Area	Verification Method	Validation Method	Pass Criteria	Primary Evidence
Dependency graph correctness	Unit assertions on graph edges and node invariants	Integration checks of API dependency output	No self-loops, no missing files, dependency-first order	`tests/unit/test_dependency_graph.py`, `tests/integration/test_project_translate_api.py`
Topological ordering (base before subclass)	Unit invariant checks for order index relationships	Project-level translate API response order checks	For every edge A depends on B, index(B) < index(A)	`tests/unit/test_topological_sort.py`, `tests/integration/test_project_translate_api.py`
Cycle detection robustness	Unit and adversarial cycle test scenarios	End-to-end circular project request handling	`had_cycle` true on cyclic input, all files retained in output	`tests/adversarial/test_circular_dependencies.py`, `tests/unit/test_topological_sort.py`
Security guardrails	Unit and adversarial pattern blocking tests	API-level blocked request behavior checks	Injection and credential patterns rejected before unsafe processing	`tests/unit/test_guardrails.py`, `tests/adversarial/test_prompt_injection.py`
RBAC and auth correctness	Negative role/permission tests	Unauthorized API paths return denied responses	Role permissions enforced with no privilege escalation	`tests/negative/test_rbac_enforcement.py`, integration auth tests
Output structure fidelity	Correctness tests over syntax/import/signatures	Workflow-level usage consistency checks	Outputs remain parseable and structurally aligned to expectations	`tests/correctness/*.py`

Gate	Scope	Command Pattern	Minimum Acceptance
Gate 1	Core logic verification	`pytest -m unit -q`	All dependency/order/parser tests pass
Gate 2	API contract verification	`pytest -m integration -q`	Endpoint contract fields and ordering checks pass
Gate 3	Security validation	`pytest -m negative -q && pytest -m adversarial -q`	RBAC, injection, and egress/model policy checks pass
Gate 4	Output quality validation	`pytest -m correctness -q`	Output syntax/structure/import quality checks pass
Gate 5	Full-system confidence	`pytest -q`	No regressions across all marker groups

Ordering Check	Why It Matters	Test Evidence
`Order` before `OrderService`	Service methods require model definitions first	Unit and integration ordering assertions
`AbstractProcessor` before `PaymentProcessor`	Subclass translation needs base contract context	Unit topological ordering assertions
`IRepository` before `OrderRepository`	Interface constraints should be available before implementation	Unit topological ordering assertions
Cycle path still returns all files	Production robustness under imperfect source graphs	Circular dependency adversarial/unit tests

Visualization	Confirms	Comparison Benefit
Architecture flowchart	End-to-end validation pipeline	Quickly spots missing validation layers
Object model diagram	Data structures and relationships	Confirms required fields exist for assertions
Dependency graph diagram	Expected dependency direction	Makes ordering mistakes obvious during review
Kahn sequence diagram	Algorithm steps and outputs	Aligns function behavior with requirement statements

Stack Part	Chosen Option	Alternative	Why Chosen for This Project	Practical Benefit
Test framework	pytest	unittest	Marker groups and fixture composition scale better for layered suites	Faster targeted runs and cleaner test organization
Async testing	pytest-asyncio	custom loop management	Native async test support without boilerplate	Lower maintenance and fewer flaky async tests
API client	httpx + ASGITransport	requests + live server	In-process execution keeps integration tests deterministic	Better speed and less CI networking variability
Auth validation	cryptography + PyJWT	static token strings	Runtime key/signature generation tests real verification paths	Higher confidence in RBAC behavior
Java structure parsing	javalang	regex parsing	Structural parsing avoids brittle text matching	More robust dependency and class extraction checks

Test Concern	Main Technology	Role
Parser and graph correctness	pytest + javalang	Validates class extraction and dependency edges
Endpoint behavior	pytest-asyncio + httpx	Exercises translate endpoints and payload contracts
RBAC and token handling	cryptography + PyJWT	Generates realistic signed JWTs for role checks
Guardrails and adversarial handling	pytest markers + fixtures	Enforces injection/secret blocking expectations

Tool	Purpose	Integration Point	Validates	Python Support	Cost Model
Klocwork (Perforce)	SAST - Security, quality, reliability	Pre-commit hooks, CI/CD pipeline	Security vulnerabilities, code defects, reliability issues	✅ Yes	Enterprise/Commercial
SonarQube	Code quality & maintainability	Post-test analysis, quality gates	Code quality, technical debt, duplication, test coverage	✅ Yes	Open-source/Commercial
Checkmarx (SAST)	Enterprise security scanning	Pipeline integration, compliance	Deep vulnerability analysis, compliance standards, OWASP	✅ Yes	Enterprise/Commercial
Coverity (Synopsys)	Deep static analysis	Build integration, incremental analysis	Memory/security issues, race conditions	✅ Yes	Enterprise/Commercial
Bandit	Python security scanning	Pre-commit, CI integration	Python security issues, hardcoding secrets	✅ Yes (Python-specific)	Open-source
ESLint/Pylint	Linting & style	Git hooks, pre-flight checks	Code style, suspicious patterns, imports	✅ Yes (Pylint)	Open-source

Tool	Purpose	Integration Point	Metrics Collected	Use Case	Cost
pytest (current)	Unit/integration test framework	Direct test runner	Pass/fail, execution time	Core test execution	Open-source
pytest-cov	Code coverage measurement	Coverage plugin, post-test	Line/branch coverage %	Verify guardrails touch all code paths	Open-source
Codecov	Coverage tracking & trending	CI upload, GitHub integration	Coverage trends, PR diffs	Long-term quality visibility	Free/Pro
Datadog	Continuous testing & monitoring	API instrumentation	Test performance, flakiness	Detect regression patterns	Commercial
LoadRunner	Performance and load testing	Scheduled pipeline stage, release gate	Response times, throughput, error rate, SLA compliance	Validate API under expected translation volume	Commercial

Tool	Purpose	How It Works	Value for This Suite	Python Support
Stryker	Mutation testing framework	Modifies code, reruns tests	Verifies tests catch real bugs	✅ Yes
PIT	Bytecode mutation (Java/JVM)	Mutates compiled bytecode	Validates our test harness quality	✅ (via JVM)

Tool	Purpose	Scans	Integration	Python Support
Snyk	Dependency vulnerability scanning	requirements.txt, package manifests	Pre-commit, PR checks, CI	✅ Yes
OWASP Dependency-Check	Known vulnerability database	Dependencies, transitive	CLI, Maven/Gradle, CI	✅ Yes
Black Duck (Synopsys)	License/composition analysis	Codebases, dependencies	CI pipeline, compliance	✅ Yes
pip-audit	Python package auditing	pip requirements	GitHub Actions, pre-commit	✅ Yes (Python-specific)

Tool	Function	Integration	Traceability	Compliance
Azure DevOps Test Plans	Requirements↔Tests mapping	Work items, test suites	Bi-directional links	CMMI/ISO ready
Jira Xray	Test management within Jira	Issues, test runs, coverage	Requirement→Test→Result	Regulatory (FDA, etc.)
TestRail	Standalone test management	API, CI integration	Test case traceability	SOC 2, HIPAA compatible
ReqIF Editor	Requirements interchange format	File-based traceability	Spec→Design→Test	Automotive (ASIL) standard

Pipeline Stage	Tool Category	Recommended Tool	What It Checks
Pre-commit	Linting + Security	Bandit, Pylint, Pre-commit hooks	Fast rejection of obvious issues
Build	Static Analysis	Klocwork, SonarQube scanner	Deep security & quality analysis
Test	Execution + Coverage	pytest + pytest-cov	Functional correctness, coverage %
Mutation	Test Quality	Stryker or PIT	Are tests strong enough?
Dependency Scan	Supply Chain	Snyk + pip-audit	Known vulnerabilities in deps
Compliance	Reporting	SonarQube/Checkmarx dashboards	Meet quality gates, audit trail

Endpoint	Suggested LoadRunner Transaction	Default SLA	Primary Assertion	Current Project Hook
`/api/v1/translate`	`translate`	250 ms	Median and p95 stay within SLA	Audit log writes `loadrunner` transaction summary
`/api/v1/translate-project`	`translate_project`	500 ms	Multi-file requests stay below release threshold	Audit log writes per-request performance budget status
`/api/v1/translate-requirements`	`translate_requirements`	250 ms	Requirements scaffolding stays responsive	Audit log writes Six Sigma-style CTQ metrics

Dashboard Section	Aggregates	Why It Matters For Release Decisions
`summary`	Total requests, ok requests, blocked requests, unique actions	Quick go/no-go snapshot
`actions`	Per-endpoint request count, average latency, p95 latency, LoadRunner pass rate	Shows which endpoint is drifting
`performance`	Global average latency, p95 latency, performance status counts	Highlights SLA breaches and warning trends
`quality`	CTQ pass rates, average DPMO, sigma-band counts, control-state counts	Converts raw audit events into process-quality signals

Algorithm / Technique	What It Does	Where It Appears In This Project	Why It Improves Confidence
Topological sorting (Kahn)	Orders dependent nodes safely	`tools/project_translator.py`, `tests/unit/test_topological_sort.py`	Prevents subclass-before-base translation defects
Boundary value analysis	Hits min/max and edge inputs	`tests/adversarial/test_boundary_conditions.py`	Finds off-by-one and empty-input failures quickly
Equivalence partitioning	Tests one representative per input class	Guardrail and malformed-input tests	Keeps coverage broad without exploding test count
Decision-table testing	Covers combinations of conditions and outcomes	RBAC and forbidden-pattern tests	Ensures policy combinations do not create gaps
State-transition testing	Verifies behavior across state changes	Audit trail blocked/allowed request scenarios	Confirms system reacts correctly as request status changes
Cycle detection	Detects unsortable dependency graphs	`tests/adversarial/test_circular_dependencies.py`	Verifies graceful degradation on invalid project graphs
Mutation testing	Injects fake bugs to measure test strength	Documented via `mutmut` / Stryker integration path	Confirms tests fail when logic is wrong
Load testing	Measures latency and throughput under concurrency	LoadRunner integration and audit metrics	Protects release readiness under realistic traffic
Risk-based prioritization	Focuses effort on highest-risk paths	Negative, adversarial, and auth tests	Keeps security-critical paths heavily defended
Pairwise / combinatorial sampling	Reduces huge input combinations to meaningful pairs	Recommended next step for API option matrices	Expands coverage efficiently for future input flags

Six Sigma Idea	Meaning In Plain Terms	Project Implementation	Evidence / Metric
CTQ (Critical to Quality)	The small set of outcomes that must go right	Audit records now track latency, reliability, safety, traceability	`ctq_metrics` in audit log
DMAIC	Define, Measure, Analyze, Improve, Control loop	README traceability + tests + audit metrics + quality gates	Requirements tables, tests, and audit trail
DPMO	Defects per million opportunities	Quality snapshot computes DPMO per request	`six_sigma.dpmo` in audit log
Control state	Is the process stable or drifting?	Requests classified as `in_control`, `watch`, or `out_of_control`	`six_sigma.control_state`
Performance control limits	Expected latency window before escalation	Per-endpoint SLA budgets in env and audit metrics	`performance_budget_ms`, `performance_status`
FMEA mindset	Rank likely failures before release	Negative/adversarial suites focus on auth, injection, model lock, egress	Security-focused test groups
Voice of customer / CTQ translation	Convert user needs into measurable gates	README requirement tables map behavior to tests and tooling	Traceability matrices
Continuous improvement	Use data from each run to tighten the process	Audit + coverage + static analysis + performance gates	CI pipeline and audit summaries

Algorithm	What It Is	Most Common Use	Why It Should Be Used	How It Helps This Project	When Not To Use
Kahn's (implemented)	In-degree based topological sort for DAGs	Build order resolution, dependency scheduling	Deterministic ordering and clear cycle detection when no zero in-degree node remains	Already used to order Java classes before translation so base classes are processed before dependents	Not for weighted path problems or graphs that are not DAG-like
Tarjan's SCC	One-pass DFS algorithm that finds all strongly connected components	Cycle grouping in directed graphs, compilers, package analyzers	Linear-time cycle group discovery and reverse-topological SCC output	Can report all dependency cycles at once with grouped diagnostics for project translation failures	Not needed for tiny graphs where simple cycle-exists checks are enough
Kosaraju's SCC	Two-pass DFS SCC algorithm over graph and reversed graph	SCC extraction when implementation simplicity is preferred	Easy to reason about and verify for correctness	Alternate SCC implementation for cross-validating cycle group results from Tarjan	Less ideal when memory access to reverse graph is costly or graph is streaming
DFS/BFS	Fundamental graph traversals for depth or level exploration	Reachability, component discovery, shortest unweighted paths (BFS)	Foundational and fast, useful in almost every graph pipeline	DFS supports dependency walk and cycle heuristics; BFS can identify translation batches by level	Not enough alone when you need weighted optimization, SCC grouping, or formal ordering guarantees
Dijkstra	Shortest-path algorithm for non-negative weighted graphs	Routing, minimum cost path, critical path scoring	Finds best path under weighted constraints efficiently	Can prioritize translation sequence by cost/risk weights (complexity, blast radius, module criticality)	Not for negative edge weights, where Bellman-Ford style methods are required
Floyd-Warshall	Dynamic programming for all-pairs shortest paths	Dense graph all-pairs analysis, transitive reachability	Gives full matrix visibility into every pair relationship	Useful for full dependency impact maps and change blast-radius analysis	Avoid on large sparse graphs due to cubic cost
Union-Find	Disjoint-set structure with union/find operations	Connectivity checks, incremental grouping, Kruskal-like workflows	Very fast near constant-time merges and membership checks	Can speed incremental dependency ingestion and fast connectivity sanity checks before deeper analysis	Not suitable for directed SCC semantics or ordered traversal outputs

Algorithm	What It Is	Most Common Use	Why It Should Be Used	How It Helps This Project	When Not To Use
AST Traversal (implemented)	Tree walk over parsed syntax nodes	Compilers, linters, refactoring, static analyzers	Preserves structural meaning better than regex parsing	Already powers Java structure extraction for classes/imports/method signatures	Not for runtime behavior reasoning without control/data flow context
Tree Edit Distance (Zhang-Shasha)	Minimum edit cost between two trees	AST diffing, clone analysis, migration similarity checks	Captures structural differences not visible in plain text diff	Can score Java vs translated Python AST fidelity for stronger parity evidence	Avoid for very large trees in hot paths due to higher compute cost
CFG	Graph model of possible execution paths in a function/method	Dead code detection, path analysis, coverage planning	Exposes branch structure and reachability explicitly	Can verify translated Python keeps equivalent branch reachability vs Java	Not needed for simple straight-line code with no branching
Data-Flow Analysis	Tracks definitions, uses, and propagation of values/types	Compiler optimization, bug finding, security checks	Detects misuse and propagation mistakes early	Can validate Java type/variable semantics survive mapping into Python	Avoid when analysis precision cost exceeds value for trivial modules
Program Slicing	Extracts statements relevant to a variable/output criterion	Debugging, comprehension, targeted verification	Reduces analysis scope and noise	Isolates only code affecting a translated output to speed parity root-cause analysis	Not ideal when holistic system interactions are the real issue
Taint Analysis (implemented conceptually)	Marks untrusted input and tracks flow to sensitive sinks	Security validation, injection prevention	Directly maps to security risk pathways	Supports guardrail hardening by tracing untrusted request data through translation pipeline	Not useful when all inputs are already trusted and isolated
Hindley-Milner Type Inference	Unification-based static type inference	Functional languages, inferred typing systems	Improves correctness with less manual annotation	Could auto-suggest Python type hints from Java source semantics	Not a fit where dynamic/runtime types dominate behavior
Abstract Interpretation	Sound approximation of program states over abstract domains	Static verification and bug class elimination	Can prove classes of errors without executing code	Can add formal assurance on translated output safety properties	Avoid where exact concrete behavior is mandatory and approximation is too coarse

Algorithm	What It Is	Most Common Use	Why It Should Be Used	How It Helps This Project	When Not To Use
Aho-Corasick	Trie + failure-link automaton for multi-pattern search	IDS signatures, malware scanning, keyword dictionaries	Finds all patterns in one pass efficiently	Can replace sequential guardrail regex checks with one multi-pattern scanner for injection/secrets	Not ideal for complex contextual patterns better handled by full parsers or regex engines
Rabin-Karp	Rolling-hash string matching approach	Plagiarism/clone detection, multiple substring checks	Fast average matching and convenient window hashing	Can detect repeated risky snippets or clone patterns across translated outputs	Avoid when hash collision handling overhead or exact single-pattern speed is critical
Boyer-Moore	Heuristic skip-based exact pattern matcher	Fast exact search in large text	Often sublinear average performance for single pattern	Useful for fast scanning of one high-priority forbidden token/signature	Not for many patterns at once; Aho-Corasick is better there
Bloom Filter	Probabilistic membership structure with false positives only	Caching, prefiltering, dedupe prechecks	Very memory-efficient and fast precheck stage	Can fast-reject obviously safe payloads before expensive deep scans	Not for workflows requiring zero false positives and exact membership
Levenshtein Distance	Edit-distance metric between strings	Fuzzy matching, near-duplicate detection, typo tolerance	Quantifies similarity robustly	Can score translation drift and flag suspiciously divergent output from expected behavior/text	Avoid for strict semantic equivalence judgments without structural context

Algorithm	What It Is	Most Common Use	Why It Should Be Used	How It Helps This Project	When Not To Use
Model Checking	Exhaustive state-space verification against temporal properties	Protocol verification, safety-critical policy checks	Finds counterexamples rigorously	Can prove RBAC and policy-lock invariants over request state transitions	Avoid for very large unconstrained state spaces without abstraction
Symbolic Execution	Executes paths with symbolic values and constraints	Path discovery, bug finding, test generation	Reaches edge paths hard to hit with manual tests	Can generate adversarial API vectors to stress translation and guardrails	Not ideal when path explosion makes runtime impractical
Concolic Testing	Concrete execution guided by symbolic constraints	Automated test input generation	Practical compromise between full symbolic and random testing	Can expand coverage for translation endpoints with targeted boundary/path inputs	Avoid when harness constraints are too expensive to maintain
Hoare Logic	Pre/postcondition proof framework for program correctness	Formal specs and proof-oriented correctness	Sharp contractual reasoning around invariants	Can specify and verify required behavior for dependency ordering and policy checks	Not needed where lightweight testing already provides enough assurance
Property-Based Testing	Randomized input generation checked against invariants	Invariant testing and edge-case exploration	Finds surprising cases that example-based tests miss	Can stress graph ordering and parity invariants over large random input spaces	Avoid when properties are weakly defined or nondeterministic outputs are expected

Algorithm	What It Is	Most Common Use	Why It Should Be Used	How It Helps This Project	When Not To Use
McCabe Cyclomatic Complexity	Branch/path complexity metric from control flow	Test planning and maintainability risk scoring	Correlates complexity with defect and testing effort	Can drive risk-based test intensity on translated functions/classes	Not as a sole quality signal without context
Halstead Metrics	Operator/operand based software volume and effort metrics	Productivity and maintainability analysis	Gives a language-agnostic complexity lens	Can compare source vs translated code inflation and detect complexity bloat	Avoid as hard pass/fail gates in isolation
Maintainability Index	Composite maintainability score from complexity/volume/LOC	Portfolio-level code health tracking	Easy high-level signal for triage	Can prioritize translated files for manual review when score degrades	Not reliable for very small files or generated code alone
Fan-In/Fan-Out	Counts inbound and outbound dependency edges	Architecture coupling analysis	Highlights hotspots and blast-radius risk	Can prioritize high fan-in classes for stricter parity and regression checks	Not needed for tiny low-coupling modules

Algorithm	What It Is	Most Common Use	Why It Should Be Used	How It Helps This Project	When Not To Use
Shewhart Control Charts (implemented baseline)	Control limits over time-series process metrics	Manufacturing and ops stability monitoring	Fast detection of obvious out-of-control behavior	Already aligns to audit control-state tracking for latency/quality drift	Less sensitive to small gradual drifts
CUSUM	Cumulative drift detector versus target mean	Early shift detection in process monitoring	Detects subtle persistent changes earlier than Shewhart	Can alert on slow latency degradation before SLA breach	Not for highly non-stationary streams without segmentation
EWMA	Exponentially weighted moving average trend estimator	Smoothed monitoring and anomaly trend tracking	Balances noise reduction with responsiveness	Can provide cleaner quality/latency trendlines in audit dashboards	Avoid if abrupt shifts are the only concern and lag is unacceptable
Z-Score Anomaly Detection	Standard deviation based outlier scoring	Basic anomaly and quality outlier flags	Simple, interpretable, low implementation cost	Can flag suspicious request records for investigation in near real-time	Not for heavy-tailed or non-Gaussian distributions without robust variants
Isolation Forest	Tree-ensemble unsupervised anomaly detector	Fraud, operations anomalies, multivariate outliers	Captures nonlinear multivariate anomalies well	Can detect odd combinations of role, latency, block-rate, and payload characteristics	Avoid for tiny datasets where model instability is high
Bayesian Inference	Posterior probability updating with evidence	Risk forecasting, decision support under uncertainty	Integrates prior knowledge and new evidence rigorously	Can estimate release risk from test outcomes plus historical defects	Not needed when deterministic thresholds are sufficient
Fisher's Exact Test	Exact significance test for contingency tables	Small sample proportion comparisons	Reliable p-values for low-count events	Can test whether blocked-request spikes are statistically significant	Avoid for large-sample cases where simpler approximations are fine

Folders and files

Latest commit

History

Repository files navigation

Java to Python Test Suite

Executive Summary

Why This Stack, How It Is Used, and Benefits

Table of Contents

Overview

Requirements to Validation Mapping

Requirements Verification and Validation

V&V Strategy Matrix

Verification Pipeline

Validation Acceptance Gates

Traceability Notes

Architecture

Object Model

Dependency Graph and Topological Sort

Why Kahn's Algorithm Matters Here

What is Kahn's Algorithm? (Layman's Explanation)

How Kahn's Algorithm Works (Step by Step)

Algorithm Pseudo-Code

Real-World Code Example from This Project

Why Not Just Random Order?

High-level behavior (The original formulation):

Visualization as a Verification Tool

Technology Stack Decision Matrix

Testing & Quality Assurance Tool Integration Matrix

Static Code Analysis Tools (Top-to-Bottom Requirements Verification)

Test Execution & Measurement Tools

Mutation Testing (Test Quality Verification)

Dependency & Supply Chain Security

Requirements Verification & Traceability Tools

DevOps & CI/CD Integration Points

Top-to-Bottom Requirements Verification Example Flow

Integration Implementation Patterns

1. Code Coverage with pytest-cov

2. Security Scanning with Bandit (Lightweight Pre-commit)

3. Dependency Vulnerability Scanning

4. Static Code Quality with SonarQube (Optional, Enterprise)

5. Mutation Testing with Stryker (Test Validation)

6. Compliance Reporting & Traceability (Regulated Environments)

7. LoadRunner Performance Integration

7.1 Release Dashboard Endpoint

8. CI/CD Pipeline with All Tools (Complete Setup)

Testing Algorithm Matrix

Six Sigma and Process Quality Matrix

Scientific and Computer Science Algorithm Catalog

Graph Theory and Dependency Analysis Algorithms

Code Analysis and Transformation

Pattern Matching and Security

Formal Verification and Correctness

Software Metrics

Audit and Statistical Process Control

Test Coverage and Combinatorial

Legacy Java-to-Python Function Parity (Proof Tests)

What Is A Vector, Vectoring, And A Vector Runner?

Shared Vector Baseline (Single Source Of Truth)

Tools That Already Support This Testing Pattern

Zero-Trust Solutions Matrix

Requirements-to-Implementation Mapping

Tool Compliance for Top Secret SCI/SCIF Regulated Environments

Core Runtime & Test Framework Dependencies

Static Analysis & Security Scanning Tools (SAST/SCA)

Testing & Performance Measurement Tools

Requirements Traceability & Test Management Tools

DevOps & Continuous Integration (CI/CD)

Summary: Compliance Status by Category

Deployment Guidelines for SCI/SCIF Environments

Migration Recommendations

Setup and Installation

Usage

Roadmap

Contributing

License

About

Topics

Resources

License

Code of conduct

Packages

Algorithm	What It Is	Most Common Use	Why It Should Be Used	How It Helps This Project	When Not To Use
IPOG	Covering-array generator for t-way combinations	Combinatorial API/config test design	Large coverage gains with far fewer cases than full Cartesian products	Can systematically cover role x endpoint x payload combinations with manageable test counts	Not necessary for very small parameter spaces
MC/DC Coverage	Criterion requiring each condition independently affect outcome	Safety-critical software verification	Strong decision-logic assurance with efficient test sets	Can harden guardrail and RBAC condition logic validation	Avoid as universal requirement for low-risk modules due to overhead
Coverage-Guided Fuzzing	Mutation fuzzing guided by code coverage feedback	Security hardening and crash discovery	Efficiently discovers deep parser/validation edge cases	Can stress translation endpoints with malformed/adversarial Java inputs	Not ideal where deterministic reproducibility and strict runtime budgets dominate
N-version/Differential Testing	Compare outputs across independent implementations	Compiler/runtime verification and migration confidence	Great at finding semantic mismatches	Can compare legacy Java oracle against translated Python outputs continuously	Not useful if all compared implementations share same defect source

Proof Test	What It Verifies	Location
Java fixture expected-value test	Legacy Java behavior is stable and explicit	`tests/correctness/test_legacy_java_python_equivalence.py`
Python fixture expected-value test	Translated Python behavior matches intended outputs	`tests/correctness/test_legacy_java_python_equivalence.py`
Cross-language equivalence test	Java output == Python output for the same inputs	`tests/correctness/test_legacy_java_python_equivalence.py`

Asset	Runtime Consumer	Purpose	Status
`legacy_calculator_vectors.json`	Python parity tests + Java vector runner	Canonical vector source (id, input, expected)	Implemented
`legacy_calculator_vectors.csv`	Optional import/export interoperability	Spreadsheet-friendly mirror for manual review	Implemented
`LegacyCalculatorVectorRunner.java`	Java runtime	Reads shared JSON vectors and evaluates legacy function	Implemented
`test_legacy_java_python_equivalence.py`	pytest	Parameterized cross-runtime parity assertions	Implemented

Tool	How It Helps With Java-to-Python Parity	Typical Use
`pytest` parameterized tests	Reuse the same vectors for both runtimes	Core parity assertions (implemented)
JUnit 5 parameterized tests	Capture legacy Java oracle outputs	Legacy baseline generation (recommended next)
ApprovalTests	Golden-master snapshot comparisons	Regression lock for legacy outputs (recommended next)
JSON/CSV test vectors	Runtime-agnostic shared inputs/outputs	Single source of truth for parity data (implemented)
Testcontainers	Reproducible Java runtime execution	Stable local runtime parity in isolated containers (recommended next)

Zero-Trust Control	What It Means	Project Implementation	Evidence
Verify identity on every request	No implicit trust by network location	JWT verification + RBAC dependency checks in API routes	`tests/negative/test_rbac_enforcement.py`
Explicit policy decision per request	Each request must be allow/deny evaluated	Input guardrails, model lock, egress policy lock, blocked audit path	`tests/negative/test_model_blocking.py`, `tests/negative/test_egress_blocking.py`, `tests/adversarial/test_prompt_injection.py`
Least privilege access	Users only get required capabilities	Role-permission mapping with permission-scoped endpoints	`core/auth.py`, `tests/negative/test_rbac_enforcement.py`
Continuous verification	Runtime signals prove controls remain active	Audit report includes zero-trust rates, quality attestations, deny rate	`/api/v1/audit-report` zero-trust section
Assume breach + contain blast radius	Treat unsafe inputs as hostile by default	Block injection/secret payloads and sanitize audit records	`guardrails/input_guard.py`, `guardrails/output_guard.py`, `tests/integration/test_audit_trail.py`

Requirement (README)	Test (pytest)	Security Check	Quality Gate	Coverage
"Guarantee base-before-subclass order"	test_topological_sort.py (16 tests)	Klocwork scan	SonarQube: no high issues	95%+ on project_translator.py
"Detect circular dependencies"	test_circular_dependencies.py (4 tests)	Bandit: no unsafe loops	No tech debt on Kahn logic	100% cycle path
"Block injection patterns"	test_prompt_injection.py (5 tests)	Klocwork CWE-89, CWE-95	SonarQube security hotspots	100% on input_guard patterns
"Redact secrets from output"	test_forbidden_patterns.py (4 tests)	Bandit hardcoding check	No credential leak in logs	100% on output_guard.redact()
"Enforce RBAC via JWT"	test_rbac_enforcement.py (4 tests)	Checkmarx token validation	Crypto best practices	100% on auth.py verify_token
"Policy lock for models/egress"	test_model_blocking.py (3 tests)	Klocwork: whitelist bypass	No bypass paths	100% on provider_lock.py

Marker Group	Purpose	Key Benefit
unit	Algorithmic correctness for parsing/graph/order	Fast feedback on core logic
integration	API request/response and contract validation	Catches wiring and schema regressions
correctness	Python output structure and signature quality	Protects translation fidelity
negative	Policy and access-control enforcement	Prevents unsafe execution paths
adversarial	Injection and malformed input hardening	Reduces attack-surface risk

Tool	Version	Purpose	SCI/SCIF Status	Restrictions/Notes
Python	3.11+	Runtime interpreter	🟢 APPROVED	Open-source, widely used in government. Requires system-level deployment controls.
pytest	8.0+	Test framework	🟢 APPROVED	Open-source, MIT license. Standard in Python security testing. No external data transmission.
pytest-asyncio	0.23+	Async test support	🟢 APPROVED	Open-source, BSD license. Minimal attack surface.
httpx	0.27+	HTTP client for API testing	🟢 APPROVED	Open-source, BSD license. Used for in-process API testing only (no external calls).
FastAPI	0.136+	Web framework	🟡 CONDITIONAL	Open-source, MIT license. Requires hardened deployment configuration for SCI. Ensure all dependencies are audited. On-prem deployment only.
cryptography	42.0+	Cryptographic library	🟢 APPROVED	Open-source, dual Apache/BSD license. NIST-standard algorithms. Actively maintained.
PyJWT	2.8+	JWT signing/verification	🟢 APPROVED	Open-source, MIT license. Minimal, focused functionality.
Pydantic	2.9+	Data validation	🟢 APPROVED	Open-source, MIT license. No external validation calls. Widely adopted in security projects.
javalang	0.13+	Java parser	🟢 APPROVED	Open-source, BSD license. Local parsing only, no network access.

Tool	Purpose	SCI/SCIF Status	Restrictions/Notes	Recommended?
Klocwork (Perforce)	SAST - vulnerabilities, code quality	🟢 APPROVED	Enterprise tool explicitly used by aerospace/defense. ISO 27001 certified. TÜV-SÜD certified. Commercial license required.	✅ YES - Preferred for classified environments
SonarQube	Code quality & maintainability	🟡 CONDITIONAL	On-prem deployment: APPROVED. Cloud (SonarCloud): NOT APPROVED. Requires air-gapped or internal-only instance.	⚠️ On-prem only
Checkmarx (SAST)	Enterprise vulnerability scanning	🟢 APPROVED	Explicitly targets government/defense. Supports on-prem. Commercial license required.	✅ YES - Enterprise-grade SAST
Coverity (Synopsys)	Deep static analysis	🟢 APPROVED	Defense/aerospace standard tool. Commercial license required. Supports on-prem deployment.	✅ YES - Advanced static analysis
Bandit	Python-specific security scanning	🟢 APPROVED	Open-source, Apache 2.0 license. Lightweight, local execution only.	✅ YES - Lightweight pre-commit check
Pylint	Python linting & style	🟢 APPROVED	Open-source, GPL license. No external calls. Standard in Python ecosystem.	✅ YES - Pre-commit linting
pip-audit	Python dependency vulnerability scanning	🟢 APPROVED	Open-source, MIT license. Local scanning, no remote calls by default.	✅ YES - Lightweight dependency audit
OWASP Dependency-Check	Dependency vulnerability scanner	🟢 APPROVED	Open-source, Apache 2.0 license. Can run air-gapped with offline DB.	✅ YES - Comprehensive SCA
Black Duck (Synopsys)	License & composition analysis	🟡 CONDITIONAL	Commercial tool with on-prem option. Requires licensing agreement for classified use.	⚠️ On-prem with variance
Snyk	Dependency scanning SaaS	🔴 NOT APPROVED	Cloud-based SaaS. Data transmission to external service prohibited for SCI. Unapproved for classified use.	❌ NO - Do not use

Tool	Purpose	SCI/SCIF Status	Restrictions/Notes	Recommended?
TestRail	Test management & traceability	🟡 CONDITIONAL	Self-hosted/on-prem: APPROVED with proper security controls. Cloud version: NOT APPROVED. Proprietary, commercial license.	⚠️ On-prem with security review
Jira Xray	Test management within Jira	🟡 CONDITIONAL	On-prem Jira: APPROVED. Cloud Jira: NOT APPROVED for SCI data. Proprietary plugin, commercial license.	⚠️ On-prem only
Azure DevOps Test Plans	Requirements & test traceability	🟡 CONDITIONAL	On-prem: APPROVED (requires Azure DevOps Server). Cloud (azure.com): NOT APPROVED for SCI.	⚠️ On-prem only
ReqIF Editor	Requirements interchange format	🟢 APPROVED	Open-source, EPL license. Local file-based tool, no external connections.	✅ YES - For requirements management