Verification-first test infrastructure for secure, dependency-aware Java to Python translation services.
This project uses a Python + pytest stack because it maximizes test expressiveness, async API coverage, and security-focused validation in one cohesive framework. The suite is intentionally built around comparison and traceability: each major requirement is represented in marker groups, assertion patterns, dependency ordering tests, and visual models.
| Technology | Why Used | How Used in This Suite | Benefit Over Alternatives |
|---|---|---|---|
| Python 3.11+ | Fast iteration and excellent testing ecosystem | Executes all test layers and fixture logic | Lower friction than Java/JUnit for mixed async + security test authoring |
| pytest | Marker-based structure and fixture system | Separates unit/integration/correctness/negative/adversarial pipelines | Better parametrization and fixture ergonomics than unittest |
| pytest-asyncio | Native async compatibility | Runs async endpoint tests without custom event-loop wrappers | Cleaner than ad-hoc loop management |
| httpx + ASGITransport | In-process API contract testing | Calls API endpoints with dependency overrides and mock backends | Faster and more deterministic than external server + requests |
| cryptography + PyJWT | Realistic auth-path verification | Generates RSA keys and signs test JWTs at runtime | Stronger coverage than static token-only tests |
| javalang | Java structure awareness in validation workflows | Supports parser-oriented assertions in unit tests | More reliable than regex-only Java parsing checks |
Important
The suite verifies not only correctness, but also translation safety and dependency order requirements, including base-class-before-subclass guarantees through topological sorting tests.
- Executive Summary
- Overview
- Requirements to Validation Mapping
- Requirements Verification and Validation
- Architecture
- Object Model
- Dependency Graph and Topological Sort
- Why Kahn's Algorithm Matters Here
- Visualization as a Verification Tool
- Technology Stack Decision Matrix
- Test Suite Breakdown
- Scientific and Computer Science Algorithm Catalog
- Tool Compliance for Top Secret SCI/SCIF Regulated Environments
- Setup and Installation
- Usage
- Roadmap
- Contributing
- License
This repository is a dedicated test harness for a Java-to-Python translation service. It validates parser behavior, method/type fidelity, API contract integrity, authorization controls, guardrail enforcement, and adversarial resilience. It is designed for teams that need reproducible quality and security checks before releasing translation features.
Important
The suite assumes an external orchestrator source path and environment variables are available as configured in conftest.py.
Core value for this project:
- Confirms required behavior with explicit assertions (not heuristic checks only).
- Compares expected ordering and output properties against actual responses.
- Detects failures in dependency ordering and cycle handling early.
- Verifies that translation order favors reusable base components before dependents.
| Requirement | Implementation Focus | Evidence in Test Suite | Outcome Verified |
|---|---|---|---|
| Parse Java artifacts safely | Parser and class-info extraction paths | tests/unit/test_java_parsing.py |
AST/data extraction is stable for normal and malformed inputs |
| Build dependency graph correctly | Intra-project edge construction | tests/unit/test_dependency_graph.py |
No self-loops, no JDK noise, valid class map |
| Sort translation order by dependency | Topological ordering logic | tests/unit/test_topological_sort.py |
Dependencies appear before dependent classes |
| Translate base classes before subclasses | Ordering invariant in project translation plan | tests/unit/test_topological_sort.py and tests/integration/test_project_translate_api.py |
Base abstractions precede concrete subclasses/services |
| Detect cycles without dropping files | Cycle fallback behavior | tests/adversarial/test_circular_dependencies.py and unit cycle tests |
had_cycle is true and all files remain represented |
| Block unsafe or manipulative input | Input guardrails | tests/adversarial/test_prompt_injection.py and tests/unit/test_guardrails.py |
Injection/secret patterns rejected before model path |
| Enforce RBAC and policy boundaries | JWT + permission checks | tests/negative/test_rbac_enforcement.py |
Unauthorized roles/actions are denied |
This suite applies both verification and validation:
- Verification asks: are we building the system right against explicit requirements?
- Validation asks: are we building the right behavior for secure translation operations?
| Requirement Area | Verification Method | Validation Method | Pass Criteria | Primary Evidence |
|---|---|---|---|---|
| Dependency graph correctness | Unit assertions on graph edges and node invariants | Integration checks of API dependency output | No self-loops, no missing files, dependency-first order | tests/unit/test_dependency_graph.py, tests/integration/test_project_translate_api.py |
| Topological ordering (base before subclass) | Unit invariant checks for order index relationships | Project-level translate API response order checks | For every edge A depends on B, index(B) < index(A) | tests/unit/test_topological_sort.py, tests/integration/test_project_translate_api.py |
| Cycle detection robustness | Unit and adversarial cycle test scenarios | End-to-end circular project request handling | had_cycle true on cyclic input, all files retained in output |
tests/adversarial/test_circular_dependencies.py, tests/unit/test_topological_sort.py |
| Security guardrails | Unit and adversarial pattern blocking tests | API-level blocked request behavior checks | Injection and credential patterns rejected before unsafe processing | tests/unit/test_guardrails.py, tests/adversarial/test_prompt_injection.py |
| RBAC and auth correctness | Negative role/permission tests | Unauthorized API paths return denied responses | Role permissions enforced with no privilege escalation | tests/negative/test_rbac_enforcement.py, integration auth tests |
| Output structure fidelity | Correctness tests over syntax/import/signatures | Workflow-level usage consistency checks | Outputs remain parseable and structurally aligned to expectations | tests/correctness/*.py |
flowchart TD
A[Requirements] --> B[Unit Verification]
B --> C[Integration Verification]
C --> D[Negative and Adversarial Verification]
D --> E[Validation Against Runtime Behaviors]
E --> F[Release Confidence Decision]
| Gate | Scope | Command Pattern | Minimum Acceptance |
|---|---|---|---|
| Gate 1 | Core logic verification | pytest -m unit -q |
All dependency/order/parser tests pass |
| Gate 2 | API contract verification | pytest -m integration -q |
Endpoint contract fields and ordering checks pass |
| Gate 3 | Security validation | pytest -m negative -q && pytest -m adversarial -q |
RBAC, injection, and egress/model policy checks pass |
| Gate 4 | Output quality validation | pytest -m correctness -q |
Output syntax/structure/import quality checks pass |
| Gate 5 | Full-system confidence | pytest -q |
No regressions across all marker groups |
- Requirement-to-test traceability is explicit through marker groups and targeted modules.
- Visualization-to-requirement traceability is captured by architecture, object model, and dependency diagrams.
- Algorithm-to-requirement traceability is captured by Kahn ordering assertions that enforce base-before-subclass translation.
Note
V&V is strongest when failures are triaged by marker group first, then by requirement area, so remediation stays requirement-focused rather than only test-focused.
flowchart LR
A[Fixture Corpus: Java Inputs] --> B[Pytest Marker Groups]
B --> C[Unit Validations]
B --> D[Integration Endpoint Contracts]
B --> E[Negative Security and RBAC]
B --> F[Adversarial Guardrail Tests]
C --> G[Dependency Graph + Topological Sort Validation]
D --> H[Translate and Translate-Project API Behavior]
E --> H
F --> H
G --> I[Confidence in Ordering and Requirements]
H --> I
Architecture intent:
- Marker groups isolate concerns so each risk area is testable independently.
- Unit tests validate deterministic algorithmic behavior (graph and order).
- Integration tests confirm API contract fields like
dependency_orderandhad_cycle. - Security suites ensure unsafe requests fail fast and auditable paths stay intact.
classDiagram
class FileEntry {
+string filename
+string source
+class_info
+set dependencies
+int order
}
class ProjectTranslationPlan {
+list ordered_files
+map class_map
+bool had_cycle
}
class JavaClassInfo {
+string name
+bool is_interface
+bool is_abstract
+set imports
+set methods
}
ProjectTranslationPlan "1" --> "many" FileEntry : contains
FileEntry --> JavaClassInfo : parsed_from
How this model helps:
- Makes ordering state explicit (
order,dependencies,had_cycle). - Supports comparison between parsed structure and output expectations.
- Enables requirement-level assertions that are easy to reason about in tests.
The translation planner builds a directed dependency graph where each node is a class/file and edges represent prerequisite relationships (for example, subclass depends on base class).
flowchart TD
A[AbstractProcessor] --> B[PaymentProcessor]
C[Order] --> D[OrderService]
E[IRepository] --> F[OrderRepository]
C --> F
The expected translation order is dependency-first:
- Base abstractions and interfaces.
- Core domain models.
- Concrete implementations and services.
This is why tests verify examples such as Order before OrderService and AbstractProcessor before PaymentProcessor.
Dependency ordering checkpoints used by the suite
| Ordering Check | Why It Matters | Test Evidence |
|---|---|---|
Order before OrderService |
Service methods require model definitions first | Unit and integration ordering assertions |
AbstractProcessor before PaymentProcessor |
Subclass translation needs base contract context | Unit topological ordering assertions |
IRepository before OrderRepository |
Interface constraints should be available before implementation | Unit topological ordering assertions |
| Cycle path still returns all files | Production robustness under imperfect source graphs | Circular dependency adversarial/unit tests |
Imagine you have a to-do list with dependencies:
- Task A: "Learn Python" (must do first)
- Task B: "Build a web app" (depends on Task A - you need Python knowledge)
- Task C: "Deploy the app" (depends on Task B - you need a working app to deploy)
You can't do Task B until Task A is done. You can't do Task C until Task B is done. Kahn's algorithm automatically figures out the correct order to do tasks when there are many interdependencies.
In our case, we have Java classes instead of tasks:
- Order.java (no dependencies - do first)
- OrderService.java (depends on Order)
- OrderRepository.java (depends on both Order and OrderService)
Kahn's algorithm ensures Order.java is translated to Python before OrderService.java, which is translated before OrderRepository.java.
Step 1: Count Prerequisites (In-Degree) For each class, count how many other classes it needs:
Order: 0 dependencies (no prerequisites)
OrderService: 1 dependency (depends on Order)
OrderRepository: 2 dependencies (depends on Order and OrderService)
Step 2: Find Classes with Zero Prerequisites Start with classes that don't depend on anything:
Queue = [Order] (has 0 dependencies)
Step 3: Process One Class at a Time
- Take Order from the queue
- Tell all classes that depend on Order: "Order is done!"
- OrderService loses one dependency (Order is now satisfied)
- OrderRepository loses one dependency (Order is now satisfied)
- Check if any class now has zero dependencies:
- OrderService: 1 - 1 = 0 dependencies left β Add to queue!
Processed = [Order]
Queue = [OrderService]
Step 4: Repeat
- Take OrderService from the queue
- Tell OrderRepository: "OrderService is done!"
- OrderRepository: 2 - 1 = 1 dependency left (still needs Order, but it's already done)
- Actually, Order was already processed, so OrderRepository should have 1 left
- But both its dependencies (Order, OrderService) are done β Add to queue!
Processed = [Order, OrderService]
Queue = [OrderRepository]
- Take OrderRepository from the queue
- No classes depend on it
Processed = [Order, OrderService, OrderRepository]
Queue = [] (empty - we're done!)
Step 5: Detect Cycles If some classes remain with unmet dependencies after processing everything, there's a circular dependency (cycle):
- A depends on B
- B depends on C
- C depends on A (creates a circle!)
These classes can't be properly ordered, but the algorithm includes them anyway so you're aware of the problem.
function KahnSort(graph):
// Count how many dependencies each node has
for each node in graph:
in_degree[node] = count of nodes it depends on
// Find who depends on whom (reverse lookup)
for each edge (A depends on B):
dependents[B].add(A)
// Start with nodes that have no dependencies
queue = [all nodes where in_degree = 0]
result = []
// Process nodes in order
while queue is not empty:
current = queue.pop()
result.add(current)
// For each node that depends on current:
for each dependent in dependents[current]:
dependent.in_degree -= 1
if dependent.in_degree = 0:
queue.add(dependent)
// Check for cycles
if result.size < graph.size:
had_cycle = TRUE
// Add remaining nodes (they're in a cycle)
result.add(remaining nodes)
return (result, had_cycle)
When we have Java files:
// Order.java
public class Order { ... }
// OrderService.java
public class OrderService {
private Order order; // depends on Order!
...
}
// OrderRepository.java
public interface OrderRepository {
Order findById(String id); // depends on Order!
}Kahn's algorithm outputs: [Order, OrderService, OrderRepository]
This guarantees:
- Order is translated first
- OrderService can reference Order class (exists in Python)
- OrderRepository can reference Order class (exists in Python)
If we translated OrderService before Order:
class OrderService:
def __init__(self):
self.order: Order # ERROR! Order not defined yet!This fails because Order doesn't exist yet. Kahn's algorithm prevents this.
- Compute in-degree for each node.
- Start with nodes that have in-degree 0 (no unmet dependencies).
- Remove processed nodes and decrement neighbors' in-degree.
- Continue until all nodes are processed.
- If nodes remain with non-zero in-degree, a cycle exists.
In this test suite, that behavior directly supports translation correctness:
- Guarantees dependency-first ordering for base classes and shared contracts.
- Prevents subclass-first generation that can create invalid imports/signatures.
- Detects cycles early while still preserving a complete output list for diagnostics.
sequenceDiagram
participant Graph as Dependency Graph
participant Kahn as Kahn Sort (In-Degree)
participant Planner as Translation Planner
Graph->>Kahn: Nodes + dependency edges
Kahn->>Kahn: 1. Compute in-degree (# dependencies per node)
Kahn->>Kahn: 2. Find nodes with in-degree = 0
Kahn->>Kahn: 3. Process each in order, decrement neighbors
Kahn->>Kahn: 4. Continue until queue empty
Kahn->>Kahn: 5. Check if nodes remain (cycle detection)
Kahn-->>Planner: dependency_order list
Kahn-->>Planner: had_cycle flag
Planner-->>Planner: translate base classes before subclasses
Tip
Kahn's approach is deterministic and testable: each assertion can verify that every dependency index is lower than its dependent index. The algorithm guarantees: if class B must be translated before class A (A depends on B), then index(B) < index(A) in the output list.
Visualizations in this README are not decorative. They reduce ambiguity when comparing implemented function behavior against requirements.
| Visualization | Confirms | Comparison Benefit |
|---|---|---|
| Architecture flowchart | End-to-end validation pipeline | Quickly spots missing validation layers |
| Object model diagram | Data structures and relationships | Confirms required fields exist for assertions |
| Dependency graph diagram | Expected dependency direction | Makes ordering mistakes obvious during review |
| Kahn sequence diagram | Algorithm steps and outputs | Aligns function behavior with requirement statements |
How this helps requirement comparison:
- Requirement text says dependency-first translation.
- Graph + sequence diagrams show exactly how dependency-first behavior is enforced.
- Unit tests then compare actual order indices to required invariants.
- Integration tests compare API
dependency_orderto expected file precedence.
| Stack Part | Chosen Option | Alternative | Why Chosen for This Project | Practical Benefit |
|---|---|---|---|---|
| Test framework | pytest | unittest | Marker groups and fixture composition scale better for layered suites | Faster targeted runs and cleaner test organization |
| Async testing | pytest-asyncio | custom loop management | Native async test support without boilerplate | Lower maintenance and fewer flaky async tests |
| API client | httpx + ASGITransport | requests + live server | In-process execution keeps integration tests deterministic | Better speed and less CI networking variability |
| Auth validation | cryptography + PyJWT | static token strings | Runtime key/signature generation tests real verification paths | Higher confidence in RBAC behavior |
| Java structure parsing | javalang | regex parsing | Structural parsing avoids brittle text matching | More robust dependency and class extraction checks |
Technology usage map by test concern
| Test Concern | Main Technology | Role |
|---|---|---|
| Parser and graph correctness | pytest + javalang | Validates class extraction and dependency edges |
| Endpoint behavior | pytest-asyncio + httpx | Exercises translate endpoints and payload contracts |
| RBAC and token handling | cryptography + PyJWT | Generates realistic signed JWTs for role checks |
| Guardrails and adversarial handling | pytest markers + fixtures | Enforces injection/secret blocking expectations |
This test suite can be enhanced through integration with specialized testing, analysis, and verification tools. Below are recommended integrations organized by capability:
| Tool | Purpose | Integration Point | Validates | Python Support | Cost Model |
|---|---|---|---|---|---|
| Klocwork (Perforce) | SAST - Security, quality, reliability | Pre-commit hooks, CI/CD pipeline | Security vulnerabilities, code defects, reliability issues | β Yes | Enterprise/Commercial |
| SonarQube | Code quality & maintainability | Post-test analysis, quality gates | Code quality, technical debt, duplication, test coverage | β Yes | Open-source/Commercial |
| Checkmarx (SAST) | Enterprise security scanning | Pipeline integration, compliance | Deep vulnerability analysis, compliance standards, OWASP | β Yes | Enterprise/Commercial |
| Coverity (Synopsys) | Deep static analysis | Build integration, incremental analysis | Memory/security issues, race conditions | β Yes | Enterprise/Commercial |
| Bandit | Python security scanning | Pre-commit, CI integration | Python security issues, hardcoding secrets | β Yes (Python-specific) | Open-source |
| ESLint/Pylint | Linting & style | Git hooks, pre-flight checks | Code style, suspicious patterns, imports | β Yes (Pylint) | Open-source |
Why multiple tools? Each excels in different domains:
- Klocwork for security-first orgs needing compliance-grade SAST
- SonarQube for quality gates and technical debt tracking
- Checkmarx when regulatory/enterprise security is primary
- Bandit/Pylint for lightweight pre-commit gating
| Tool | Purpose | Integration Point | Metrics Collected | Use Case | Cost |
|---|---|---|---|---|---|
| pytest (current) | Unit/integration test framework | Direct test runner | Pass/fail, execution time | Core test execution | Open-source |
| pytest-cov | Code coverage measurement | Coverage plugin, post-test | Line/branch coverage % | Verify guardrails touch all code paths | Open-source |
| Codecov | Coverage tracking & trending | CI upload, GitHub integration | Coverage trends, PR diffs | Long-term quality visibility | Free/Pro |
| Datadog | Continuous testing & monitoring | API instrumentation | Test performance, flakiness | Detect regression patterns | Commercial |
| LoadRunner | Performance and load testing | Scheduled pipeline stage, release gate | Response times, throughput, error rate, SLA compliance | Validate API under expected translation volume | Commercial |
Recommended first addition: pytest-cov to verify that guardrail code paths (input_guard, output_guard, provider_lock) are fully exercised.
| Tool | Purpose | How It Works | Value for This Suite | Python Support |
|---|---|---|---|---|
| Stryker | Mutation testing framework | Modifies code, reruns tests | Verifies tests catch real bugs | β Yes |
| PIT | Bytecode mutation (Java/JVM) | Mutates compiled bytecode | Validates our test harness quality | β (via JVM) |
Application to this suite: Run mutation tests on guardrails code (input_guard, output_guard, provider_lock) to ensure rejection logic is properly tested.
| Tool | Purpose | Scans | Integration | Python Support |
|---|---|---|---|---|
| Snyk | Dependency vulnerability scanning | requirements.txt, package manifests | Pre-commit, PR checks, CI | β Yes |
| OWASP Dependency-Check | Known vulnerability database | Dependencies, transitive | CLI, Maven/Gradle, CI | β Yes |
| Black Duck (Synopsys) | License/composition analysis | Codebases, dependencies | CI pipeline, compliance | β Yes |
| pip-audit | Python package auditing | pip requirements | GitHub Actions, pre-commit | β Yes (Python-specific) |
Why this matters: fastapi, pytest, javalang, and cryptography dependencies must remain secure. Snyk + pip-audit provide light/fast scanning; Black Duck for enterprise compliance.
| Tool | Function | Integration | Traceability | Compliance |
|---|---|---|---|---|
| Azure DevOps Test Plans | RequirementsβTests mapping | Work items, test suites | Bi-directional links | CMMI/ISO ready |
| Jira Xray | Test management within Jira | Issues, test runs, coverage | RequirementβTestβResult | Regulatory (FDA, etc.) |
| TestRail | Standalone test management | API, CI integration | Test case traceability | SOC 2, HIPAA compatible |
| ReqIF Editor | Requirements interchange format | File-based traceability | SpecβDesignβTest | Automotive (ASIL) standard |
Current project: README.md serves as living requirements. For regulated environments, migrate to one of above tools to create formal traceability matrix.
| Pipeline Stage | Tool Category | Recommended Tool | What It Checks |
|---|---|---|---|
| Pre-commit | Linting + Security | Bandit, Pylint, Pre-commit hooks | Fast rejection of obvious issues |
| Build | Static Analysis | Klocwork, SonarQube scanner | Deep security & quality analysis |
| Test | Execution + Coverage | pytest + pytest-cov | Functional correctness, coverage % |
| Mutation | Test Quality | Stryker or PIT | Are tests strong enough? |
| Dependency Scan | Supply Chain | Snyk + pip-audit | Known vulnerabilities in deps |
| Compliance | Reporting | SonarQube/Checkmarx dashboards | Meet quality gates, audit trail |
graph TD
A[Requirements<br/>README.md] -->|Defined as test markers| B[Test Suite<br/>387 tests]
B -->|Run on every commit| C[pytest<br/>Unit/Int/Correctness]
C -->|Coverage tracked| D[pytest-cov<br/>Code coverage %]
D -->|Trending| E[Codecov<br/>Historical view]
C -->|Mutation test| F[Stryker<br/>Test quality]
F -->|Validates| G{Tests strong<br/>enough?}
G -->|Yes| H[SonarQube<br/>Quality gates]
C -->|Security scan| I[Klocwork/Checkmarx<br/>Vulnerability detection]
I -->|Verify| J[Zero high-risk<br/>findings]
K[requirements.txt] -->|Supply chain scan| L[Snyk/pip-audit<br/>Dependency check]
H -->|Release gate| M[Deploy<br/>with confidence]
J -->|Security approval| M
L -->|No vulns found| M
style A fill:#e1f5ff
style M fill:#c8e6c9
This flow ensures:
- Requirements are explicit (README)
- Tests verify requirements (pytest suite)
- Tests are strong (mutation testing)
- Code is secure (static analysis + SAST)
- Dependencies are safe (supply chain scanning)
- Quality gates passed (SonarQube)
Add coverage measurement to verify all guardrail code is exercised:
# Run tests with coverage
pytest --cov=guardrails --cov=core --cov-report=html --cov-report=term
# Verify minimum coverage threshold
pytest --cov=guardrails --cov-fail-under=90In CI/CD (GitHub Actions example):
- name: Run tests with coverage
run: pytest --cov=guardrails --cov=core --cov-report=xml
- name: Upload coverage to Codecov
uses: codecov/codecov-action@v3
with:
files: ./coverage.xmlWhy this matters: Guardrails (input_guard.py, output_guard.py) must have zero uncovered branches to ensure all security checks are tested.
Add Python security scanning before commit:
# Install Bandit
pip install bandit
# Scan project
bandit -r guardrails/ core/ api/ tools/ -f json -o bandit-report.json
# Fail on medium+ severity
bandit -r . -ll # -ll = medium level and abovePre-commit hook (.pre-commit-config.yaml):
- repo: https://github.com/PyCQA/bandit
rev: 1.7.5
hooks:
- id: bandit
args: ['-ll'] # Medium severity minimum
exclude: tests/Focus areas: Detect hardcoded secrets, SQL injection patterns, insecure random usage in guardrails and auth modules.
Quick setup with pip-audit (Python-specific):
# Install pip-audit
pip install pip-audit
# Check dependencies
pip-audit --desc # Show vulnerability descriptions
# In CI, fail on high-severity
pip-audit --fail-on highGitHub Actions integration:
- name: Check dependencies for vulnerabilities
run: pip-audit --fail-on highCritical dependencies to monitor:
- fastapi (API framework)
- cryptography (JWT/RBAC)
- javalang (Java parsing)
- pydantic (data validation)
For organizations with SonarQube instance:
# Install SonarScanner
pip install sonarscan
# Run analysis (requires sonar.projectKey, sonar.host.url, sonar.login)
sonar-scanner \
-Dsonar.projectKey=java-to-python \
-Dsonar.host.url=https://sonarqube.company.com \
-Dsonar.login=$SONAR_TOKENQuality gate conditions:
- Coverage > 80%
- Duplicated lines < 5%
- Code smells < 10
- No critical issues
Verify that tests catch real bugs by mutating code:
# Install Stryker for Python
pip install mutmut
# Run mutation tests on guardrails
mutmut run --paths-to-mutate=guardrails
# Generate HTML report
mutmut htmlExample: Test that input_guard.py rejection logic is properly tested:
mutmut run --paths-to-mutate=guardrails/input_guard.py \
--tests-dir=tests/adversarialSuccess criteria: > 80% mutation score (tests kill > 80% of mutants)
For organizations requiring formal verification:
Current state (README-based):
README.md
βββ Requirements section
βββ Test suite breakdown
βββ Unit/Integration/Correctness/Negative/Adversarial breakdown
βββ Maps to test files
Migrate to (TestRail example):
- Create test plan in TestRail
- Link each test case to requirement ID
- Run tests via API
- Auto-generate compliance report
# Example: Link test to requirement
# TestRail API: Create test case run with requirement traceability
POST /api/v2/add_result_for_case/1/123
{
"status_id": 1, # passed
"comment": "Verifies Req-002: Dependency ordering",
"custom_requirement_id": "REQ-002"
}LoadRunner fits this project as the dedicated non-functional gate for the FastAPI endpoints:
| Endpoint | Suggested LoadRunner Transaction | Default SLA | Primary Assertion | Current Project Hook |
|---|---|---|---|---|
/api/v1/translate |
translate |
250 ms | Median and p95 stay within SLA | Audit log writes loadrunner transaction summary |
/api/v1/translate-project |
translate_project |
500 ms | Multi-file requests stay below release threshold | Audit log writes per-request performance budget status |
/api/v1/translate-requirements |
translate_requirements |
250 ms | Requirements scaffolding stays responsive | Audit log writes Six Sigma-style CTQ metrics |
This repository now exposes LoadRunner-friendly transaction metadata in audit records:
{
"action": "translate",
"latency_ms": 83.2,
"performance_budget_ms": 250,
"performance_status": "within_control",
"loadrunner": {
"transaction": "translate",
"response_time_ms": 83.2,
"sla_ms": 250,
"passed": true
}
}That makes it straightforward to compare internal audit data with external LoadRunner runs and to use the same transaction names in performance dashboards.
The service now includes a small read-only release dashboard endpoint at /api/v1/audit-report.
It aggregates the JSONL audit log into a single release-oriented summary:
| Dashboard Section | Aggregates | Why It Matters For Release Decisions |
|---|---|---|
summary |
Total requests, ok requests, blocked requests, unique actions | Quick go/no-go snapshot |
actions |
Per-endpoint request count, average latency, p95 latency, LoadRunner pass rate | Shows which endpoint is drifting |
performance |
Global average latency, p95 latency, performance status counts | Highlights SLA breaches and warning trends |
quality |
CTQ pass rates, average DPMO, sigma-band counts, control-state counts | Converts raw audit events into process-quality signals |
Example usage:
curl -H "Authorization: Bearer <token>" http://localhost:8000/api/v1/audit-reportExample response shape:
{
"summary": {
"total_requests": 24,
"ok_requests": 21,
"blocked_requests": 3,
"unique_actions": 3
},
"actions": {
"translate": {
"requests": 12,
"avg_latency_ms": 85.4,
"p95_latency_ms": 140.2,
"loadrunner_pass_rate": 1.0
}
},
"performance": {
"avg_latency_ms": 91.7,
"p95_latency_ms": 151.6,
"performance_status_counts": {
"within_control": 22,
"warning": 1,
"breach": 1
},
"loadrunner_pass_rate": 0.958
},
"quality": {
"ctq_metrics": {
"reliability": {
"pass_count": 23,
"total": 24,
"pass_rate": 0.958
}
},
"avg_dpmo": 13888.889,
"sigma_band_counts": {
"good": 20,
"watch": 4
},
"control_state_counts": {
"in_control": 21,
"watch": 2,
"out_of_control": 1
}
}
}Recommended GitHub Actions workflow:
name: End-to-End Quality & Security
on: [push, pull_request]
jobs:
quality:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
# Linting & style
- name: Lint with Pylint
run: |
pip install pylint
pylint guardrails/ core/ api/ tools/ --fail-under=9.0
# Security scanning
- name: Bandit security scan
run: |
pip install bandit
bandit -r . -ll --exclude tests/
# Dependency audit
- name: Check dependencies
run: |
pip install pip-audit
pip-audit --fail-on high
# Test execution
- name: Run tests
run: pytest --cov=guardrails --cov=core --cov-report=xml
# Performance regression gate
- name: Run LoadRunner suite
if: env.LOADRUNNER_SCENARIO_ID != ''
run: |
echo "Trigger LoadRunner scenario $LOADRUNNER_SCENARIO_ID against /api/v1 endpoints"
# Coverage upload
- name: Upload coverage
uses: codecov/codecov-action@v3
# Mutation testing (optional, slower)
- name: Mutation test guardrails
run: |
pip install mutmut
mutmut run --paths-to-mutate=guardrails --tests-dir=tests
# Quality gate (SonarQube)
- name: SonarQube analysis
if: env.SONAR_HOST_URL != ''
run: |
pip install sonarscan
sonar-scanner -Dsonar.host.url=${{ secrets.SONAR_HOST_URL }} \
-Dsonar.login=${{ secrets.SONAR_TOKEN }}| Algorithm / Technique | What It Does | Where It Appears In This Project | Why It Improves Confidence |
|---|---|---|---|
| Topological sorting (Kahn) | Orders dependent nodes safely | tools/project_translator.py, tests/unit/test_topological_sort.py |
Prevents subclass-before-base translation defects |
| Boundary value analysis | Hits min/max and edge inputs | tests/adversarial/test_boundary_conditions.py |
Finds off-by-one and empty-input failures quickly |
| Equivalence partitioning | Tests one representative per input class | Guardrail and malformed-input tests | Keeps coverage broad without exploding test count |
| Decision-table testing | Covers combinations of conditions and outcomes | RBAC and forbidden-pattern tests | Ensures policy combinations do not create gaps |
| State-transition testing | Verifies behavior across state changes | Audit trail blocked/allowed request scenarios | Confirms system reacts correctly as request status changes |
| Cycle detection | Detects unsortable dependency graphs | tests/adversarial/test_circular_dependencies.py |
Verifies graceful degradation on invalid project graphs |
| Mutation testing | Injects fake bugs to measure test strength | Documented via mutmut / Stryker integration path |
Confirms tests fail when logic is wrong |
| Load testing | Measures latency and throughput under concurrency | LoadRunner integration and audit metrics | Protects release readiness under realistic traffic |
| Risk-based prioritization | Focuses effort on highest-risk paths | Negative, adversarial, and auth tests | Keeps security-critical paths heavily defended |
| Pairwise / combinatorial sampling | Reduces huge input combinations to meaningful pairs | Recommended next step for API option matrices | Expands coverage efficiently for future input flags |
| Six Sigma Idea | Meaning In Plain Terms | Project Implementation | Evidence / Metric |
|---|---|---|---|
| CTQ (Critical to Quality) | The small set of outcomes that must go right | Audit records now track latency, reliability, safety, traceability | ctq_metrics in audit log |
| DMAIC | Define, Measure, Analyze, Improve, Control loop | README traceability + tests + audit metrics + quality gates | Requirements tables, tests, and audit trail |
| DPMO | Defects per million opportunities | Quality snapshot computes DPMO per request | six_sigma.dpmo in audit log |
| Control state | Is the process stable or drifting? | Requests classified as in_control, watch, or out_of_control |
six_sigma.control_state |
| Performance control limits | Expected latency window before escalation | Per-endpoint SLA budgets in env and audit metrics | performance_budget_ms, performance_status |
| FMEA mindset | Rank likely failures before release | Negative/adversarial suites focus on auth, injection, model lock, egress | Security-focused test groups |
| Voice of customer / CTQ translation | Convert user needs into measurable gates | README requirement tables map behavior to tests and tooling | Traceability matrices |
| Continuous improvement | Use data from each run to tighten the process | Audit + coverage + static analysis + performance gates | CI pipeline and audit summaries |
This section catalogs established computer science and mathematical algorithms that apply directly to the Java-to-Python translation pipeline, audit trail, guardrails, and quality metrics implemented in this project. Each algorithm is linked to the project area it improves.
| Algorithm | What It Is | Most Common Use | Why It Should Be Used | How It Helps This Project | When Not To Use |
|---|---|---|---|---|---|
| Kahn's (implemented) | In-degree based topological sort for DAGs | Build order resolution, dependency scheduling | Deterministic ordering and clear cycle detection when no zero in-degree node remains | Already used to order Java classes before translation so base classes are processed before dependents | Not for weighted path problems or graphs that are not DAG-like |
| Tarjan's SCC | One-pass DFS algorithm that finds all strongly connected components | Cycle grouping in directed graphs, compilers, package analyzers | Linear-time cycle group discovery and reverse-topological SCC output | Can report all dependency cycles at once with grouped diagnostics for project translation failures | Not needed for tiny graphs where simple cycle-exists checks are enough |
| Kosaraju's SCC | Two-pass DFS SCC algorithm over graph and reversed graph | SCC extraction when implementation simplicity is preferred | Easy to reason about and verify for correctness | Alternate SCC implementation for cross-validating cycle group results from Tarjan | Less ideal when memory access to reverse graph is costly or graph is streaming |
| DFS/BFS | Fundamental graph traversals for depth or level exploration | Reachability, component discovery, shortest unweighted paths (BFS) | Foundational and fast, useful in almost every graph pipeline | DFS supports dependency walk and cycle heuristics; BFS can identify translation batches by level | Not enough alone when you need weighted optimization, SCC grouping, or formal ordering guarantees |
| Dijkstra | Shortest-path algorithm for non-negative weighted graphs | Routing, minimum cost path, critical path scoring | Finds best path under weighted constraints efficiently | Can prioritize translation sequence by cost/risk weights (complexity, blast radius, module criticality) | Not for negative edge weights, where Bellman-Ford style methods are required |
| Floyd-Warshall | Dynamic programming for all-pairs shortest paths | Dense graph all-pairs analysis, transitive reachability | Gives full matrix visibility into every pair relationship | Useful for full dependency impact maps and change blast-radius analysis | Avoid on large sparse graphs due to cubic cost |
| Union-Find | Disjoint-set structure with union/find operations | Connectivity checks, incremental grouping, Kruskal-like workflows | Very fast near constant-time merges and membership checks | Can speed incremental dependency ingestion and fast connectivity sanity checks before deeper analysis | Not suitable for directed SCC semantics or ordered traversal outputs |
| Algorithm | What It Is | Most Common Use | Why It Should Be Used | How It Helps This Project | When Not To Use |
|---|---|---|---|---|---|
| AST Traversal (implemented) | Tree walk over parsed syntax nodes | Compilers, linters, refactoring, static analyzers | Preserves structural meaning better than regex parsing | Already powers Java structure extraction for classes/imports/method signatures | Not for runtime behavior reasoning without control/data flow context |
| Tree Edit Distance (Zhang-Shasha) | Minimum edit cost between two trees | AST diffing, clone analysis, migration similarity checks | Captures structural differences not visible in plain text diff | Can score Java vs translated Python AST fidelity for stronger parity evidence | Avoid for very large trees in hot paths due to higher compute cost |
| CFG | Graph model of possible execution paths in a function/method | Dead code detection, path analysis, coverage planning | Exposes branch structure and reachability explicitly | Can verify translated Python keeps equivalent branch reachability vs Java | Not needed for simple straight-line code with no branching |
| Data-Flow Analysis | Tracks definitions, uses, and propagation of values/types | Compiler optimization, bug finding, security checks | Detects misuse and propagation mistakes early | Can validate Java type/variable semantics survive mapping into Python | Avoid when analysis precision cost exceeds value for trivial modules |
| Program Slicing | Extracts statements relevant to a variable/output criterion | Debugging, comprehension, targeted verification | Reduces analysis scope and noise | Isolates only code affecting a translated output to speed parity root-cause analysis | Not ideal when holistic system interactions are the real issue |
| Taint Analysis (implemented conceptually) | Marks untrusted input and tracks flow to sensitive sinks | Security validation, injection prevention | Directly maps to security risk pathways | Supports guardrail hardening by tracing untrusted request data through translation pipeline | Not useful when all inputs are already trusted and isolated |
| Hindley-Milner Type Inference | Unification-based static type inference | Functional languages, inferred typing systems | Improves correctness with less manual annotation | Could auto-suggest Python type hints from Java source semantics | Not a fit where dynamic/runtime types dominate behavior |
| Abstract Interpretation | Sound approximation of program states over abstract domains | Static verification and bug class elimination | Can prove classes of errors without executing code | Can add formal assurance on translated output safety properties | Avoid where exact concrete behavior is mandatory and approximation is too coarse |
| Algorithm | What It Is | Most Common Use | Why It Should Be Used | How It Helps This Project | When Not To Use |
|---|---|---|---|---|---|
| Aho-Corasick | Trie + failure-link automaton for multi-pattern search | IDS signatures, malware scanning, keyword dictionaries | Finds all patterns in one pass efficiently | Can replace sequential guardrail regex checks with one multi-pattern scanner for injection/secrets | Not ideal for complex contextual patterns better handled by full parsers or regex engines |
| Rabin-Karp | Rolling-hash string matching approach | Plagiarism/clone detection, multiple substring checks | Fast average matching and convenient window hashing | Can detect repeated risky snippets or clone patterns across translated outputs | Avoid when hash collision handling overhead or exact single-pattern speed is critical |
| Boyer-Moore | Heuristic skip-based exact pattern matcher | Fast exact search in large text | Often sublinear average performance for single pattern | Useful for fast scanning of one high-priority forbidden token/signature | Not for many patterns at once; Aho-Corasick is better there |
| Bloom Filter | Probabilistic membership structure with false positives only | Caching, prefiltering, dedupe prechecks | Very memory-efficient and fast precheck stage | Can fast-reject obviously safe payloads before expensive deep scans | Not for workflows requiring zero false positives and exact membership |
| Levenshtein Distance | Edit-distance metric between strings | Fuzzy matching, near-duplicate detection, typo tolerance | Quantifies similarity robustly | Can score translation drift and flag suspiciously divergent output from expected behavior/text | Avoid for strict semantic equivalence judgments without structural context |
| Algorithm | What It Is | Most Common Use | Why It Should Be Used | How It Helps This Project | When Not To Use |
|---|---|---|---|---|---|
| Model Checking | Exhaustive state-space verification against temporal properties | Protocol verification, safety-critical policy checks | Finds counterexamples rigorously | Can prove RBAC and policy-lock invariants over request state transitions | Avoid for very large unconstrained state spaces without abstraction |
| Symbolic Execution | Executes paths with symbolic values and constraints | Path discovery, bug finding, test generation | Reaches edge paths hard to hit with manual tests | Can generate adversarial API vectors to stress translation and guardrails | Not ideal when path explosion makes runtime impractical |
| Concolic Testing | Concrete execution guided by symbolic constraints | Automated test input generation | Practical compromise between full symbolic and random testing | Can expand coverage for translation endpoints with targeted boundary/path inputs | Avoid when harness constraints are too expensive to maintain |
| Hoare Logic | Pre/postcondition proof framework for program correctness | Formal specs and proof-oriented correctness | Sharp contractual reasoning around invariants | Can specify and verify required behavior for dependency ordering and policy checks | Not needed where lightweight testing already provides enough assurance |
| Property-Based Testing | Randomized input generation checked against invariants | Invariant testing and edge-case exploration | Finds surprising cases that example-based tests miss | Can stress graph ordering and parity invariants over large random input spaces | Avoid when properties are weakly defined or nondeterministic outputs are expected |
| Algorithm | What It Is | Most Common Use | Why It Should Be Used | How It Helps This Project | When Not To Use |
|---|---|---|---|---|---|
| McCabe Cyclomatic Complexity | Branch/path complexity metric from control flow | Test planning and maintainability risk scoring | Correlates complexity with defect and testing effort | Can drive risk-based test intensity on translated functions/classes | Not as a sole quality signal without context |
| Halstead Metrics | Operator/operand based software volume and effort metrics | Productivity and maintainability analysis | Gives a language-agnostic complexity lens | Can compare source vs translated code inflation and detect complexity bloat | Avoid as hard pass/fail gates in isolation |
| Maintainability Index | Composite maintainability score from complexity/volume/LOC | Portfolio-level code health tracking | Easy high-level signal for triage | Can prioritize translated files for manual review when score degrades | Not reliable for very small files or generated code alone |
| Fan-In/Fan-Out | Counts inbound and outbound dependency edges | Architecture coupling analysis | Highlights hotspots and blast-radius risk | Can prioritize high fan-in classes for stricter parity and regression checks | Not needed for tiny low-coupling modules |
| Algorithm | What It Is | Most Common Use | Why It Should Be Used | How It Helps This Project | When Not To Use |
|---|---|---|---|---|---|
| Shewhart Control Charts (implemented baseline) | Control limits over time-series process metrics | Manufacturing and ops stability monitoring | Fast detection of obvious out-of-control behavior | Already aligns to audit control-state tracking for latency/quality drift | Less sensitive to small gradual drifts |
| CUSUM | Cumulative drift detector versus target mean | Early shift detection in process monitoring | Detects subtle persistent changes earlier than Shewhart | Can alert on slow latency degradation before SLA breach | Not for highly non-stationary streams without segmentation |
| EWMA | Exponentially weighted moving average trend estimator | Smoothed monitoring and anomaly trend tracking | Balances noise reduction with responsiveness | Can provide cleaner quality/latency trendlines in audit dashboards | Avoid if abrupt shifts are the only concern and lag is unacceptable |
| Z-Score Anomaly Detection | Standard deviation based outlier scoring | Basic anomaly and quality outlier flags | Simple, interpretable, low implementation cost | Can flag suspicious request records for investigation in near real-time | Not for heavy-tailed or non-Gaussian distributions without robust variants |
| Isolation Forest | Tree-ensemble unsupervised anomaly detector | Fraud, operations anomalies, multivariate outliers | Captures nonlinear multivariate anomalies well | Can detect odd combinations of role, latency, block-rate, and payload characteristics | Avoid for tiny datasets where model instability is high |
| Bayesian Inference | Posterior probability updating with evidence | Risk forecasting, decision support under uncertainty | Integrates prior knowledge and new evidence rigorously | Can estimate release risk from test outcomes plus historical defects | Not needed when deterministic thresholds are sufficient |
| Fisher's Exact Test | Exact significance test for contingency tables | Small sample proportion comparisons | Reliable p-values for low-count events | Can test whether blocked-request spikes are statistically significant | Avoid for large-sample cases where simpler approximations are fine |
| Algorithm | What It Is | Most Common Use | Why It Should Be Used | How It Helps This Project | When Not To Use |
|---|---|---|---|---|---|
| IPOG | Covering-array generator for t-way combinations | Combinatorial API/config test design | Large coverage gains with far fewer cases than full Cartesian products | Can systematically cover role x endpoint x payload combinations with manageable test counts | Not necessary for very small parameter spaces |
| MC/DC Coverage | Criterion requiring each condition independently affect outcome | Safety-critical software verification | Strong decision-logic assurance with efficient test sets | Can harden guardrail and RBAC condition logic validation | Avoid as universal requirement for low-risk modules due to overhead |
| Coverage-Guided Fuzzing | Mutation fuzzing guided by code coverage feedback | Security hardening and crash discovery | Efficiently discovers deep parser/validation edge cases | Can stress translation endpoints with malformed/adversarial Java inputs | Not ideal where deterministic reproducibility and strict runtime budgets dominate |
| N-version/Differential Testing | Compare outputs across independent implementations | Compiler/runtime verification and migration confidence | Great at finding semantic mismatches | Can compare legacy Java oracle against translated Python outputs continuously | Not useful if all compared implementations share same defect source |
The suite now includes proof-style parity tests that run the same function behavior in both legacy Java and translated Python and assert identical outputs for shared input vectors.
In this repository, a vector means one structured test case: input values plus the expected output.
Example vector concept:
- Input:
base=5,multiplier=10,premium=true - Expected output:
75
That single row is one vector. A vector file is a list of many such rows (normal, edge, and negative scenarios).
Vectoring is the testing approach where both runtimes (legacy Java and translated Python) are driven from that same shared vector dataset instead of hardcoded test values in multiple places.
Why vectoring is useful:
- Single source of truth for migration parity expectations
- Less duplicated test data across languages
- Easier reviews and audits of behavioral requirements
- Faster updates when business rules change
Vector Runner in this project:
LegacyCalculatorVectorRunner.javareads the shared JSON vectors- Executes the legacy Java function for each vector
- Emits per-case output (
id, actual, expected) for parity checks
This is how we prove output equivalence:
- Define vectors in shared JSON/CSV fixture files
- Run legacy Java against those vectors
- Run translated Python against those same vectors
- Assert Java output equals Python output for each vector id
This pattern gives an explicit migration proof: same inputs, same outputs, across runtimes.
| Proof Test | What It Verifies | Location |
|---|---|---|
| Java fixture expected-value test | Legacy Java behavior is stable and explicit | tests/correctness/test_legacy_java_python_equivalence.py |
| Python fixture expected-value test | Translated Python behavior matches intended outputs | tests/correctness/test_legacy_java_python_equivalence.py |
| Cross-language equivalence test | Java output == Python output for the same inputs | tests/correctness/test_legacy_java_python_equivalence.py |
Fixture sources:
fixtures/java/simple/LegacyCalculator.javafixtures/java/simple/LegacyCalculatorVectorRunner.javafixtures/expected_python/legacy_calculator.pyfixtures/vectors/legacy_calculator_vectors.jsonfixtures/vectors/legacy_calculator_vectors.csv
| Asset | Runtime Consumer | Purpose | Status |
|---|---|---|---|
legacy_calculator_vectors.json |
Python parity tests + Java vector runner | Canonical vector source (id, input, expected) | Implemented |
legacy_calculator_vectors.csv |
Optional import/export interoperability | Spreadsheet-friendly mirror for manual review | Implemented |
LegacyCalculatorVectorRunner.java |
Java runtime | Reads shared JSON vectors and evaluates legacy function | Implemented |
test_legacy_java_python_equivalence.py |
pytest | Parameterized cross-runtime parity assertions | Implemented |
| Tool | How It Helps With Java-to-Python Parity | Typical Use |
|---|---|---|
pytest parameterized tests |
Reuse the same vectors for both runtimes | Core parity assertions (implemented) |
| JUnit 5 parameterized tests | Capture legacy Java oracle outputs | Legacy baseline generation (recommended next) |
| ApprovalTests | Golden-master snapshot comparisons | Regression lock for legacy outputs (recommended next) |
| JSON/CSV test vectors | Runtime-agnostic shared inputs/outputs | Single source of truth for parity data (implemented) |
| Testcontainers | Reproducible Java runtime execution | Stable local runtime parity in isolated containers (recommended next) |
Practical recommendation: keep a shared vector file and run both Java and Python against it, treating Java output as the initial oracle during migration.
| Zero-Trust Control | What It Means | Project Implementation | Evidence |
|---|---|---|---|
| Verify identity on every request | No implicit trust by network location | JWT verification + RBAC dependency checks in API routes | tests/negative/test_rbac_enforcement.py |
| Explicit policy decision per request | Each request must be allow/deny evaluated | Input guardrails, model lock, egress policy lock, blocked audit path | tests/negative/test_model_blocking.py, tests/negative/test_egress_blocking.py, tests/adversarial/test_prompt_injection.py |
| Least privilege access | Users only get required capabilities | Role-permission mapping with permission-scoped endpoints | core/auth.py, tests/negative/test_rbac_enforcement.py |
| Continuous verification | Runtime signals prove controls remain active | Audit report includes zero-trust rates, quality attestations, deny rate | /api/v1/audit-report zero-trust section |
| Assume breach + contain blast radius | Treat unsafe inputs as hostile by default | Block injection/secret payloads and sanitize audit records | guardrails/input_guard.py, guardrails/output_guard.py, tests/integration/test_audit_trail.py |
The release dashboard now includes a dedicated zero_trust section with:
postureidentity_verification_ratepolicy_decision_ratecontinuous_verification_ratepolicy_deny_rate
This makes zero-trust status measurable release-over-release instead of purely descriptive.
| Requirement (README) | Test (pytest) | Security Check | Quality Gate | Coverage |
|---|---|---|---|---|
| "Guarantee base-before-subclass order" | test_topological_sort.py (16 tests) | Klocwork scan | SonarQube: no high issues | 95%+ on project_translator.py |
| "Detect circular dependencies" | test_circular_dependencies.py (4 tests) | Bandit: no unsafe loops | No tech debt on Kahn logic | 100% cycle path |
| "Block injection patterns" | test_prompt_injection.py (5 tests) | Klocwork CWE-89, CWE-95 | SonarQube security hotspots | 100% on input_guard patterns |
| "Redact secrets from output" | test_forbidden_patterns.py (4 tests) | Bandit hardcoding check | No credential leak in logs | 100% on output_guard.redact() |
| "Enforce RBAC via JWT" | test_rbac_enforcement.py (4 tests) | Checkmarx token validation | Crypto best practices | 100% on auth.py verify_token |
| "Policy lock for models/egress" | test_model_blocking.py (3 tests) | Klocwork: whitelist bypass | No bypass paths | 100% on provider_lock.py |
This table is the top-to-bottom traceability matrix: each requirement has a test, security validation, and quality gate.
pie title Test File Distribution by Marker Group
"unit" : 5
"integration" : 4
"correctness" : 4
"negative" : 4
"adversarial" : 4
| Marker Group | Purpose | Key Benefit |
|---|---|---|
| unit | Algorithmic correctness for parsing/graph/order | Fast feedback on core logic |
| integration | API request/response and contract validation | Catches wiring and schema regressions |
| correctness | Python output structure and signature quality | Protects translation fidelity |
| negative | Policy and access-control enforcement | Prevents unsafe execution paths |
| adversarial | Injection and malformed input hardening | Reduces attack-surface risk |
Classification Criteria:
- π’ APPROVED: Tool is explicitly approved for classified/SCI work, has required security certifications (ISO 27001, FedRAMP, etc.), commonly used in defense/government sectors, or is open-source with minimal attack surface.
- π‘ CONDITIONAL: Tool can be used with specific restrictions (on-prem deployment only, special licensing, restricted data flow, etc.).
- π΄ NOT APPROVED: Tool lacks required certifications, uses unapproved cloud storage, transmits classified data externally, or has known security concerns for SCI environments.
Caution
All tools flagged as NOT APPROVED or CONDITIONAL must be reviewed by your security/compliance officer before use. Do not deploy tools flagged as NOT APPROVED in SCI/SCIF environments. CONDITIONAL tools require explicit variance/waiver documentation.
| Tool | Version | Purpose | SCI/SCIF Status | Restrictions/Notes |
|---|---|---|---|---|
| Python | 3.11+ | Runtime interpreter | π’ APPROVED | Open-source, widely used in government. Requires system-level deployment controls. |
| pytest | 8.0+ | Test framework | π’ APPROVED | Open-source, MIT license. Standard in Python security testing. No external data transmission. |
| pytest-asyncio | 0.23+ | Async test support | π’ APPROVED | Open-source, BSD license. Minimal attack surface. |
| httpx | 0.27+ | HTTP client for API testing | π’ APPROVED | Open-source, BSD license. Used for in-process API testing only (no external calls). |
| FastAPI | 0.136+ | Web framework | π‘ CONDITIONAL | Open-source, MIT license. Requires hardened deployment configuration for SCI. Ensure all dependencies are audited. On-prem deployment only. |
| cryptography | 42.0+ | Cryptographic library | π’ APPROVED | Open-source, dual Apache/BSD license. NIST-standard algorithms. Actively maintained. |
| PyJWT | 2.8+ | JWT signing/verification | π’ APPROVED | Open-source, MIT license. Minimal, focused functionality. |
| Pydantic | 2.9+ | Data validation | π’ APPROVED | Open-source, MIT license. No external validation calls. Widely adopted in security projects. |
| javalang | 0.13+ | Java parser | π’ APPROVED | Open-source, BSD license. Local parsing only, no network access. |
| Tool | Purpose | SCI/SCIF Status | Restrictions/Notes | Recommended? |
|---|---|---|---|---|
| Klocwork (Perforce) | SAST - vulnerabilities, code quality | π’ APPROVED | Enterprise tool explicitly used by aerospace/defense. ISO 27001 certified. TΓV-SΓD certified. Commercial license required. | β YES - Preferred for classified environments |
| SonarQube | Code quality & maintainability | π‘ CONDITIONAL | On-prem deployment: APPROVED. Cloud (SonarCloud): NOT APPROVED. Requires air-gapped or internal-only instance. | |
| Checkmarx (SAST) | Enterprise vulnerability scanning | π’ APPROVED | Explicitly targets government/defense. Supports on-prem. Commercial license required. | β YES - Enterprise-grade SAST |
| Coverity (Synopsys) | Deep static analysis | π’ APPROVED | Defense/aerospace standard tool. Commercial license required. Supports on-prem deployment. | β YES - Advanced static analysis |
| Bandit | Python-specific security scanning | π’ APPROVED | Open-source, Apache 2.0 license. Lightweight, local execution only. | β YES - Lightweight pre-commit check |
| Pylint | Python linting & style | π’ APPROVED | Open-source, GPL license. No external calls. Standard in Python ecosystem. | β YES - Pre-commit linting |
| pip-audit | Python dependency vulnerability scanning | π’ APPROVED | Open-source, MIT license. Local scanning, no remote calls by default. | β YES - Lightweight dependency audit |
| OWASP Dependency-Check | Dependency vulnerability scanner | π’ APPROVED | Open-source, Apache 2.0 license. Can run air-gapped with offline DB. | β YES - Comprehensive SCA |
| Black Duck (Synopsys) | License & composition analysis | π‘ CONDITIONAL | Commercial tool with on-prem option. Requires licensing agreement for classified use. | |
| Snyk | Dependency scanning SaaS | π΄ NOT APPROVED | Cloud-based SaaS. Data transmission to external service prohibited for SCI. Unapproved for classified use. | β NO - Do not use |
| Tool | Purpose | SCI/SCIF Status | Restrictions/Notes | Recommended? |
|---|---|---|---|---|
| pytest-cov | Code coverage measurement | π’ APPROVED | Open-source, BSD license. Local execution only. Generates coverage reports. | β YES - Essential for V&V |
| Codecov | Coverage tracking SaaS | π΄ NOT APPROVED | Cloud-based service. Transmits coverage data to external servers. Not approved for SCI environments. | β NO - Do not use |
| Datadog | APM & monitoring SaaS | π΄ NOT APPROVED | Cloud SaaS. Continuous data transmission to external servers. Classified data cannot be sent to Datadog. | β NO - Do not use |
| LoadRunner (Micro Focus/OpenText) | Performance & load testing | π‘ CONDITIONAL | On-prem/self-hosted: APPROVED with proper security hardening. Cloud version: NOT APPROVED. Commercial license required. | |
| Stryker | Mutation testing (Python/Java) | π’ APPROVED | Open-source, Apache 2.0 license. Runs locally, no external calls. | β YES - Test quality verification |
| PIT | Mutation testing for Java bytecode | π’ APPROVED | Open-source, Apache 2.0 license. Local execution only. | β YES - For Java parity testing |
| Tool | Purpose | SCI/SCIF Status | Restrictions/Notes | Recommended? |
|---|---|---|---|---|
| TestRail | Test management & traceability | π‘ CONDITIONAL | Self-hosted/on-prem: APPROVED with proper security controls. Cloud version: NOT APPROVED. Proprietary, commercial license. | |
| Jira Xray | Test management within Jira | π‘ CONDITIONAL | On-prem Jira: APPROVED. Cloud Jira: NOT APPROVED for SCI data. Proprietary plugin, commercial license. | |
| Azure DevOps Test Plans | Requirements & test traceability | π‘ CONDITIONAL | On-prem: APPROVED (requires Azure DevOps Server). Cloud (azure.com): NOT APPROVED for SCI. | |
| ReqIF Editor | Requirements interchange format | π’ APPROVED | Open-source, EPL license. Local file-based tool, no external connections. | β YES - For requirements management |
| Tool | Purpose | SCI/SCIF Status | Restrictions/Notes | Recommended? |
|---|---|---|---|---|
| GitHub Actions | Cloud-hosted CI/CD | π΄ NOT APPROVED | Cloud-hosted service. Builds and artifacts transmitted to GitHub servers. Not approved for SCI code/data. | β NO - Use on-prem CI/CD |
| Jenkins | On-prem CI/CD automation | π’ APPROVED | Open-source, MIT license. Can be air-gapped or on-prem only. Widely used in government. | β YES - Preferred CI/CD for SCI |
| GitLab CI (Cloud) | Cloud-hosted CI/CD | π΄ NOT APPROVED | Cloud-hosted. Not approved for SCI code transmission. | β NO - Use on-prem option |
| GitLab CI (Self-Hosted) | Self-hosted CI/CD | π‘ CONDITIONAL | On-prem deployment: APPROVED with proper air-gapping. Proprietary core, open-source options available. |
| Category | Approved Count | Conditional Count | Not Approved Count | Recommendation |
|---|---|---|---|---|
| Core Dependencies | 8/8 | 1 | 0 | Use all core deps. Harden FastAPI deployment. |
| Static Analysis (SAST/SCA) | 5/9 | 2 | 2 | Use Klocwork, Checkmarx, Coverity as primary SAST. Avoid Snyk cloud. |
| Testing & Performance | 4/6 | 1 | 1 | Use pytest-cov and mutation testing. Avoid Codecov/Datadog cloud. |
| Requirements & Test Mgmt | 1/4 | 3 | 0 | Use ReqIF or on-prem TestRail/Jira. Avoid cloud services. |
| CI/CD | 1/4 | 1 | 2 | Use Jenkins on-prem. Avoid GitHub Actions and cloud CI. |
| TOTAL | 19/31 | 8/31 | 5/31 | Buildable with APPROVED tools. CONDITIONAL tools need variance. |
For APPROVED Tools:
- No additional review needed.
- Deploy using standard security hardening practices.
- Ensure all infrastructure is on-prem and air-gapped from external networks.
For CONDITIONAL Tools:
- Requires security/compliance officer review and variance documentation.
- Must be deployed on-prem (not cloud).
- Ensure all data remains within security boundary.
- Document any external dependencies or data transmission.
For NOT APPROVED Tools:
- DO NOT DEPLOY in SCI/SCIF environments.
- Seek alternative APPROVED tools.
- Escalate to program security office if no alternative exists.
If you are currently using NOT APPROVED tools:
| Current Tool | Reason Not Approved | APPROVED Alternative |
|---|---|---|
| Codecov | Cloud SaaS, external data transmission | Use local pytest-cov + local artifact storage |
| Datadog | Cloud SaaS, continuous monitoring | Use on-prem ELK, Grafana, or Prometheus stack |
| Snyk | Cloud SaaS, external scanning | Use OWASP Dependency-Check (on-prem) + Bandit |
| GitHub Actions | Cloud CI/CD | Use Jenkins on-prem or GitLab self-hosted |
| SonarCloud | Cloud SaaS | Use SonarQube on-prem instance |
Prerequisites:
- Python 3.11+
- Access to the orchestrator source path expected by
conftest.py
Install:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtOptional local env file:
cp .env.example .envQuick validation:
pytest --collect-only -qRun full suite:
pytest -qRun by concern:
pytest -m unit -q
pytest -m integration -q
pytest -m correctness -q
pytest -m negative -q
pytest -m adversarial -qFocused debugging flow:
- Install dependencies.
- Run the relevant marker group.
- Use
-kto isolate failing behavior. - Re-run the same slice to confirm regression closure.
pytest -m integration -k dependency_order -qTip
For long local runs, use Ctrl+C to stop gracefully and keep the latest failure summary.
gantt
title Verification Roadmap
dateFormat YYYY-MM-DD
section Core
Parser and ordering guarantees :done, r1, 2026-01-01, 2026-02-20
section Security
RBAC and guardrail hardening :active, r2, 2026-02-21, 2026-05-30
section Expansion
Coverage growth and mutation checks :r3, 2026-06-01, 2026-09-01
| Phase | Goals | Target | Status |
|---|---|---|---|
| Core | Preserve dependency and translation order correctness | Q1 2026 | Complete |
| Security | Broaden adversarial and RBAC scenarios | Q2 2026 | In progress |
| Expansion | Add mutation testing and richer fixture corpora | Q3 2026 | Planned |
See CONTRIBUTING.md for workflow and test expectations.
Quality checklist for pull requests
- Add or update tests for each behavior change.
- Preserve dependency-order invariants in project translation paths.
- Keep fixtures deterministic and security-safe.
- Run targeted marker groups plus a full suite pass before opening a PR.
This project is licensed under the MIT License. See LICENSE for details.