Skip to content

JinBa1/java-query-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Java Query Engine

CI Coverage Dependencies

An in-memory relational query engine built on the Volcano/iterator model. Parses SQL via JSqlParser, builds an operator tree, and executes queries tuple-by-tuple against CSV data.

Architecture

SQL → JSqlParser → QueryPlanner → QueryPlanOptimizer → Operator Tree → Results

Core components:

Component Role
QueryPlanner Parses SQL and builds the operator pipeline
QueryPlanOptimizer Selection pushdown, trivial operator removal
DBCatalog Schema and table metadata (singleton)
ExpressionEvaluator Evaluates WHERE/HAVING conditions per tuple
ExpressionPreprocessor Resolves column references to indices
ConditionSplitter Separates join predicates from selection predicates

Operator hierarchy (all extend Operator):

ScanOperatorSelectOperatorProjectOperatorJoinOperatorSortOperatorSumOperatorDuplicateEliminationOperator

Feature Matrix

Feature Status
SELECT * / projection ✅ Supported
WHERE predicates ✅ Supported
Inner joins (nested-loop) ✅ Supported
ORDER BY ✅ Supported
GROUP BY + SUM ✅ Supported
DISTINCT ✅ Supported
Nested arithmetic/comparison expressions ✅ Supported
Query optimisation (selection pushdown) ✅ Supported
Indexes ❌ Not supported
Transactions ❌ Not supported
INSERT / UPDATE / DELETE ❌ Not supported
Concurrency ❌ Not supported
Persistence ❌ Not supported
Full SQL dialect ❌ Not supported

Scope

This engine supports SQL-over-CSV query execution: read-only queries against tables stored as CSV files. It does not support transactions, indexes, data modification (INSERT/UPDATE/DELETE), concurrency, persistence, or a full SQL dialect. All values are integers.

The focus is on demonstrating query planning, optimisation, and the Volcano iterator execution model.

Quick Start

Prerequisites: Java 17, Maven (or use the included Maven Wrapper).

# Clone
git clone https://github.com/JinBa1/java-query-engine.git
cd java-query-engine

# Build fat JAR
./mvnw clean compile assembly:single

Run a query:

java -cp target/java-query-engine-1.0.0-jar-with-dependencies.jar \
  com.github.jinba1.blazedb.BlazeDB \
  samples/db samples/input/query1.sql output.csv

Demo

Input table (samples/db/data/Student.csv):

1, 200, 50, 33
2, 200, 200, 44
3, 100, 105, 44
4, 100, 50, 11
5, 100, 500, 22
6, 300, 400, 11

Query (samples/input/query4.sql):

SELECT * FROM Student WHERE Student.A < 3;

Command:

java -cp target/java-query-engine-1.0.0-jar-with-dependencies.jar \
  com.github.jinba1.blazedb.BlazeDB \
  samples/db samples/input/query4.sql output.csv

Output (output.csv):

1, 200, 50, 33
2, 200, 200, 44

Running Examples

The samples/ directory ships with 12 queries and a small dataset (Student, Course, Enrolled tables). Expected output lives in samples/expected_output/.

# Run all sample queries and diff against expected output
for i in $(seq 1 12); do
  java -cp target/java-query-engine-1.0.0-jar-with-dependencies.jar \
    com.github.jinba1.blazedb.BlazeDB \
    samples/db "samples/input/query${i}.sql" "/tmp/out${i}.csv"
  diff "samples/expected_output/query${i}.csv" "/tmp/out${i}.csv" && echo "query${i}: OK"
done

Testing

./mvnw test

The test suite covers individual operators, the query planner, the optimiser, expression evaluation, and end-to-end integration scenarios (190 tests).

Project Structure

├── src/main/java/com/github/jinba1/blazedb/   # Core engine (21 files)
│   └── operator/                                # Volcano operators (8 files)
├── src/test/java/com/github/jinba1/blazedb/    # JUnit 5 tests
├── samples/
│   ├── db/schema.txt                            # Table schemas
│   ├── db/data/                                 # CSV data files
│   ├── input/query[1-12].sql                    # Sample queries
│   └── expected_output/query[1-12].csv          # Expected results
├── pom.xml                                      # Maven config (Java 17, JSqlParser 4.7)
├── mvnw / mvnw.cmd                              # Maven Wrapper
└── LICENSE

Background

Originally built as a university project for the Advanced Database Systems course at the University of Edinburgh, subsequently extended with additional query optimisation and expanded test coverage.

License

This project is released under the MIT License. See LICENSE for details.

About

In-memory relational query engine built on the Volcano/iterator model. SQL parsing, query planning and optimisation in Java.

Topics

Resources

License

Stars

Watchers

Forks

Contributors

Languages