AI-driven data pipeline designed to be used on bulk transcripts for persons with Aphasia or otherspeech disorders. Designed to function as a CLI tool with the capability to handle JSON config file input. The data is processed through TTS, realtime audio, and transcription AI models in order to produce data for research purposes. Iterations are taken with the data going through this pipeline in order to develop prompts for Artificial Intelligence models that most accurately match Natural Language features of real transcripts from real people with Aphasia.
This project was originally developed with NodeJS but I switched to Python as the amount of data processing and organization increased throughout the course of the research process. I ported the code due to the simpler nature of Python scripting and extensive library support for working with audio files, large datasets, and OpenAI's API.
It is almost a required rite of passage for all techy college kids to build a blog site at some point. After way too much contemplation of how I should build my blog site, I decided a content first approach using Jekyll/GitHub Pages was the way to go. I am not looking for a frontend development job, I am a budding cybersecurity professional. With that in mind, I went with the Chirpy theme, which is common amongst my peers for good reason. It is simple, effective, and easy to work with. I am using a custom domain as well and setting it up was a good reminder lesson on DNS servers. My site will include CTF writeups, takeaways from CPTC, and maybe some notes about my homelab experiments with AI Agents and tooling.

