Over the past few months, I have been studying Java in my Computer Programming II class. I really enjoy it more than I thought I would, and I was eventually able to get NetBeans and IntelliJ setup so that I could test out JavaFX. After some troubleshooting, and plenty of research (googling), I was able to get a simple screen toggle working. Eventually, I was able to get a simple form connected to an SQL database.

All of this was fun and interesting, but I quickly realized that I was programming a lot of work without a goal. In the end, the value is the experience gained and honestly, learning Java and applying that in different ways has been the best way for me to continue learning. Since I have been experimenting with text-to-speech (TTS), I decided to try implementing an AI Chatbot. However, after doing some browsing through Github, I realized that Python was more common and would be the better route. With the loose concept, I created an audio recorder model, and a transcription module, using OpenAI’s Whisper model. My ultimate goal is to design this chatbot to work offline, without performing any external API calls. Using LM Studio, I created a headless server on another PC so I could take advantage of the Geforce 1060 it has. An old as heck GPU at this point, but it has 8GB of VRAM, which is good enough for this project.

Putting it all together, I wrote the script to send my recorded message to LM Studio, transcribed using whisper. Eventually, I’d like to have the bot’s reply translated into audio using a STT model, and I know it’s possible. I’ve run some tests using dia and parler-tts, and both are promising, but unfortunately, I currently have an issue with torch while running these. My CUDA isn’t being made available, and until I figure this out, I’ll continue doing some experiments. There is the option of using ElevenLabs, however, this would defeat my goal of keeping external APIs out of the codebase. Even still, it is satisfying to see the development of these projects rapidly speeding up.