In short
Software that gives feedback on fitness videos, just like a coach or personal trainer
would. The throughline: an experiment with ChatGPT as a personal trainer fails, while
a custom AI solution succeeds. With custom software you can analyze complex movements
in video and give technical, personalized coaching on them.
Problem
I was curious how far the multimodal capabilities of today's LLMs reach — models that
understand text, sound, images and video. For this I used videos I had earlier sent to my
own personal trainer. After uploading them to ChatGPT I only got generic, unspecific
feedback. No available model could analyze the movements accurately; when asked for visual
feedback it generated irrelevant images.
Approach
So I built a custom solution: an AI-powered virtual Olympic coach that poses as the
world-famous weightlifting coach
Bob Takano.
- Prompt engineering based on the methodology of a top weightlifting coach
- Google Gemini 2.5 Pro for frame-by-frame movement analysis
- A Python tool for slowing down the video and a visual feedback overlay
- Result: technically accurate, personalized coaching
ChatGPT vs. custom
| ChatGPT — fails at video analysis of sports movements |
Custom — AI-powered virtual Olympic coach |
| No available model can analyze movements accurately |
Prompt engineering based on a top weightlifting coach's methodology |
| Feedback is generic and not specific to the technique shown |
Google Gemini 2.5 Pro for frame-by-frame movement analysis |
| When asked for visual feedback it generates irrelevant images |
Python tool for slowing down the video and a visual feedback overlay |
| Movement recognition is missing entirely |
Technically accurate, personalized coaching |
The two attempts (attempt 1 with ChatGPT, attempt 2 with the custom coach) and two
technique analyses are shown as playable videos at the bottom of this case.
Tech & stack
- 💬 ChatGPT — macOS app (first, failed attempt)
- 🤖 Google AI Studio — Gemini 2.5 Pro (multimodal video analysis)
- 🧑💻 GitHub Copilot — coding agent
- 💻 VSCode — IDE
- 🐍 Python — tool for slowing down video and the feedback overlay
Status
Experiment — my own R&D, shared via LinkedIn with demo videos. It shows that generic
multimodal models fall short for movement analysis, while a custom approach with Gemini
2.5 Pro + a Python pipeline does work.