A couple of years ago, I looked at predicting human poses from an image and/or video. It turned out to be more effort than I was willing to put in at the time (especially for water skiing, since there is a lot of noise and occlusion via ropes and spray).
Anyway, it led me to follow a few people's work in the space, specifically Michael Black's work:
https://github.com/mkocabas/VIBE
I ran his latest project on a clip from the 'Terry Winter at Quarter Speed' video from Horton's channel. Out of the box, this performed amazing well. Also, there isn't any specific training towards this type of movement. Very promising in my opinion.
I also tested it against some random clips from youtube, but I won't subject randoms to this discussion. Since this is in quarter speed, and very well zoomed in, I assumed it would be much better. Honestly, it's better in some areas, but much more jittery for some reason.
Here is the result:
You can find a ton of work in various sports doing this sort of thing. From the tracked joint data you can do a lot of automated analysis or comparisons between runs. It's also interesting to see the 3D pose, since we rarely get that view in this sport.
Thoughts?