Jammal Omotoyosi Adeyemi*
Received: January 10, 2024; Published: January 23, 2024
*Corresponding author: Jammal Omotoyosi Adeyemi, MSc AI Graduate University of Bradford Bradford, United Kingdom
DOI: 10.26717/BJSTR.2024.54.008579
Background: This research presents a pioneering and comprehensive approach aimed at delivering personalized exercise and yoga support to the senior population. The primary objective of this research is to enhance physical well-being, boost engagement, foster adherence to exercise and yoga routines, and refine users’ posture and balance.
Methods: A pivotal phase in this project involves keypoint extraction, skillfully executed using MoveNet, an advanced posture estimation model. To achieve this, deep learning algorithms like Convolutional Neural Networks, Dense Neural Networks, and Multi-Layer Perceptrons are deployed to effectively classify and categorize yoga and exercise positions.
Results: The culmination of this endeavor manifests in the form of an interactive web application. The robustness and efficacy of the system are underscored by extensive user testing, which has also assessed its usability and potential to significantly enhance the physical well-being of the elderly. In summation, this research represents a substantial advancement in the realm of targeted exercise and yoga support, expressly designed to cater to the distinctive needs of the senior demographic.
Conclusion: Concluding research stands as a testament to the potential of technology in promoting physical well-being and enhancing the lives of the elderly through tailored exercise and yoga support systems.
Trial Registration: User testing phase carried out in retrospectively registered manner.
Keywords: Exercise; Yoga; MoveNet; Deep Learning; Tailored Support; Tiago Robot; Personalized Assistance
Exercise is a program of structured physical exercise that aims to develop and maintain mental health, well-being, cardiovascular endurance, muscular strength, and physical fitness [1]. When you exercise, endorphins, often known as "feel-good" hormones, are released, which helps to boost happiness and well-being [2]. Exercise is critical for weight management and maintenance [3]. Physical exercise includes any bodily action that requires energy expenditure. Brisk walking, jogging, swimming, and cycling are all exercises that help to strengthen the heart muscle, improve blood circulation, and lower blood pressure [3,4]. The position of a person's body while they are sitting or standing is referred to as their posture [5]. It might be static or dynamic, such when a person is sitting, standing, or sleeping [6]. Dynamic examples include when a person is walking, jogging, or bending. Having good posture means supporting the body in a position that puts the least amount of strain on the muscles and ligaments. This can aid in avoiding accidents and advancing general physical health.
According to multiple studies [7-10], regular exercise provides many health advantages, including lowering the risk of cardiovascular illnesses such high cholesterol, stroke, and heart disease. Regular physical exercise can support the maintenance of a healthy body weight when accompanied with a balanced and nourishing diet. Regular exercise can improve endurance, enabling people to do daily chores more quickly and effectively. According to several studies [7,11,12], regular exercise can lower the chance of acquiring chronic diseases such type 2 diabetes, certain malignancies, osteoporosis, and arthritis. Managing pre-existing medical disorders including diabetes, hypertension, and chronic pain is also crucial [13].
Making sure people exercise safely and successfully is essential given the significance of exercise in boosting health and well-being. By doing so, you may increase the advantages of exercise while lowering your chance of being hurt. Technology may be used to reliably detect exercise postures and offer real-time feedback in the context of exercise and injury prevention, thereby enhancing patient outcomes and fostering a safe and effective approach to physical activity [14]. Incorrect approaches may be immediately identified by the system by analyzing the user's actions and comparing them to proper postures, lowering the chance of harm. The ability to modify their form as needed thanks to this real-time feedback system allows patients to minimize risk and get the most out of their training regimens [15,16]. In addition, technology offers a data-driven, impartial way to exercising oversight. The exercise pose detection system can precisely measure the number of exercises completed, ensuring that patients follow their treatment plan [17,18]. In addition to assisting patients in tracking their progress, this function enables healthcare professionals to keep track of adherence and make educated judgments about the patient's exercise regimen. The preservation and enhancement of strength, which aids people in maintaining their independence as they get older, is one of the major advantages of exercise injury prevention [19]. Individuals may increase their strength and endurance by exercising their muscles and joints, which will help them carry out daily tasks more confidently and easily [20]. For older persons in particular, exercise injury prevention is essential for enhancing balance and lowering the risk of falls and fall-related injuries. Yoga and tai chi are two examples of balanced activities that can assist people improve their body control, stability, and coordination [21].
Individuals can greatly lower their risk of falls, fractures, and other ailments linked to balance impairment by routinely performing these exercises. Additionally, including injury prevention exercises in your program can help you manage and avoid a number of disorders. Numerous illnesses, such as arthritis, heart disease, stroke, type 2 diabetes, osteoporosis, and some forms of cancer, such as breast and colon cancer, are known to be made less likely by regular physical exercise [22-24]. A person's general health can be improved and their risk of contracting certain diseases can be decreased by engaging in workouts that specifically target these problem areas. Technological devices serve as a virtual coach, directing patients to maintain appropriate form and reduce the chance of accidents by spotting wrong postures and suggesting corrective actions [25]. Those who have current medical issues or are recuperating from injuries might benefit most from this proactive approach to injury prevention. Patients may exercise safely and confidently with the system's real-time feedback and instruction, which improves their general well-being and rehabilitation results. Technology also makes telemedicine and remote monitoring capabilities possible. Healthcare professionals may remotely evaluate patients' exercise regimens using the exercise pose detection technology to ensure compliance and progress [18,26,27]. Patients who may have trouble visiting healthcare institutions or need continual supervision owing to geographic restrictions might benefit notably from this remote monitoring option. Healthcare professionals may use machine learning and deep learning algorithms to examine this data and learn more about the patient's strengths, shortcomings, and potential improvement areas [28] and also modify exercise plans and therapies to best suit the unique requirements and objectives of each patient, enhancing both the exercise experience and the health results.
Pose estimation is a computer vision technique that involves detecting and tracking the position and orientation of an object, typically a human body, in an image or video [29]. This technology has several uses, including animation, gaming, motion recognition, and exercise pose detection. Pose estimation can be used to categorize various postures while also giving users immediate feedback and recommendations for corrective action in order to maintain optimum posture [14,30,31]. This reduces the possibility for injury while increasing the benefits of exercise. To detect and categorize diverse postures, 2D pose estimation methods such as OpenPose, YOLO, and PoseNet employ 2D skeleton lines [32,33]. These techniques may be useful for general posture identification, but they may not be the best option for all users, especially those with visual impairments that make it difficult for them to recognize 2D images. With the advent of deep learning-based techniques that can precisely identify and categorize complicated postures, pose estimation technology has evolved tremendously recently [34]. This study is significant because it has the ability to address the special requirements of adults in maintaining fitness through exercise and yoga. Yoga and exercise are crucial for promoting physical well-being and general health, especially in older people.
Exercise and Yoga
Exercise is necessary for optimal physical health, muscular strengthening, cardiovascular function improvement, and flexibility enhancement [35,36]. On the other hand, Yoga has a significant role for improvisation in physical & mental health, while also leading to spiritual enhancements [37,38]. Preservation of transitions in aging is one of the most vital factors for exercise and yoga [39].
Human-Computer Interaction (HCI)
Human-Computer Interaction can redefine the causes that indulges health factor increment with gratifying experiences [40,41].
Technology Integration in Exercise and Yoga Programs
Technology integration is becoming more common in fitness and yoga programs, opening up new avenues for involvement, monitoring, and personalization [42,43]. A novel method to engage in physical activity and mindfulness practices is through interactive exercise and yoga programs. These systems employ technology to produce immersive and interactive experiences that have the potential to be more productive, motivating, and pleasant than traditional techniques [44]. Individuals may construct their own exercise experiences, taking customization a step further [45].
Pose Detection, Estimation and Correction Technology (in 2D)
Pose correction and detection technology has transformed how people practice yoga and fitness. This technology evaluates body motions and provides real-time feedback on form, alignment, and posture using powerful algorithms and computer vision techniques [46]. This device assists users in strengthening their technique, minimizing their risk of injury, and optimizing the benefits of their practice by precisely capturing and evaluating postures [47,48]. In 2D pose estimation, the location and orientation of objects or human body joints in a 2D picture are inferred. The idea is to use a collection of key points or joints to illustrate the stance.
Heatmap-based approaches in 2D pose estimation employ CNNs to generate heatmaps representing joint locations, whereas regression-based methods predict joint coordinates directly [49]. Deep learning-based algorithms are one regularly utilized approach in pose estimation. Convolutional Neural Networks (CNNs) have shown to be extremely effective in detecting and localizing body joints and landmarks. These algorithms understand the spatial correlations and patterns associated with various body positions by training on massive datasets of annotated photos [32,50,51]. To match the model to the observed photo or video data, these approaches use optimization algorithms and 3D human body models [51]. A collection of joint angles and anthropometric measurements are commonly used to define the model [52].
Tiago: A Unique and Customizable Fitness Solution
Tiago is a cutting-edge fitness solution that blends modern technology with custom features to improve the workout experience of the user. Tiago is a cutting-edge personal fitness assistant developed to make exercise and yoga more accessible and enjoyable [53]. Tiago provides a personalized fitness programme that adapts to individual needs and preferences via the use of innovative technology and robust artificial intelligence (Figure 1). Tiago uses the information given to develop a customized training programme that changes based on the development of each client. For everyone attempting to achieve various fitness objectives, such as weight loss, enhanced strength, improved flexibility, and tension release, Tiago offers comprehensive options [53].
Overview of Data Sources
Part of the primary aim of the research was to collect as much information about yoga-exercises and poses as possible from diverse sources. One set of data comes from a data website called Kaggle, while the other came from data scraped from the internet. The Kaggle data is divided into two categories. In one area, images represent yoga poses so that we may see how they should be done. These photos aid us in categorizing the situations into several groupings. The second component of the Kaggle data contains information on postures in English.
Data Collection
Yoga Pose Dataset from Kaggle: In this project, we're focusing on a specific set of yoga poses. These poses were carefully selected from credible sources with the elderly in mind. The chosen yoga poses - chair, cobra, dog, shoulder stand, tree, and warrior - have been meticulously handpicked for their simplicity, accessibility, and potential benefits for individuals seeking gentle yet effective yoga practices (Figure 2) [54]. Warrior II Pose (Virabhadrasana II): This pose embodies strength and determination. From a wide-legged stance, turn one foot out and bend the knee, aligning it with the ankle. Stretch the arms out parallel to the ground, gazing over the front fingertips. Warrior II strengthens the legs and enhances endurance [55].
Exercise Dataset from the Nets: This project is built on a carefully selected set of fitness poses. These postures have been chosen to give extensive strength and balance training. The chosen exercise poses Stepups (with knee raise), Sit-to-Stand, Modified Push Ups (Knee Pushups), Mini Squat (holding a chair), and Bent Over Row Dumbbell, have been thoughtfully handpicked based on their alignment with reputable online sources, offering a comprehensive spectrum of physical benefits and promoting well-rounded fitness. Stepups (with Knee Raise): Stepups are a functional lower body exercise that enhances leg strength and stability [56]. Begin by stepping onto a platform or elevated surface, elevating one leg at a time while lifting the opposite knee. This movement not only strengthens the lower body muscles but also engages the core and challenges balance [57] (Figure 3).
Image Processing & MoveNet Thunder Implementation: We adhered to a tight set of guidelines to ensure that the MoveNet Thunder model can reliably and accurately predict posture. The exercise dataset, comprising five categories of exercise videos, was procured from YouTube [58]. The exercises - Bent-over Row, Sit-to-Stand, Sit-ups, Push-ups, and Chair Squats - were captured in approximately 1-minute videos, yielding around 1800 frames per video [59]. To get started, we acquired the Thunder TFLite model that had already been pre-trained. Before deploying it using the TensorFlow Lite interpreter API, we carefully finished this. Using this ground-breaking method, we were able to use the model in Python without the need for a complicated TensorFlow setup. We begin by configuring important components, such as data input and output, component size, and the number of threads to use. It's crucial to remember that we made sure the input data had the correct size 224 x 224 pixels as the model calls for. To deal with videos, we used OpenCV's Video Capture, a sophisticated tool. It helped us carefully extract frames from movies. We made certain that each frame had the required red, green, and blue colors for the MoveNet framework. We took considerable care in terms of data quality.
We did not use any images that were wrong or had problems. Using a mix of TensorFlow I/O and OpenCV, we successfully retrieved frames from videos. This was critical for the later stages of our work. With the correct color photos in hand, we began the critical stage of position estimation with MoveNet Thunder. To prepare the input for the model, we enlarged the image with np.expand_dims (). This improved the model's performance when we employed it. To begin, we used np.squeeze() to remove some unnecessary objects. The 17x3 array was then subdivided into three arrays: one for x-coordinates, one for y-coordinates, and one for confidence levels. This made working with the data easier. Each row contained the file name as well as three sets of x-values, y-values, and confidence ratings. We performed this for each image or frame in the dataset, resulting in a slew of CSV files. Key points that boasted scores below the discerning threshold of 0.1 faced a transformation whose coordinates were carefully replaced with 0. This tactical maneuver wielded a twofold impact: it acted as a noise filter, and, in parallel, ushered in an era of enhanced accuracy and relevance. Harmonious synthesis served as the guiding principle as the per-class CSVs seamlessly converged into unified datasets for both the yoga and exercise domains. This cohesive union not only streamlined the subsequent modeling processes but also introduced an organized and efficient trajectory for dataset management.
System Design and Approach
The flowchart (as shown in Figure 4) outlines the steps involved in creating a robot buddy and position estimation technology for exercise and yoga help system for tailored practice, mostly for the elderly (Figure 4).
Pose Detection & Data Preprocessing with MoveNet
Movenet, an advanced convolutional neural network, for accurate pose detection. The integration of MoveNet within our system establishes a critical framework for subsequent pose classification, an essential process for realizing our comprehensive web application.
To harness the full potential of MoveNet, meticulous data pre-processing is essential. Raw pose data, comprising landmark coordinates and corresponding scores, undergoes a systematic transformation process:
Landmark Extraction: From the raw dataset, precise landmark coordinates and their associated scores are extracted, laying the groundwork for subsequent analysis.
Reshaping: The extracted landmark data is subjected to reshaping, aligning it with the input format prerequisites of the Movenet architecture.
Pose Normalization: Ensuring uniformity across diverse samples, the pose landmarks undergo normalization. The translation of landmark coordinates to center at the origin (0, 0) and scaling to a standardized pose size are essential steps to achieve this consistency.
Enhancing Pose Data for Classification using Landmarks-to-Embedding Transformation
The essence of this enhancement is encapsulated within the landmarks_to_embedding function. This pivotal transformation encompasses the following stages:
Normalization Refinement: Building upon normalized 2D landmark coordinates, an additional layer of normalization aligns the landmarks with previously established normalized pose landmarks. This ensures a consistent and coherent representation across data samples.
Flattening and Embedding: The refined landmarks are strategically flattened into a vector, resulting in an embedding that encapsulates the intricate nuances of the pose while minimizing redundant information.
(Where i represents the index of the landmark point)
Integration of Detection, Classification & Model Utilisation
For instance, these landmarks include vital anatomical points such as the nose tip, left and right eyes, left and right ears, left and right shoulders, left and right elbows, left and right wrists, left and right hips, left and right knees, and left and right ankles. This section delves into the intricate details of the deep learning pose classification architectures, namely Convolutional Neural Networks (CNNs), Dense Neural Networks, and Multi-Layer Perceptron (MLPs) that harness the transformed pose embeddings generated from the MoveNet's outputs.
Convolutional Neural Network Model & its Implementation
The convolutional neural network (CNN) is a deep neural network that is excellent in comprehending pictures and spatial data, according to (Chauhan, et al. [60,61]). The CNN-based pose classification model is structured as follows: The initial convolutional layers are responsible for identifying local patterns within the input data. These layers are accompanied by max-pooling layers that downsample the spatial dimensions of the data, aiding in feature extraction.
Convolutional Layer 1:
Max-Pooling Layer 1:
Subsequently, another set of convolutional and max-pooling layers further refines the extracted features by capturing higher-level spatial patterns. These features are then flattened into a one-dimensional vector to prepare them for the fully connected layers.
Convolutional Layer 2:
Max-Pooling Layer 2:
Flatten
The architecture concludes with a SoftMax activation layer that produces probability distributions over the different pose classes.
Fully Connected Layer:
Output Layer:
Let:
• N be the number of data samples
• L be the length of the input sequence (34 in this case)
• C(1) ,C(2) be the number of filters in the first and second convolutional layers respectively.
• K(1) ,K(2) be the kernel size in the first and second convolutional layers respectively.
• P(1) ,P(2) be the max-pooling size in the first and second max-pooling layers respectively.
• H be the number of neurons in the fully connected hidden layer
• M be the number of classes (pose categories).
MLP (Multilayer Perceptron)
With exceptional accuracy on both training and test datasets, the MLP has emerged as a formidable rival. However, in terms of validation accuracy, it lags below CNN and DNN.
DNN (Dense Neural Network)
The DNN performed wonderfully, with a specific strength being its high-test accuracy. Despite having somewhat lower training and validation accuracy than the CNN, the DNN's exceptional generalization to unknown test data demonstrated its true capability. The DNN's outstanding performance was likely aided by its ability to detect complicated correlations in data, as well as the advantages of dropout layers for regularization.
Training & Model Evaluation (Figure 5)
The Process is as Follows
Loss and Accuracy Evaluation: The ‘evaluate’ function computes the model's loss and accuracy on the test dataset. The calculated loss value quantifies the disparity between predicted classifications and actual labels, while the accuracy metric represents the proportion of accurate predictions.
Precision: Quantifies the model's ability to correctly identify positive samples among the samples it predicted as positive. In the context of pose classification, precision measures how well the model correctly identifies a specific pose class.
Recall: Also known as sensitivity or true positive rate, measures the model's capability to correctly identify positive samples out of all actual positive samples. It highlights the model's effectiveness in capturing positive instances.
F1-Score: It is a harmonic mean of precision and recall. It provides a balanced assessment of the model's precision and recall capabilities and is particularly useful when classes are imbalanced.
Quantitative Performance Metrics
CNN Algorithm: The Convolutional Neural Network (CNN) algorithm demonstrated exceptional performance across various evaluation metrics:
1. Training Accuracy: 99.48%
2. Validation Accuracy: 99.05%
3. Test Accuracy: 99.62%
4. Precision: 0.984
5. Recall: 0.978
6. F1-score were all 0.979
Figures 6a & 6b depict the CNN Model's training and validation accuracy curves, as well as the training and validation loss plot.
MLP Algorithm
The Multi-Layer Perceptron (MLP) algorithm showcased its robustness in pose classification:
1. Training Accuracy: 97.82%
2. Validation Accuracy: 97.57%
3. Test Accuracy: 98.47%
4. Precision: 0.984
5. Recall: 0.978
6. F1-Score: 0.979
Figures 7a & 7b illustrate the training and validation accuracy curves, as well as the training and validation loss plot, respectively, enhancing our understanding of the algorithm's performance.
DNN Algorithm
The Dense Neural Network (DNN) algorithm demonstrated remarkable proficiency:
1. Training Accuracy: 97.87%
2. Validation Accuracy: 97.04%
3. Test Accuracy: 99.39%
4. Precision: 0.978
5. Recall: 0.979
6. F1-Score: 0.978
Figures 8a & 8b visually represent the training and validation accuracy curves, as well as the training and validation loss plot, respectively.
Use of MoveNet Model
The MoveNet model, a revolutionary TensorFlow.js creation, is responsible for the program's ability to recognize and track people. MoveNet's lightweight design is well acknowledged for being the ideal solution for difficult posture estimation applications. It exhibits the ability to recognize and track key points distributed across the user's body, allowing for exact localization of physical joints. MoveNet is unusual in that it can detect stances in real time. Because of its emphasis on speed, the technique can be applied to a wide range of devices, including those with limited computer capability. This promptness is critical, especially when communicating with individuals in real time through a webcam.
Key Features and Components
The application has a variety of innovative features to provide a comfortable and seamless user experience, such as:
Webcam Feed: Using the "react-webcam" component, the application actively records live video through the user's camera. This dynamic feature enables real-time interaction, allowing users to receive immediate feedback on their yoga and fitness positions (Figure 9a).
User Interface Components
The application includes a basic user interface with useful components such as dropdown menus for selecting postures and thorough instructions. The application's user interface consists of several components designed to enhance user interaction and experience:
Drop Down: Users can select specific Exercise or Yoga poses from a dropdown menu (Figure 9b).
Canvas: The Canvas component overlays the detected keypoints and skeleton visualization onto the live webcam feed, providing users with visual guidance.
Instructions: This component provides users with textual instructions for correctly executing the chosen Exercise/Yoga pose (Figure 9c).
Performance Metrics: The application tracks the duration for which a pose is accurately held or done. It displays real-time updates on the current time or counter and the user's best performance metrics.
Skeleton Visualization: The application offers a visual representation of the skeleton overlaid on the canvas. The color of the skeleton changes to convey pose correctness. A green skeleton indicates a correctly detected pose, while a white skeleton implies an incorrect pose (Figure 9d).
Real-Time Guidance: The application's primary competitive edge is its real-time guidance capabilities. The combined characteristics of the MoveNet model and CNN classifier provide users with rapid feedback on the correctness of their postures. Because of this dynamic feedback loop, users may modify their alignment and posture in real-time, ensuring perfect pose execution. By seeing their live camera feed in conjunction with the overlay of recognized key points and visual feedback, users can rapidly repair faults and increase the accuracy of their postures.
Steps for Application Deployment on Tiago
A rigorous process is necessary to ensure the Real-Time Yoga Pose Estimation Web Application is effectively incorporated into the Tiago robot's operational framework. There are several critical stages in this process:
1. Transferring the Application through FTP.
2. Node Package Installation.
3. Starting the Application.
4. Setting up the server.js.
These meticulous methods serve as the foundation for the implementation of the Real-Time Yoga Pose Estimation Web Application on the Tiago robot. By carefully following each phase, users may unlock the application's potential and harness it within the robot's capabilities to construct a complete and innovative solution.
Based on the comparative evaluation, it is evident that the CNN algorithm outperforms the others in terms of accuracy and classification metrics. Its robustness in handling pose variations and its consistent high performance position it as the optimal choice for our web application's pose detection and classification requirements. The selection of the CNN algorithm aligns with our objective of achieving accurate and reliable pose identification while providing a foundation for a robust user experience.
Our research journey culminated in user testing sessions with elderly individuals, aimed at evaluating the usability and satisfaction of our system. The feedback obtained during these sessions will be instrumental in making necessary adjustments to enhance performance, usability, and user satisfaction. It opens avenues for tailored exercise solutions and an overall improved quality of life for seniors, deftly merging cutting-edge technology with a user-centric design approach. This innovative approach presents a promising route to promote physical health and well-being within an aging population. In conclusion, our research stands as a testament to the potential of technology in promoting physical well-being and enhancing the lives of the elderly through tailored exercise and yoga support systems. It bridges the realms of computer vision, deep learning, and human-robot interaction to empower individuals in their pursuit of a healthier lifestyle. The journey has been fruitful, but it is only the beginning of what promises to be an exciting avenue of research and development.
The completion of this research project marks a significant step towards enhancing the utilization of pose estimation technology for tailored exercise and yoga support systems, particularly for the elderly population. The successful integration of data collection, pose detection, and classification models into a user-friendly web application on the Tiago robot has provided valuable insights and outcomes. However, while this research has yielded promising outcomes, several avenues for future works and enhancements emerge:
1. Enhancements to Data Collection and Augmentation.
2. Individualized feedback and direction.
3. Cross-Modal Integration for Enhanced Interaction.
4. Applications in Clinical Settings and Longitudinal User Research.
In summary, our research sets the stage for a technology-driven paradigm in elderly care, focusing on physical well-being and active aging. As we progress into the future, we are committed to advancing this system and its impact on the lives of the elderly. Through ongoing innovation, user-centered design, and ethical considerations, we aim to pave the way for a healthier and more active aging population.