Article
Open access
Published: 24 September 2025

Machine learning and data-driven inverse modeling of metabolomics unveil key processes of active aging

npj Systems Biology and Applications volume 11, Article number: 103 (2025) Cite this article

959 Accesses
55 Altmetric
Metrics details

Subjects

Abstract

Physical inactivity and low fitness have become global health concerns. Metabolomics, as an integrative approach, may link fitness to molecular changes. In this study, we analyzed blood metabolomes from elderly individuals under different treatments. By defining two fitness groups and their corresponding metabolite profiles, we applied several machine learning classifiers to identify key metabolite biomarkers. Aspartate consistently emerged as a dominant fitness marker. We further defined a body activity index (BAI) and analyzed two cohorts with high and low BAI using COVRECON, a novel method for metabolic network interaction analysis. COVRECON identifies causal molecular dynamics in multiomics data. Aspartate-amino-transferase (AST) was among the dominant processes distinguishing the groups. Routine blood tests confirmed significant differences in AST and ALT. Aspartate is also a known biomarker in dementia, related to physical fitness. In summary, we combine machine learning and COVRECON to identify metabolic biomarkers and molecular dynamics supporting active aging.

A dynamic multi-tissue model to study human metabolism

Article Open access 22 January 2021

Correlations of blood and brain NMR metabolomics with Alzheimer’s disease mouse models

Article Open access 18 March 2025

Machine learning based prediction of cognitive metrics using major biomarkers in SuperAgers

Article Open access 28 May 2025

Introduction

Physical inactivity is a worldwide health problem, and is ranked as the fourth leading behavioral risk factor for global mortality¹. The imperative to maintain body activity, physically and metabolically, is on the rise. The concept of active aging, inspired by Robert Havighurst’s activity theory², suggests that maintaining an active lifestyle is crucial for the well-being of older individuals. The thought of active aging emerged and began to develop in the 1990s, placing strong emphasis on the link between activity and health³.This focus became particularly pertinent due to the worldwide aging population, leading to concerns about inactivity permeating various social domains⁴. Within the transition into the 2020 s, there has been an escalating emphasis on harnessing technology to foster healthy aging^5,6,7. Beyond longevity, active aging encompasses regular physical activity, better management of chronic diseases, and improved quality of life⁸.

Conventionally, several studies have examined various physical aspects of active aging, such as sleep, sedentary time, muscle strengthening activities, and movement behaviors^9,10. In this study, we focus on a specific aspect of body activity — physical performance (e.g., muscle endurance and strength)- and explore its potential association with individual’s metabolomics profiling. As the new technology to diagnose diseases, metabolomics involves the comprehensive analysis of small-molecule metabolites (<10 kDa) present in a biological sample, including metabolic intermediates, hormones, signaling molecules, and secondary metabolites^11,12. Functioning as the culmination of all biological processes in the body, metabolites play a pivotal role in energy generation, signal transmission, and carrying essential information about the body’s status and ongoing functions. Consequently, important metabolites possess the potential to serve as aging biomarkers or as integral components of the metabolic signature. This signature mirrors the active state of the organism as it traverses the aging process¹³. The development of metabolomics empowers us to scrutinize health issues at the molecular level¹⁴. Notably, amid the COVID-19 pandemic, metabolomics has demonstrated its potency as a diagnostic, prognostic, and drug intervention tool¹⁵. As expected, COVID-19 has been extensively investigated using metabolomics methodologies, contributing to biomarker studies^16,17 and evaluations of drug impacts^18,19. Beyond specific disease diagnosis, metabolomics can also illuminate our comprehension of bodily activities (active aging). Recent endeavors have delved into metabolic profiling within aging studies^13,20, providing us with overarching insights into metabolic changes during the aging process. Physical performance (e.g., walking test, muscle strength) is a critical aspect of active aging. In this work, we specifically focus on the metabolomic profiling of older adults and aim to detect key biomarkers and important metabolic interactions linked to physical performance and active aging.

The emerging large scale datasets from OMICS (metabolomics, proteomics, transcriptomic and genomics) measurements empower us to scrutinize any question in biology from a systemic perspective^21,22. In the field of systems biology, a central goal is to identify biomarkers and infer biochemical regulations from large-scale metabolomics data²³. Statistical methods, especially when combined with machine learning techniques, have shown power on constructing accurate classifiers capable of distinguishing between diverse sample groups and revealing underlying biomarkers^24,25,26,27. However, statistical methods offer limited insights into how information is transferred within a biochemical network, the critical regulatory steps involved, and how regulatory mechanisms change under different conditions^11,21,23,28. Several studies have emphasized the necessity of analyzing the dynamic behavior of metabolism to understand the evolution and maintenance of stable metabolic homeostasis under varying environmental conditions^{29,30,31,32,33}.

Mathematically, kinetic models can provide systemic insights into metabolic networks, but constructing these models and estimating parameters poses challenges, particularly for large-scale models³⁴. In recent years, several studies have focused on the steady-state Jacobian investigation of metabolomics data^{35,36,37,38,39} integrating with fluxomics or time-series measurements. In addition, with only large sampled metabolomics measurements, recent studies have developed inverse differential Jacobian algorithms, which provide a convenient way to infer differences in the dynamics of metabolic networks between different conditions²¹^{,30,31,32,33,40,41,42,43,44}. Among them, the most recent study has developed a novel method and workflow termed COVRECON for analyzing key biochemical regulations through the solution of a differential Jacobian problem^21,31,32,33. The COVRECON approach integrates the covariance matrix of metabolomics data with automatic metabolic network modeling based on genome-scale metabolic reconstructions and biochemical reaction databases³².

Figure 1 illustrates the workflow of this study, which consists of three main steps. In step 1, we aim to cluster the original samples into different groups based on physical and functional measurements, where each group represents different body activity conditions calculated from individual’s physical performance measurements. Step 2 involves building machine learning-based classifiers to identify these different groups using metabolomics data, thus enabling the identification of key metabolites as biomarkers. Finally, in step 3, we employ the inverse Jacobian analysis and the COVRECON workflow to uncover the most important biochemical regulations associated with the identified body activity conditions. By conducting this approach, we intend to contribute to the understanding of active aging from a metabolomics perspective and shed light on the key biochemical regulations underlying different fitness conditions.

**Fig. 1: Work scheme of the proposed approach.**

Results

This study was performed in 5 retirement homes in Vienna managed by Curatorship of Viennese Retirement Homes, with the main aim to assess the impact of resistance training and protein-vitamin supplementation or a cognitive training on physical performance⁴⁵. The subjects were randomly assigned to three groups (see Figure S1 in Supplementary Data 1): resistance training (RT), resistance training and supplements (RTS) and cognitive training, acting as a control group (CT). Blood samples were collected at the baseline (T1), after three months (T2) and after six months (T3). Oesen et al. have shown resistance training and supplementation can improve the physical performance of the older adults. In this study we analyze all samples from these three groups together, focusing on the relationship between physical performance and the metabolomics profiles of the older adults.

In this secondary analysis, we conducted plasma metabolomics measurement. Plasma plays a crucial role in maintaining body health by transporting nutrients, hormones, and waste products throughout the body. Changes in plasma composition can serve as indicators of various diseases, making it a valuable medium for health monitoring and diagnosis. Numerous studies have demonstrated associations between plasma metabolomic profiles and different health conditions^46,47,48. Here we focus on the plasma metabolomics changes and the identification of potential biomarkers and biochemical processes related to physical performance in older adults. Thus, treatment-specific improvements in physical strength are not central to our objective. Instead, we analyze all samples collectively to investigate the relationship between active aging and metabolomics patterns. The cohort of older adults with an average age above life expectancy consisted of 117 participants at baseline and altogether we measured 263 plasma metabolomics samples.

In order to establish the relationship between body physical performance and metabolomics profiles, we initially investigated the physical data measurements, which consisted of two types: body physical performance and body shape. Moreover, the body physical performance measurements can be further divided into resistance exercise and endurance exercise types. Table 1 shows the group differences of the physical measurements across the three groups. As expected, compared to the Cognitive Training (CT), the resistance training groups (RT and RTS) exhibit better resistance measurements. Nevertheless, there was no influence on endurance measurements (e.g. walking distance). Notably, endurance exercise has been reported more related to body aging conditions than resistance measurements^49,50. This is also consistent with the experimental design, where the old adults were randomly assigned to the three groups regardless of their fitness.

Table 1 Differences in the physical fitness of the three treatment groups

Full size table

Canonical Correlation (CCA) based clustering to assess physical fitness in a cohort of older institutionalized adults

Since our aim was to investigate the relationship between metabolomics and body physical performance, we first employed Canonical Correlation Analysis (CCA) to generate a body activity index (BAI) based on the physical performance measurements. Subsequently, we clustered the old adults and samples into two, four, and six groups based on this body activity index.

As demonstrated in Fig. 2a, the generated body activity index has a high correlation to the metabolomics index (Pearson Coeff = 0.8471, $p=1.5* {10}^{-19}$), where the CCA loadings of the body activity index is listed aside. Among all physical performance indexes, walking distance showed the most dominant effect within the body activity index. This observation is biologically reasonable since walking distance directly reflects an individual’s endurance condition, which is directly related to the aging process^49,50.

**Fig. 2: Body activity index and metabolomics index from canonical correlation analysis (CCA) and cluster analysis.**

Considering the potential non-linear relationship between the generated body activity index and the metabolomics index, we constructed an automated machine-learning classifier using the XGBoosting algorithm as described in the method part. The automatic classifier was trained with 50 maximum models, over 30 random datasets separation, the averaged AUCs calculated on the hold-out test sets through a repeated double CV approach were determined to be 91.50%, 82.36%, and 62.17% for the two, four, and six-group clusters, respectively (Fig. 2c). This indicates that the CCA generated body activity index and metabolomics index exhibit a strong correlation. Meanwhile, we group all the old adults into two groups for the inverse Jacobian analysis using the mean body activity index as shown in Fig. 2b.

For comparison, we also performed CCA analysis between the metabolomics data and body shape features, such as gender, height, and age. The biplots of the CCA from the metabolomics and body shape analysis are presented in Fig. S2 in Supplementary Data 1. The highest Pearson’s correlation coefficient obtained was only 0.4963 ($p=1.5* {10}^{-19}$) for the age index. Meanwhile, this correlation can partially originate from the physical performance difference generated in this body shape index (e.g., it is evident that age and BMI may influence individual’s physical performance). Additionally, we conducted a further CCA analysis considering the metabolomics data along with both body functionality and body shape data. However, the Pearson correlation coefficient increased only marginally from 0.8471 to 0.8574. This indicates that the metabolomics data are primarily influenced by the body strength/ functionality aspects. Consequently, this validates both the body activity index and the metabolomics index that we developed.

The results of the CCA-based cluster analysis highlight the strong relationship between our derived body activity index and the metabolomics index. The dominance of walking distance as a key factor indicates its significance as a reflection of an individual’s health condition and metabolic activity. In the following analysis, we will focus on the two old adult groups clustered based on the body activity index, labeled as active group and less-active group.

Machine learning based classifiers and variables importance reveals strong association of metabolites and fitness

In this following section, we developed several machine learning based classifiers to predict the active/less-active groups from the metabolomics dataset. This approach can provide us valuable insights on the nonlinear influence between the metabolomics index and body activity index. As described in the Methods section, we evaluated the predictive performance of five machine learning algorithms: XGBoosting (XGB), DRF, GLM, GBM, and DeepLearning (DL) algorithms. For each algorithm, the optimal model was selected using an automated machine learning (AutoML) framework.

Classifier construction followed a repeated double cross-validation scheme based on the area under the receiver operating characteristic curve (AUC). The dataset was divided into four parts. In each iteration, one part was used as the test set, while the remaining three parts were used for model selection via five-fold cross-validation. The selected model was then evaluated on the held-out test set. The final classifier performance was reported as the average of the evaluation metrics across the four folds. This entire double cross-validation procedure was repeated 30 times with different random splits of the dataset.

As shown in Fig. 3, we compared the average performance of the classifiers trained on the full dataset and on three-fourths of the data across the five algorithms. To control potential bias due to random sampling of group labels, we conducted a permutation test with 1500 repeats. All five machine learning models achieved statistically significant results, and all permutation repeats for the five algorithms show negative results, resulting in permutation test p values < 1/1500 ~ 0.00067. The details of permutation test are presented in “Methods” section.

**Fig. 3: Different machine learning classifiers results.**

Figure 3a illustrates the averaged AUC values calculated on the hold-out test datasets through the repeated double CV approach, averaged across 30 random data splits for each algorithm. The detailed AUC results are provided in Supplementary Data 2. Among the five methods, the two boosting methods—XGBoosting (XGB) and Gradient Boosting Machine (GBM)—achieved the highest predictive performance with average AUC values of 0.9150 and 0.913, respectively. As shown in the Fig. 3c, these results are statistically significant (Paired T test P < 0.01) compared to DeepLearning algorithm, which itself outperformed both the Generalized Linear Model (GLM) and Distributed Random Forest (DRF) algorithms (Paired T test P < 0.01). Additional performance metrics comparing these algorithms are also presented in Fig. 3d, with detailed descriptions available in “Methods” section and corresponding statistical values reported in Supplementary Data 2. The superior performance of boosting-based methods suggests the presence of nonlinear patterns in the data, potentially arising from complex interactions within the metabolic network. To evaluate the influence of sample size on classifier performance, we randomly removed one quarter of the training data and reassessed all five algorithms. As expected, AUC values for each method declined slightly, yet remained high overall. This indicates that classifier performance had not yet plateaued and may improve further with larger datasets.

In order to assess the importance of metabolites directly related to the two body activity groups, we ranked the metabolites extracted from the five algorithms based on the testing dataset. We identified the top 10 metabolites for each algorithm by calculating the average variable importance across the 25 repeats. The algorithm-metabolite bipartite graph is shown in Fig. 4a, where Aspartate, Proline, Fructose, Pyruvate and Malic Acid were consistently identified as the top metabolites across almost all classifiers. The detailed metabolite importance values of each algorithm are presented in Supplementary Data 2.

**Fig. 4: Variable importance results of the machine learning based classifiers.**

For a better understanding of the variable importance results, we applied a multi-algorithm auto-machine learning approach, including all five algorithms with a maximum of 100 models, using the ‘automl’ function in the H2o.py package. XGB achieved the best performance, as shown in Table S1 in Supplementary Data 1. The Pareto front plot in Figure S3 in Supplementary Data 1 determined the optimal subset classifier, which included XGBoosting and GBM classifiers, highlighting the superiority of boosting methods for this task. Figure 4b, c present the variable importance and SHAP summary plot for the leading XGBoosting classifier on the test set. The analysis revealed that Aspartate was the most important metabolite, accounting for over 90% of the importance. This highlights the direct influence of the metabolomics aspect on the body activity index. The Pearson’s correlation heatmap shown in Fig. 5 further supports this observation, with Aspartate exhibiting the most significant correlation with body strength data. Although other metabolites, such as Proline, Malic Acid, and Pyruvate, had lower importance values, they consistently appeared among the top 10 metabolites across different classifiers. In Fig. 5, we also did the t test for all metabolites between the two groups, where the differences with significance are plotted. Interestingly, they didn’t fully match the classifier results, e.g. Pyruvate is identified as key metabolites by all classifiers but didn’t show significance. This may suggest that the effect of Pyruvate is non-linear between the two groups. In addition, as shown in Fig. 4c, the SHAP plot of the classifier top metabolites still shows good separation between two groups, albeit with less pronounced distinctions compared to Aspartate. This further indicates that they play a role in reflecting non-linear metabolic effects on the body activity index.

**Fig. 5: Statistic results between the two groups.**

We choose the eight most important metabolites: Aspartate, Proline, Fructose, Malic Acid, Pyruvate, Valine, Citrate and Ornithine, and map them to the KEGG pathways as shown in Fig. S4 in Supplementary Data 1. We can see aside from a few large comprehensive pathways, the top metabolites identified in the classifier results are most related to Central carbon metabolism in cancer and 2-Oxocarboxylic acid metabolism. However, it merely revealed a surface-level connection between active aging and these pathways, which falls short of providing a comprehensive understanding of the underlying biochemical regulations of the active aging dynamics.

Predictive inverse metabolic interaction modeling using the COVRECON platform

While the machine learning and classifier results provide insights into the variable importance between the measured metabolites and the body activity index, this does not explain the mechanistic change between the two groups. Since for each old adult, metabolomics analysis was done three times, first time point, after 3 months and after 6 months, we plotted the correlation heatmap of the change of all body features and metabolomics measurement changes within two the time intervals in Fig. 6. It is evident that the correlation patterns within the metabolomics measurement changes show high similarity. This reflects the internal dynamics of the metabolic networks. Nevertheless, when we check the highly correlated metabolites, we may find no biochemical reactions between the two metabolites from any database. This situation frequently happens, e.g. in Fig. 6, Threonine, Tyrosine and Valine show a high correlation, yet no direct biochemical reactions occur among them. This is because the high correlations originate from the network dynamics. Thus, finding the causal interactions among the metabolites is crucial.

**Fig. 6: The correlations of metabolomics changes.**

In recent years, inverse differential Jacobian algorithms have been developed, providing a convenient way to infer causal dynamics of metabolic networks from metabolomics data^21,30,31,32^,⁴⁰. Besides the metabolomics measurements, metabolic reconstruction is used as complementary information to build a topological model for metabolic interaction network. Based on this, we have developed the COVRECON toolbox (available at: https://bitbucket.org/mosys-univie/covrecon/src/main/)³². As shown in the method part, we applied the COVRECON workflow to the two group datasets. The COVRECON workflow consists of two steps: building the metabolic interaction network and the inverse Jacobian calculation.

As described in COVRECON³², we used a default setting in the Sim-Network part to generate a metabolic super-pathway network of the measured metabolites. Each edge in the network represents a feasible pathway between two nodes (metabolites) and reflects a non-zero component in the system Jacobian matrix. The default setting assigns a fixed weight of one to each reaction, and the reverse reaction weight is based on the log value of its delta Gibbs free energy. Additionally, a pathway-step limitation of 4 is set. Detailed information about reactions, enzymes and genes of the resulting metabolic interaction network can be found in the Supplementary Data 3. By integrating the covariance of the metabolomics data from both groups and the Jacobian structure matrix, we can perform the inverse Jacobian analysis in the second part of COVRECON toolbox. The COVRECON workflow and toolbox address the ill-conditioned matrix problem associated with the inverse Jacobian approach through a regression loss-based algorithm, significantly improving its stability and feasibility^32,33^,⁴². However, given that the inverse Jacobian approach is based on the Jacobian structure and is more reliable in smaller-sized models, we selected a tailored core part of the whole model containing 10–20 metabolites based on the classifier variable importance results as described in method part. The same network reduction strategy as in Sim-Network was employed, with additional indirect connections added to the reduced model. For example, an additional connection from Proline to Aspartate was added to account for the indirect effects through the connections from Proline to Asparagine and from Asparagine to Aspartate (Fig. 7). In Supplementary Data 1, Figure S8 presents 12 typical results in the repeated calculation. All the repeated results are available in Supplementary Data 5. It is evident that even though the local results are different due to the influence from the Jacobian structure information, the Inverse Jacobian approach shows stability on several highlighted metabolic interactions. For example, the interactions Proline->Aspartate, Ornithine->Aspartate, Citrate->Aspartate and Glutamate->2-oxo glutaric acid are high valued in the resulted differential metabolic interaction network of many repeats. To present the overall metabolic interaction importance, we integrated all the 200 local results into the full differential Jacobian (DJ) by calculating the average value of each metabolic interaction within the repeats. The final R* matrix and the differential interaction network are presented in Fig. 7a, b, respectively. In Fig. 7b, we plot only the highlighted metabolic interactions with calculated value (scaled to 0–1) above 0.5. Here we note, the result showed robustness, with similar overall R* using 100, 200 and 500 repeats. Further results are using 200 repeats.

**Fig. 7: The overall inverse Jacobian results integrating all 200 local calculation repeats.**

Through this COVRECON approach, we are able to find several important perturbed metabolic interactions between the two body activity index clustered groups. The highlighted interactions and the detailed reactions, enzymes and gene information are presented in Supplementary Data 4. These findings provide valuable insights into the regulatory interactions and dynamics of the metabolic network related to Aspartate, further supporting its importance as the dominant biomarker in the classifiers results. As shown in Fig. 7c, several reactions are consistently identified in several highlighted metabolic interactions. Among these, enzyme aspartate transaminase (AST, EC number 2.6.1.1) is identified in 11 out of the 15 highlighted interactions and shown in all the largest valued interactions: Proline->Aspartate, Valine->Aspartate, Citrate->Aspartate and Glutamate->2-oxo glutaric acid. The enzyme Glutamic-Pyruvic Transaminase (ALT, EC number: 2.6.1.2) is also highlighted. Notably, both AST and ALT are important enzymes in amino acid metabolism, and recently there is indication of their involvement in health-related issues of older adults^51,52,53. Furthermore, enzyme asparagine synthetase B (EC number: 6.3.5.4) was identified in 8 out of the 15 highlighted interactions. This enzyme is less studied for health issues of elderly peoples. However, asparagine synthetase (ASNS) deficiency was recently discovered as a metabolic disorder of non-essential amino acids⁵⁴. Moreover, it is evident that most identified enzymes in Fig. 6c belong to enzyme class of transaminases (EC:2.6.1.-). The transaminase enzymes are important in the production of various amino acids, and measuring the concentrations of various transaminases in the blood is important in diagnosing and tracking of many diseases⁵⁵.

For a further analysis of the enzymes, we conducted routine blood tests measurements of the old adults across the three time points. Four metabolic enzymes were measured: AST, ALT, Gamma-glutamyltransferase (GGT) and Creatine Kinase (CK). The data measurements are presented in Supplementary Data 2. As shown in Fig. 8 and Figure S6 in Supplementary Data 1, we compared the enzyme measurements between the two groups (active/less active). The results suggested significant differences in AST and ALT, while GGT and CK did not exhibit such significant variations. This observation validates the inverse Jacobian results in Fig. 7. Furthermore, we compared the AST and ALT changes within the two 3-months’ time intervals. As demonstrated in Fig. 8, both AST and ALT showed significant changes in the “active group”, while the changes were not significant in the “less active group” during both 3-months intervals. Notably, the changes also exhibited significant differences between the two groups. Specifically, in the “active group”, AST and ALT demonstrated a significant larger decrease during the first 3 months, followed by a significant larger increase in the subsequent 3-months interval. This suggests a larger plasticity of enzymatic liver and muscle systems in individuals with a high level of body activity. Interestingly, a few studies have revealed similar observations while investigating enzyme variations. In a long-term study of 29 routine laboratory measurements of 30 athletes, AST and ALT exhibited significantly larger variations over an 11-months period compared to those reported for general population^56,57. Moreover, various studies have evidenced the enzyme fluctuations within healthy individuals’ blood samples from physical activity and exercises^{10,58,59,60,61,62}.

**Fig. 8: The enzyme measurements for the two enzymes identified in inverse Jacobian results.**

As a comparison to the COVRECON results, we performed disease and pathway enrichment analyses using MetaboAnalyst. The results are presented in Fig. 9. While the pathway enrichment analysis yielded limited informative results, the disease enrichment analysis interestingly highlighted cirrhosis—a condition commonly observed in older adults. Cirrhosis refers to advanced liver scarring resulting from various causes, including hepatitis and alcohol use disorder. Although our dataset does not include clinical information indicating whether participants were diagnosed with cirrhosis, the enrichment of cirrhosis-related metabolites suggests a degree of heterogeneity among participants in terms of liver health. This observation aligns with the findings from our COVRECON network analysis, though MetaboAnalyst provides less detailed insight into specific network-level changes.

**Fig. 9: The metabolites enrichment analysis results generated by MetaboAnalyst.**

We also conducted a metabolite network analysis using MetaboAnalyst, with the results shown in Figure S9 in Supplementary Data 1. This network was constructed based on a precision matrix, which does not incorporate structural information from biochemical databases such as KEGG. In contrast, COVRECON explicitly focuses on differential networks between conditions and leverages prior biochemical knowledge to highlight detailed and biologically meaningful connections.

Discussion

In this article, we measured 263 plasma metabolomics samples to study active aging and fitness in a cohort of very old adults close to or above the average life expectancy. Using a CCA approach, we clustered all old adults and samples into two groups based on a body activity index. Then we identified several key biomarkers between these two groups through machine- and deep learning analysis. The identified metabolites are Aspartate, Proline, Fructose, Malic Acid, Pyruvate, Valine, Citrate and Ornithine, where Aspartate showed dominant effects. XGboosting showed the best performance. In a further analysis, we applied the COVRECON (Li et al., 2023) approach to the two groups of old adults. Through this method, we identified several key metabolic interaction changes between the active versus the less active group. Many of these interactions are related to aspartate, which is also consistent with the machine learning results. By checking the detailed enzyme information of the highlighted metabolic interactions, we identified several important enzyme regulations. The enzyme AST showed a relation to most highlighted interactions in the COVRECON analysis. Blood measurements of all individuals across the three time points validated the results. Existing studies also showed that AST and ALT are highly related to health issues and dementia of older adults⁶³.

Since the study is conducted at three time points with different treatments, we also analyze the metabolic context for resistance training. As shown in Table S2 in Supplementary Data 1, we conducted a group difference t-test for the metabolomics measurements with alpha Tocopherol showing significant differences between the nutritional supplement intake group (E) and the other two groups, as it is a part of the supplement FortiFit. The metabolites Linoleic acid, Methionine, Palmitic acid, Succinate and Tyrosine showed a significant difference between the control group (K) and the resistance training groups (T & E). Interestingly, this divergence contrasts with the results obtained from the body activity classifiers, suggesting distinct metabolic mechanisms for resistance exercise and endurance exercise. This mechanistic difference between endurance and resistance exercise has been previously explored⁶⁴, where the metabolite changes induced by endurance or resistance exercise are identified in two different modes.

Moreover, several studies have reported that endurance exercise but not resistance exercise has a high relevance to aging related questions. In a study by Cao Dinh et al., 2019, it is reported that among 100 old women (aged over 65 years) strength endurance training significantly reduced senescence-prone T cells, which is widely recognized as age-related⁶⁵, while intensive training showed no significant influence. In another study, Weiner et al., 2019 concluded that endurance but not resistance training has anti-aging effects while examining a total of 124 healthy previously inactive individuals⁴⁹. These studies provide additional support for our body activity index and metabolic network analysis.

Furthermore, we generated scatter plots and calculated correlations between changes in our CCA-derived metabolic index and changes in the body activity index for each individual across two time intervals: from baseline to 3 months, and from 3 to 6 months. The results are presented in Fig. 10. We can see the changes in metabolic index and body activity index also show high correlations in each individual group. Most large changes are in the same direction for metabolic index and body activity index. This further highlights the correspondence between the metabolic index and the body activity index.

**Fig. 10: Comparison of metabolic index changes and body activity index changes.**

As shown in the results section, Aspartate is a dominant blood biomarker for body activity and one of the 22 protein-generic amino acids. It is involved in the malate-aspartate shuttle, which facilitates the transfer of electrons and energy between the cytoplasm and mitochondria, ultimately contributing to the production of ATP and the efficient functioning of cellular energy metabolism⁶⁶. Thus, it is particularly important in tissues with high-energy demands, such as muscle, liver and the heart. This may account for the larger aspartate metabolism in the “active group”. From this point, several groups have evidenced the effect of aspartate as an important supplement for attenuation of exercise-induced hyperammonemia and an increase in exercise endurance^67,68. On the other hand, aspartate is involved in the removal of ammonia from the body through the urea cycle⁶⁹. Performing exercise can lead to ammonia production as a byproduct of energy metabolism. Aspartate may be used to help detoxify ammonia, potentially altering its levels.

Another result generated from COVRECON approach is that old adults with better body activity index have larger plasticity of enzymatic liver and muscle system. AST and ALT are two of the routine blood test enzymes highly related to individual’s liver but also muscle and heart health⁷⁰, where elevated levels of AST and ALT enzymes beyond a specified threshold may indicate medical condition like hepatitis, liver disease or myonecrosis. The ratio AST/ALT is a significant sign of liver disease. We plotted the AST/ALT ratio changes over the three time points for the two groups in Figure S7 in Supplementary Data 1. The results showed no significant changes across the time points and groups. This suggests that AST and ALT variations originate from non-disease related factors. Furthermore, investigations have provided evidence that physical exercise and improved fitness levels can also lead to a transient elevation of these enzyme levels within a healthy range for individuals without underlying liver issues^58,59,60. This exercise-induced transaminase elevation is a well-documented phenomenon, commonly observed in response to vigorous physical activity. It is essential to recognize that these exercise-related increases in AST and ALT levels are typically temporary and return to baseline levels shortly after physical exertion. This indicates larger AST and ALT variations for individuals with better body functionality/activity, as observed in Fig. 8. This viewpoint is also suggested in a long-term study of 29 routine laboratory measurements of 30 athletes, where AST and ALT exhibited significantly larger variations over an 11-months period compared to those reported for general population^56,57.

Furthermore, since natural dementia is a hallmark of aging, we explored its associations with our primary findings—despite physical performance being the central focus of this study. Emerging evidence indicates that in Alzheimer’s disease, brain levels of D-aspartate are dysregulated and neuronal N-acetyl-L-aspartate (NAA) is reduced, reflecting impaired neurotransmission and neuronal integrity⁷¹; aspartate serves as an agonist at N-methyl-D-aspartate (NMDA) receptors, and its metabolic imbalance contributes to synaptic dysfunction and cognitive decline⁷². Peripherally, lower mid-life alanine aminotransferase (ALT) and aspartate aminotransferase (AST) levels are associated with increased long-term dementia risk⁷³, while an elevated AST/ALT ratio correlates with poorer cognitive performance and hippocampal atrophy in older adults⁷⁴. Collectively, these findings underscore regular physical activity as a potent, modifiable factor in reducing natural dementia risk⁷⁵. Together, central aspartate metabolism and peripheral liver enzyme alterations underscore a liver–brain axis in dementia pathogenesis, suggesting novel biomarkers and therapeutic targets.

On the other hand, regular physical activity has been robustly linked to a lower risk of natural (age-related) dementia, with meta-analyses demonstrating that higher activity levels reduce all-cause dementia incidence by approximately 28% (HR 0.72, 95% CI 0.65–0.80)^75,76. Even minimal exercise—such as walking over 6000 steps per day—has been shown to prevent incident dementia in older adults⁷⁷. Mechanistic studies highlight that endurance exercise elevates neurogenesis and brain-derived neurotrophic factor (BDNF), protecting against neurodegeneration⁷⁸. And physical activity programs in dementia patients slow cognitive decline and improve walking quality⁷⁹.

Taken together, this suggests a tightly linked triad of physical activity, metabolomic alterations, and natural dementia. While our work concentrates on the nexus between physical activity and metabolomics, the relationships among the other two pairs are well studied elsewhere. We acknowledge that, without direct investigation in older adult or patient cohorts, our conclusions regarding dementia remain hypothetical and propose that this compelling topic warrants dedicated future research.

In conclusion, in this study we integrate machine learning statistical analysis and COVRECON inverse Jacobian analysis. In metabolomics analysis, machine learning based statistical methods aid us to find the key metabolites. As for the dynamical analysis, aside from kinetic modeling which needs many parameters fitting processes, we showed the predictive metabolic interaction modeling using the inverse differential Jacobian approach. This approach provides a powerful tool to find important dynamic causal molecular regulations between two conditions. By integrating the machine learning results, we showed a robust approach for the inverse differential Jacobian calculation. By robustly identifying aspartate as a biomarker for active aging in the combined ML/COVRECON approach we provide novel insights into fitness parameter such as the body activity index and their linkage to metabolic processes but also a potential link to dementia. In future we propose to integrate metabolomics studies and COVRECON into a unified study linking physical activity, metabolomic alterations, and cognitive decline; exploring this triadic relationship of natural dementia in more detail. Further developments of COVRECON will include weighing of enzyme-level regulation or reaction kinetics—factors that can introduce more precision into the inferred network structure³³. Addressing these gaps will be key objectives in our next phase of research.

Methods

Experimental design

This study was performed in 5 retirement homes in Vienna managed by Curatorship of Viennese Retirement Homes. The aim of this study was to assess the impact of strength training, strength training and protein-vitamin supplement or cognitive training on very old, institutionalized adults. This study was conducted in a randomized, controlled, observer-blind design. The subjects were randomly assigned to three groups: resistance training (RT), resistance training and supplements (RTS) and cognitive training, acting as a control group (CT). The details are presented in Supplementary Data 1. Blood samples were collected at the baseline (T1), after three months (T2) and after six months (T3).

One hundred and seventeen subjects were recruited from five senior residences (Fig. S1 in Supplementary Data 1). The exclusion criteria consisted of physical fitness (Short Physical Performance Battery >4) and mental performance (Mini Mental State Examination ≥23). Moreover, they were free of severe diseases such as diabetic retinopathy, CVDs and regular use of cortisone-containing drugs. Before starting the intervention the health and nutritional status was assessed by specialists in internal medicine and gerontology⁴⁵. All subjects signed informed consent before inclusion in accordance with the Declaration of Helsinki. The study was approved by the ethics committee of the City of Vienna (EK-11-151-0811) and registered at ClinicalTrials.gov, NCT01775111⁴⁵.

Subject characteristics

The sex distribution (87.6% women; 12.4% men) among participants was representative for the population living in nursing homes. The mean age of the study population was 82.9 ± 6.0 years for women and 84.9 ± 6.7 years for men. The participants had a BMI of 29.27 kg/m² ± 5.00 kg/m² ⁴⁵.

Treatment

Resistance training (RT)

The participants performed resistance training twice a week, supervised by a sport scientist. They were able to use elastic bands, chairs and their own body weights. The session consisted of a 10 min warm-up, 30–40 min of strength training that consisted of ten exercises for the main muscle groups (shoulders, arms legs, back, abdomen and chest) and ended with a 10 minutes cool down (Oesen et al. 2015). The exertion was adjusted to the participants’ individual fitness level by adapting the resistance of the elastic band. 15 repetitions were performed and as soon as the exercise could be easily performed by the subjects, the resistance was increased to perform a more difficult version of the exercise and thus obtain a higher training effect (Oesen, et al., 2015).

Resistance training and supplementation (RTS)

The participants performed the same resistance training as the resistance training group. In addition, they received supplements every day and after each training session. The intake of this supplement was controlled. Nutritional Supplement FortiFit, produced by NUTRICIA GmbH, Vienna, Austria, contained 20.7 g protein (56 energy (En)%, 19.7 g whey protein, 3.0 g leucine, >10 g essential amino acids), 9.3 g carbohydrates (25 En%, 0.8 BE), 3.0 g fat (18 En%), 1.2 g roughage (2 En%), 800IU (20 μg) of vitamin D, 250 mg calcium, vitamins C, E, B6 and B12, folic acid and magnesium (Oesen et al. 2015).

Cognitive training (CT)

The participants performed twice a week memory training and finger dexterity exercises in sitting position. Therefore, minimizing the “bias” being alone and not being a part in group activities (socialization factor) (Oesen et al., 2015). Participants of all groups were instructed to maintain their regular food intake.

Blood plasma metabolite extraction and analysis

Several studies addressed the choice of blood sample, revealing that Heparin plasma produces a smaller side effect in the chromatogram spectrum^80,81. Concordant with these findings, Heparin was used as an anticoagulant, while blood plasma was separated from fresh blood samples and kept in –80 °C for further clinical analysis. Metabolite profiles of obtained human plasma samples were measured using a gas chromatograph coupled to mass spectrometer^82,83. The samples were thawed on ice for 45 min and were vigorously vortexed for 10 s. The extraction consisted of two steps. First, 100 µl plasma were transferred into 1.5 ml Eppendorf tubes, followed by the addition of 600 µl ice cooled MeOH, immediately vortexed for 10 s and left one ice for 15 min for incubation. In order to remove proteins, the samples were centrifuged at 14,000 × g for 4 min at 4 °C. The supernatant was transferred into new tubes and dried down in a SpeedVac. Afterwards the dried pellets were stored at –20 °C.

The second step consisted of extraction with CHCl3. 300 µl of CHCl3 were added to pellets. The further procedure was a repetition of the first step. The supernatant was transferred into new Eppendorf tubes and dried down in SpeedVac. Metabolite extractions were performed in batches of 30 samples of randomly selected subjects.

Quality control-mix

A quality control (QC) and calibration mixture consisted of specific metabolites, including organic acids, amino acids, mono- and disaccharides and substrates of the TCA cycle. The table of metabolites for the QC-Mix is attached to Supplementary Data 2. A calibration curve was prepared with concentrations of 2 µl, 5 µl, 10 µl, 20 µl, 40 µl, 80 µl and 100 µl.

Internal standard (10 µl Pinitol and 10 µl Sorbitol) were added to each sample and to each QC just the day before GC-MS analysis. Afterwards, the samples were dried in a SpeedVac.

Derivatization

First, addition of 20 µl of 40 mg mL^–1 of methoxyamine hydrochloride (MeOX) dissolved in pyridine were added to each sample in order to dissolve MeOX in pyridine appropriately, the solution was vigorously vortexed several times and tube was put into hot water. After that, samples were vortexed until pellets were completely dissolved, followed by agitation at 30 °C for 90 min at 750 rpm with a thermoshaker.

N-Methyl-N-(trimethylsilyl) trifluoroacetamide (MSTFA) flasks of 1 ml content was spiked with 30 µl retention index marker solution of alkanes from C10- C40 in hexane. After addition of 80 µl of prepared MSTFA, samples were incubated at 37 °C for 30 min at 750 rpm, followed by centrifugation at 14,000 × g for 2 min at room temperature (24 °C). Immediately after this step, 70 µl of the supernatant were transferred to GC-vials with micro inserts and closed with crimp caps.

GC- MS analysis

Finally, samples were analysed using GC-MS (LECO Pegasus® 4D GCxGC-TOF-MS, Mönchengladbach, Germany) according to Weckwerth et al. 2004 and Leitner et al., 2017^82,83. Immediately after derivatization, 1 µl of sample were injected utilizing a split ratio of 1:5. The split/splitless injector was kept at a constant temperature of 230 °C equipped with a single-tapered liner with deactivated wool. The GC-MS consisted of an Agilent 6890 (Agilent Technologies, Glostrup, Denmark) using helium as carrier gas at a flow rate of 1 mL min–1. Gas separation was performed on the HP-5MS column (30 m 3 0.25 mm 3 0.25 mm, Agilent Technologies).

The initial temperature of the GC oven was set to 70 °C isothermal for 1 min, followed by a heating ramp of 9 °C ${\min }^{-1}$ to reach 330 °C and hold for 7 min.

Transfer line temperature was 250 °C, and ion source temperature was set to 200 °C. The MS detector was switched off during the first 260 s. Mass spectra were acquired with an acquisition rate of 20 spectra ${{\rm{s}}}^{-1}$ and were recorded in the range of 40–600 m/z, utilizing a detector voltage of 1550 V and electron impact ionization of 70 eV. The metabolite assessment required an exchange of the liner every 70 injections, thus every 2 batches in a row.

The whole data acquisition was performed within 14 batches. Each batch was measured in the same chronological order. At the beginning and at the end of each batch an alkane mix containing C10-C40 and different concentrations of the QC mix were measured to allow for external calibration and check for instrument performance. In order to be able to estimate the carry over effect, and to keep the machine clean from the most abundant metabolites, blank samples that contained only dried extraction reagents and derivatization solvents were injected each 5 or 7 samples. Each batch consisted of plasma samples from 20-30 subjects and was analyzed within 24–32 h. One pooled sample was measured for each batch, in order to assess instrument stability. At the end of every batch, the same QC was measured again to monitor instrumental performance over time. To minimize systematic bias induced by preparation order, samples were randomly distributed into 14 batches. Additionally after each batch (around 60–70 total injections), the liner was exchanged. Each batch consisted of a representative cross section of total samples and was comparable to the total experimental population. For batch effect analysis we performed a PCA analysis⁸⁴. There is no pronounced batch effect on every PCA component. The top 8 PCA components biplots are presented in Fig. S10 in Supplementary Data 1.

Metabolite identification, peak integration and alignment

After GC-MS analysis the raw data consisted of ion peaks and were preprocessed using LECO Chroma-TOF. The ion fragmentation spectra were matched to fragmentation spectra in NIST library and inhouse libraries and scored with a match probability, taking into account only metabolites with at least 700 similarity score. Analytes were identified by comparison of ion fragments to a reference library of chemical standards and by calibration based on calibration curves generated with the QC mixtures (see Supplementary Data 2 and the raw data is uploaded in MetaboLights). Alkanes measured at the beginning of each batch provided retention indices that were assigned to all ion peaks. Peak integration and alignment was performed with Chroma-TOF software from LECO. The data is presented in Supplementary Data 2.

Blood test and enzyme determination

Venous blood samples were collected from participants following an overnight fast and processed within two hours. Serum levels of aspartate aminotransferase (AST), alanine aminotransferase (ALT), gamma-glutamyltransferase (GGT), and creatine kinase (CK) were measured using standard automated enzymatic colorimetric methods on a clinical chemistry analyzer (e.g., Roche Cobas 8000 or equivalent). Quality control procedures were performed daily according to manufacturer protocols. All assays were conducted in the hospital’s certified clinical laboratory, and results were expressed in units per liter (U/L). These enzymes serve as routine biomarkers of hepatic and muscular function.

Data processing

The data processing steps involved several procedures. Initially, missing values in the metabolomics measurements were imputed using the K-Nearest Neighbors (KNN) method. Following, normalization was performed to reduce heteroscedasticity and adjust for the offset between high and low intensity features, where the log transformation of each metabolite by centering it around its mean (x̅) and scaling it by its standard deviation (s): ${\hat{x}}_{{ij}}=\left(\frac{{\log }_{2}({x}_{ij})-\overline{{\log }_{2}({x}_{i})}}{s}\right)$

Data clustering

To identify biomarkers and perform the inverse Jacobian analysis, the samples were firstly clustered into distinct groups. The clustering process comprised the following steps. Firstly, based on the information provided in Supplementary Data 2, it was observed that physical measurements could be categorized into two types: “body-shape” data (e.g., gender and height) and “body-functional” data (e.g., walking distance and left standing time). In order to generate a body activity index that reflects body functionality while minimizing the influence of body-shape differences, Canonical Correlation Analysis (CCA)⁸⁵ was applied. The loadings of this body activity index are presented in Fig. 2a, where it can be observed that walking distance exhibits the strongest effects. The metabolomics-related body activity index generated through CCA was then used to cluster the samples using the k-means method, grouping them based on this body activity index.

Machine learning based classifiers

While the CCA-based clustering approach analyzes the relationship between the body activity index and the metabolic index as a linear method, it may not fully capture the dynamic nature of the metabolic mechanism, which inherently exhibits predominantly non-linear behavior. To capture this non-linear influence and achieve higher accuracy with the identification of important variables, several machine learning based classifiers were employed within an automated machine learning framework, implemented using the H2o package in Python. The classifiers are build to predict body activity groups from the metabolomics data. The features are all the metabolites measurements, thus, the feature dimension is 35. The methods utilized are as follows:

1, Generalized Linear Models (GLM): GLM implements regularized linear models with stochastic gradient descent (SGD) learning. The model is updated iteratively using a decreasing strength schedule, estimating the loss gradient for each sample at a time. This method offers a baseline for the linear effects.

2, Random Forest Classifier (DRF): A random forest is an ensemble meta-estimator that fits multiple decision tree classifiers on different sub-samples of the dataset, utilizing averaging to improve predictive accuracy and mitigate overfitting.

3–4, Boosting Methods: Boosting is an ensemble meta-algorithm that reduces bias and variance in supervised learning. It integrates a family of machine learning algorithms that convert weak learners to strong ones⁸⁶. The main variation between many boosting algorithms is their method of weighting training data points and hypotheses. We employed two common boosting methods, LGBMClassifier and XGBClassifier. LGBMClassifier (GBM) is a distributed gradient-boosting framework based on decision tree algorithms, originally developed by Microsoft⁸⁷, while XGBClassifier (XGB) is an open-source library for regularizing gradient boosting⁸⁸.

5, Autoencoder + deep learning: Deep learning (DL), also known as deep neural networks, is a powerful machine learning method extensively used in pattern recognition, image processing, and bioinformatics⁸⁹. Prior to training the model, we employed an autoencoder to pre-train it, using the entire unlabeled data, improving model performance, preventing random weight initialization.

In our approach, each of these machine learning methods was integrated into an automated framework that encompasses hyper-parameter optimization. Hyper-parameter optimization entails the selection of ideal parameter values that govern the learning process, aiming to enhance model performance⁹⁰. In Supplementary Data 1, the Figure S5 provides an overview of the scope of hyper-parameters associated with each machine learning method.

Repeated double cross validation

To optimize hyperparameters and more comprehensively evaluate model performance, we employed a repeated double cross-validation (rdCV) strategy^91,92. This advanced validation technique is particularly well-suited for small datasets, as it helps to optimize model complexity while yielding robust and realistic estimates of predictive performance.

The rdCV procedure comprises three main components:

1, Outer Loop (Test Set Evaluation): The dataset is randomly divided into four segments. In each iteration, one segment is held out as the test set, while the remaining three serve as the calibration set. This outer loop is used to evaluate the model’s predictive performance on unseen data, providing an unbiased estimate of generalization ability.

2, Inner Loop (Model Optimization): Within each calibration set from the outer loop, an additional cross-validation is performed to optimize hyperparameters. Model selection is based on the area under the ROC curve (AUC), ensuring a balance between model complexity and predictive accuracy, and reducing the risk of overfitting.

3, Repetition Loop: The entire double cross-validation process is repeated 30 times with different random data partitions. This repetition improves the stability and reliability of the performance metrics and offers insights into the variability of model complexity and selection across different data splits.

Model performance and permutation test

To assess the performance of each classifier, we employed a comprehensive set of evaluation metrics:

1, AUC (Area Under the ROC Curve): AUC measures a model’s ability to distinguish between positive and negative classes across all possible classification thresholds. An AUC of 1.0 indicates perfect classification, whereas an AUC of 0.5 reflects performance equivalent to random guessing.

2, AUCPR (Area Under the Precision-Recall Curve): AUCPR evaluates the trade-off between precision and recall across various thresholds. It is particularly informative when dealing with imbalanced datasets. AUCPR is computed as the weighted average of precision over recall, where the weights correspond to the probability distribution over thresholds.

3 Gini Index: The Gini index is derived from the AUC and quantifies the discriminatory power of a classifier. Gini=2×AUC − 1. A Gini index of 1 indicates perfect discrimination, while a value of 0 indicates no discriminative ability.

4, F1 Score: The F1 score is the harmonic mean of precision and recall, providing a single metric to evaluate classification effectiveness on the positive class. An F1 score of 1 signifies that both precision and recall are perfect; a lower F1 score indicates a trade-off between false positives and false negatives.

5, Logarithmic Loss (LogLoss): LogLoss measures the accuracy of predicted probabilities rather than hard classification labels. It penalizes both overconfident incorrect predictions and underconfident correct predictions. Lower LogLoss values indicate better probabilistic predictions.

6, Mean Squared Error (MSE): MSE quantifies the average squared difference between predicted probabilities and actual class labels. Although more common in regression, MSE is also informative for probabilistic classifiers.

7, Root Mean Squared Error (RMSE): RMSE is the square root of MSE and provides an interpretable scale for average prediction error, in the same units as the predicted values.

8, Mean Per-Class Error: This metric computes the average misclassification rate across all classes. For binary classification, it reflects the average of the error rates in each class, offering a balanced perspective on performance, especially for imbalanced datasets.

To further validate each classifier, we conducted permutation tests to assess the statistical significance of the model performance^92,93. Although high evaluation metrics (e.g., AUC) may suggest strong discriminative power, such values can occasionally be achieved by chance due to favorable random splits of training, validation, and test sets. Therefore, it is important to determine whether the observed performance truly reflects meaningful classification ability.

In a permutation test, the class labels of the samples are randomly shuffled, and the same classifier is rebuilt using the permuted labels. This process is repeated multiple times—in our case, 1500 iterations—to generate a null distribution of the performance metric under the hypothesis of no association between predictors and labels. If a classifier trained on permuted labels outperforms the original model, the observed classification performance is likely due to chance, rendering the original model unreliable. In case of AUC metric, the upper threshold P for the P-value of each algorithm is calculated as $P=\frac{1+\#({{AUC}}_{{Perm}}\ge {{AUC}}_{{obs}})}{N}$, where ${{AUC}}_{{Perm}}$ refers to the ${AUC}$ values obtained from the permuted datasets, ${{AUC}}_{{obs}}$ is the observed ${AUC}$ from the actual data, and $N$=1500 is the number of permutation tests. A similar formulation is applied to other performance metrics, replacing the inequality with the appropriate comparison. In the case where none of the permuted models achieve a better performance than the original (i.e., zero out of 1500 permutations exceed the observed value), the resulting p-value is bounded by: $P < \frac{1}{1500}\approx 0.00067$.

The machine learning classifiers and permutation test code is available in Supplementary Data 6 in Supplementary Data 1.

Feature importance

Feature importance was estimated using a model-based approach, considering a feature to be important if it significantly contributed to the model’s performance. Here, the ‘varimp’ function within the H2o.py package was utilized to rank the important metabolites of each classifier. The importance value is averaged over the 25 training-test separations, and we choose the top 10 metabolites for each machine-learning method.

Predictive metabolic modeling using an inverse Jacobian approach

Statistical and machine learning methods face inherent limitations in elucidating biochemical network dynamics, identifying critical regulatory steps, and capturing condition-specific regulatory changes²⁴. To address this, inverse differential Jacobian algorithms have recently been developed as a powerful approach to infer dynamic regulation of metabolic networks from metabolomics data^{21,30,31,32,33}^,40,41,42^,94.

In previous studies, we introduced the COVRECON workflow and Matlab toolbox as the standard inverse Jacobian workflow^31,32. COVRECON combines the covariance matrix of metabolomics data with automated network modeling based on genome-scale metabolic reconstructions and biochemical reaction databases.

Consider a metabolic network with n metabolites $\{{X}_{i}{\}}_{i=1\ldots n}$, modeled by a system of ODEs:

$$\frac{d{\boldsymbol{M}}}{{dt}}={\boldsymbol{F}}\left({\boldsymbol{M}}\right),\,{\boldsymbol{M}}=\left\{{M}_{i}\right\}=\left\{\left|{X}_{i}\right|\right\}$$

(1)

where ${\boldsymbol{M}}$ are the concentrations of the n metabolites, and ${\boldsymbol{F}}=\left\{{f}_{i}\left({\boldsymbol{M}}\right)\right\}$ denotes their reaction rates (e.g., mass action or Michaelis–Menten kinetics).

The steady-state Jacobian matrix is defined as: ${{\boldsymbol{J}}}_{{ij}}={\frac{\partial {f}_{i}}{\partial {M}_{j}}|}_{{steady}}$

$${\boldsymbol{J}}={\frac{\partial {\boldsymbol{F}}}{\partial {\boldsymbol{M}}}}_{{steady}}=\left[\begin{array}{ccc}\begin{array}{cc}\frac{\partial {f}_{1}}{\partial {M}_{1}} & \frac{\partial {f}_{1}}{\partial {M}_{2}}\\ \frac{\partial {f}_{2}}{\partial {M}_{1}} & \frac{\partial {f}_{2}}{\partial {M}_{2}}\end{array} & \cdots & \begin{array}{c}\frac{\partial {f}_{1}}{\partial {M}_{n}}\\ \frac{\partial {f}_{2}}{\partial {M}_{n}}\end{array}\\ \vdots & \ddots & \vdots \\ \begin{array}{cc}\frac{\partial {f}_{n}}{\partial {M}_{1}} & \frac{\partial {f}_{n}}{\partial {M}_{2}}\end{array} & \cdots & \frac{\partial {f}_{n}}{\partial {M}_{n}}\end{array}\right]steady$$

(2)

It captures first-order interactions among metabolites and encodes dynamic regulatory relationships. Steuer et al. ⁹⁴ derived the following Lyapunov equation linking the covariance matrix $C$ of metabolite concentrations to the Jacobian ${\boldsymbol{J}}$:

$${\boldsymbol{J}}* C+C* {{\boldsymbol{J}}}^{T}=-2{\boldsymbol{D}}$$

(3)

where the fluctuation matrix D represents the covariance of noise sources acting on the system.

The differences between two conditions can be quantified by the differential Jacobian $D{\boldsymbol{J}}$, which is calculated from the Jacobians of the two groups:

$${D{\boldsymbol{J}}}_{{ij}}=\left\{\begin{array}{c}\max \left(\left|\frac{{({{\boldsymbol{J}}}_{{\boldsymbol{d}}})}_{{\boldsymbol{ij}}}}{{{({\boldsymbol{J}}}_{{\boldsymbol{h}}})}_{{\boldsymbol{ij}}}}\right|,\left|\frac{{({{\boldsymbol{J}}}_{{\boldsymbol{h}}})}_{{\boldsymbol{ij}}}}{{{({\boldsymbol{J}}}_{{\boldsymbol{d}}})}_{{\boldsymbol{ij}}}}\right|\right)\\ 1,{\,{{\boldsymbol{if}}({\boldsymbol{J}}}_{{\boldsymbol{h}}})}_{{\boldsymbol{ij}}}={\bf{0}}.\end{array}\right.$$

(4)

To compare two conditions, the differential Jacobian $D{\boldsymbol{J}}$ is defined elementwise from the corresponding Jacobians. It highlights condition-specific regulatory changes. Solving the inverse problem (inferring $D{\boldsymbol{J}}$ from metabolomics data) requires both the structural information of ${\boldsymbol{J}}$ and an optimization strategy.

COVRECON addresses this by integrating automated metabolic network assembly with an inverse Jacobian algorithm. The method reformulates the Lyapunov equation as a regression problem, where variation in regression loss is more robust than variation in regression solutions. Based on this property, a regression loss matrix R* is constructed to approximate the relative importance of elements in $D{\boldsymbol{J}}$. Larger R* values indicate stronger regulatory differences between two conditions. For robustness, R* is computed across multiple random realizations of the fluctuation matrix $D$ with the final score normalized to [0, 1]³².

In the COVRECON workflow and its Matlab toolbox³², we integrate automated metabolic network reconstruction with inverse differential Jacobian analysis via a regression-loss-based algorithm. The method first assembles a metabolic interaction network that encodes the Jacobian structure, then estimates the differential Jacobian by calculating a regression loss matrix R*. The results are visualized in Matlab figures, where interaction pathways can be interactively examined.

In this approach, the Lyapunov Eq. (3) is reformulated as a set of linear equations:

$$\begin{array}{c}{A}_{h}\,{{\rm{q}}}_{h}={{\rm{b}}}_{h}\\ {A}_{d}\,{{\rm{q}}}_{d}={{\rm{b}}}_{d}\end{array}$$

(5)

Where $A,{\rm{q}},{\rm{b}}$ are generated from corresponding $C,J,D$ respectively. Li, et al. ³² verified that under numerical variations in ${\rm{b}}$ the variation of the regression solution ${\rm{q}}$ is much larger compared to the variation in the regression loss $r$. Based on this property, we construct a “regression loss matrix” ${{\boldsymbol{R}}}^{* }$ to capture the relative importance of Jacobian elements rather than directly estimating their absolute values. Specifically, for each element ${J}_{{ij}}$

$${q}_{s}^{* }={{({A}_{c}}^{T}{A}_{c})}^{-1}{{A}_{c}}^{T}{b}_{s}{R}_{{ij}}^{* }={\min }_{{b}_{s}}{||}{b}_{s}-{A}_{c}{q}_{s}^{* }{||}$$

(6)

Where ${A}_{c}$ is calculated by combining ${A}_{h}$ and ${A}_{d}$ in Eq. (5) with additional constraint that only that single element ${J}_{{ij}}$ is the same between the Jacobians and ${b}_{c}=[{b}_{h};{b}_{d}]$. If this assumption fails, an additional regression loss arises, reflected in larger ${R}_{{ij}}^{* }$³².

Because only the structure (not the values) of ${D}_{h}$ and ${D}_{d}$ is known, multiple realizations of D are sampled according to its nonzero structure. In practice, 1000 samples are used, and the final ${R}_{{ij}}^{* }$ is taken as the minimum loss across all samples. Larger ${R}_{{ij}}^{* }$ indicates stronger regulatory differences between conditions. For interpretability, ${R}^{* }{\rm{is\; normalized\; to}}$ [0,1]³².

Unlike correlation-based methods, which often include indirect or spurious associations due to the absence of biochemical priors, COVRECON incorporates curated reaction databases to reconstruct direct interactions. This enables the identification of potentially causal mechanisms underlying metabolic regulation. Through this framework, we aim to reveal key components and regulatory interactions embedded in the differential Jacobian.

Integrate classifier biomarkers and group differential Jacobian analysis

Since we have clustered the samples into two groups in the data clustering part, we are now able to do the inverse Jacobian analysis for the two groups. As discussed in Supplementary Data 1, similar to the general approach of most kinetic models, we consider the dynamics within each group is simulated in a group model, thus the steady state dynamics can be represented as a group Jacobian. Consequently, the inverse Jacobian algorithm can offer valuable information of the regulated dynamics between the two groups.

The results from the inverse Jacobian analysis are closely linked to the structural information of the Jacobian obtained from the automatically generated super-pathway metabolic interaction networks. It is essential to highlight that we combine the significance of classifier variables in the context of inverse Jacobian analysis. Simply put, we retain the pivotal biomarkers and introduce a controlled mix of randomly chosen additional metabolites. The augmented networks, encompassing 10–20 metabolites, are subsequently subjected to the COVRECON workflow. Notably, in COVRECON results, large values serve as indicators of the dynamics difference between the two distinct groups. We are able to identify the important reactions or enzymes involved in the active aging context by checking the detailed information behind these large values³².

Data availability

The data underlying this article are available in the online Supplementary Data. The raw metabolomics data is uploaded in MetaboLghts with ID: REQ20250608211079. Please contact the corresponding author Wolfram Weckwerth if you have further requirements.

Code availability

The Matlab code for COVRECON is available in https://bitbucket.org/mosys-univie/covrecon/. The machine learning classifiers and permutation test code is available in Supplementary Data 6.

References

Kohl, H. W. et al. The pandemic of physical inactivity: global action for public health. lancet 380, 294–305 (2012).
Article PubMed Google Scholar
Havighurst, R. J. Successful aging. Process. Aging.: Soc. Psychol. Perspect. 1, 299–320 (1963).
Google Scholar
WHO, Active ageing: A policy framework. 2002, World Health Organization.
Boudiny, K. & Mortelmans, D. A critical perspective: Towards a broader understanding of’active ageing’. E-journal. Appl. Psychol. 7, 8-14 (2011).
Google Scholar
Offerman, J. et al. Attitudes related to technology for active and healthy aging in a national multigenerational survey. Nat. Aging 3, 617–625 (2023).
Article CAS PubMed PubMed Central Google Scholar
Wongsala, M., Anbäcken, E.-M. & Rosendahl, S. Active ageing–perspectives on health, participation, and security among older adults in northeastern Thailand–a qualitative study. BMC Geriatrics 21, 1–10 (2021).
Article Google Scholar
Malkowski, O. S., Kanabar, R. & Western, M. J. Socio-economic status and trajectories of a novel multidimensional metric of Active and Healthy Ageing: the English Longitudinal Study of Ageing. Sci. Rep. 13, 6107 (2023).
Article CAS PubMed PubMed Central Google Scholar
Fernández-Ballesteros, R. et al. Active aging: a global goal. 2013, Hindawi.
Caprara, M. et al. Active aging promotion: Results from the Vital Aging Program. Curr. Gerontol. Geriatrics Res. 2013. https://doi.org/10.1155/2013/817813 (2013).
Taylor, A. W. Physiology of exercise and healthy aging. 2022: Human Kinetics.
Weckwerth, W. Metabolomics: an integral technique in systems biology. Bioanalysis 2, 829–836 (2010).
Article CAS PubMed Google Scholar
Patti, G. J., Yanes, O. & Siuzdak, G. Metabolomics: the apogee of the omics trilogy. Nat. Rev. Mol. cell Biol. 13, 263–269 (2012).
Article CAS PubMed PubMed Central Google Scholar
Balashova, E. E. et al. Metabolome profiling in aging studies. Biology 11, 1570 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gonzalez-Covarrubias, V., Martínez-Martínez, E. & del Bosque-Plata, L. The potential of metabolomics in biomedical applications. Metabolites 12, 194 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bruzzone, C. et al. Metabolomics as a powerful tool for diagnostic, pronostic and drug intervention analysis in COVID-19. Front. Mol. Biosci. 10, 1111482 (2023).
Article CAS PubMed PubMed Central Google Scholar
Su, Y. et al. Multi-omics resolves a sharp disease-state shift between mild and moderate COVID-19. Cell 183, 1479–1495.e20 (2020).
Article CAS PubMed PubMed Central Google Scholar
Sindelar, M. et al. Longitudinal metabolomics of human plasma reveals prognostic markers of COVID-19 disease severity. Cell Rep. Med. 2, https://doi.org/10.1016/j.xcrm.2021.100369 (2021).
Meoni, G. et al. Metabolomic/lipidomic profiling of COVID-19 and individual response to tocilizumab. PLoS Pathog. 17, e1009243 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ghini, V. et al. Serum NMR profiling reveals differential alterations in the lipoproteome induced by pfizer-BioNTech vaccine in COVID-19 recovered subjects and naïve subjects. Front. Mol. Biosci. 9, 839809 (2022).
Article PubMed PubMed Central Google Scholar
Panyard, D. J., Yu, B. & Snyder, M. P. The metabolomics of human aging: Advances, challenges, and opportunities. Sci. Adv. 8, eadd6155 (2022).
Article CAS PubMed PubMed Central Google Scholar
Weckwerth, W. Toward a unification of system-theoretical principles in biology and ecology—the stochastic lyapunov matrix equation and its inverse application. Front. Appl. Math. Stat. 5, 29 (2019).
Article Google Scholar
Weckwerth, W. Green systems biology—from single genomes, proteomes and metabolomes to ecosystems research and biotechnology. J. Proteom. 75, 284–305 (2011).
Article CAS Google Scholar
Weckwerth, W. Unpredictability of metabolism-the key role of metabolomics science in combination with next-generation genome sequencing. Anal. Bioanal. Chem. 400, 1967–1978 (2011).
Article CAS PubMed PubMed Central Google Scholar
Sidak, D. et al. Interpretable machine learning methods for predictions in systems biology from omics data. Front. Mol. Biosci. 9, 926623 (2022).
Article CAS PubMed PubMed Central Google Scholar
Liebal, U. W. et al. Machine learning applications for mass spectrometry-based metabolomics. Metabolites 10, 243 (2020).
Article CAS PubMed PubMed Central Google Scholar
Pomyen, Y. et al. Deep metabolome: Applications of deep learning in metabolomics. Comput. Struct. Biotechnol. J. 18, 2818–2825 (2020).
Article CAS PubMed PubMed Central Google Scholar
Alakwaa, F. M., Chaudhary, K. & Garmire, L. X. Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data. J. Proteome Res. 17, 337–347 (2018).
Article CAS PubMed Google Scholar
Weckwerth, W. Metabolomics in systems biology. Annu. Rev. plant Biol. 54, 669–689 (2003).
Article CAS PubMed Google Scholar
Wienkoop, S. et al. Integration of metabolomic and proteomic phenotypes: analysis of data covariance dissects starch and RFO metabolism from low and high temperature compensation response in Arabidopsis thaliana. Mol. Cell. Proteom. 7, 1725–1736 (2008).
Article CAS Google Scholar
Nägele, T. et al. Solving the differential biochemical Jacobian from metabolomics covariance data. PloS one 9, e92299 (2014).
Article PubMed PubMed Central Google Scholar
Wilson, J. L. et al. Inverse data-driven modeling and multiomics analysis reveals phgdh as a metabolic checkpoint of macrophage polarization and proliferation. Cell Rep. 30, 1542–1552.e7 (2020).
Article CAS PubMed Google Scholar
Li, J., Waldherr, S. & Weckwerth, W. COVRECON: Automated integration of genome- and metabolome-scale network reconstruction and data-driven inverse modeling of metabolic interaction networks. Bioinformatics 39, btad397 (2023).
Article CAS PubMed PubMed Central Google Scholar
Li, J., Weckwerth W. & Waldherr, S. Network structure and fluctuation data improve inference of metabolic interaction strengths with the inverse Jacobian. npj Syst. Biol. Appl. 137 (2024).
King, Z. A. et al. BiGG Models: A platform for integrating, standardizing and sharing genome-scale models. Nucleic Acids Res. 44, D515–D522 (2016).
Article CAS PubMed Google Scholar
Steuer, R. et al. Structural kinetic modeling of metabolic networks. Proc. Natl. Acad. Sci. 103, 11868–11873 (2006).
Article CAS PubMed PubMed Central Google Scholar
Jamshidi, N. & Palsson, B. Ø Mass action stoichiometric simulation models: incorporating kinetics and regulation into stoichiometric models. Biophys. J. 98, 175–185 (2010).
Article CAS PubMed PubMed Central Google Scholar
Haiman, Z. B. et al. MASSpy: Building, simulating, and visualizing dynamic biological models in Python using mass action kinetics. PLoS Comput. Biol. 17, e1008208 (2021).
Article CAS PubMed PubMed Central Google Scholar
Akbari, A., Haiman, Z. B. & Palsson, B. O. A data-driven approach for timescale decomposition of biochemical reaction networks. Msystems 9, e01001-23 (2024).
Article PubMed PubMed Central Google Scholar
Nägele, T. Metabolic regulation of subcellular sucrose cleavage inferred from quantitative analysis of metabolic functions. Quant. Plant Biol. 3, e10 (2022).
Article PubMed PubMed Central Google Scholar
Sun, X. & Weckwerth, W. COVAIN: A toolbox for uni-and multivariate statistics, time-series and correlation network analysis and inverse estimation of the differential Jacobian from metabolomics covariance data. Metabolomics 8, 81–93 (2012).
Article CAS Google Scholar
Kügler, P. & Yang, W. Identification of alterations in the Jacobian of biochemical reaction networks from steady state covariance data at two conditions. J. Math. Biol. 68, 1757–1783 (2014).
Article PubMed Google Scholar
Sun, X., Länger, B. & Weckwerth, W. Challenges of inversely estimating jacobian from metabolomics data. Front. Bioeng. Biotechnol. 3, 188 (2015).
Article PubMed PubMed Central Google Scholar
Weiszmann, J. et al. Metabolome plasticity in 241 Arabidopsis thaliana accessions reveals evolutionary cold adaptation processes. Plant Physiol. 2023: kiad298.
Chaturvedi, P. et al. Natural variation in the chickpea metabolome under drought stress. Plant Biotechnol. J. https://doi.org/10.1111/pbi.14447 (2024).
Oesen, S. et al. Effects of elastic band resistance training and nutritional supplementation on physical performance of institutionalised elderly—A randomized controlled trial. Exp. Gerontol. 72, 99–108 (2015).
Article PubMed Google Scholar
Lv, J. et al. Plasma metabolomics reveals the shared and distinct metabolic disturbances associated with cardiovascular events in coronary artery disease. Nat. Commun. 15, 5729 (2024).
Article CAS PubMed PubMed Central Google Scholar
Tessier, A.-J. et al. Plasma metabolites of a healthy lifestyle in relation to mortality and longevity: Four prospective US cohort studies. Med 5, 224–238.e5 (2024).
Article CAS PubMed Google Scholar
Wang, F. et al. Plasma metabolomic profiles associated with mortality and longevity in a prospective analysis of 13,512 individuals. Nat. Commun. 14, 5744 (2023).
Article CAS PubMed PubMed Central Google Scholar
Werner, C. M. et al. Differential effects of endurance, interval, and resistance training on telomerase activity and telomere length in a randomized, controlled study. Eur. heart J. 40, 34–46 (2019).
Article CAS PubMed Google Scholar
Cao Dinh, H. et al. Strength endurance training but not intensive strength training reduces senescence-prone T cells in peripheral blood in community-dwelling elderly women. J. Gerontology: Ser. A 74, 1870–1878 (2019).
Google Scholar
Le Couteur, D. G. et al. The association of alanine transaminase with aging, frailty, and mortality. J. Gerontol. Ser. A: Biomed. Sci. Med. Sci. 65, 712–717 (2010).
Article Google Scholar
Goh, G. B.-B. et al. Age impacts ability of aspartate–alanine aminotransferase ratio to predict advanced fibrosis in nonalcoholic fatty liver disease. Digest Dis. Sci. 60, 1825–1831 (2015).
Article CAS PubMed Google Scholar
Nakajima, K. et al. High aspartate Aminotransferase/Alanine aminotransferase ratio may be Associated with all-cause mortality in the Elderly: a Retrospective Cohort Study using Artificial Intelligence and Conventional Analysis. in Healthcare. 2022. MDPI.
Yamamoto, T. et al. The first report of Japanese patients with asparagine synthetase deficiency. Brain Dev. 39, 236–242 (2017).
Article PubMed Google Scholar
Oh, R. C. et al. Mildly elevated liver transaminase levels: causes and evaluation. Am. Fam. physician 96, 709–715 (2017).
PubMed Google Scholar
Diaz-Garzon, J. et al. Long-term within-and between-subject biological variation of 29 routine laboratory measurands in athletes. Clin. Chem. Lab. Med. (CCLM) 60, 618–628 (2022).
Article CAS PubMed Google Scholar
Diaz-Garzon, J. et al. Long-term within-and between-subject biological variation data of hematological parameters in recreational endurance athletes. Clin. Chem. 69, 500–509 (2023).
Article CAS PubMed Google Scholar
Pavletic, A. J. & Wright, M. E. Exercise-induced elevation of liver enzymes in a healthy female research volunteer. Psychosomatics 56, 604 (2015).
Article PubMed PubMed Central Google Scholar
Pettersson, J. et al. Muscular exercise can cause highly pathological liver function tests in healthy men. Br. J. Clin. Pharmacol. 65, 253–259 (2008).
Article PubMed Google Scholar
Tiller, N. B. & Stringer, W. W. Exercise-induced increases in “liver function tests” in a healthy adult male: Is there a knowledge gap in primary care?. J. Fam. Med. Prim. Care 12, 177 (2023).
Article Google Scholar
Nunez, D. J. et al. Factors influencing longitudinal changes of circulating liver enzyme concentrations in subjects randomized to placebo in four clinical trials. Am. J. Physiol. -Gastrointest. Liver Physiol. 316, G372–G386 (2019).
Article CAS PubMed Google Scholar
Ruiz, J. R. et al. Physical activity, sedentary time, and liver enzymes in adolescents: the HELENA study. Pediatr. Res. 75, 798–802 (2014).
Article CAS PubMed Google Scholar
Andy, S. Y. & Keeffe E.B. Elevated AST or ALT to nonalcoholic fatty liver disease: accurate predictor of disease prevalence? 2003, LWW. 955-956.
Morville, T. et al., Plasma metabolome profiling of resistance exercise and endurance exercise in humans. Cell Rep. 2020. 33.
Childs, B. G. et al. Cellular senescence in aging and age-related disease: from mechanisms to therapy. Nat. Med. 21, 1424–1435 (2015).
Article CAS PubMed PubMed Central Google Scholar
Borst, P. The malate–aspartate shuttle (Borst cycle): How it started and developed into a major metabolic pathway. IUBMB Life 72, 2241–2259 (2020).
Article CAS PubMed PubMed Central Google Scholar
Marquezi, M. L. et al. Effect of aspartate and asparagine supplementation on fatigue determinants in intense exercise. Int. J. sport Nutr. Exerc. Metab. 13, 65–75 (2003).
Article CAS PubMed Google Scholar
Trudeau, F. Aspartate as an ergogenic supplement. Sports Med. 38, 9–16 (2008).
Article PubMed Google Scholar
Fibriansah, G. et al. Structural basis for the catalytic mechanism of aspartate ammonia lyase. Biochemistry 50, 6053–6062 (2011).
Article CAS PubMed Google Scholar
Lala, V., Zubair M., & Minter D. A. Liver function tests, in StatPearls [internet]. 2022, StatPearls Publishing.
Piubelli, L. et al. The role of D-amino acids in Alzheimer’s disease. J. Alzheimer’s. Dis. 80, 475–492 (2021).
Article CAS Google Scholar
Lin, C.-H. & Lane, H.-Y. The role of N-methyl-D-aspartate receptor neurotransmission and precision medicine in behavioral and psychological symptoms of dementia. Front. Pharmacol. 10, 540 (2019).
Article CAS PubMed PubMed Central Google Scholar
Lu, Y. et al. Low liver enzymes and risk of dementia: the atherosclerosis risk in communities (ARIC) study. J. Alzheimer’s. Dis. 79, 1775–1784 (2021).
Article CAS Google Scholar
Li, W. et al. An increased aspartate to alanine aminotransferase ratio is associated with a higher risk of cognitive impairment. Front. Med. 9, 780174 (2022).
Article Google Scholar
Iso-Markku, P. et al. Physical activity as a protective factor for dementia and Alzheimer’s disease: systematic review, meta-analysis and quality assessment of cohort and case–control studies. Br. J. sports Med. 56, 701–709 (2022).
Article PubMed Google Scholar
Zhang, X. et al. Effect of physical activity on risk of Alzheimer’s disease: a systematic review and meta-analysis of twenty-nine prospective cohort studies. Ageing Res. Rev. 92, 102127 (2023).
Article PubMed Google Scholar
Chung, Y.-H. et al. Minimal amount of exercise prevents incident dementia in cognitively normal older adults with osteoarthritis: a retrospective longitudinal follow-up study. Sci. Rep. 13, 16568 (2023).
Article CAS PubMed PubMed Central Google Scholar
Tari, A. R. et al. Neuroprotective mechanisms of exercise and the importance of fitness for healthy brain ageing. Lancet 405, 1093–1118 (2025).
Article PubMed Google Scholar
Bherer, L., Erickson, K. I. & Liu-Ambrose, T. A review of the effects of physical activity and exercise on cognitive and brain functions in older adults. J. aging Res. 2013, 1657508 (2013).
Google Scholar
Teahan, O. et al. Impact of analytical bias in metabonomic studies of human blood serum and plasma. Anal. Chem. 78, 4307–4318 (2006).
Article CAS PubMed Google Scholar
Dunn, W. B. et al. Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry. Nat. Protoc. 6, 1060–1083 (2011).
Article CAS PubMed Google Scholar
Weckwerth, W., Wenzel, K. & Fiehn, O. Process for the integrated extraction, identification and quantification of metabolites, proteins and RNA to reveal their co-regulation in biochemical networks. Proteomics 4, 78–83 (2004).
Article CAS PubMed Google Scholar
Leitner, M. et al. Combined metabolomic analysis of plasma and urine reveals AHBA, tryptophan and serotonin metabolism as potential risk factors in gestational diabetes mellitus (GDM). Front. Mol. Biosci. 4, 84 (2017).
Article PubMed PubMed Central Google Scholar
González-Domínguez, Á. et al. QC Omics: Recommendations and Guidelines for Robust, Easily Implementable and Reportable Quality Control of Metabolomics Data. Anal. Chem. 96, 1064–1072 (2024).
Article PubMed PubMed Central Google Scholar
Hardoon, D. R., Szedmak, S. & Shawe-Taylor, J. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 16, 2639–2664 (2004).
Article PubMed Google Scholar
Zhou, Z.-H., Ensemble methods: foundations and algorithms. 2012: CRC press.
Ke, G. et al. Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems, 2017. 30.
Chen, T. and C. Guestrin. Xgboost: A scalable tree boosting system. in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016.
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. nature 521, 436–444 (2015).
Article CAS PubMed Google Scholar
Feurer, M. & Hutter F. Hyperparameter optimization, in Automated machine learning. 2019, Springer, Cham. 3-33.
Filzmoser, P., Liebmann, B. & Varmuza, K. Repeated double cross validation. J. Chemometrics: A J. Chemometrics Soc. 23, 160–171 (2009).
Article CAS Google Scholar
Szymańska, E. et al. Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies. Metabolomics 8, 3–16 (2012).
Article PubMed Google Scholar
Westerhuis, J. A. et al. Assessment of PLSDA cross validation. Metabolomics 4, 81–89 (2008).
Article CAS Google Scholar
Steuer, R. et al. Observing and interpreting correlations in metabolomic networks. Bioinformatics 19, 1019–1026 (2003).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This work is supported by the department funding of Molecular Systems Biology Lab (MOSYS). Open access funding provided by University of Vienna. J.L. is supported by Ph.D. scholarship provided by the China Scholarship Council (CSC) [grant number: 201806010428 to J.L.], Tianjin Municipal Science and Technology Bureau [grant number: 24JCQNJC01860 to J.L.] and the National Natural Science Foundation of China [grant number: 12501679 and 12426303 to J.L.].

Author information

Authors and Affiliations

Molecular Systems Biology Lab (MOSYS), Department of Functional and Evolutionary Ecology, University of Vienna, Vienna, Austria
Jiahang Li, Martin Brenner, Iro Pierides, Steffen Waldherr & Wolfram Weckwerth
School of Mathematical Sciences, Nankai University, Tianjin, China
Jiahang Li
Department of Nutritional Sciences, University of Vienna, Vienna, Austria
Barbara Wessner, Bernhard Franzke, Eva-Maria Strasser & Karl-Heinz Wagner
Research Platform Active Ageing, University of Vienna, Vienna, Austria
Barbara Wessner, Bernhard Franzke, Eva-Maria Strasser & Karl-Heinz Wagner
Research Center Health Sciences, University of Applied Sciences Hochschule Campus Wien, Vienna, Austria
Bernhard Franzke
Vienna Molecular Metabolomics Center (VIME), University of Vienna, Vienna, Austria
Wolfram Weckwerth
Health in Society Research Hub, University of Vienna, Vienna, Austria
Wolfram Weckwerth

Authors

Jiahang Li
View author publications
Search author on:PubMed Google Scholar
Martin Brenner
View author publications
Search author on:PubMed Google Scholar
Iro Pierides
View author publications
Search author on:PubMed Google Scholar
Barbara Wessner
View author publications
Search author on:PubMed Google Scholar
Bernhard Franzke
View author publications
Search author on:PubMed Google Scholar
Eva-Maria Strasser
View author publications
Search author on:PubMed Google Scholar
Steffen Waldherr
View author publications
Search author on:PubMed Google Scholar
Karl-Heinz Wagner
View author publications
Search author on:PubMed Google Scholar
Wolfram Weckwerth
View author publications
Search author on:PubMed Google Scholar

Contributions

W.W., K.H.W., and J.L. conceived the study. J.L. and W.W. developed the method. M.B., B.W., B.F., and E.M.S. implemented and performed the experiments, and J.L., S.W., and I.P. interpreted the results. J.L., W.W., and S.W. wrote the first version of the manuscript. W.W., K.H.W., J.L., and S.W. revised the manuscript. All authors reviewed and approved the final version of the manuscript.

Corresponding authors

Correspondence to Karl-Heinz Wagner or Wolfram Weckwerth.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Data 1

Supplementary Data 2

Supplementary Data 3-5

Supplementary Data 6

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, J., Brenner, M., Pierides, I. et al. Machine learning and data-driven inverse modeling of metabolomics unveil key processes of active aging. npj Syst Biol Appl 11, 103 (2025). https://doi.org/10.1038/s41540-025-00580-4

Download citation

Received: 02 November 2024
Accepted: 20 August 2025
Published: 24 September 2025
DOI: https://doi.org/10.1038/s41540-025-00580-4