Research Question 1

Overview

Research Question 1: How do liver enzyme levels (ALP, ALT, AST) change with age in people with hepatitis C compared to healthy individuals?

This research investigates how liver enzymes, which are important for diagnosing liver issues, vary as people get older, especially comparing those with hepatitis C to healthy people.

Data Preparation and Analysis

Data Preparation:

  • We used data from the hcvdat.csv data-set focusing on age and liver enzymes (ALP, ALT, AST).
  • We removed missing values and grouped data into age categories (‘Young’, ‘Middle-aged’, ‘Senior’) to analyze trends based on age.

Model Training and Validation:

  • A Random Forest model was chosen for its robustness in handling datasets with multiple predictors. It was trained to predict log-transformed Albumin levels.

  • 10-fold cross-validation was used to validate the model, ensuring it performs well across different data subsets.

Model Evaluation:

  • The model’s accuracy was assessed with Root Mean Square Error (RMSE) and R-squared values, where it scored an R-squared of 0.937. This means about 93.7% of the variability in Albumin levels was explained by the model.

  • The RMSE was 0.035, showing the model’s precision in predictions.

Visualizations and Insights

Enzyme Level Analysis by Age Group:

We made a bar plot to show average liver enzyme levels by age group, which helps us see changes in enzyme levels across ages, especially noting increases in older groups.

Model Fit Visualization:

Residuals vs. Fitted Values Plot: Checked to ensure our model fits well. The plot showed residuals spread randomly, suggesting the model was appropriate without any clear misfit patterns.

Approaches Considered and Rejected

We initially considered several methods:

  • Non-linear Models: These were tested for capturing complex relationships but not used due to the risk of making the model too complex.

  • Polynomial Regression: Was also considered to deal with curves in the data but was not used to keep the model simple and easy to interpret.

Real-World Application

The findings are useful for:

  • Early Diagnosis: Better understanding of how liver enzymes change with age helps in spotting liver issues early.

  • Personalizing Treatment: Insights from the data can guide tailored treatments based on how enzyme levels change with age.

Conclusion

This study effectively used a Random Forest model to clarify how age impacts liver enzyme levels in hepatitis C patients compared to healthy individuals. It provided useful insights that could improve how liver conditions are managed.

Future Directions

Further research could improve these findings by:

  • Including more factors like broader demographic details and other health indicators to enhance predictions.

  • Conducting longitudinal studies to observe changes in enzyme levels over time, giving more insight into how liver diseases progress.

This webpage offers a detailed look at the research conducted, the methods used, the insights gained, and their practical implications, aiming to improve the management of liver health across different age groups.

References

  1. UCI Machine Learning Repository: Hepatitis C Virus (HCV) Dataset. Available at: https://archive.ics.uci.edu/dataset/571/hcv+data. This link provides access to the dataset used in your analysis, essential for anyone looking to replicate or extend your research findings.

  2. Professor’s Notes on ANOVA:Unit 3 and Unit 4 Notes: These documents are direct inputs from educational materials provided by your professor, covering detailed theoretical and practical applications of ANOVA, which underpin your analytical methodologies.

  3. Box, G. E. P., Hunter, J. S., & Hunter, W. G. (2005). Statistics for Experimenters: Design, Innovation, and Discovery (2nd Edition). Wiley. A foundational text on the principles of experimental design and analysis crucial for interpreting ANOVA results.

  4. Montgomery, D. C. (2017). Design and Analysis of Experiments (9th Edition). Wiley. Provides a comprehensive resource on experimental design and analysis, supporting the methodologies used in your analysis.

  5. Neter, J., Wasserman, W., & Kutner, M. H. (1996). Applied Linear Statistical Models (4th Edition). McGraw-Hill. Covers regression and analysis of variance in detail, supporting the statistical approaches and tests employed in your research.