Portfolio Details

Abstract

This highlights three projects focused on predictive analytics and data visualization, addressing key challenges in finance, healthcare, and effective data communication. Using advanced statistical techniques and machine learning models, these projects emphasize actionable insights and practical applications.

  • Conducted predictive analytics on heart failure, loan default, and data visualization projects using R programming, achieving an 88.9% accuracy in healthcare survival prediction, a 0.9758 AUC in financial risk mitigation, and significantly improving data clarity through redesigned visualizations.
  • Applied advanced machine learning algorithms, including logistic regression, random forest, decision trees, and K-Nearest Neighbors, to identify significant predictors, derive actionable insights, and enhance data interpretation.
  • Transformed misleading graphs into clear, interactive visualizations, effectively highlighting patterns in gender wage gaps, crime rates, and startup funding distributions.

Key Highlights

Booking Cancellations Prediction

  • Negative Correlation: Special Requests and Lead Time: As lead time extends, the number of special requests declines, potentially impacting cancellations.
  • Room Price and Number of Children: A positive correlation exists, where higher room prices influence family booking decisions.
  • Room Type Analysis: The "Executive Suite" has a high cancellation ratio of 0.96, indicating a higher likelihood of cancellations.
  • Seasonal Impact on Cancellations: Low season has lower cancellation rates, while the peak season sees a significant increase.
  • Room Price and Lead Days Relationship: Reservations made within 0-100 days in advance show lower cancellation rates, while higher room prices and extended lead times result in more cancellations.
  • Tools: R Programming, Logistic Regression, Random Forest.

Heart Failure Prediction

  • Predicted survival outcomes for heart failure patients using clinical data.
  • Significant predictors included age, ejection fraction, serum creatinine, and follow-up time.
  • Data simulation enhanced model performance, achieving 93.82% accuracy with Random Forest.
  • Tools: R Programming, Logistic Regression, Random Forest, KNN, Decision Trees.

Data Visualization Redesign

  • Redesigned misleading graphs to improve clarity and insight extraction.
  • Developed effective visualizations for gender wage gaps, crime rates, and startup funding.
  • Interactive charts (ggplotly) enhanced user engagement and data exploration.
  • Tools: R Programming, ggplot2, plotly.

Project information

  • Category: Predictive Analysis and Visualization
  • More Details: Github Link