alidenewade's picture
Update README.md
f2296e3 verified

A newer version of the Gradio SDK is available: 5.49.0

Upgrade
metadata
title: Model Point Clustering
emoji: ๐Ÿงฎ
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: mit
tags:
  - actuarial
  - clustering
  - model-points
  - insurance
  - gradio
  - data-science
  - present-values
  - policy-attributes
  - cashflows
  - machine-learning
short_description: Cluster insurance policies into representative model points.

๐Ÿงฎ Model Point Clustering Dashboard

An interactive dashboard for calibrating and evaluating model points using K-Means clustering. Designed for actuaries and data scientists working with large insurance portfolios.

Open in Spaces


๐Ÿ“Œ Overview

This application performs cluster-based model point selection by grouping similar policies to represent large portfolios more efficiently.

You can choose from three clustering calibration methods:

  • Annual Cashflows
  • Policy Attributes
  • Present Values

It compares how well each clustering method replicates actual values across base, lapse, and mortality stress scenarios.


๐Ÿ” Use Cases

  • Model point reduction for valuation and projections
  • Policy summarization for faster simulations
  • Stress testing comparison across representative points
  • Actuarial model validation and calibration studies

๐Ÿ“ˆ Features

Calibration Methods

  • Cashflows: Captures policy behavior over time.
  • Attributes: Uses demographic/product characteristics.
  • Present Values: Focuses on total liability or cashflow values.

Interactive Tabs

  • Summary: Bar chart of absolute PV Net Cashflow errors.
  • Cashflow Calibration: Visual and tabular comparisons based on cashflows.
  • Policy Attribute Calibration: Analysis using static policy data.
  • Present Value Calibration: PV-based clustering with stress testing.

Scenario Support

  • Base Scenario
  • Lapse Stress (+50%)
  • Mortality Stress (+15%)

๐Ÿ“ Required Inputs

Upload 7 .xlsx files, or use the example files by clicking Load Example Data.

File Type Description
cashflows_seriatim_10K.xlsx Base cashflows per policy
cashflows_seriatim_10K_lapse50.xlsx Cashflows under lapse stress
cashflows_seriatim_10K_mort15.xlsx Cashflows under mortality stress
model_point_table.xlsx Policy attributes (age, term, etc.)
pv_seriatim_10K.xlsx Present values for base
pv_seriatim_10K_lapse50.xlsx PVs under lapse stress
pv_seriatim_10K_mort15.xlsx PVs under mortality stress

Example directory structure:

โ”œโ”€โ”€ app.py
โ””โ”€โ”€ eg_data/
  โ”œโ”€โ”€ cashflows_seriatim_10K.xlsx
  โ”œโ”€โ”€ cashflows_seriatim_10K_lapse50.xlsx
  โ”œโ”€โ”€ cashflows_seriatim_10K_mort15.xlsx
  โ”œโ”€โ”€ model_point_table.xlsx
  โ”œโ”€โ”€ pv_seriatim_10K.xlsx
  โ”œโ”€โ”€ pv_seriatim_10K_lapse50.xlsx
  โ””โ”€โ”€ pv_seriatim_10K_mort15.xlsx

โš™๏ธ How to Use

  1. Launch the App
    Click the "Open in Spaces" button or run app.py.

  2. Upload or Load Files

    • Upload all 7 required .xlsx files.
    • Or click "Load Example Data".
  3. Run Analysis
    Click "Analyze Dataset" to generate cluster reps, plots, and comparisons.

  4. Explore Tabs

    • ๐Ÿ“Š Summary: Calibration errors across scenarios.
    • ๐Ÿ’ธ Cashflow Calibration: Clustered vs actual based on cashflows.
    • ๐Ÿ‘ค Policy Attribute Calibration: Calibrated via policy data.
    • ๐Ÿ’ฐ Present Value Calibration: Uses PVs directly.

๐Ÿง  Behind the Scenes

Core Engine: Clusters Class

Encapsulates K-Means logic for:

  • Clustering using selected variables
  • Selecting representative policies
  • Aggregating actual vs estimated outputs
  • Plotting cashflows, PVs, and scatter comparisons

Key Libraries

  • gradio โ€“ UI and file interface
  • pandas, numpy โ€“ Data manipulation
  • scikit-learn โ€“ K-Means clustering
  • matplotlib, PIL โ€“ Visualization

๐Ÿ“Š Output Summary

The application generates:

  • ๐Ÿ“ˆ Cluster vs Actual Comparisons
  • ๐Ÿ–ผ๏ธ Cashflow Time Series Plots
  • โš–๏ธ Per-Cluster Scatter Plots
  • ๐Ÿ“‹ Summary Tables
  • ๐Ÿ“‰ Mean Absolute Error Bar Charts

All results are based on direct comparison of cluster-aggregated estimates vs original full dataset metrics.


๐Ÿ“š Attribution & References

Inspired by the Lifelib open-source project:

lifelib Developers. (2025). Model Point Clustering. In lifelib: Life actuarial models in Python.
https://github.com/lifelib-dev/lifelib

Notebook reference:
Cluster Model Points โ€“ Lifelib Notebook


๐Ÿ› ๏ธ Local Setup

To run locally:

# Clone the repo
git clone https://github.com/alidenewade/model-point-clustering.git
cd model-point-clustering

# Install dependencies
pip install -r requirements.txt

# Launch app
python app.py

๐Ÿ“œ License
This project is open source under the MIT License.