File size: 5,271 Bytes
e54138b f2296e3 20b6076 e54138b 20b6076 e54138b 76489a1 e54138b 20b6076 f2296e3 e54138b 87611c6 20b6076 e54138b 76489a1 20b6076 76489a1 20b6076 76489a1 20b6076 76489a1 20b6076 76489a1 20b6076 76489a1 20b6076 76489a1 20b6076 76489a1 87611c6 20b6076 15cf6ab 20b6076 87611c6 20b6076 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 |
---
title: Model Point Clustering
emoji: ๐งฎ
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: mit
tags:
- actuarial
- clustering
- model-points
- insurance
- gradio
- data-science
- present-values
- policy-attributes
- cashflows
- machine-learning
short_description: Cluster insurance policies into representative model points.
---
# ๐งฎ Model Point Clustering Dashboard
An interactive dashboard for calibrating and evaluating **model points** using K-Means clustering. Designed for actuaries and data scientists working with large insurance portfolios.
[](https://huggingface.co/spaces/alidenewade/model-point-clustering)
---
## ๐ Overview
This application performs **cluster-based model point selection** by grouping similar policies to represent large portfolios more efficiently.
You can choose from three clustering calibration methods:
- **Annual Cashflows**
- **Policy Attributes**
- **Present Values**
It compares how well each clustering method replicates actual values across base, lapse, and mortality stress scenarios.
---
## ๐ Use Cases
- Model point reduction for valuation and projections
- Policy summarization for faster simulations
- Stress testing comparison across representative points
- Actuarial model validation and calibration studies
---
## ๐ Features
### Calibration Methods
- **Cashflows**: Captures policy behavior over time.
- **Attributes**: Uses demographic/product characteristics.
- **Present Values**: Focuses on total liability or cashflow values.
### Interactive Tabs
- **Summary**: Bar chart of absolute PV Net Cashflow errors.
- **Cashflow Calibration**: Visual and tabular comparisons based on cashflows.
- **Policy Attribute Calibration**: Analysis using static policy data.
- **Present Value Calibration**: PV-based clustering with stress testing.
### Scenario Support
- Base Scenario
- Lapse Stress (+50%)
- Mortality Stress (+15%)
---
## ๐ Required Inputs
Upload **7 `.xlsx` files**, or use the example files by clicking **Load Example Data**.
| File Type | Description |
|----------|-------------|
| `cashflows_seriatim_10K.xlsx` | Base cashflows per policy |
| `cashflows_seriatim_10K_lapse50.xlsx` | Cashflows under lapse stress |
| `cashflows_seriatim_10K_mort15.xlsx` | Cashflows under mortality stress |
| `model_point_table.xlsx` | Policy attributes (age, term, etc.) |
| `pv_seriatim_10K.xlsx` | Present values for base |
| `pv_seriatim_10K_lapse50.xlsx` | PVs under lapse stress |
| `pv_seriatim_10K_mort15.xlsx` | PVs under mortality stress |
Example directory structure:
```
โโโ app.py
โโโ eg_data/
โโโ cashflows_seriatim_10K.xlsx
โโโ cashflows_seriatim_10K_lapse50.xlsx
โโโ cashflows_seriatim_10K_mort15.xlsx
โโโ model_point_table.xlsx
โโโ pv_seriatim_10K.xlsx
โโโ pv_seriatim_10K_lapse50.xlsx
โโโ pv_seriatim_10K_mort15.xlsx
```
---
## โ๏ธ How to Use
1. **Launch the App**
Click the "Open in Spaces" button or run `app.py`.
2. **Upload or Load Files**
- Upload all 7 required `.xlsx` files.
- Or click **"Load Example Data"**.
3. **Run Analysis**
Click **"Analyze Dataset"** to generate cluster reps, plots, and comparisons.
4. **Explore Tabs**
- ๐ **Summary**: Calibration errors across scenarios.
- ๐ธ **Cashflow Calibration**: Clustered vs actual based on cashflows.
- ๐ค **Policy Attribute Calibration**: Calibrated via policy data.
- ๐ฐ **Present Value Calibration**: Uses PVs directly.
---
## ๐ง Behind the Scenes
### Core Engine: `Clusters` Class
Encapsulates K-Means logic for:
- Clustering using selected variables
- Selecting representative policies
- Aggregating actual vs estimated outputs
- Plotting cashflows, PVs, and scatter comparisons
### Key Libraries
- `gradio` โ UI and file interface
- `pandas`, `numpy` โ Data manipulation
- `scikit-learn` โ K-Means clustering
- `matplotlib`, `PIL` โ Visualization
---
## ๐ Output Summary
The application generates:
- ๐ **Cluster vs Actual Comparisons**
- ๐ผ๏ธ **Cashflow Time Series Plots**
- โ๏ธ **Per-Cluster Scatter Plots**
- ๐ **Summary Tables**
- ๐ **Mean Absolute Error Bar Charts**
All results are based on direct comparison of cluster-aggregated estimates vs original full dataset metrics.
---
## ๐ Attribution & References
Inspired by the [Lifelib](https://lifelib.io) open-source project:
> lifelib Developers. (2025). *Model Point Clustering*. In **lifelib: Life actuarial models in Python**.
> [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib)
Notebook reference:
[Cluster Model Points โ Lifelib Notebook](https://colab.research.google.com/github/lifelib-dev/lifelib/blob/current/lifelib/libraries/cluster/cluster_model_points.ipynb)
---
## ๐ ๏ธ Local Setup
To run locally:
```bash
# Clone the repo
git clone https://github.com/alidenewade/model-point-clustering.git
cd model-point-clustering
# Install dependencies
pip install -r requirements.txt
# Launch app
python app.py
๐ License
This project is open source under the MIT License. |