File size: 5,271 Bytes
e54138b
f2296e3
20b6076
e54138b
20b6076
e54138b
76489a1
e54138b
 
 
20b6076
f2296e3
 
 
 
 
 
 
 
 
 
 
e54138b
87611c6
20b6076
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e54138b
76489a1
 
20b6076
76489a1
20b6076
 
 
 
 
 
 
 
 
 
 
 
76489a1
 
 
20b6076
 
 
 
 
 
 
 
 
76489a1
20b6076
76489a1
 
 
20b6076
76489a1
20b6076
76489a1
20b6076
 
 
 
 
 
 
 
 
 
 
76489a1
87611c6
20b6076
15cf6ab
20b6076
 
 
 
 
 
87611c6
20b6076
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
---
title: Model Point Clustering
emoji: ๐Ÿงฎ
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: mit
tags:
- actuarial
- clustering
- model-points
- insurance
- gradio
- data-science
- present-values
- policy-attributes
- cashflows
- machine-learning
short_description: Cluster insurance policies into representative model points.
---

# ๐Ÿงฎ Model Point Clustering Dashboard

An interactive dashboard for calibrating and evaluating **model points** using K-Means clustering. Designed for actuaries and data scientists working with large insurance portfolios.

[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/alidenewade/model-point-clustering)

---

## ๐Ÿ“Œ Overview

This application performs **cluster-based model point selection** by grouping similar policies to represent large portfolios more efficiently.

You can choose from three clustering calibration methods:

- **Annual Cashflows**
- **Policy Attributes**
- **Present Values**

It compares how well each clustering method replicates actual values across base, lapse, and mortality stress scenarios.

---

## ๐Ÿ” Use Cases

- Model point reduction for valuation and projections
- Policy summarization for faster simulations
- Stress testing comparison across representative points
- Actuarial model validation and calibration studies

---

## ๐Ÿ“ˆ Features

### Calibration Methods
- **Cashflows**: Captures policy behavior over time.
- **Attributes**: Uses demographic/product characteristics.
- **Present Values**: Focuses on total liability or cashflow values.

### Interactive Tabs
- **Summary**: Bar chart of absolute PV Net Cashflow errors.
- **Cashflow Calibration**: Visual and tabular comparisons based on cashflows.
- **Policy Attribute Calibration**: Analysis using static policy data.
- **Present Value Calibration**: PV-based clustering with stress testing.

### Scenario Support
- Base Scenario
- Lapse Stress (+50%)
- Mortality Stress (+15%)

---

## ๐Ÿ“ Required Inputs

Upload **7 `.xlsx` files**, or use the example files by clicking **Load Example Data**.

| File Type | Description |
|----------|-------------|
| `cashflows_seriatim_10K.xlsx` | Base cashflows per policy |
| `cashflows_seriatim_10K_lapse50.xlsx` | Cashflows under lapse stress |
| `cashflows_seriatim_10K_mort15.xlsx` | Cashflows under mortality stress |
| `model_point_table.xlsx` | Policy attributes (age, term, etc.) |
| `pv_seriatim_10K.xlsx` | Present values for base |
| `pv_seriatim_10K_lapse50.xlsx` | PVs under lapse stress |
| `pv_seriatim_10K_mort15.xlsx` | PVs under mortality stress |

Example directory structure:

```
โ”œโ”€โ”€ app.py
โ””โ”€โ”€ eg_data/
  โ”œโ”€โ”€ cashflows_seriatim_10K.xlsx
  โ”œโ”€โ”€ cashflows_seriatim_10K_lapse50.xlsx
  โ”œโ”€โ”€ cashflows_seriatim_10K_mort15.xlsx
  โ”œโ”€โ”€ model_point_table.xlsx
  โ”œโ”€โ”€ pv_seriatim_10K.xlsx
  โ”œโ”€โ”€ pv_seriatim_10K_lapse50.xlsx
  โ””โ”€โ”€ pv_seriatim_10K_mort15.xlsx
```

---

## โš™๏ธ How to Use

1. **Launch the App**  
   Click the "Open in Spaces" button or run `app.py`.

2. **Upload or Load Files**  
   - Upload all 7 required `.xlsx` files.
   - Or click **"Load Example Data"**.

3. **Run Analysis**  
   Click **"Analyze Dataset"** to generate cluster reps, plots, and comparisons.

4. **Explore Tabs**  
   - ๐Ÿ“Š **Summary**: Calibration errors across scenarios.
   - ๐Ÿ’ธ **Cashflow Calibration**: Clustered vs actual based on cashflows.
   - ๐Ÿ‘ค **Policy Attribute Calibration**: Calibrated via policy data.
   - ๐Ÿ’ฐ **Present Value Calibration**: Uses PVs directly.

---

## ๐Ÿง  Behind the Scenes

### Core Engine: `Clusters` Class
Encapsulates K-Means logic for:
- Clustering using selected variables
- Selecting representative policies
- Aggregating actual vs estimated outputs
- Plotting cashflows, PVs, and scatter comparisons

### Key Libraries
- `gradio` โ€“ UI and file interface
- `pandas`, `numpy` โ€“ Data manipulation
- `scikit-learn` โ€“ K-Means clustering
- `matplotlib`, `PIL` โ€“ Visualization

---

## ๐Ÿ“Š Output Summary

The application generates:

- ๐Ÿ“ˆ **Cluster vs Actual Comparisons**
- ๐Ÿ–ผ๏ธ **Cashflow Time Series Plots**
- โš–๏ธ **Per-Cluster Scatter Plots**
- ๐Ÿ“‹ **Summary Tables**
- ๐Ÿ“‰ **Mean Absolute Error Bar Charts**

All results are based on direct comparison of cluster-aggregated estimates vs original full dataset metrics.

---

## ๐Ÿ“š Attribution & References

Inspired by the [Lifelib](https://lifelib.io) open-source project:

> lifelib Developers. (2025). *Model Point Clustering*. In **lifelib: Life actuarial models in Python**.  
> [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib)

Notebook reference:  
[Cluster Model Points โ€“ Lifelib Notebook](https://colab.research.google.com/github/lifelib-dev/lifelib/blob/current/lifelib/libraries/cluster/cluster_model_points.ipynb)

---

## ๐Ÿ› ๏ธ Local Setup

To run locally:

```bash
# Clone the repo
git clone https://github.com/alidenewade/model-point-clustering.git
cd model-point-clustering

# Install dependencies
pip install -r requirements.txt

# Launch app
python app.py

๐Ÿ“œ License
This project is open source under the MIT License.