---
title: Model Point Clustering
emoji: 🧮
colorFrom: yellow
colorTo: green
sdk: gradio
sdk_version: 5.31.0
app_file: app.py
pinned: false
license: mit
tags:
- actuarial
- clustering
- model-points
- insurance
- gradio
- data-science
- present-values
- policy-attributes
- cashflows
- machine-learning
short_description: Cluster insurance policies into representative model points.
---

# 🧮 Model Point Clustering Dashboard

An interactive dashboard for calibrating and evaluating **model points** using K-Means clustering. Designed for actuaries and data scientists working with large insurance portfolios.

[![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/alidenewade/model-point-clustering)

---

## 📌 Overview

This application performs **cluster-based model point selection** by grouping similar policies to represent large portfolios more efficiently.

You can choose from three clustering calibration methods:

- **Annual Cashflows**
- **Policy Attributes**
- **Present Values**

It compares how well each clustering method replicates actual values across base, lapse, and mortality stress scenarios.

---

## 🔍 Use Cases

- Model point reduction for valuation and projections
- Policy summarization for faster simulations
- Stress testing comparison across representative points
- Actuarial model validation and calibration studies

---

## 📈 Features

### Calibration Methods
- **Cashflows**: Captures policy behavior over time.
- **Attributes**: Uses demographic/product characteristics.
- **Present Values**: Focuses on total liability or cashflow values.

### Interactive Tabs
- **Summary**: Bar chart of absolute PV Net Cashflow errors.
- **Cashflow Calibration**: Visual and tabular comparisons based on cashflows.
- **Policy Attribute Calibration**: Analysis using static policy data.
- **Present Value Calibration**: PV-based clustering with stress testing.

### Scenario Support
- Base Scenario
- Lapse Stress (+50%)
- Mortality Stress (+15%)

---

## 📁 Required Inputs

Upload **7 `.xlsx` files**, or use the example files by clicking **Load Example Data**.

| File Type | Description |
|----------|-------------|
| `cashflows_seriatim_10K.xlsx` | Base cashflows per policy |
| `cashflows_seriatim_10K_lapse50.xlsx` | Cashflows under lapse stress |
| `cashflows_seriatim_10K_mort15.xlsx` | Cashflows under mortality stress |
| `model_point_table.xlsx` | Policy attributes (age, term, etc.) |
| `pv_seriatim_10K.xlsx` | Present values for base |
| `pv_seriatim_10K_lapse50.xlsx` | PVs under lapse stress |
| `pv_seriatim_10K_mort15.xlsx` | PVs under mortality stress |

Example directory structure:

```
├── app.py
└── eg_data/
  ├── cashflows_seriatim_10K.xlsx
  ├── cashflows_seriatim_10K_lapse50.xlsx
  ├── cashflows_seriatim_10K_mort15.xlsx
  ├── model_point_table.xlsx
  ├── pv_seriatim_10K.xlsx
  ├── pv_seriatim_10K_lapse50.xlsx
  └── pv_seriatim_10K_mort15.xlsx
```

---

## ⚙️ How to Use

1. **Launch the App**  
   Click the "Open in Spaces" button or run `app.py`.

2. **Upload or Load Files**  
   - Upload all 7 required `.xlsx` files.
   - Or click **"Load Example Data"**.

3. **Run Analysis**  
   Click **"Analyze Dataset"** to generate cluster reps, plots, and comparisons.

4. **Explore Tabs**  
   - 📊 **Summary**: Calibration errors across scenarios.
   - 💸 **Cashflow Calibration**: Clustered vs actual based on cashflows.
   - 👤 **Policy Attribute Calibration**: Calibrated via policy data.
   - 💰 **Present Value Calibration**: Uses PVs directly.

---

## 🧠 Behind the Scenes

### Core Engine: `Clusters` Class
Encapsulates K-Means logic for:
- Clustering using selected variables
- Selecting representative policies
- Aggregating actual vs estimated outputs
- Plotting cashflows, PVs, and scatter comparisons

### Key Libraries
- `gradio` – UI and file interface
- `pandas`, `numpy` – Data manipulation
- `scikit-learn` – K-Means clustering
- `matplotlib`, `PIL` – Visualization

---

## 📊 Output Summary

The application generates:

- 📈 **Cluster vs Actual Comparisons**
- 🖼️ **Cashflow Time Series Plots**
- ⚖️ **Per-Cluster Scatter Plots**
- 📋 **Summary Tables**
- 📉 **Mean Absolute Error Bar Charts**

All results are based on direct comparison of cluster-aggregated estimates vs original full dataset metrics.

---

## 📚 Attribution & References

Inspired by the [Lifelib](https://lifelib.io) open-source project:

> lifelib Developers. (2025). *Model Point Clustering*. In **lifelib: Life actuarial models in Python**.  
> [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib)

Notebook reference:  
[Cluster Model Points – Lifelib Notebook](https://colab.research.google.com/github/lifelib-dev/lifelib/blob/current/lifelib/libraries/cluster/cluster_model_points.ipynb)

---

## 🛠️ Local Setup

To run locally:

```bash
# Clone the repo
git clone https://github.com/alidenewade/model-point-clustering.git
cd model-point-clustering

# Install dependencies
pip install -r requirements.txt

# Launch app
python app.py

📜 License
This project is open source under the MIT License.