--- title: Model Point Clustering emoji: 🧮 colorFrom: yellow colorTo: green sdk: gradio sdk_version: 5.31.0 app_file: app.py pinned: false license: mit tags: - actuarial - clustering - model-points - insurance - gradio - data-science - present-values - policy-attributes - cashflows - machine-learning short_description: Cluster insurance policies into representative model points. --- # 🧮 Model Point Clustering Dashboard An interactive dashboard for calibrating and evaluating **model points** using K-Means clustering. Designed for actuaries and data scientists working with large insurance portfolios. [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/raw/main/open-in-hf-spaces-sm.svg)](https://huggingface.co/spaces/alidenewade/model-point-clustering) --- ## 📌 Overview This application performs **cluster-based model point selection** by grouping similar policies to represent large portfolios more efficiently. You can choose from three clustering calibration methods: - **Annual Cashflows** - **Policy Attributes** - **Present Values** It compares how well each clustering method replicates actual values across base, lapse, and mortality stress scenarios. --- ## 🔍 Use Cases - Model point reduction for valuation and projections - Policy summarization for faster simulations - Stress testing comparison across representative points - Actuarial model validation and calibration studies --- ## 📈 Features ### Calibration Methods - **Cashflows**: Captures policy behavior over time. - **Attributes**: Uses demographic/product characteristics. - **Present Values**: Focuses on total liability or cashflow values. ### Interactive Tabs - **Summary**: Bar chart of absolute PV Net Cashflow errors. - **Cashflow Calibration**: Visual and tabular comparisons based on cashflows. - **Policy Attribute Calibration**: Analysis using static policy data. - **Present Value Calibration**: PV-based clustering with stress testing. ### Scenario Support - Base Scenario - Lapse Stress (+50%) - Mortality Stress (+15%) --- ## 📁 Required Inputs Upload **7 `.xlsx` files**, or use the example files by clicking **Load Example Data**. | File Type | Description | |----------|-------------| | `cashflows_seriatim_10K.xlsx` | Base cashflows per policy | | `cashflows_seriatim_10K_lapse50.xlsx` | Cashflows under lapse stress | | `cashflows_seriatim_10K_mort15.xlsx` | Cashflows under mortality stress | | `model_point_table.xlsx` | Policy attributes (age, term, etc.) | | `pv_seriatim_10K.xlsx` | Present values for base | | `pv_seriatim_10K_lapse50.xlsx` | PVs under lapse stress | | `pv_seriatim_10K_mort15.xlsx` | PVs under mortality stress | Example directory structure: ``` ├── app.py └── eg_data/ ├── cashflows_seriatim_10K.xlsx ├── cashflows_seriatim_10K_lapse50.xlsx ├── cashflows_seriatim_10K_mort15.xlsx ├── model_point_table.xlsx ├── pv_seriatim_10K.xlsx ├── pv_seriatim_10K_lapse50.xlsx └── pv_seriatim_10K_mort15.xlsx ``` --- ## ⚙️ How to Use 1. **Launch the App** Click the "Open in Spaces" button or run `app.py`. 2. **Upload or Load Files** - Upload all 7 required `.xlsx` files. - Or click **"Load Example Data"**. 3. **Run Analysis** Click **"Analyze Dataset"** to generate cluster reps, plots, and comparisons. 4. **Explore Tabs** - 📊 **Summary**: Calibration errors across scenarios. - 💸 **Cashflow Calibration**: Clustered vs actual based on cashflows. - 👤 **Policy Attribute Calibration**: Calibrated via policy data. - 💰 **Present Value Calibration**: Uses PVs directly. --- ## 🧠 Behind the Scenes ### Core Engine: `Clusters` Class Encapsulates K-Means logic for: - Clustering using selected variables - Selecting representative policies - Aggregating actual vs estimated outputs - Plotting cashflows, PVs, and scatter comparisons ### Key Libraries - `gradio` – UI and file interface - `pandas`, `numpy` – Data manipulation - `scikit-learn` – K-Means clustering - `matplotlib`, `PIL` – Visualization --- ## 📊 Output Summary The application generates: - 📈 **Cluster vs Actual Comparisons** - 🖼️ **Cashflow Time Series Plots** - ⚖️ **Per-Cluster Scatter Plots** - 📋 **Summary Tables** - 📉 **Mean Absolute Error Bar Charts** All results are based on direct comparison of cluster-aggregated estimates vs original full dataset metrics. --- ## 📚 Attribution & References Inspired by the [Lifelib](https://lifelib.io) open-source project: > lifelib Developers. (2025). *Model Point Clustering*. In **lifelib: Life actuarial models in Python**. > [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib) Notebook reference: [Cluster Model Points – Lifelib Notebook](https://colab.research.google.com/github/lifelib-dev/lifelib/blob/current/lifelib/libraries/cluster/cluster_model_points.ipynb) --- ## 🛠️ Local Setup To run locally: ```bash # Clone the repo git clone https://github.com/alidenewade/model-point-clustering.git cd model-point-clustering # Install dependencies pip install -r requirements.txt # Launch app python app.py 📜 License This project is open source under the MIT License.