Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
@@ -10,116 +10,37 @@ pinned: false
|
|
10 |
license: mit
|
11 |
short_description: Cluster-based model point selection for actuarial analysis.
|
12 |
---
|
|
|
13 |
# Cluster Model Points Analysis
|
14 |
|
15 |
---
|
16 |
|
17 |
This application provides a powerful tool for actuaries and financial professionals to analyze and select representative **model points** for large insurance portfolios using **K-Means clustering**. By calibrating clusters based on different variable sets—such as **cashflows**, **policy attributes**, or **present values**—you can assess the accuracy of your model point selection across various financial metrics and stress scenarios.
|
18 |
|
19 |
-
**Note:** This Gradio application is inspired by and duplicates the core logic and analysis presented in one of the [Lifelib](https://lifelib.io/) project's example notebooks, specifically related to model point clustering and calibration. Lifelib is an open-source Python library of actuarial models.
|
20 |
-
|
21 |
-
## Key Features
|
22 |
-
|
23 |
-
* **Flexible Calibration Methods**: Choose to calibrate your clusters using:
|
24 |
-
* **Annual Cashflows**: Ideal for capturing the dynamic financial behavior of policies.
|
25 |
-
* **Policy Attributes**: Useful for segmenting based on static characteristics like age, term, and sum assured.
|
26 |
-
* **Present Values**: Focus on accurately replicating the overall value of the portfolio.
|
27 |
-
* **Scenario Analysis**: Evaluate model point accuracy under base, lapse stress, and mortality stress scenarios.
|
28 |
-
* **Interactive Visualizations**: Gain insights through:
|
29 |
-
* **Time-series plots** comparing actual vs. estimated cashflows for different scenarios.
|
30 |
-
* **Scatter plots** showing per-cluster actual vs. estimated values.
|
31 |
-
* **Summary bar chart** comparing calibration errors across methods and scenarios.
|
32 |
-
* **Data Upload and Example Data**: Easily upload your own `.xlsx` files or use the provided example dataset to get started immediately.
|
33 |
|
34 |
---
|
35 |
|
36 |
-
##
|
37 |
-
|
38 |
-
### Running the Application
|
39 |
-
|
40 |
-
This application is designed to run as a Gradio Space. You can launch it directly if you have Gradio installed and the required files in place.
|
41 |
-
|
42 |
-
### Preparing Your Data
|
43 |
-
|
44 |
-
The application expects seven `.xlsx` files. Ensure your data is structured correctly with `policy_id` as the index for cashflow and present value files, and specific columns for policy data.
|
45 |
-
|
46 |
-
**Required Files:**
|
47 |
-
|
48 |
-
* `cashflows_seriatim_10K.xlsx`: Base scenario cashflows.
|
49 |
-
* `cashflows_seriatim_10K_lapse50.xlsx`: Cashflows under a lapse stress (+50%).
|
50 |
-
* `cashflows_seriatim_10K_mort15.xlsx`: Cashflows under a mortality stress (+15%).
|
51 |
-
* `model_point_table.xlsx`: Policy data including `age_at_entry`, `policy_term`, `sum_assured`, and `duration_mth`.
|
52 |
-
* `pv_seriatim_10K.xlsx`: Present values for the base scenario.
|
53 |
-
* `pv_seriatim_10K_lapse50.xlsx`: Present values under a lapse stress.
|
54 |
-
* `pv_seriatim_10K_mort15.xlsx`: Present values under a mortality stress.
|
55 |
-
|
56 |
-
**Example Data:**
|
57 |
-
|
58 |
-
For quick testing, place the example `.xlsx` files within an `eg_data` directory in the same location as your application script. You can then use the "Load Example Data" button within the interface.
|
59 |
-
|
60 |
-
The expected structure for example files is:
|
61 |
-
|
62 |
-
<pre><code>
|
63 |
-
├── app.py
|
64 |
-
└── eg_data/
|
65 |
-
├── cashflows_seriatim_10K.xlsx
|
66 |
-
├── cashflows_seriatim_10K_lapse50.xlsx
|
67 |
-
├── cashflows_seriatim_10K_mort15.xlsx
|
68 |
-
├── model_point_table.xlsx
|
69 |
-
├── pv_seriatim_10K.xlsx
|
70 |
-
├── pv_seriatim_10K_lapse50.xlsx
|
71 |
-
└── pv_seriatim_10K_mort15.xlsx
|
72 |
-
<code></pre>
|
73 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
74 |
|
75 |
---
|
76 |
|
77 |
-
##
|
78 |
-
|
79 |
-
1. **Launch the Application**: Run the `app.py` script.
|
80 |
-
2. **Upload or Load Data**:
|
81 |
-
* **Upload Your Own**: Click the "Upload Files" buttons and select your corresponding `.xlsx` files.
|
82 |
-
* **Load Example**: Click the "Load Example Data" button. This will pre-fill the file paths with the example data (assuming they are in the `eg_data` directory).
|
83 |
-
3. **Analyze Dataset**: Once all files are loaded (either uploaded or from examples), click the "Analyze Dataset" button.
|
84 |
-
4. **View Results**: Navigate through the tabs:
|
85 |
-
* **Summary**: See an overall comparison of calibration methods based on error in total PV Net Cashflow.
|
86 |
-
* **Cashflow Calibration**: View detailed comparisons and plots when clusters are calibrated using cashflows.
|
87 |
-
* **Policy Attribute Calibration**: See results when policy attributes are used for calibration.
|
88 |
-
* **Present Value Calibration**: Explore outcomes when present values are the basis for clustering.
|
89 |
-
|
90 |
-
---
|
91 |
|
92 |
-
|
93 |
-
|
94 |
-
The core of this application is the `Clusters` class, which encapsulates the K-Means clustering logic. It identifies representative policies for each cluster and provides methods to aggregate and compare actual portfolio values against estimates derived from the clustered representatives.
|
95 |
-
|
96 |
-
The application leverages:
|
97 |
-
|
98 |
-
* **`gradio`**: For building the interactive web interface.
|
99 |
-
* **`numpy` and `pandas`**: For efficient data manipulation and numerical operations.
|
100 |
-
* **`sklearn.cluster.KMeans`**: For performing the clustering algorithm.
|
101 |
-
* **`matplotlib` and `PIL`**: For generating and displaying plots.
|
102 |
-
|
103 |
-
---
|
104 |
-
|
105 |
-
## Citation
|
106 |
-
|
107 |
-
This application's methodology for model point clustering and analysis is a direct adaptation of a notebook found in the [Lifelib](https://lifelib.io/) open-source actuarial library. If you use this application or its underlying logic in academic or professional contexts, please cite the Lifelib project.
|
108 |
-
|
109 |
-
Based on the information from their GitHub and `lifelib.io`, Lifelib is an open-source project that encourages contributions and aims to be transparent and versatile. While they don't specify a formal citation style, a suitable citation for the Lifelib project itself would be:
|
110 |
-
|
111 |
-
> lifelib Developers. (Current Year). *lifelib: Life actuarial models in Python*. Retrieved from [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib)
|
112 |
-
|
113 |
-
(Replace "Current Year" with the year you accessed or used the library. For example, "2018-2025" as seen on their GitHub page, or just the current year.)
|
114 |
-
|
115 |
-
If you are referencing a specific notebook that this application is based on, you could extend the citation as follows (you would need to identify the exact notebook on the Lifelib GitHub or documentation):
|
116 |
-
|
117 |
-
> lifelib Developers. (Current Year). *[Title of Specific Lifelib Notebook, e.g., "Model Point Clustering Example"]*. In *lifelib: Life actuarial models in Python*. Retrieved from https://github.com/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb
|
118 |
-
|
119 |
-
---
|
120 |
-
|
121 |
-
## Need help or have suggestions?
|
122 |
|
123 |
-
|
124 |
|
125 |
-
|
|
|
|
10 |
license: mit
|
11 |
short_description: Cluster-based model point selection for actuarial analysis.
|
12 |
---
|
13 |
+
|
14 |
# Cluster Model Points Analysis
|
15 |
|
16 |
---
|
17 |
|
18 |
This application provides a powerful tool for actuaries and financial professionals to analyze and select representative **model points** for large insurance portfolios using **K-Means clustering**. By calibrating clusters based on different variable sets—such as **cashflows**, **policy attributes**, or **present values**—you can assess the accuracy of your model point selection across various financial metrics and stress scenarios.
|
19 |
|
20 |
+
> **Note:** This Gradio application is inspired by and duplicates the core logic and analysis presented in one of the [Lifelib](https://lifelib.io/) project's example notebooks, specifically related to model point clustering and calibration. Lifelib is an open-source Python library of actuarial models.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
21 |
|
22 |
---
|
23 |
|
24 |
+
## 🚀 Key Features
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
+
- **Flexible Calibration Methods**:
|
27 |
+
- **Annual Cashflows**: Ideal for capturing dynamic financial behavior of policies.
|
28 |
+
- **Policy Attributes**: Useful for segmenting based on static characteristics like age, term, and sum assured.
|
29 |
+
- **Present Values**: Focus on accurately replicating the overall value of the portfolio.
|
30 |
+
- **Scenario Analysis**: Evaluate model point accuracy under base, lapse stress, and mortality stress scenarios.
|
31 |
+
- **Interactive Visualizations**:
|
32 |
+
- Time-series plots comparing actual vs. estimated cashflows.
|
33 |
+
- Scatter plots for per-cluster actual vs. estimated values.
|
34 |
+
- Summary bar charts comparing calibration errors.
|
35 |
+
- **Data Upload and Example Data**: Upload your own `.xlsx` files or use the included sample data.
|
36 |
|
37 |
---
|
38 |
|
39 |
+
## 📦 Getting Started
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
40 |
|
41 |
+
### Running the Application
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
+
This is a [Gradio Space](https://huggingface.co/spaces) application. If you're running locally, make sure you have Gradio installed and all required files in place. Launch it using:
|
44 |
|
45 |
+
```bash
|
46 |
+
python app.py
|