alidenewade commited on
Commit
87611c6
·
verified ·
1 Parent(s): 05ec79b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -97
README.md CHANGED
@@ -10,116 +10,37 @@ pinned: false
10
  license: mit
11
  short_description: Cluster-based model point selection for actuarial analysis.
12
  ---
 
13
  # Cluster Model Points Analysis
14
 
15
  ---
16
 
17
  This application provides a powerful tool for actuaries and financial professionals to analyze and select representative **model points** for large insurance portfolios using **K-Means clustering**. By calibrating clusters based on different variable sets—such as **cashflows**, **policy attributes**, or **present values**—you can assess the accuracy of your model point selection across various financial metrics and stress scenarios.
18
 
19
- **Note:** This Gradio application is inspired by and duplicates the core logic and analysis presented in one of the [Lifelib](https://lifelib.io/) project's example notebooks, specifically related to model point clustering and calibration. Lifelib is an open-source Python library of actuarial models.
20
-
21
- ## Key Features
22
-
23
- * **Flexible Calibration Methods**: Choose to calibrate your clusters using:
24
- * **Annual Cashflows**: Ideal for capturing the dynamic financial behavior of policies.
25
- * **Policy Attributes**: Useful for segmenting based on static characteristics like age, term, and sum assured.
26
- * **Present Values**: Focus on accurately replicating the overall value of the portfolio.
27
- * **Scenario Analysis**: Evaluate model point accuracy under base, lapse stress, and mortality stress scenarios.
28
- * **Interactive Visualizations**: Gain insights through:
29
- * **Time-series plots** comparing actual vs. estimated cashflows for different scenarios.
30
- * **Scatter plots** showing per-cluster actual vs. estimated values.
31
- * **Summary bar chart** comparing calibration errors across methods and scenarios.
32
- * **Data Upload and Example Data**: Easily upload your own `.xlsx` files or use the provided example dataset to get started immediately.
33
 
34
  ---
35
 
36
- ## Getting Started
37
-
38
- ### Running the Application
39
-
40
- This application is designed to run as a Gradio Space. You can launch it directly if you have Gradio installed and the required files in place.
41
-
42
- ### Preparing Your Data
43
-
44
- The application expects seven `.xlsx` files. Ensure your data is structured correctly with `policy_id` as the index for cashflow and present value files, and specific columns for policy data.
45
-
46
- **Required Files:**
47
-
48
- * `cashflows_seriatim_10K.xlsx`: Base scenario cashflows.
49
- * `cashflows_seriatim_10K_lapse50.xlsx`: Cashflows under a lapse stress (+50%).
50
- * `cashflows_seriatim_10K_mort15.xlsx`: Cashflows under a mortality stress (+15%).
51
- * `model_point_table.xlsx`: Policy data including `age_at_entry`, `policy_term`, `sum_assured`, and `duration_mth`.
52
- * `pv_seriatim_10K.xlsx`: Present values for the base scenario.
53
- * `pv_seriatim_10K_lapse50.xlsx`: Present values under a lapse stress.
54
- * `pv_seriatim_10K_mort15.xlsx`: Present values under a mortality stress.
55
-
56
- **Example Data:**
57
-
58
- For quick testing, place the example `.xlsx` files within an `eg_data` directory in the same location as your application script. You can then use the "Load Example Data" button within the interface.
59
-
60
- The expected structure for example files is:
61
-
62
- <pre><code>
63
- ├── app.py
64
- └── eg_data/
65
- ├── cashflows_seriatim_10K.xlsx
66
- ├── cashflows_seriatim_10K_lapse50.xlsx
67
- ├── cashflows_seriatim_10K_mort15.xlsx
68
- ├── model_point_table.xlsx
69
- ├── pv_seriatim_10K.xlsx
70
- ├── pv_seriatim_10K_lapse50.xlsx
71
- └── pv_seriatim_10K_mort15.xlsx
72
- <code></pre>
73
 
 
 
 
 
 
 
 
 
 
 
74
 
75
  ---
76
 
77
- ## How to Use
78
-
79
- 1. **Launch the Application**: Run the `app.py` script.
80
- 2. **Upload or Load Data**:
81
- * **Upload Your Own**: Click the "Upload Files" buttons and select your corresponding `.xlsx` files.
82
- * **Load Example**: Click the "Load Example Data" button. This will pre-fill the file paths with the example data (assuming they are in the `eg_data` directory).
83
- 3. **Analyze Dataset**: Once all files are loaded (either uploaded or from examples), click the "Analyze Dataset" button.
84
- 4. **View Results**: Navigate through the tabs:
85
- * **Summary**: See an overall comparison of calibration methods based on error in total PV Net Cashflow.
86
- * **Cashflow Calibration**: View detailed comparisons and plots when clusters are calibrated using cashflows.
87
- * **Policy Attribute Calibration**: See results when policy attributes are used for calibration.
88
- * **Present Value Calibration**: Explore outcomes when present values are the basis for clustering.
89
-
90
- ---
91
 
92
- ## Technical Details
93
-
94
- The core of this application is the `Clusters` class, which encapsulates the K-Means clustering logic. It identifies representative policies for each cluster and provides methods to aggregate and compare actual portfolio values against estimates derived from the clustered representatives.
95
-
96
- The application leverages:
97
-
98
- * **`gradio`**: For building the interactive web interface.
99
- * **`numpy` and `pandas`**: For efficient data manipulation and numerical operations.
100
- * **`sklearn.cluster.KMeans`**: For performing the clustering algorithm.
101
- * **`matplotlib` and `PIL`**: For generating and displaying plots.
102
-
103
- ---
104
-
105
- ## Citation
106
-
107
- This application's methodology for model point clustering and analysis is a direct adaptation of a notebook found in the [Lifelib](https://lifelib.io/) open-source actuarial library. If you use this application or its underlying logic in academic or professional contexts, please cite the Lifelib project.
108
-
109
- Based on the information from their GitHub and `lifelib.io`, Lifelib is an open-source project that encourages contributions and aims to be transparent and versatile. While they don't specify a formal citation style, a suitable citation for the Lifelib project itself would be:
110
-
111
- > lifelib Developers. (Current Year). *lifelib: Life actuarial models in Python*. Retrieved from [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib)
112
-
113
- (Replace "Current Year" with the year you accessed or used the library. For example, "2018-2025" as seen on their GitHub page, or just the current year.)
114
-
115
- If you are referencing a specific notebook that this application is based on, you could extend the citation as follows (you would need to identify the exact notebook on the Lifelib GitHub or documentation):
116
-
117
- > lifelib Developers. (Current Year). *[Title of Specific Lifelib Notebook, e.g., "Model Point Clustering Example"]*. In *lifelib: Life actuarial models in Python*. Retrieved from https://github.com/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb
118
-
119
- ---
120
-
121
- ## Need help or have suggestions?
122
 
123
- Feel free to open an issue or suggest improvements if you encounter any problems or have ideas for new features.
124
 
125
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
10
  license: mit
11
  short_description: Cluster-based model point selection for actuarial analysis.
12
  ---
13
+
14
  # Cluster Model Points Analysis
15
 
16
  ---
17
 
18
  This application provides a powerful tool for actuaries and financial professionals to analyze and select representative **model points** for large insurance portfolios using **K-Means clustering**. By calibrating clusters based on different variable sets—such as **cashflows**, **policy attributes**, or **present values**—you can assess the accuracy of your model point selection across various financial metrics and stress scenarios.
19
 
20
+ > **Note:** This Gradio application is inspired by and duplicates the core logic and analysis presented in one of the [Lifelib](https://lifelib.io/) project's example notebooks, specifically related to model point clustering and calibration. Lifelib is an open-source Python library of actuarial models.
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ---
23
 
24
+ ## 🚀 Key Features
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
+ - **Flexible Calibration Methods**:
27
+ - **Annual Cashflows**: Ideal for capturing dynamic financial behavior of policies.
28
+ - **Policy Attributes**: Useful for segmenting based on static characteristics like age, term, and sum assured.
29
+ - **Present Values**: Focus on accurately replicating the overall value of the portfolio.
30
+ - **Scenario Analysis**: Evaluate model point accuracy under base, lapse stress, and mortality stress scenarios.
31
+ - **Interactive Visualizations**:
32
+ - Time-series plots comparing actual vs. estimated cashflows.
33
+ - Scatter plots for per-cluster actual vs. estimated values.
34
+ - Summary bar charts comparing calibration errors.
35
+ - **Data Upload and Example Data**: Upload your own `.xlsx` files or use the included sample data.
36
 
37
  ---
38
 
39
+ ## 📦 Getting Started
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
+ ### Running the Application
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
+ This is a [Gradio Space](https://huggingface.co/spaces) application. If you're running locally, make sure you have Gradio installed and all required files in place. Launch it using:
44
 
45
+ ```bash
46
+ python app.py