Spaces:

alidenewade
/

model-point-clustering

Sleeping

App Files Files Community

alidenewade commited on May 23

Commit

87611c6

verified ·

1 Parent(s): 05ec79b

Update README.md

Browse files

Files changed (1) hide show

README.md +18 -97

README.md CHANGED Viewed

@@ -10,116 +10,37 @@ pinned: false
 license: mit
 short_description: Cluster-based model point selection for actuarial analysis.
 ---
 # Cluster Model Points Analysis
 ---
 This application provides a powerful tool for actuaries and financial professionals to analyze and select representative **model points** for large insurance portfolios using **K-Means clustering**. By calibrating clusters based on different variable sets—such as **cashflows**, **policy attributes**, or **present values**—you can assess the accuracy of your model point selection across various financial metrics and stress scenarios.
-**Note:** This Gradio application is inspired by and duplicates the core logic and analysis presented in one of the [Lifelib](https://lifelib.io/) project's example notebooks, specifically related to model point clustering and calibration. Lifelib is an open-source Python library of actuarial models.
-## Key Features
-* **Flexible Calibration Methods**: Choose to calibrate your clusters using:
-    * **Annual Cashflows**: Ideal for capturing the dynamic financial behavior of policies.
-    * **Policy Attributes**: Useful for segmenting based on static characteristics like age, term, and sum assured.
-    * **Present Values**: Focus on accurately replicating the overall value of the portfolio.
-* **Scenario Analysis**: Evaluate model point accuracy under base, lapse stress, and mortality stress scenarios.
-* **Interactive Visualizations**: Gain insights through:
-    * **Time-series plots** comparing actual vs. estimated cashflows for different scenarios.
-    * **Scatter plots** showing per-cluster actual vs. estimated values.
-    * **Summary bar chart** comparing calibration errors across methods and scenarios.
-* **Data Upload and Example Data**: Easily upload your own `.xlsx` files or use the provided example dataset to get started immediately.
 ---
-## Getting Started
-### Running the Application
-This application is designed to run as a Gradio Space. You can launch it directly if you have Gradio installed and the required files in place.
-### Preparing Your Data
-The application expects seven `.xlsx` files. Ensure your data is structured correctly with `policy_id` as the index for cashflow and present value files, and specific columns for policy data.
-**Required Files:**
-* `cashflows_seriatim_10K.xlsx`: Base scenario cashflows.
-* `cashflows_seriatim_10K_lapse50.xlsx`: Cashflows under a lapse stress (+50%).
-* `cashflows_seriatim_10K_mort15.xlsx`: Cashflows under a mortality stress (+15%).
-* `model_point_table.xlsx`: Policy data including `age_at_entry`, `policy_term`, `sum_assured`, and `duration_mth`.
-* `pv_seriatim_10K.xlsx`: Present values for the base scenario.
-* `pv_seriatim_10K_lapse50.xlsx`: Present values under a lapse stress.
-* `pv_seriatim_10K_mort15.xlsx`: Present values under a mortality stress.
-**Example Data:**
-For quick testing, place the example `.xlsx` files within an `eg_data` directory in the same location as your application script. You can then use the "Load Example Data" button within the interface.
-The expected structure for example files is:
-<pre><code>
-├── app.py
-└── eg_data/
-    ├── cashflows_seriatim_10K.xlsx
-    ├── cashflows_seriatim_10K_lapse50.xlsx
-    ├── cashflows_seriatim_10K_mort15.xlsx
-    ├── model_point_table.xlsx
-    ├── pv_seriatim_10K.xlsx
-    ├── pv_seriatim_10K_lapse50.xlsx
-    └── pv_seriatim_10K_mort15.xlsx
-<code></pre>
 ---
-## How to Use
-1.  **Launch the Application**: Run the `app.py` script.
-2.  **Upload or Load Data**:
-    * **Upload Your Own**: Click the "Upload Files" buttons and select your corresponding `.xlsx` files.
-    * **Load Example**: Click the "Load Example Data" button. This will pre-fill the file paths with the example data (assuming they are in the `eg_data` directory).
-3.  **Analyze Dataset**: Once all files are loaded (either uploaded or from examples), click the "Analyze Dataset" button.
-4.  **View Results**: Navigate through the tabs:
-    * **Summary**: See an overall comparison of calibration methods based on error in total PV Net Cashflow.
-    * **Cashflow Calibration**: View detailed comparisons and plots when clusters are calibrated using cashflows.
-    * **Policy Attribute Calibration**: See results when policy attributes are used for calibration.
-    * **Present Value Calibration**: Explore outcomes when present values are the basis for clustering.
----
-## Technical Details
-The core of this application is the `Clusters` class, which encapsulates the K-Means clustering logic. It identifies representative policies for each cluster and provides methods to aggregate and compare actual portfolio values against estimates derived from the clustered representatives.
-The application leverages:
-* **`gradio`**: For building the interactive web interface.
-* **`numpy` and `pandas`**: For efficient data manipulation and numerical operations.
-* **`sklearn.cluster.KMeans`**: For performing the clustering algorithm.
-* **`matplotlib` and `PIL`**: For generating and displaying plots.
----
-## Citation
-This application's methodology for model point clustering and analysis is a direct adaptation of a notebook found in the [Lifelib](https://lifelib.io/) open-source actuarial library. If you use this application or its underlying logic in academic or professional contexts, please cite the Lifelib project.
-Based on the information from their GitHub and `lifelib.io`, Lifelib is an open-source project that encourages contributions and aims to be transparent and versatile. While they don't specify a formal citation style, a suitable citation for the Lifelib project itself would be:
-> lifelib Developers. (Current Year). *lifelib: Life actuarial models in Python*. Retrieved from [https://github.com/lifelib-dev/lifelib](https://github.com/lifelib-dev/lifelib)
-(Replace "Current Year" with the year you accessed or used the library. For example, "2018-2025" as seen on their GitHub page, or just the current year.)
-If you are referencing a specific notebook that this application is based on, you could extend the citation as follows (you would need to identify the exact notebook on the Lifelib GitHub or documentation):
-> lifelib Developers. (Current Year). *[Title of Specific Lifelib Notebook, e.g., "Model Point Clustering Example"]*. In *lifelib: Life actuarial models in Python*. Retrieved from https://github.com/jupyter/notebook/blob/master/docs/source/examples/Notebook/Notebook%20Basics.ipynb
----
-## Need help or have suggestions?
-Feel free to open an issue or suggest improvements if you encounter any problems or have ideas for new features.
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 license: mit
 short_description: Cluster-based model point selection for actuarial analysis.
 ---
 # Cluster Model Points Analysis
 ---
 This application provides a powerful tool for actuaries and financial professionals to analyze and select representative **model points** for large insurance portfolios using **K-Means clustering**. By calibrating clusters based on different variable sets—such as **cashflows**, **policy attributes**, or **present values**—you can assess the accuracy of your model point selection across various financial metrics and stress scenarios.
+> **Note:** This Gradio application is inspired by and duplicates the core logic and analysis presented in one of the [Lifelib](https://lifelib.io/) project's example notebooks, specifically related to model point clustering and calibration. Lifelib is an open-source Python library of actuarial models.
 ---
+## 🚀 Key Features
+- **Flexible Calibration Methods**:
+  - **Annual Cashflows**: Ideal for capturing dynamic financial behavior of policies.
+  - **Policy Attributes**: Useful for segmenting based on static characteristics like age, term, and sum assured.
+  - **Present Values**: Focus on accurately replicating the overall value of the portfolio.
+- **Scenario Analysis**: Evaluate model point accuracy under base, lapse stress, and mortality stress scenarios.
+- **Interactive Visualizations**:
+  - Time-series plots comparing actual vs. estimated cashflows.
+  - Scatter plots for per-cluster actual vs. estimated values.
+  - Summary bar charts comparing calibration errors.
+- **Data Upload and Example Data**: Upload your own `.xlsx` files or use the included sample data.
 ---
+## 📦 Getting Started
+### Running the Application
+This is a [Gradio Space](https://huggingface.co/spaces) application. If you're running locally, make sure you have Gradio installed and all required files in place. Launch it using:
+```bash
+python app.py