Create Model_Inputs+Outputs.md
Browse files- Model_Inputs+Outputs.md +170 -0
Model_Inputs+Outputs.md
ADDED
@@ -0,0 +1,170 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Here we define the inputs and outputs of the "black box" Transformer-based forecasting model (Enhanced_Business_Model_for_Collaborative_Predictive_Supply_Chain_model.py) within this collaborative supply chain context.
|
2 |
+
We categorize them for clarity and provide details on their format and expected characteristics.
|
3 |
+
This detailed breakdown of inputs and outputs provides a clear picture of the data requirements and the expected results of the forecasting model, serving as a solid foundation for its development and implementation within the collaborative supply chain framework. It also sets the stage for specifying data preprocessing steps, model architecture, and evaluation metrics.
|
4 |
+
|
5 |
+
**I. Inputs**
|
6 |
+
|
7 |
+
The inputs are all the data fed into the Transformer model to generate the forecasts. Since we're aiming for a comprehensive and dynamic system, the inputs are diverse and can be grouped into several categories:
|
8 |
+
|
9 |
+
**A. Historical Sales Data:**
|
10 |
+
|
11 |
+
* **Description:** Time-series data of past sales, at the most granular level possible (ideally SKU-store-day).
|
12 |
+
* **Format:**
|
13 |
+
* **Structure:** Typically a tabular format (e.g., CSV, Parquet, database table). Could also be a tensor if pre-processed for the Transformer.
|
14 |
+
* **Columns:**
|
15 |
+
* `timestamp`: Date and time of the sale (e.g., `YYYY-MM-DD HH:MM:SS` or a Unix timestamp).
|
16 |
+
* `sku`: Stock Keeping Unit (unique product identifier).
|
17 |
+
* `store_id`: Identifier for the store location.
|
18 |
+
* `quantity`: Number of units sold.
|
19 |
+
* `price`: Unit price at the time of sale.
|
20 |
+
* `discount`: Any discount applied (amount or percentage).
|
21 |
+
* **Characteristics:**
|
22 |
+
* High frequency (daily or even hourly).
|
23 |
+
* Potentially millions or billions of rows.
|
24 |
+
* May exhibit seasonality, trends, and noise.
|
25 |
+
|
26 |
+
**B. Promotional Data:**
|
27 |
+
|
28 |
+
* **Description:** Information about past, current, and *planned* promotional activities.
|
29 |
+
* **Format:**
|
30 |
+
* **Structure:** Tabular format.
|
31 |
+
* **Columns:**
|
32 |
+
* `promotion_id`: Unique identifier for the promotion.
|
33 |
+
* `sku`: SKU(s) included in the promotion.
|
34 |
+
* `store_id`: Store(s) where the promotion is active.
|
35 |
+
* `start_date`: Start date of the promotion.
|
36 |
+
* `end_date`: End date of the promotion.
|
37 |
+
* `promotion_type`: Type of promotion (e.g., "BOGO," "percentage discount," "fixed price discount," "coupon").
|
38 |
+
* `discount_value`: Value of the discount (e.g., 0.2 for a 20% discount, 5.00 for a $5 discount).
|
39 |
+
* `marketing_spend`: (Optional) Amount spent on advertising for the promotion.
|
40 |
+
* **Characteristics:**
|
41 |
+
* Less frequent than sales data.
|
42 |
+
* Should include *future* planned promotions, which are crucial for forecasting.
|
43 |
+
|
44 |
+
**C. Inventory Data:**
|
45 |
+
|
46 |
+
* **Description:** Information about current and historical inventory levels.
|
47 |
+
* **Format:**
|
48 |
+
* **Structure:** Tabular format.
|
49 |
+
* **Columns:**
|
50 |
+
* `timestamp`: Date and time of the inventory snapshot.
|
51 |
+
* `sku`: Stock Keeping Unit.
|
52 |
+
* `store_id`: Store location (or warehouse ID for wholesalers).
|
53 |
+
* `quantity_on_hand`: Number of units currently in stock.
|
54 |
+
* `quantity_on_order`: Number of units ordered but not yet received.
|
55 |
+
* `reorder_point`: (Optional) The inventory level at which a new order should be placed.
|
56 |
+
* `safety_stock` (Optional) Minimum stock.
|
57 |
+
* **Characteristics:**
|
58 |
+
* Frequency can vary (daily, weekly).
|
59 |
+
|
60 |
+
**D. External Factors:**
|
61 |
+
|
62 |
+
* **Description:** Data that is not directly related to sales or inventory but can influence demand.
|
63 |
+
* **Format:**
|
64 |
+
* **Structure:** Can be tabular or time-series data from various sources.
|
65 |
+
* **Examples:**
|
66 |
+
* **Economic Indicators:** GDP growth, unemployment rate, consumer confidence index, inflation rate. (Typically time-series data from government sources or financial data providers.)
|
67 |
+
* **Weather Data:** Temperature, precipitation, forecasts. (Time-series data from weather APIs.)
|
68 |
+
* **Holiday/Event Indicators:** Binary indicators (0 or 1) for holidays, major events, school breaks. (Typically a pre-defined calendar.)
|
69 |
+
* **Social Media Sentiment:** Aggregated sentiment scores related to the product or brand. (Requires text processing and sentiment analysis.)
|
70 |
+
* **Web Traffic Data:** Website visits, product page views, search queries. (Data from web analytics platforms.)
|
71 |
+
* **Competitor Data:** Pricing and promotional activity of competitors (if available, often through web scraping or third-party data providers).
|
72 |
+
* **Characteristics:**
|
73 |
+
* Varying frequencies and formats depending on the source.
|
74 |
+
|
75 |
+
**E. Product Metadata:**
|
76 |
+
|
77 |
+
* **Description:** Static information about the products.
|
78 |
+
* **Format:**
|
79 |
+
* **Structure:** Tabular format.
|
80 |
+
* **Columns:**
|
81 |
+
* `sku`: Stock Keeping Unit.
|
82 |
+
* `product_category`: Category the product belongs to.
|
83 |
+
* `product_subcategory`: Subcategory.
|
84 |
+
* `brand`: Brand name.
|
85 |
+
* `product_description`: Textual description (may be used for embeddings).
|
86 |
+
* `price_tier`: (Optional) Categorization based on price (e.g., "economy," "mid-range," "premium").
|
87 |
+
* **Characteristics:**
|
88 |
+
* Relatively static; changes infrequently.
|
89 |
+
|
90 |
+
**F. Store Metadata:**
|
91 |
+
|
92 |
+
* **Description:** Static information of store.
|
93 |
+
* **Format:**
|
94 |
+
* **Structure:** Tabular format.
|
95 |
+
* **Columns:**
|
96 |
+
*`store_id`: Unique store identifier.
|
97 |
+
*`location`: City and state.
|
98 |
+
*`store_type`: Physical, online, mixed.
|
99 |
+
|
100 |
+
**II. Outputs**
|
101 |
+
|
102 |
+
The outputs are the forecasts generated by the Transformer model.
|
103 |
+
|
104 |
+
**A. Probabilistic Forecasts:**
|
105 |
+
|
106 |
+
* **Description:** Instead of a single point forecast (e.g., "we will sell 100 units"), the model provides a *probability distribution* of future demand. This quantifies the uncertainty in the forecast.
|
107 |
+
* **Format:**
|
108 |
+
* **Structure:** Typically a set of quantiles (or percentiles) for each SKU-store-future time period.
|
109 |
+
* **Example:** For SKU 123, store A, on 2024-07-04, the model might output:
|
110 |
+
* `p10`: 80 units (10th percentile - there's a 10% chance demand will be 80 units or less)
|
111 |
+
* `p50`: 105 units (50th percentile - median forecast)
|
112 |
+
* `p90`: 130 units (90th percentile - there's a 90% chance demand will be 130 units or less)
|
113 |
+
* ...and other quantiles as needed (e.g., p25, p75, p95, p99).
|
114 |
+
* **Characteristics:**
|
115 |
+
* Provides a range of possible outcomes, allowing for risk-aware decision-making.
|
116 |
+
* Allows for calculation of confidence intervals.
|
117 |
+
|
118 |
+
**B. Forecast Horizon:**
|
119 |
+
|
120 |
+
* **Description:** The length of time into the future for which the model generates forecasts.
|
121 |
+
* **Format:**
|
122 |
+
* Defined by the model configuration and the needs of the business. Could be days, weeks, or months.
|
123 |
+
* Typically specified as a number of time steps (e.g., 28 days, 12 weeks).
|
124 |
+
* **Characteristics:**
|
125 |
+
* Longer horizons generally have greater uncertainty.
|
126 |
+
|
127 |
+
**C. Forecast Granularity:**
|
128 |
+
|
129 |
+
* **Description:** The level of detail at which the forecasts are generated (SKU-store-day, SKU-region-week, etc.).
|
130 |
+
* **Format:**
|
131 |
+
* Determined by the model and the available data.
|
132 |
+
* Should align with the business needs (e.g., retailers need store-level forecasts, while wholesalers might need regional forecasts).
|
133 |
+
|
134 |
+
**D. Forecast Timestamps:**
|
135 |
+
|
136 |
+
* **Description:** The specific dates and times for which the forecasts are generated.
|
137 |
+
* **Format:**
|
138 |
+
* A list or sequence of timestamps corresponding to the forecast horizon and granularity.
|
139 |
+
* Example: `[2024-07-04, 2024-07-05, 2024-07-06, ...]`
|
140 |
+
|
141 |
+
**E. (Optional) Explainability Outputs:**
|
142 |
+
|
143 |
+
* **Description:** Outputs that help explain *why* the model made a particular forecast. This is especially important for building trust and understanding.
|
144 |
+
* **Format:**
|
145 |
+
* **Attention Weights:** For Transformer models, the attention weights can be visualized to show which parts of the input sequence were most important for the prediction.
|
146 |
+
* **Feature Importance Scores:** Estimates of the relative importance of different input features.
|
147 |
+
* **SHAP Values:** A more sophisticated method for explaining individual predictions.
|
148 |
+
* **Characteristics:**
|
149 |
+
* Can be complex to interpret, but provide valuable insights.
|
150 |
+
|
151 |
+
**Summary Table:**
|
152 |
+
|
153 |
+
| Category | Description | Format | Characteristics |
|
154 |
+
| ---------------- | -------------------------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------- |
|
155 |
+
| **Inputs** | | | |
|
156 |
+
| Historical Sales | Past sales data (SKU-store-day level) | Tabular (timestamp, sku, store_id, quantity, price, discount) | High frequency, potentially large, may exhibit seasonality/trends/noise. |
|
157 |
+
| Promotional Data | Past, current, and *planned* promotions | Tabular (promotion_id, sku, store_id, start/end dates, type, value, spend) | Less frequent than sales data, includes future promotions. |
|
158 |
+
| Inventory Data | Current and historical inventory levels | Tabular (timestamp, sku, store_id/warehouse_id, quantity_on_hand, quantity_on_order, reorder point) | Frequency varies (daily, weekly). |
|
159 |
+
| External Factors | Economic indicators, weather, holidays, social media, web traffic, competitors | Tabular or time-series (various) | Varying frequencies and formats. |
|
160 |
+
| Product Metadata | Static information about products | Tabular (sku, category, subcategory, brand, description, price_tier) | Relatively static. |
|
161 |
+
| Store Metadata | Static information of store | Tabular (store_id, location, store_type) | Relatively static.
|
162 |
+
|
163 |
+
| **Outputs** | Description | Format | Characteristics |
|
164 |
+
| ------------------ | ------------------------------------------------------ | -------------------------------------------------------------------------- | -------------------------------------------------------------------- |
|
165 |
+
| Probabilistic Forecasts | Probability distribution of future demand | Set of quantiles (p10, p50, p90, etc.) for each SKU-store-future time period | Provides a range of outcomes, quantifies uncertainty. |
|
166 |
+
| Forecast Horizon | Length of time into the future | Number of time steps (days, weeks, months) | Longer horizons have greater uncertainty. |
|
167 |
+
| Forecast Granularity| Level of detail (SKU-store-day, SKU-region-week, etc.) | Determined by model and business needs | Aligns with business requirements. |
|
168 |
+
| Forecast Timestamps | Dates/times for which forecasts are generated | List/sequence of timestamps | Corresponds to horizon and granularity. |
|
169 |
+
| Explainability (Optional) | Outputs that explain model predictions | Attention weights, feature importance scores, SHAP values | Complex to interpret, but provide valuable insights. |
|
170 |
+
|