MartialTerran commited on
Commit
8949a60
·
verified ·
1 Parent(s): c333f60

Create Model_Inputs+Outputs.md

Browse files
Files changed (1) hide show
  1. Model_Inputs+Outputs.md +170 -0
Model_Inputs+Outputs.md ADDED
@@ -0,0 +1,170 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Here we define the inputs and outputs of the "black box" Transformer-based forecasting model (Enhanced_Business_Model_for_Collaborative_Predictive_Supply_Chain_model.py) within this collaborative supply chain context.
2
+ We categorize them for clarity and provide details on their format and expected characteristics.
3
+ This detailed breakdown of inputs and outputs provides a clear picture of the data requirements and the expected results of the forecasting model, serving as a solid foundation for its development and implementation within the collaborative supply chain framework. It also sets the stage for specifying data preprocessing steps, model architecture, and evaluation metrics.
4
+
5
+ **I. Inputs**
6
+
7
+ The inputs are all the data fed into the Transformer model to generate the forecasts. Since we're aiming for a comprehensive and dynamic system, the inputs are diverse and can be grouped into several categories:
8
+
9
+ **A. Historical Sales Data:**
10
+
11
+ * **Description:** Time-series data of past sales, at the most granular level possible (ideally SKU-store-day).
12
+ * **Format:**
13
+ * **Structure:** Typically a tabular format (e.g., CSV, Parquet, database table). Could also be a tensor if pre-processed for the Transformer.
14
+ * **Columns:**
15
+ * `timestamp`: Date and time of the sale (e.g., `YYYY-MM-DD HH:MM:SS` or a Unix timestamp).
16
+ * `sku`: Stock Keeping Unit (unique product identifier).
17
+ * `store_id`: Identifier for the store location.
18
+ * `quantity`: Number of units sold.
19
+ * `price`: Unit price at the time of sale.
20
+ * `discount`: Any discount applied (amount or percentage).
21
+ * **Characteristics:**
22
+ * High frequency (daily or even hourly).
23
+ * Potentially millions or billions of rows.
24
+ * May exhibit seasonality, trends, and noise.
25
+
26
+ **B. Promotional Data:**
27
+
28
+ * **Description:** Information about past, current, and *planned* promotional activities.
29
+ * **Format:**
30
+ * **Structure:** Tabular format.
31
+ * **Columns:**
32
+ * `promotion_id`: Unique identifier for the promotion.
33
+ * `sku`: SKU(s) included in the promotion.
34
+ * `store_id`: Store(s) where the promotion is active.
35
+ * `start_date`: Start date of the promotion.
36
+ * `end_date`: End date of the promotion.
37
+ * `promotion_type`: Type of promotion (e.g., "BOGO," "percentage discount," "fixed price discount," "coupon").
38
+ * `discount_value`: Value of the discount (e.g., 0.2 for a 20% discount, 5.00 for a $5 discount).
39
+ * `marketing_spend`: (Optional) Amount spent on advertising for the promotion.
40
+ * **Characteristics:**
41
+ * Less frequent than sales data.
42
+ * Should include *future* planned promotions, which are crucial for forecasting.
43
+
44
+ **C. Inventory Data:**
45
+
46
+ * **Description:** Information about current and historical inventory levels.
47
+ * **Format:**
48
+ * **Structure:** Tabular format.
49
+ * **Columns:**
50
+ * `timestamp`: Date and time of the inventory snapshot.
51
+ * `sku`: Stock Keeping Unit.
52
+ * `store_id`: Store location (or warehouse ID for wholesalers).
53
+ * `quantity_on_hand`: Number of units currently in stock.
54
+ * `quantity_on_order`: Number of units ordered but not yet received.
55
+ * `reorder_point`: (Optional) The inventory level at which a new order should be placed.
56
+ * `safety_stock` (Optional) Minimum stock.
57
+ * **Characteristics:**
58
+ * Frequency can vary (daily, weekly).
59
+
60
+ **D. External Factors:**
61
+
62
+ * **Description:** Data that is not directly related to sales or inventory but can influence demand.
63
+ * **Format:**
64
+ * **Structure:** Can be tabular or time-series data from various sources.
65
+ * **Examples:**
66
+ * **Economic Indicators:** GDP growth, unemployment rate, consumer confidence index, inflation rate. (Typically time-series data from government sources or financial data providers.)
67
+ * **Weather Data:** Temperature, precipitation, forecasts. (Time-series data from weather APIs.)
68
+ * **Holiday/Event Indicators:** Binary indicators (0 or 1) for holidays, major events, school breaks. (Typically a pre-defined calendar.)
69
+ * **Social Media Sentiment:** Aggregated sentiment scores related to the product or brand. (Requires text processing and sentiment analysis.)
70
+ * **Web Traffic Data:** Website visits, product page views, search queries. (Data from web analytics platforms.)
71
+ * **Competitor Data:** Pricing and promotional activity of competitors (if available, often through web scraping or third-party data providers).
72
+ * **Characteristics:**
73
+ * Varying frequencies and formats depending on the source.
74
+
75
+ **E. Product Metadata:**
76
+
77
+ * **Description:** Static information about the products.
78
+ * **Format:**
79
+ * **Structure:** Tabular format.
80
+ * **Columns:**
81
+ * `sku`: Stock Keeping Unit.
82
+ * `product_category`: Category the product belongs to.
83
+ * `product_subcategory`: Subcategory.
84
+ * `brand`: Brand name.
85
+ * `product_description`: Textual description (may be used for embeddings).
86
+ * `price_tier`: (Optional) Categorization based on price (e.g., "economy," "mid-range," "premium").
87
+ * **Characteristics:**
88
+ * Relatively static; changes infrequently.
89
+
90
+ **F. Store Metadata:**
91
+
92
+ * **Description:** Static information of store.
93
+ * **Format:**
94
+ * **Structure:** Tabular format.
95
+ * **Columns:**
96
+ *`store_id`: Unique store identifier.
97
+ *`location`: City and state.
98
+ *`store_type`: Physical, online, mixed.
99
+
100
+ **II. Outputs**
101
+
102
+ The outputs are the forecasts generated by the Transformer model.
103
+
104
+ **A. Probabilistic Forecasts:**
105
+
106
+ * **Description:** Instead of a single point forecast (e.g., "we will sell 100 units"), the model provides a *probability distribution* of future demand. This quantifies the uncertainty in the forecast.
107
+ * **Format:**
108
+ * **Structure:** Typically a set of quantiles (or percentiles) for each SKU-store-future time period.
109
+ * **Example:** For SKU 123, store A, on 2024-07-04, the model might output:
110
+ * `p10`: 80 units (10th percentile - there's a 10% chance demand will be 80 units or less)
111
+ * `p50`: 105 units (50th percentile - median forecast)
112
+ * `p90`: 130 units (90th percentile - there's a 90% chance demand will be 130 units or less)
113
+ * ...and other quantiles as needed (e.g., p25, p75, p95, p99).
114
+ * **Characteristics:**
115
+ * Provides a range of possible outcomes, allowing for risk-aware decision-making.
116
+ * Allows for calculation of confidence intervals.
117
+
118
+ **B. Forecast Horizon:**
119
+
120
+ * **Description:** The length of time into the future for which the model generates forecasts.
121
+ * **Format:**
122
+ * Defined by the model configuration and the needs of the business. Could be days, weeks, or months.
123
+ * Typically specified as a number of time steps (e.g., 28 days, 12 weeks).
124
+ * **Characteristics:**
125
+ * Longer horizons generally have greater uncertainty.
126
+
127
+ **C. Forecast Granularity:**
128
+
129
+ * **Description:** The level of detail at which the forecasts are generated (SKU-store-day, SKU-region-week, etc.).
130
+ * **Format:**
131
+ * Determined by the model and the available data.
132
+ * Should align with the business needs (e.g., retailers need store-level forecasts, while wholesalers might need regional forecasts).
133
+
134
+ **D. Forecast Timestamps:**
135
+
136
+ * **Description:** The specific dates and times for which the forecasts are generated.
137
+ * **Format:**
138
+ * A list or sequence of timestamps corresponding to the forecast horizon and granularity.
139
+ * Example: `[2024-07-04, 2024-07-05, 2024-07-06, ...]`
140
+
141
+ **E. (Optional) Explainability Outputs:**
142
+
143
+ * **Description:** Outputs that help explain *why* the model made a particular forecast. This is especially important for building trust and understanding.
144
+ * **Format:**
145
+ * **Attention Weights:** For Transformer models, the attention weights can be visualized to show which parts of the input sequence were most important for the prediction.
146
+ * **Feature Importance Scores:** Estimates of the relative importance of different input features.
147
+ * **SHAP Values:** A more sophisticated method for explaining individual predictions.
148
+ * **Characteristics:**
149
+ * Can be complex to interpret, but provide valuable insights.
150
+
151
+ **Summary Table:**
152
+
153
+ | Category | Description | Format | Characteristics |
154
+ | ---------------- | -------------------------------------------------------------------------------- | ------------------------------------------ | ------------------------------------------------------------------------------- |
155
+ | **Inputs** | | | |
156
+ | Historical Sales | Past sales data (SKU-store-day level) | Tabular (timestamp, sku, store_id, quantity, price, discount) | High frequency, potentially large, may exhibit seasonality/trends/noise. |
157
+ | Promotional Data | Past, current, and *planned* promotions | Tabular (promotion_id, sku, store_id, start/end dates, type, value, spend) | Less frequent than sales data, includes future promotions. |
158
+ | Inventory Data | Current and historical inventory levels | Tabular (timestamp, sku, store_id/warehouse_id, quantity_on_hand, quantity_on_order, reorder point) | Frequency varies (daily, weekly). |
159
+ | External Factors | Economic indicators, weather, holidays, social media, web traffic, competitors | Tabular or time-series (various) | Varying frequencies and formats. |
160
+ | Product Metadata | Static information about products | Tabular (sku, category, subcategory, brand, description, price_tier) | Relatively static. |
161
+ | Store Metadata | Static information of store | Tabular (store_id, location, store_type) | Relatively static.
162
+
163
+ | **Outputs** | Description | Format | Characteristics |
164
+ | ------------------ | ------------------------------------------------------ | -------------------------------------------------------------------------- | -------------------------------------------------------------------- |
165
+ | Probabilistic Forecasts | Probability distribution of future demand | Set of quantiles (p10, p50, p90, etc.) for each SKU-store-future time period | Provides a range of outcomes, quantifies uncertainty. |
166
+ | Forecast Horizon | Length of time into the future | Number of time steps (days, weeks, months) | Longer horizons have greater uncertainty. |
167
+ | Forecast Granularity| Level of detail (SKU-store-day, SKU-region-week, etc.) | Determined by model and business needs | Aligns with business requirements. |
168
+ | Forecast Timestamps | Dates/times for which forecasts are generated | List/sequence of timestamps | Corresponds to horizon and granularity. |
169
+ | Explainability (Optional) | Outputs that explain model predictions | Attention weights, feature importance scores, SHAP values | Complex to interpret, but provide valuable insights. |
170
+