MartialTerran's picture
Create Model_Inputs+Outputs.md
8949a60 verified

Here we define the inputs and outputs of the "black box" Transformer-based forecasting model (Enhanced_Business_Model_for_Collaborative_Predictive_Supply_Chain_model.py) within this collaborative supply chain context. We categorize them for clarity and provide details on their format and expected characteristics. This detailed breakdown of inputs and outputs provides a clear picture of the data requirements and the expected results of the forecasting model, serving as a solid foundation for its development and implementation within the collaborative supply chain framework. It also sets the stage for specifying data preprocessing steps, model architecture, and evaluation metrics.

I. Inputs

The inputs are all the data fed into the Transformer model to generate the forecasts. Since we're aiming for a comprehensive and dynamic system, the inputs are diverse and can be grouped into several categories:

A. Historical Sales Data:

  • Description: Time-series data of past sales, at the most granular level possible (ideally SKU-store-day).
  • Format:
    • Structure: Typically a tabular format (e.g., CSV, Parquet, database table). Could also be a tensor if pre-processed for the Transformer.
    • Columns:
      • timestamp: Date and time of the sale (e.g., YYYY-MM-DD HH:MM:SS or a Unix timestamp).
      • sku: Stock Keeping Unit (unique product identifier).
      • store_id: Identifier for the store location.
      • quantity: Number of units sold.
      • price: Unit price at the time of sale.
      • discount: Any discount applied (amount or percentage).
  • Characteristics:
    • High frequency (daily or even hourly).
    • Potentially millions or billions of rows.
    • May exhibit seasonality, trends, and noise.

B. Promotional Data:

  • Description: Information about past, current, and planned promotional activities.
  • Format:
    • Structure: Tabular format.
    • Columns:
      • promotion_id: Unique identifier for the promotion.
      • sku: SKU(s) included in the promotion.
      • store_id: Store(s) where the promotion is active.
      • start_date: Start date of the promotion.
      • end_date: End date of the promotion.
      • promotion_type: Type of promotion (e.g., "BOGO," "percentage discount," "fixed price discount," "coupon").
      • discount_value: Value of the discount (e.g., 0.2 for a 20% discount, 5.00 for a $5 discount).
      • marketing_spend: (Optional) Amount spent on advertising for the promotion.
  • Characteristics:
    • Less frequent than sales data.
    • Should include future planned promotions, which are crucial for forecasting.

C. Inventory Data:

  • Description: Information about current and historical inventory levels.
  • Format:
    • Structure: Tabular format.
    • Columns:
      • timestamp: Date and time of the inventory snapshot.
      • sku: Stock Keeping Unit.
      • store_id: Store location (or warehouse ID for wholesalers).
      • quantity_on_hand: Number of units currently in stock.
      • quantity_on_order: Number of units ordered but not yet received.
      • reorder_point: (Optional) The inventory level at which a new order should be placed.
      • safety_stock (Optional) Minimum stock.
  • Characteristics:
    • Frequency can vary (daily, weekly).

D. External Factors:

  • Description: Data that is not directly related to sales or inventory but can influence demand.
  • Format:
    • Structure: Can be tabular or time-series data from various sources.
    • Examples:
      • Economic Indicators: GDP growth, unemployment rate, consumer confidence index, inflation rate. (Typically time-series data from government sources or financial data providers.)
      • Weather Data: Temperature, precipitation, forecasts. (Time-series data from weather APIs.)
      • Holiday/Event Indicators: Binary indicators (0 or 1) for holidays, major events, school breaks. (Typically a pre-defined calendar.)
      • Social Media Sentiment: Aggregated sentiment scores related to the product or brand. (Requires text processing and sentiment analysis.)
      • Web Traffic Data: Website visits, product page views, search queries. (Data from web analytics platforms.)
      • Competitor Data: Pricing and promotional activity of competitors (if available, often through web scraping or third-party data providers).
  • Characteristics:
    • Varying frequencies and formats depending on the source.

E. Product Metadata:

  • Description: Static information about the products.
  • Format:
    • Structure: Tabular format.
    • Columns:
      • sku: Stock Keeping Unit.
      • product_category: Category the product belongs to.
      • product_subcategory: Subcategory.
      • brand: Brand name.
      • product_description: Textual description (may be used for embeddings).
      • price_tier: (Optional) Categorization based on price (e.g., "economy," "mid-range," "premium").
  • Characteristics:
    • Relatively static; changes infrequently.

F. Store Metadata:

  • Description: Static information of store.
  • Format:
    • Structure: Tabular format.
      • Columns: *store_id: Unique store identifier. *location: City and state. *store_type: Physical, online, mixed.

II. Outputs

The outputs are the forecasts generated by the Transformer model.

A. Probabilistic Forecasts:

  • Description: Instead of a single point forecast (e.g., "we will sell 100 units"), the model provides a probability distribution of future demand. This quantifies the uncertainty in the forecast.
  • Format:
    • Structure: Typically a set of quantiles (or percentiles) for each SKU-store-future time period.
    • Example: For SKU 123, store A, on 2024-07-04, the model might output:
      • p10: 80 units (10th percentile - there's a 10% chance demand will be 80 units or less)
      • p50: 105 units (50th percentile - median forecast)
      • p90: 130 units (90th percentile - there's a 90% chance demand will be 130 units or less)
      • ...and other quantiles as needed (e.g., p25, p75, p95, p99).
  • Characteristics:
    • Provides a range of possible outcomes, allowing for risk-aware decision-making.
    • Allows for calculation of confidence intervals.

B. Forecast Horizon:

  • Description: The length of time into the future for which the model generates forecasts.
  • Format:
    • Defined by the model configuration and the needs of the business. Could be days, weeks, or months.
    • Typically specified as a number of time steps (e.g., 28 days, 12 weeks).
  • Characteristics:
    • Longer horizons generally have greater uncertainty.

C. Forecast Granularity:

  • Description: The level of detail at which the forecasts are generated (SKU-store-day, SKU-region-week, etc.).
  • Format:
    • Determined by the model and the available data.
    • Should align with the business needs (e.g., retailers need store-level forecasts, while wholesalers might need regional forecasts).

D. Forecast Timestamps:

  • Description: The specific dates and times for which the forecasts are generated.
  • Format:
    • A list or sequence of timestamps corresponding to the forecast horizon and granularity.
    • Example: [2024-07-04, 2024-07-05, 2024-07-06, ...]

E. (Optional) Explainability Outputs:

  • Description: Outputs that help explain why the model made a particular forecast. This is especially important for building trust and understanding.
  • Format:
    • Attention Weights: For Transformer models, the attention weights can be visualized to show which parts of the input sequence were most important for the prediction.
    • Feature Importance Scores: Estimates of the relative importance of different input features.
    • SHAP Values: A more sophisticated method for explaining individual predictions.
  • Characteristics:
    • Can be complex to interpret, but provide valuable insights.

Summary Table:

Category Description Format Characteristics
Inputs
Historical Sales Past sales data (SKU-store-day level) Tabular (timestamp, sku, store_id, quantity, price, discount) High frequency, potentially large, may exhibit seasonality/trends/noise.
Promotional Data Past, current, and planned promotions Tabular (promotion_id, sku, store_id, start/end dates, type, value, spend) Less frequent than sales data, includes future promotions.
Inventory Data Current and historical inventory levels Tabular (timestamp, sku, store_id/warehouse_id, quantity_on_hand, quantity_on_order, reorder point) Frequency varies (daily, weekly).
External Factors Economic indicators, weather, holidays, social media, web traffic, competitors Tabular or time-series (various) Varying frequencies and formats.
Product Metadata Static information about products Tabular (sku, category, subcategory, brand, description, price_tier) Relatively static.
Store Metadata Static information of store Tabular (store_id, location, store_type) Relatively static.
Outputs Description Format Characteristics
Probabilistic Forecasts Probability distribution of future demand Set of quantiles (p10, p50, p90, etc.) for each SKU-store-future time period Provides a range of outcomes, quantifies uncertainty.
Forecast Horizon Length of time into the future Number of time steps (days, weeks, months) Longer horizons have greater uncertainty.
Forecast Granularity Level of detail (SKU-store-day, SKU-region-week, etc.) Determined by model and business needs Aligns with business requirements.
Forecast Timestamps Dates/times for which forecasts are generated List/sequence of timestamps Corresponds to horizon and granularity.
Explainability (Optional) Outputs that explain model predictions Attention weights, feature importance scores, SHAP values Complex to interpret, but provide valuable insights.