CNN + BiLSTM Product Classification Model

This model uses a hybrid CNN + BiLSTM architecture for product classification. This model uses a hybrid CNN + BiLSTM architecture for classifying Turkish product titles into Google Product Taxonomy (GPT) category IDs. The model was trained on a custom Turkish-language dataset of 100,000 e-commerce product titles, manually and semi-automatically mapped to the official Google Product Taxonomy. This implementation is inspired by the methodology described in the following work:

Shogo D. Suzuki, Yohei Iseki, Hiroaki Shiino, Hongwei Zhang, Aya Iwamoto, and Fumihiko Takahashi. 2018. Convolutional Neural Network and Bidirectional LSTM Based Taxonomy Classification Using External Dataset at SIGIR eCom Data Challenge. In Proceedings of the SIGIR 2018 Workshop on eCommerce (SIGIR eCom 2018), Ann Arbor, Michigan, USA, July 12, 2018. CEUR Workshop Proceedings, Vol. 2319.

Model Details

  • Architecture: Multi-kernel CNN + Bidirectional LSTM with attention
  • Embedding Dimension: 512
  • Vocabulary Size: 33,961
  • Number of Classes: 2,227
  • Best Validation Accuracy: 0.7105

Features

  • Multi-kernel CNN with kernel sizes: [2, 3, 4, 5]
  • Bidirectional LSTM with soft attention mechanism
  • Ad-hoc features (title length, character statistics, etc.)
  • Word2Vec embeddings trained on product titles

Usage

#pip install product-classifier

# 1. Import your package
from product_classifier import CNNBiLSTMInference

# 2. Load a pre-trained model from HuggingFace Hub  
model = CNNBiLSTMInference.from_pretrained("FinisYazilim/product-classifier-v1")

# 3. Make predictions
product_title = "Yataş Bedding BAMBU Yorgan (%20 Bambu) 300 Gr."
prediction = model.predict(product_title, top_k=3)

print(f"Product: {product_title}")
for label, confidence in prediction:
    print(f"  → {label}: {confidence:.4f}")

##Returns
#Product: Yataş Bedding BAMBU Yorgan (%20 Bambu) 300 Gr.
#  → 505287.0: 0.9981
#  → 569.0: 0.0008
#  → 2541.0: 0.0003
#The output label corresponds to a Google Product Taxonomy category ID(505287) to Home & Garden > Linens & Bedding > Bedding > Quilts & Comforters

Training Details

  • Epochs: 20
  • Batch Size: 128
  • Learning Rate: 0.001
@inproceedings{suzuki2018cnn,
  title={Convolutional Neural Network and Bidirectional LSTM Based Taxonomy Classification Using External Dataset at SIGIR eCom Data Challenge},
  author={Suzuki, Shogo D. and Iseki, Yohei and Shiino, Hiroaki and Zhang, Hongwei and Iwamoto, Aya and Takahashi, Fumihiko},
  booktitle={Proceedings of the SIGIR 2018 Workshop on eCommerce (SIGIR eCom)},
  year={2018},
  url={https://ceur-ws.org/Vol-2319/ecom18DC_paper_1.pdf},
  note={CEUR Workshop Proceedings, Vol. 2319}
}

Downloads last month
19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support