ByteCNN-10K: Ultra-Lightweight Toxicity Detection

Ultra-compact single-layer CNN for toxicity detection, optimized for maximum speed on edge deployment.

Model Details

  • Architecture: Single-layer ByteCNN (Embedding β†’ Conv1D + BatchNorm β†’ Dense β†’ Dense)
  • Parameters: 10,009 (~40KB)
  • Input: Raw byte sequences (max 512 bytes)
  • Output: Toxicity probability (0-1)
  • Optimization: 72% parameter reduction from original 36K model

Performance

  • Validation Accuracy: 78.97%
  • Training Dataset: Full balanced dataset (222,628 samples)
  • Efficiency: 7.89% accuracy per 1K parameters (best efficiency in sweep)
  • Inference Speed: <1ms on Cloudflare Workers
  • CPU Limits: Guaranteed to stay under edge compute constraints

Architecture Configuration

  • Embedding: 256 vocab β†’ 12 dimensions
  • Conv Layer: 12 β†’ 40 filters, kernel=3
  • Dense Layer: 40 β†’ 128 hidden units
  • Output: 128 β†’ 1 (sigmoid activation)

Training Details

  • Trained on balanced dataset (50/50 toxic/safe ratio)
  • 222,628 total samples from multiple sources
  • AdamW optimizer with weight decay 0.01
  • Learning rate: 0.001 with ReduceLROnPlateau
  • 10 epochs, batch size 128

Parameter Sweep Results

Comparison across different model sizes:

Model Parameters Accuracy Efficiency (Acc/1K params) Trade-off
Original 36,257 81.20% 2.24% Baseline
10K 10,009 78.97% 7.89% 72% fewer params
15K 14,985 80.98% 5.40% 59% fewer params
20K 20,009 78.76% 3.94% 45% fewer params

The 10K model offers the best parameter efficiency with minimal accuracy loss.

Contextual Understanding

Despite its compact size, the model demonstrates sophisticated toxicity detection:

  • "fuck you" β†’ 87.28% toxic (direct personal attack)
  • "get fucked!" β†’ 17.73% safe (potentially playful/dismissive)
  • "Hello everyone!" β†’ 6.65% safe (clearly benign)

Usage

Input text is converted to UTF-8 bytes, truncated/padded to 512 bytes, then processed through the CNN layers.

Deployment

Optimized for edge deployment with:

  • BatchNorm folding for inference
  • Static weight embedding (211KB)
  • Sub-1ms inference time
  • Zero CPU timeout issues on Cloudflare Workers

Live Demo

Limitations

  • English text optimized
  • 512 byte context window
  • Binary classification only (toxic/safe)
  • 2.23% accuracy trade-off vs original for 72% size reduction

Model Card

This model represents the optimal balance of speed and quality for production edge deployment.

Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support