talharauf commited on
Commit
3801ee5
Β·
verified Β·
1 Parent(s): 6d7eafc

---
title: Hockey Mind AI Chatbot
emoji: πŸ’
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 4.8.0
app_file: app_gradio.py
pinned: false
license: mit
---

# πŸ’ Hockey Mind AI Chatbot

An intelligent field hockey chatbot that provides personalized coaching advice and video recommendations using semantic search and AI.

## Features

- **Personalized Responses**: Tailored advice for coaches, players, parents, and fans
- **Team-Specific Guidance**: Customized for your team/skill level
- **Multilingual Support**: Works in English and Dutch
- **Video Recommendations**: Semantic search through hockey video database
- **Expert Knowledge**: Specialized in field hockey techniques, drills, and strategies

## How to Use

1. Select your role (Coach, Player, Parent, Fan)
2. Enter your team name or skill level
3. Ask any hockey-related question
4. Get AI-powered advice plus relevant video recommendations!

## Example Queries

- "What are the best backhand shooting drills for young players?"
- "How can I improve my penalty corner technique?"
- "Geef me oefeningen voor backhandschoten" (Dutch)
- "What equipment does my child need to start playing hockey?"

## Technology

- **AI Model**: OpenRouter GPT for intelligent responses
- **Embeddings**: Sentence Transformers for semantic search
- **Vector Search**: FAISS for fast similarity matching
- **Database**: SQLite with hockey video content
- **Interface**: Gradio for easy web interaction

## Configuration

The app requires an `OPENROUTER_API_KEY` environment variable. On Hugging Face Spaces, this should be set as a Space secret.

---

*Built for the field hockey community with ❀️*

.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ faiss_index.index filter=lfs diff=lfs merge=lfs -text
37
+ HockeyFood.db filter=lfs diff=lfs merge=lfs -text
HockeyFood.db ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:89c53610f3d490086bd39e2783e67f48caa374061e287f634811e9282e35d293
3
+ size 15642624
OpenAPI_DB.py ADDED
@@ -0,0 +1,530 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import logging
2
+ import os
3
+ import re
4
+ import faiss
5
+ import numpy as np
6
+ from dotenv import load_dotenv
7
+ import httpx
8
+ from langdetect import detect
9
+ from deep_translator import GoogleTranslator
10
+ import sqlite3
11
+ import pickle
12
+ import json
13
+ from sentence_transformers import SentenceTransformer, util
14
+ from tenacity import retry, stop_after_attempt, wait_exponential
15
+
16
+ # Configure logging
17
+ logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(levelname)s - %(message)s")
18
+
19
+ # Load environment variables
20
+ load_dotenv()
21
+ OPENROUTER_API_KEY = os.getenv("OPENROUTER_API_KEY")
22
+ OPENROUTER_API_URL = "https://openrouter.ai/api/v1/chat/completions"
23
+ DATABASE_PATH = os.getenv("DATABASE_PATH", "HockeyFood.db")
24
+ EMBEDDINGS_PATH = "video_embeddings.npy"
25
+ METADATA_PATH = "video_metadata.json"
26
+ INDEX_PATH = "faiss_index.index"
27
+
28
+ if not OPENROUTER_API_KEY:
29
+ logging.error("OPENROUTER_API_KEY not set in .env file.")
30
+ raise RuntimeError("OPENROUTER_API_KEY not set in .env file.")
31
+ else:
32
+ masked_key = OPENROUTER_API_KEY[:6] + "..." + OPENROUTER_API_KEY[-4:]
33
+ logging.info(f"Loaded OpenRouter API key: {masked_key}")
34
+
35
+ if not os.path.exists(DATABASE_PATH):
36
+ logging.error(f"Database file not found at {DATABASE_PATH}.")
37
+ raise FileNotFoundError(f"Database file not found at {DATABASE_PATH}.")
38
+
39
+ # In-memory conversation history
40
+ conversation_histories = {}
41
+
42
+ # Lazy-loaded SentenceTransformer and FAISS index
43
+ sentence_model = None
44
+ faiss_index = None
45
+ embeddings_np = None
46
+ metadata = []
47
+
48
+ def load_resources():
49
+ global sentence_model, faiss_index, embeddings_np, metadata
50
+ if sentence_model is None:
51
+ try:
52
+ sentence_model = SentenceTransformer("paraphrase-multilingual-MiniLM-L12-v2") # Multilingual model
53
+ logging.info("Loaded SentenceTransformer model.")
54
+ except ImportError as e:
55
+ logging.error(f"Failed to load SentenceTransformer: {e}. Ensure PyTorch and transformers are installed correctly.")
56
+ raise
57
+ except Exception as e:
58
+ logging.error(f"Unexpected error loading SentenceTransformer: {e}")
59
+ raise
60
+ if faiss_index is None or embeddings_np is None or not metadata:
61
+ if not (os.path.exists(EMBEDDINGS_PATH) and os.path.exists(METADATA_PATH) and os.path.exists(INDEX_PATH)):
62
+ logging.info("Generating embeddings, metadata, and FAISS index from database...")
63
+ embeddings = []
64
+ metadata = []
65
+ conn = sqlite3.connect(DATABASE_PATH)
66
+ cursor = conn.cursor()
67
+ cursor.execute("SELECT title, url, embedding FROM YouTube_Urls")
68
+ for title, url, embedding_blob in cursor.fetchall():
69
+ if title and url and embedding_blob:
70
+ try:
71
+ embedding = pickle.loads(embedding_blob)
72
+ if isinstance(embedding, np.ndarray):
73
+ embeddings.append(embedding)
74
+ metadata.append({"title": title[:100], "url": url})
75
+ except Exception as e:
76
+ logging.debug(f"Skipping invalid embedding: {e}")
77
+ conn.close()
78
+
79
+ if embeddings:
80
+ embeddings_np = np.array(embeddings, dtype=np.float32)
81
+ dimension = embeddings_np.shape[1]
82
+ faiss_index = faiss.IndexFlatIP(dimension) # Using IP for cosine similarity
83
+ faiss.normalize_L2(embeddings_np)
84
+ faiss_index.add(embeddings_np)
85
+ np.save(EMBEDDINGS_PATH, embeddings_np)
86
+ with open(METADATA_PATH, "w") as f:
87
+ json.dump(metadata, f)
88
+ faiss.write_index(faiss_index, INDEX_PATH)
89
+ logging.info(f"Saved {len(embeddings)} embeddings to {EMBEDDINGS_PATH}, metadata to {METADATA_PATH}, and FAISS index to {INDEX_PATH}")
90
+ else:
91
+ logging.error("No valid embeddings found in database.")
92
+ raise RuntimeError("No valid embeddings found in database.")
93
+ else:
94
+ embeddings_np = np.load(EMBEDDINGS_PATH)
95
+ with open(METADATA_PATH, "r") as f:
96
+ metadata = json.load(f)
97
+ try:
98
+ faiss_index = faiss.read_index(INDEX_PATH)
99
+ except Exception as e:
100
+ logging.warning(f"Failed to load FAISS index from {INDEX_PATH}: {e}. Regenerating index...")
101
+ dimension = embeddings_np.shape[1]
102
+ faiss_index = faiss.IndexFlatIP(dimension)
103
+ faiss.normalize_L2(embeddings_np)
104
+ faiss_index.add(embeddings_np)
105
+ faiss.write_index(faiss_index, INDEX_PATH)
106
+ logging.info(f"Loaded {embeddings_np.shape[0]} embeddings of dimension {embeddings_np.shape[1]} and FAISS index")
107
+
108
+ load_resources() # Initial load
109
+
110
+ # Hockey-specific translation dictionary
111
+ hockey_translation_dict = {
112
+ "schiettips": "shooting tips",
113
+ "schieten": "shooting",
114
+ "backhand": "backhand",
115
+ "backhandschoten": "backhand shooting",
116
+ "achterhand": "backhand",
117
+ "veldhockey": "field hockey",
118
+ "strafcorner": "penalty corner",
119
+ "sleepflick": "drag flick",
120
+ "doelman": "goalkeeper",
121
+ "aanvaller": "forward",
122
+ "verdediger": "defender",
123
+ "middenvelder": "midfielder",
124
+ "stickbeheersing": "stick handling",
125
+ "balbeheersing": "ball control",
126
+ "hockeyoefeningen": "hockey drills",
127
+ "oefeningen": "drills",
128
+ "kinderen": "kids",
129
+ "verbeteren": "improve"
130
+ }
131
+
132
+ # Expanded hockey keywords for domain detection
133
+ hockey_keywords = [
134
+ "hockey", "field hockey", "veldhockey", "match", "wedstrijd", "game", "spel", "goal", "doelpunt",
135
+ "score", "scoren", "ball", "bal", "stick", "hockeystick", "field", "veld", "turf", "kunstgras",
136
+ "pitch", "speelveld", "corner", "short corner", "long corner", "korte hoek", "lange hoek",
137
+ "penalty", "strafbal", "shootout", "strookschot", "penalty stroke", "strafslag",
138
+ "coach", "trainer", "goalkeeper", "doelman", "keeper", "goalie", "defender", "verdediger",
139
+ "midfielder", "middenvelder", "forward", "aanvaller", "striker", "spits", "captain", "aanvoerder",
140
+ "player", "speler", "team", "ploeg",
141
+ "shooting", "schieten", "schiet", "backhand shooting", "backhandschoten", "passing", "passen",
142
+ "backhand", "achterhand", "forehand", "voorhand", "drag flick", "sleeppush", "push pass",
143
+ "pushpass", "hit pass", "slagpass", "aerial pass", "luchtpass", "dribbling", "dribbelen",
144
+ "stick work", "stickwerk", "deflection", "afbuiging", "scoop", "scheppen", "tackle", "tackelen",
145
+ "block tackle", "blok tackle", "jab tackle", "steektackle", "reverse stick", "omgekeerde stick",
146
+ "indian dribble", "indiase dribbel", "3d skills", "3d vaardigheden", "goalkeeping", "doelverdediging",
147
+ "save", "redding", "clearance", "uitverdediging", "flick", "slepen", "lift", "optillen",
148
+ "chip", "chippen", "sweep hit", "veegslag", "tomahawk", "backstick", "reverse hit", "omgekeerde slag",
149
+ "drag", "slepen", "dummy", "schijnbeweging", "feint", "fint", "spin", "draaien",
150
+ "training", "oefening", "exercise", "oefenen", "drill", "oefensessie", "practice", "praktijk",
151
+ "warm-up", "opwarming", "cool-down", "afkoeling", "conditioning", "conditietraining",
152
+ "fitness", "fitheid", "agility", "wendbaarheid", "speed", "snelheid", "endurance", "uithoudingsvermogen",
153
+ "strength", "kracht", "core strength", "kernkracht", "stick handling", "stickbeheersing",
154
+ "ball control", "balbeheersing", "footwork", "voetwerk", "positioning", "positionering",
155
+ "marking", "dekken", "zone defense", "zonedekking", "man-to-man", "man-op-man",
156
+ "attack drill", "aanvalsoefening", "defense drill", "verdedigingsoefening",
157
+ "passing drill", "passoefening", "shooting drill", "schietoefening", "goalkeeper drill",
158
+ "doelmanoefening", "skill development", "vaardigheidsontwikkeling", "technique", "techniek",
159
+ "strategy", "strategie", "tactic", "tactiek", "game plan", "spelplan", "formation", "opstelling",
160
+ "press", "druk zetten", "counterattack", "tegenaanval", "breakaway", "uitbraak",
161
+ "offensive play", "aanvallend spel", "defensive play", "verdedigend spel", "set piece",
162
+ "standaardsituatie", "free hit", "vrije slag", "penalty corner", "strafcorner",
163
+ "tutorial", "handleiding", "tips", "advies", "coaching", "coachen", "learn", "leren",
164
+ "education", "opleiding", "skills training", "vaardigheidstraining", "workshop", "werkplaats",
165
+ "session", "sessie", "clinic", "kliniek", "instruction", "instructie", "guide", "gids",
166
+ "shin guard", "scheenbeschermer", "mouthguard", "mondbeschermer", "gloves", "handschoenen",
167
+ "grips", "grepen", "turf shoes", "kunstschoenen", "hockey shoes", "hockeyschoenen",
168
+ "goalpost", "doelpaal", "net", "netwerk", "training cone", "trainingskegel",
169
+ "rebound board", "reboundbord", "practice net", "oefennet",
170
+ "warmup", "opwarmen", "stretching", "rekken", "injury prevention", "blessurepreventie",
171
+ "teamwork", "samenwerking", "communication", "communicatie", "leadership", "leiderschap",
172
+ "motivation", "motivatie", "mental preparation", "mentale voorbereiding", "focus", "concentratie",
173
+ "hockey camp", "hockeykamp", "tournament", "toernooi", "league", "liga", "championship",
174
+ "kampioenschap"
175
+ ]
176
+
177
+ # Out-of-domain keywords
178
+ out_of_domain_keywords = [
179
+ "politics", "politiek", "government", "regering", "election", "verkiezing", "policy", "beleid",
180
+ "football", "voetbal", "soccer", "basketball", "basketbal", "tennis", "cricket", "rugby",
181
+ "volleyball", "volleybal", "baseball", "honkbal", "golf", "swimming", "zwemmen",
182
+ "athletics", "atletiek", "cycling", "wielrennen", "boxing", "boksen", "martial arts",
183
+ "vechtsport", "gymnastics", "gymnastiek", "weather", "weer", "temperature", "temperatuur",
184
+ "forecast", "voorspelling", "rain", "regen", "snow", "sneeuw", "storm", "wind", "sun",
185
+ "zon", "cloud", "wolk", "humidity", "vochtigheid", "climate", "klimaat", "pollution",
186
+ "vervuiling", "movie", "film", "television", "televisie", "music", "muziek", "concert",
187
+ "celebrity", "beroemdheid", "news", "nieuws", "gossip", "roddel", "streaming", "streamen",
188
+ "video game", "videospel", "gaming", "gamen", "cooking", "koken", "recipe", "recept",
189
+ "fashion", "mode", "shopping", "winkelen", "travel", "reizen", "vacation", "vakantie",
190
+ "car", "auto", "finance", "financiΓ«n", "stock market", "aandelenmarkt", "business", "zaken",
191
+ "job", "baan", "education", "onderwijs",
192
+ "ice hockey", "ijshockey", "slap shot", "wrist shot"
193
+ ]
194
+
195
+ # Greetings for detection
196
+ greetings = [
197
+ "hey", "hello", "hi", "hiya", "yo", "what's up", "sup", "good morning", "good afternoon",
198
+ "good evening", "good night", "howdy", "greetings", "morning", "evening", "hallo", "hoi",
199
+ "goedemorgen", "goedemiddag", "goedenavond", "goedennacht", "hΓ©", "joe", "moi", "dag",
200
+ "goedendag", "aloha", "ciao", "salut", "hola", "heej"
201
+ ]
202
+
203
+ # Common Dutch question starters (not greetings)
204
+ dutch_question_starters = [
205
+ "geef me", "kun je", "kunt u", "hoe kan", "wat is", "waarom", "welke", "hoe moet", "wat zijn"
206
+ ]
207
+
208
+ # Refusal detection keywords
209
+ refusal_keywords = [
210
+ "i can't help", "cannot assist", "not available", "cannot provide", "inappropriate",
211
+ "refuse", "not allowed", "no access", "ai cannot respond", "ask something else",
212
+ "outside my domain", "beyond my capabilities", "not permitted", "sorry, i can't",
213
+ "unable to answer", "restricted from", "not within my scope", "as an ai language model",
214
+ "i am not able to", "prohibited", "off-topic", "irrelevant", "not my expertise",
215
+ "try a different question", "change the topic", "out of bounds", "not supported",
216
+ "i don't have that information", "no data available", "not equipped to handle"
217
+ ]
218
+
219
+ # Semantic detection setup with preloaded embeddings
220
+ refusal_embedding = sentence_model.encode(
221
+ "Sorry, I can only assist with questions about field hockey, such as training, drills, strategies, rules, and tutorials. Please ask a field hockey-related question!",
222
+ convert_to_tensor=True
223
+ )
224
+ hockey_reference_embedding = sentence_model.encode(
225
+ "Questions about field hockey training, drills, strategies, rules, techniques, or tutorials, including shooting, passing, dribbling, and goalkeeping.",
226
+ convert_to_tensor=True
227
+ )
228
+ hockey_technique_embedding = sentence_model.encode(
229
+ "Field hockey skills such as backhand shooting, forehand passing, drag flick, push pass, aerial pass, dribbling, tackling, and goalkeeping techniques.",
230
+ convert_to_tensor=True
231
+ )
232
+ hockey_context_embedding = sentence_model.encode(
233
+ "Field hockey gameplay, team strategies, player positions, penalty corners, free hits, and match preparation.",
234
+ convert_to_tensor=True
235
+ )
236
+ out_of_domain_embedding = sentence_model.encode(
237
+ "Questions about politics, other sports like football, ice hockey, tennis, weather, movies, music, cooking, or unrelated general topics.",
238
+ convert_to_tensor=True
239
+ )
240
+
241
+ def is_refusal(text: str) -> bool:
242
+ if not text or not isinstance(text, str):
243
+ logging.debug("Empty or invalid text for refusal check.")
244
+ return False
245
+ text_lower = text.lower()
246
+ return any(kw in text_lower for kw in refusal_keywords)
247
+
248
+ def is_semantic_refusal(text: str) -> bool:
249
+ if not text or not isinstance(text, str):
250
+ logging.debug("Empty or invalid text for semantic refusal check.")
251
+ return False
252
+ embedding = sentence_model.encode(text, convert_to_tensor=True)
253
+ similarity = util.cos_sim(embedding, refusal_embedding).item()
254
+ logging.debug(f"Semantic refusal similarity: {similarity:.3f}")
255
+ return similarity > 0.7
256
+
257
+ def preprocess_prompt(prompt: str, user_lang: str) -> tuple[str, str]:
258
+ """
259
+ Preprocess prompt and return both translated (English) and original prompt.
260
+ """
261
+ if not prompt or not isinstance(prompt, str):
262
+ return prompt, prompt
263
+ prompt_lower = prompt.lower().strip()
264
+ if user_lang == "nl":
265
+ # Apply hockey-specific translations
266
+ for dutch_term, english_term in hockey_translation_dict.items():
267
+ prompt_lower = re.sub(rf'\b{re.escape(dutch_term)}\b', english_term, prompt_lower)
268
+ try:
269
+ translated = GoogleTranslator(source="nl", target="en").translate(prompt_lower)
270
+ logging.debug(f"Translated Dutch prompt '{prompt_lower}' to English: '{translated}'")
271
+ return translated if translated else prompt_lower, prompt
272
+ except Exception as e:
273
+ logging.error(f"Translation error for prompt '{prompt_lower}': {str(e)}")
274
+ return prompt_lower, prompt
275
+ return prompt_lower, prompt
276
+
277
+ def is_in_domain(prompt: str) -> bool:
278
+ if not prompt or not isinstance(prompt, str):
279
+ logging.debug("Prompt is empty or not a string.")
280
+ return False
281
+ prompt_lower = prompt.lower().strip()
282
+
283
+ has_hockey_keywords = any(
284
+ re.search(rf'\b{re.escape(word)}\b|\b{re.escape(word[:-1])}\w*\b', prompt_lower)
285
+ for word in hockey_keywords
286
+ )
287
+ has_out_of_domain_keywords = any(word in prompt_lower for word in out_of_domain_keywords)
288
+
289
+ prompt_embedding = sentence_model.encode(prompt_lower, convert_to_tensor=True)
290
+ hockey_primary_similarity = util.cos_sim(prompt_embedding, hockey_reference_embedding).item()
291
+ hockey_technique_similarity = util.cos_sim(prompt_embedding, hockey_technique_embedding).item()
292
+ hockey_context_similarity = util.cos_sim(prompt_embedding, hockey_context_embedding).item()
293
+
294
+ logging.debug(f"Domain check: has_hockey_keywords={has_hockey_keywords}, "
295
+ f"has_out_of_domain_keywords={has_out_of_domain_keywords}, "
296
+ f"primary_sim={hockey_primary_similarity:.3f}, "
297
+ f"technique_sim={hockey_technique_similarity:.3f}, "
298
+ f"context_sim={hockey_context_similarity:.3f}")
299
+
300
+ if has_out_of_domain_keywords:
301
+ logging.info("Prompt contains out-of-domain keywords, marked as out of domain.")
302
+ return False
303
+
304
+ return (has_hockey_keywords or
305
+ hockey_primary_similarity > 0.3 or
306
+ hockey_technique_similarity > 0.3 or
307
+ hockey_context_similarity > 0.3)
308
+
309
+ def is_greeting_or_vague(prompt: str, user_lang: str) -> bool:
310
+ if not prompt or not isinstance(prompt, str):
311
+ logging.debug("Prompt is empty or not a string.")
312
+ return True
313
+ prompt_lower = prompt.lower().strip()
314
+ is_greeting = any(greeting in prompt_lower for greeting in greetings)
315
+ is_question_starter = any(starter in prompt_lower for starter in dutch_question_starters) if user_lang == "nl" else False
316
+ has_hockey_keywords = any(
317
+ re.search(rf'\b{re.escape(word)}\b|\b{re.escape(word[:-1])}\w*\b', prompt_lower)
318
+ for word in hockey_keywords
319
+ )
320
+
321
+ logging.debug(f"Vague check (lang={user_lang}): is_greeting={is_greeting}, "
322
+ f"is_question_starter={is_question_starter}, has_hockey_keywords={has_hockey_keywords}")
323
+
324
+ return is_greeting and not (is_question_starter or has_hockey_keywords)
325
+
326
+ @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
327
+ def search_youtube_urls_db(english_query: str, dutch_query: str) -> list:
328
+ if not is_in_domain(english_query):
329
+ logging.info("Query is out of domain, skipping database search.")
330
+ return []
331
+
332
+ try:
333
+ # Encode both English and Dutch queries
334
+ english_embedding = sentence_model.encode(english_query, convert_to_tensor=False)
335
+ english_embedding = np.array(english_embedding).astype("float32").reshape(1, -1)
336
+ faiss.normalize_L2(english_embedding)
337
+
338
+ dutch_embedding = sentence_model.encode(dutch_query, convert_to_tensor=False) if dutch_query else english_embedding
339
+ dutch_embedding = np.array(dutch_embedding).astype("float32").reshape(1, -1)
340
+ faiss.normalize_L2(dutch_embedding)
341
+
342
+ # Search with both embeddings, limited to top 5
343
+ distances_en, indices_en = faiss_index.search(english_embedding, 5)
344
+ distances_nl, indices_nl = faiss_index.search(dutch_embedding, 5) if dutch_query else (distances_en, indices_en)
345
+
346
+ results = []
347
+ seen_urls = set()
348
+ field_hockey_terms = ["field hockey", "veldhockey"]
349
+ ice_hockey_terms = ["ice hockey", "ijshockey", "slap shot", "wrist shot"]
350
+
351
+ # Combine results from both searches
352
+ for indices, distances in [(indices_en, distances_en), (indices_nl, distances_nl)]:
353
+ for idx, sim in zip(indices[0], distances[0]):
354
+ if idx < len(metadata) and sim > 0.3: # Include results above threshold
355
+ title = metadata[idx]["title"].lower()
356
+ url = metadata[idx]["url"]
357
+ logging.debug(f"FAISS match: title='{metadata[idx]['title']}', similarity={sim:.3f}")
358
+ if (any(term in title for term in field_hockey_terms) or
359
+ not any(term in title for term in ice_hockey_terms)) and url not in seen_urls:
360
+ results.append({
361
+ "title": metadata[idx]["title"], # Keep original title
362
+ "url": url,
363
+ "similarity": float(sim)
364
+ })
365
+ seen_urls.add(url)
366
+
367
+ # Return only the top 5 results by similarity (already limited by search)
368
+ logging.info(f"FAISS search completed with {len(results)} results.")
369
+ return results
370
+ except Exception as e:
371
+ logging.error(f"FAISS search error: {e}")
372
+ return []
373
+
374
+ def get_conversation_history(user_role: str, user_team: str) -> str:
375
+ session_key = f"{user_role}|{user_team}"
376
+ history = conversation_histories.get(session_key, [])
377
+ formatted_history = "\n".join([f"Gebruiker: {q}\nCoach: {a}" for q, a in history[-3:]])
378
+ logging.debug(f"Conversation history for {session_key}: {formatted_history}")
379
+ return formatted_history
380
+
381
+ def update_conversation_history(user_role: str, user_team: str, question: str, answer: str):
382
+ session_key = f"{user_role}|{user_team}"
383
+ history = conversation_histories.get(session_key, [])
384
+ history.append((question, answer))
385
+ conversation_histories[session_key] = history[-3:]
386
+ logging.debug(f"Updated conversation history for {session_key} with question: {question}")
387
+
388
+ def get_relevant_context(question: str) -> str:
389
+ sample_context = [
390
+ {"question": "What are good drills for improving stick handling?",
391
+ "answer": "Try cone dribbling and figure-eight patterns to enhance stick control."},
392
+ {"question": "Hoe train je voor strafcorners?",
393
+ "answer": "Oefen sleepflicks en ingestudeerde spelsituaties met focus op timing en precisie."},
394
+ {"question": "What are good drills for improving backhand shooting?",
395
+ "answer": "Use cone shooting drills and practice wrist flicks for power and accuracy."},
396
+ {"question": "Geef me oefeningen voor backhandschoten voor kinderen",
397
+ "answer": "Gebruik kegeloefeningen en laat kinderen polsbewegingen oefenen voor kracht en precisie."}
398
+ ]
399
+ question_lower = question.lower() if isinstance(question, str) else ""
400
+ relevant = [
401
+ f"Vraag: {entry['question']}\nAntwoord: {entry['answer']}"
402
+ for entry in sample_context
403
+ if any(kw in question_lower for kw in hockey_keywords) and
404
+ any(kw in entry['question'].lower() for kw in hockey_keywords)
405
+ ]
406
+ context = "\n\n".join(relevant[:2])
407
+ logging.debug(f"Relevant context for question '{question}': {context}")
408
+ return context
409
+
410
+ def translate_text(text: str, source_lang: str, target_lang: str) -> str:
411
+ if not text or not isinstance(text, str):
412
+ logging.debug("Empty or invalid text for translation, returning empty string.")
413
+ return ""
414
+ if source_lang == target_lang:
415
+ return text
416
+ try:
417
+ translated = GoogleTranslator(source=source_lang, target=target_lang).translate(text)
418
+ logging.debug(f"Translated text from {source_lang} to {target_lang}: {translated}")
419
+ return translated
420
+ except Exception as e:
421
+ logging.error(f"Translation error: {str(e)}")
422
+ return text
423
+
424
+ @retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, min=4, max=10))
425
+ async def agentic_hockey_chat(user_active_role: str, user_team: str, user_prompt: str) -> dict:
426
+ logging.info(f"Processing question: {user_prompt}, role: {user_active_role}, team: {user_team}")
427
+
428
+ # Sanitize user prompt
429
+ if not user_prompt or not isinstance(user_prompt, str):
430
+ logging.error("Invalid or empty user_prompt.")
431
+ return {"ai_response": "Vraag mag niet leeg zijn.", "recommended_content_details": []}
432
+ user_prompt = re.sub(r'\s+', ' ', user_prompt.strip())
433
+
434
+ try:
435
+ user_lang = detect(user_prompt)
436
+ if user_lang not in ["en", "nl"]:
437
+ logging.info(f"Detected language {user_lang} not supported, defaulting to English.")
438
+ user_lang = "en"
439
+ except Exception:
440
+ user_lang = "en"
441
+ logging.debug("Language detection failed, defaulting to English.")
442
+
443
+ # Get both translated and original prompts
444
+ processing_prompt, original_prompt = preprocess_prompt(user_prompt, user_lang)
445
+ logging.info(f"Processing prompt after translation: {processing_prompt}")
446
+
447
+ if is_greeting_or_vague(user_prompt, user_lang):
448
+ answer = "Hallo! Waarmee kan ik je helpen met betrekking tot hockey, training of andere onderwerpen?" if user_lang == "nl" else "Hello! How can I assist you with hockey, training, or other topics?"
449
+ update_conversation_history(user_active_role, user_team, user_prompt, answer)
450
+ return {"ai_response": answer, "recommended_content_details": []}
451
+
452
+ if not is_in_domain(processing_prompt):
453
+ answer = "Sorry, ik kan alleen helpen met vragen over hockey, zoals training, oefeningen, strategieΓ«n, regels en tutorials. Stel me een hockeygerelateerde vraag!" if user_lang == "nl" else "Sorry, I can only assist with questions about hockey, such as training, drills, strategies, rules, and tutorials. Please ask a hockey-related question!"
454
+ update_conversation_history(user_active_role, user_team, user_prompt, answer)
455
+ return {"ai_response": answer, "recommended_content_details": []}
456
+
457
+ history = get_conversation_history(user_active_role, user_team)
458
+ context = get_relevant_context(processing_prompt)
459
+
460
+ system_prompt = (
461
+ "You are an AI Assistant Bot specialized in all things field hockey, including training, drills, strategies, rules, and more. "
462
+ "You communicate with a {user_active_role} from the team {user_team}. "
463
+ "Provide concise, practical, and specific answers tailored to the user's role and team, especially for youth teams like U8C. "
464
+ "Focus on field hockey-related topics such as training, drills, strategies, rules, and tutorials. "
465
+ "Ensure the response is semantically accurate and relevant to the question.\n\n"
466
+ "Recent conversation:\n{history}\n\n"
467
+ "Relevant previous conversations:\n{context}\n\n"
468
+ "Answer the following question in English based on the provided context and your expertise:\n{user_prompt}"
469
+ )
470
+
471
+ hockey_prompt_template = system_prompt.format(
472
+ user_active_role=user_active_role,
473
+ user_team=user_team,
474
+ history=history or "No previous conversations.",
475
+ context=context or "No relevant context available.",
476
+ user_prompt=processing_prompt
477
+ )
478
+
479
+ payload = {
480
+ "model": "openai/gpt-4o",
481
+ "messages": [
482
+ {"role": "system", "content": hockey_prompt_template}
483
+ ],
484
+ "max_tokens": 200,
485
+ "temperature": 0.3,
486
+ "top_p": 0.9
487
+ }
488
+
489
+ headers = {
490
+ "Authorization": f"Bearer {OPENROUTER_API_KEY}",
491
+ "Content-Type": "application/json"
492
+ }
493
+
494
+ try:
495
+ logging.info("Making OpenRouter API call...")
496
+ async with httpx.AsyncClient(timeout=30) as client:
497
+ response = await client.post(OPENROUTER_API_URL, json=payload, headers=headers)
498
+ response.raise_for_status()
499
+ data = response.json()
500
+ logging.debug(f"Raw API response: {data}")
501
+
502
+ answer = data.get("choices", [{}])[0].get("message", {}).get("content", "").strip()
503
+
504
+ if not answer:
505
+ logging.error("No answer received from OpenRouter API.")
506
+ return {"ai_response": "No answer received from the API.", "recommended_content_details": []}
507
+
508
+ answer = re.sub(r'https?://\S+', '', answer).strip()
509
+ answer = translate_text(answer, "en", user_lang)
510
+
511
+ logging.info("Performing FAISS search...")
512
+ recommended_content = search_youtube_urls_db(processing_prompt, original_prompt if user_lang == "nl" else "")
513
+ logging.info(f"FAISS search completed with {len(recommended_content)} results.")
514
+
515
+ if is_refusal(answer) or is_semantic_refusal(answer):
516
+ logging.warning(f"Response flagged as refusal: {answer}")
517
+ answer = "Sorry, ik kan alleen helpen met vragen over hockey, zoals training, oefeningen, strategieΓ«n, regels en tutorials. Stel me een hockeygerelateerde vraag!" if user_lang == "nl" else "Sorry, I can only assist with questions about hockey, such as training, drills, strategies, rules, and tutorials. Please ask a hockey-related question!"
518
+ recommended_content = []
519
+
520
+ filtered_recommended_content = [{"title": item["title"], "url": item["url"]} for item in recommended_content]
521
+
522
+ update_conversation_history(user_active_role, user_team, user_prompt, answer)
523
+ return {"ai_response": answer, "recommended_content_details": filtered_recommended_content}
524
+
525
+ except httpx.HTTPStatusError as e:
526
+ logging.error(f"OpenRouter API error: Status {e.response.status_code}, Response: {e.response.text}")
527
+ return {"ai_response": f"API error: {e.response.text}", "recommended_content_details": []}
528
+ except Exception as e:
529
+ logging.error(f"Internal error: {str(e)}")
530
+ return {"ai_response": f"Internal error: {str(e)}", "recommended_content_details": []}
app_gradio.py ADDED
@@ -0,0 +1,168 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Hockey Mind AI Chatbot - Gradio Interface for Hugging Face Spaces
4
+ """
5
+ import gradio as gr
6
+ import asyncio
7
+ import os
8
+ from dotenv import load_dotenv
9
+ from OpenAPI_DB import agentic_hockey_chat
10
+
11
+ # Load environment variables
12
+ load_dotenv()
13
+
14
+ # Global variable to track if resources are loaded
15
+ resources_loaded = False
16
+
17
+ async def chat_interface(user_role, user_team, user_prompt):
18
+ """Interface function for Gradio"""
19
+ global resources_loaded
20
+
21
+ try:
22
+ # Load resources on first use to save memory
23
+ if not resources_loaded:
24
+ from OpenAPI_DB import load_resources
25
+ load_resources()
26
+ resources_loaded = True
27
+
28
+ # Call the main chat function
29
+ result = await agentic_hockey_chat(user_role, user_team, user_prompt)
30
+
31
+ # Format response for Gradio
32
+ ai_response = result.get('ai_response', 'Sorry, no response generated.')
33
+ recommendations = result.get('recommended_content_details', [])
34
+
35
+ # Format recommendations as HTML
36
+ rec_html = ""
37
+ if recommendations:
38
+ rec_html = "<h3>πŸ’ Recommended Videos:</h3><ul>"
39
+ for i, rec in enumerate(recommendations[:5], 1):
40
+ title = rec.get('title', 'No title')
41
+ url = rec.get('url', '#')
42
+ similarity = rec.get('similarity', 0)
43
+ rec_html += f"<li><a href='{url}' target='_blank'>{title}</a> (Similarity: {similarity:.3f})</li>"
44
+ rec_html += "</ul>"
45
+
46
+ return ai_response, rec_html
47
+
48
+ except Exception as e:
49
+ return f"Error: {str(e)}", "No recommendations available due to error."
50
+
51
+ def sync_chat_interface(user_role, user_team, user_prompt):
52
+ """Synchronous wrapper for Gradio"""
53
+ return asyncio.run(chat_interface(user_role, user_team, user_prompt))
54
+
55
+ # Gradio Interface
56
+ with gr.Blocks(
57
+ title="πŸ’ Hockey Mind AI Chatbot",
58
+ theme=gr.themes.Soft(),
59
+ css="""
60
+ .gradio-container {max-width: 800px !important; margin: auto !important;}
61
+ .main-header {text-align: center; margin-bottom: 2rem;}
62
+ """
63
+ ) as demo:
64
+
65
+ gr.HTML("""
66
+ <div class="main-header">
67
+ <h1>πŸ’ Hockey Mind AI Chatbot</h1>
68
+ <p>Get personalized hockey advice and video recommendations!</p>
69
+ <p><i>Optimized for field hockey coaching, training, and player development</i></p>
70
+ </div>
71
+ """)
72
+
73
+ with gr.Row():
74
+ with gr.Column():
75
+ user_role = gr.Dropdown(
76
+ choices=["Player", "Coach", "Parent", "Fan", "le Coach", "Speler", "Ouder"],
77
+ label="Your Role πŸ‘€",
78
+ value="Coach",
79
+ info="Select your role in hockey"
80
+ )
81
+
82
+ user_team = gr.Textbox(
83
+ label="Team/Level πŸ’",
84
+ placeholder="e.g., U8C, Toronto Maple Leafs, Beginner",
85
+ value="U10",
86
+ info="Your team name or skill level"
87
+ )
88
+
89
+ user_prompt = gr.Textbox(
90
+ label="Your Question ❓",
91
+ placeholder="Ask about drills, techniques, strategies, rules...",
92
+ lines=3,
93
+ info="Ask in English or Dutch!"
94
+ )
95
+
96
+ submit_btn = gr.Button("Get Hockey Advice πŸš€", variant="primary", size="lg")
97
+
98
+ with gr.Row():
99
+ ai_response = gr.Textbox(
100
+ label="πŸ€– AI Response",
101
+ lines=8,
102
+ interactive=False,
103
+ info="Personalized hockey advice based on your role and team"
104
+ )
105
+
106
+ with gr.Row():
107
+ recommendations = gr.HTML(
108
+ label="πŸ“Ί Video Recommendations",
109
+ info="Relevant hockey videos from our database"
110
+ )
111
+
112
+ # Examples section
113
+ gr.HTML("<br><h3>πŸ’‘ Example Questions:</h3>")
114
+
115
+ examples = gr.Examples(
116
+ examples=[
117
+ ["Coach", "U8C", "What are the best backhand shooting drills for young players?"],
118
+ ["Player", "Intermediate", "How can I improve my penalty corner technique?"],
119
+ ["le Coach", "U10", "Geef me oefeningen voor backhandschoten"],
120
+ ["Parent", "Beginner", "What equipment does my child need to start playing hockey?"],
121
+ ["Coach", "Advanced", "What are effective small-sided games for skill development?"],
122
+ ],
123
+ inputs=[user_role, user_team, user_prompt],
124
+ outputs=[ai_response, recommendations],
125
+ fn=sync_chat_interface,
126
+ )
127
+
128
+ # Event handler
129
+ submit_btn.click(
130
+ fn=sync_chat_interface,
131
+ inputs=[user_role, user_team, user_prompt],
132
+ outputs=[ai_response, recommendations],
133
+ api_name="chat"
134
+ )
135
+
136
+ user_prompt.submit(
137
+ fn=sync_chat_interface,
138
+ inputs=[user_role, user_team, user_prompt],
139
+ outputs=[ai_response, recommendations]
140
+ )
141
+
142
+ # Footer
143
+ gr.HTML("""
144
+ <br>
145
+ <div style="text-align: center; color: #666; font-size: 0.9em;">
146
+ <p>πŸ’ Hockey Mind AI - Powered by OpenRouter & Sentence Transformers</p>
147
+ <p>Supports English & Dutch | Built for field hockey community</p>
148
+ </div>
149
+ """)
150
+
151
+ # Launch configuration for Hugging Face Spaces
152
+ if __name__ == "__main__":
153
+ # Check if running on Hugging Face Spaces
154
+ if os.getenv("SPACE_ID"):
155
+ # Production mode on HF Spaces
156
+ demo.launch(
157
+ server_name="0.0.0.0",
158
+ server_port=7860,
159
+ share=False,
160
+ show_error=True,
161
+ quiet=False
162
+ )
163
+ else:
164
+ # Local development mode
165
+ demo.launch(
166
+ share=True,
167
+ show_error=True
168
+ )
faiss_index.index ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:79f0d624450e9f0727c2d37920afe2f70f5e39e964525d04e272b56823dd4fc1
3
+ size 10618413
requirements-hf.txt ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi==0.104.1
2
+ uvicorn==0.24.0
3
+ python-dotenv==1.0.0
4
+ pydantic==2.5.0
5
+ requests==2.31.0
6
+ httpx==0.25.2
7
+ langdetect==1.0.9
8
+ deep-translator==1.11.4
9
+ beautifulsoup4==4.12.2
10
+ sentence-transformers==2.2.2
11
+ faiss-cpu==1.7.4
12
+ numpy==1.24.3
13
+ tenacity==8.2.3
14
+ psutil==5.9.0
15
+ gradio==4.8.0
video_embeddings.npy ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:70d0cf70da0c130f54304ed8a998f5c1d32a2cca43b40133834946708fb8ba3c
3
+ size 10618496
video_metadata.json ADDED
The diff for this file is too large to render. See raw diff