vindruid commited on
Commit
5f8a43b
ยท
unverified ยท
1 Parent(s): f2acbb8
Files changed (4) hide show
  1. README.md +97 -6
  2. app.py +762 -0
  3. assets/langgraph_flow.png +0 -0
  4. requirements.txt +18 -0
README.md CHANGED
@@ -1,14 +1,105 @@
1
  ---
2
- title: AI Chat To Visual
3
- emoji: ๐ŸŒ
4
  colorFrom: yellow
5
- colorTo: indigo
6
  sdk: gradio
7
- sdk_version: 5.33.1
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
- short_description: Ai chart generator by chatting with your data
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Pvt Terloka
3
+ emoji: ๐Ÿ’ฌ
4
  colorFrom: yellow
5
+ colorTo: purple
6
  sdk: gradio
7
+ sdk_version: 5.0.1
8
  app_file: app.py
9
  pinned: false
10
  license: apache-2.0
11
+ tags:
12
+ - agent-demo-track
13
  ---
14
 
15
+ # ๐Ÿ’ฌ Terloka Data Insight Tool
16
+
17
+ **Terloka Data Insight Tool** is an interactive, AI-powered data exploration and visualization tool for analytics.
18
+ Built with **Gradio**, **LangGraph**, **Gemini Pro (Google Generative AI)**, and **Altair**, it enables users to:
19
+
20
+ - ๐Ÿ“ Upload datasets
21
+ - ๐Ÿง  Converse with an intelligent LLM assistant
22
+ - ๐Ÿ“Š Automatically generate meaningful charts and visual insights
23
+ - ๐Ÿ’ฌ Get explanations without writing code
24
+
25
+ ---
26
+
27
+ ## ๐ŸŽฏ Project Goals
28
+
29
+ - Empower business users, analysts, and domain experts to explore data **using natural language**, not code.
30
+ - Lower the barrier to insight generation by integrating **LLM-driven interfaces** with **automated visualization tools**.
31
+ - Create a flexible foundation for conversational analytics across verticals (e.g., travel, e-commerce, finance).
32
+
33
+ ---
34
+
35
+ ## โš™๏ธ Capabilities
36
+
37
+ 1. **๐Ÿ“ File Upload**
38
+ Supports `.csv`, `.xls`, and `.xlsx` formats via the Gradio UI.
39
+
40
+ 2. **๐Ÿค– Conversational Chatbot**
41
+ Interact with a Gemini-powered LLM to analyze and visualize your data through natural language.
42
+
43
+ 3. **๐Ÿ“ˆ Auto Visualization**
44
+ Automatically generates Altair plots based on your questions or commands.
45
+
46
+ 4. **๐Ÿงพ Schema & Summary View**
47
+ View data schema, column types, null value breakdowns, and duplicates.
48
+
49
+ 5. **๐Ÿ“Š Insight Generation**
50
+ Each chart comes with a smart LLM-generated textual analysis based on the data.
51
+
52
+ ---
53
+
54
+ ## ๐Ÿ“ฆ Project Scope
55
+
56
+ ### โœ… In-Scope
57
+ - Text-based interaction with the LLM.
58
+ - Plot generation using Altair.
59
+ - Data upload via the UI.
60
+ - Simple exploratory analysis: aggregations, groupings, comparisons.
61
+ - Multi-turn conversations with short-term memory.
62
+
63
+ ### ๐Ÿšซ Out-of-Scope (currently)
64
+ - Multi-file joins or SQL querying.
65
+ - Persistent storage or dashboarding.
66
+ - Real-time data processing or streaming.
67
+ - Access control or authentication mechanisms.
68
+
69
+ ---
70
+
71
+ ## ๐Ÿงฑ Technical Requirements
72
+
73
+ | Requirement | Description |
74
+ |--------------------|-------------|
75
+ | **Python** | Recommended 3.10+ |
76
+ | **Libraries** | `gradio`, `pandas`, `altair`, `langchain`, `langgraph`, `google-generativeai` |
77
+ | **Visualization** | Altair (for fast and declarative charting) |
78
+ | **LLM API** | Google Gemini Pro (via `langchain_google_genai`) |
79
+ | **Workflow Engine**| LangGraph (manages multi-step LLM workflows) |
80
+
81
+ ## ๐Ÿ”ƒ Logic Flowchart
82
+ ![Flowchart of the system](assets/langgraph_flow.png)
83
+
84
+ ---
85
+
86
+ ## โœจ Known Limitations
87
+ 1. Only works with single flat tables (no joins).
88
+ 2. Memory is ephemeral โ€” uploaded data not persisted across sessions.
89
+ 3. Chart library is Altair only โ€” limited interactivity compared to Plotly.
90
+
91
+ ---
92
+
93
+ ## ๐ŸŽ‰Future Improvements
94
+ 1. Add multi-file support and relational reasoning.
95
+ 2. Enable drag-and-drop dashboard building.
96
+ 3. Switch between Altair and Plotly visual modes.
97
+ 4. Implement authentication and user-level file storage.
98
+ 5. Integrate OpenAI Assistants or Claude for broader model compatibility.
99
+
100
+ ---
101
+
102
+ ๐Ÿ™Œ Credits
103
+ Developed by Terloka Bros โ€” building intelligent tools to empower data storytelling.
104
+
105
+ An example chatbot using [Gradio](https://gradio.app), [`huggingface_hub`](https://huggingface.co/docs/huggingface_hub/v0.22.2/en/index), and the [Hugging Face Inference API](https://huggingface.co/docs/api-inference/index).
app.py ADDED
@@ -0,0 +1,762 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import gradio as gr
3
+ import time
4
+ import uuid
5
+ from typing import List, TypedDict, Annotated, Optional
6
+ from gradio.themes.base import Base
7
+ import pandas as pd
8
+ import altair as alt
9
+
10
+ from langchain_google_genai import ChatGoogleGenerativeAI
11
+ from langchain_core.messages import AnyMessage, SystemMessage, HumanMessage, AIMessage, ToolMessage
12
+ from langchain_core.tools import tool
13
+ from langgraph.checkpoint.memory import InMemorySaver
14
+
15
+ from langgraph.graph.message import add_messages
16
+ from langgraph.graph import START, END, StateGraph
17
+
18
+ # Global df for sharing between functions
19
+ df = pd.DataFrame()
20
+
21
+ # --- Tool ---
22
+ @tool
23
+ def describe_schema() -> str:
24
+ """
25
+ Describe the dataframe schema so you will have context how to processing it.
26
+ Do this before generating any plot if you are not sure about the columns,
27
+ can skip this if you already know about the columns and data types.
28
+ By knowing the schema, you can better understand how to instruct the plot creation.
29
+ """
30
+ return str(df.dtypes)
31
+
32
+ @tool
33
+ def generate_plot_code(plot_instruction: str) -> dict:
34
+ """
35
+ Given a plot_instruction not the direct Python code,
36
+ generate Python code that:
37
+ 1. Performs aggregation/transformation on `df` (store in `df_agg`)
38
+ 2. Generates a Altair plot from `df_agg` (store in `fig`)
39
+
40
+ Args:
41
+ plot_instruction (str): A description of the plot to generate, e.g. "Bar chart of total revenue by region".
42
+
43
+ Returns:
44
+ dict: A dictionary containing:
45
+ - `plot_instruction`: The original plot instruction.
46
+ - `code`: The generated Python code as a string.
47
+ - `chart`: The Altair chart object.
48
+ - `df_agg`: The aggregated DataFrame used for the plot.
49
+ """
50
+
51
+ promt_generate_plot_code = """
52
+ You are a Python assistant. A pandas DataFrame `df` is available.
53
+
54
+ Your task:
55
+ 1. Perform any necessary data processing or aggregation based on this request: "{plot_instruction}"
56
+ - Store the final df_agg in a variable called `df_agg`.
57
+ - When grouping data, always use `.reset_index()` after aggregation so the group keys remain columns in the df_agg.
58
+ 2. Create a Altair plot from `df_agg`
59
+ - Only use the Altair library.
60
+ - Assign the chart to a variable named `chart`.
61
+ - Do NOT include explanations, comments, or markdown (like ```python).
62
+ - Use the existing DataFrame `df` directly.
63
+ - Just return executable Python code.
64
+
65
+ Rules:
66
+ - Do NOT create fake/sample data.
67
+ - Use only the real `df`.
68
+ - must create variable `df_agg` for the aggregated DataFrame.
69
+ - must create variable `chart` for the Altair chart.
70
+ - always show title and tooltip in the chart.
71
+ - No print statements or explanation โ€” just code.
72
+ - Be flexible interpreting column names:
73
+ - If the plot_instruction uses a partial or common term (e.g. "customer"), find the best matching column(s) in schema (like "customer_name").
74
+ - Normalize and expand synonyms or abbreviations to match columns.
75
+ - If multiple columns match, pick the most relevant one.
76
+
77
+ Example result:
78
+ import altair as alt
79
+ df_agg = df.groupby('region')['sales'].sum().reset_index().sort_values('sales', ascending=False)
80
+ chart = alt.Chart(df_agg).mark_bar().encode(
81
+ x='region:N',
82
+ y='sales:Q',
83
+ color=alt.Color('region:N', scale=alt.Scale(scheme='tableau10')),
84
+ tooltip=['region', 'sales']
85
+ ).properties(
86
+ title='Top Sales per Region'
87
+ ).transform_calculate(
88
+ text='datum.sales'
89
+ ).mark_bar(
90
+ cornerRadiusTopLeft=3, cornerRadiusTopRight=3
91
+ )
92
+ """
93
+
94
+ promt_generate_plot_code = promt_generate_plot_code.format(plot_instruction=plot_instruction)
95
+
96
+ try:
97
+ response = llm_plot.invoke([HumanMessage(content=promt_generate_plot_code)])
98
+ code = response.content.strip()
99
+
100
+ # Remove markdown fences if present
101
+ if code.startswith("```"):
102
+ lines = code.split("\n")
103
+ if lines[0].startswith("```"):
104
+ lines = lines[1:]
105
+ if lines[-1].startswith("```"):
106
+ lines = lines[:-1]
107
+ code = "\n".join(lines).strip()
108
+
109
+ interpretation = assistant_analysis(code,plot_instruction)
110
+ return {
111
+ "plot_instruction": plot_instruction,
112
+ "code": code,
113
+ "interpretation" : interpretation,
114
+ }
115
+ except Exception as e:
116
+ raise RuntimeError(f"Failed to generate plot: {e}")
117
+
118
+ @tool
119
+ def enhance_plot_code(previous_code: str, plot_instruction: str) -> dict:
120
+ """
121
+ Given a previous code and plot_instruction not the direct Python code,
122
+ enhance Python code for graph that:
123
+ 1. Performs aggregation/transformation on `df` (store in `df_agg`)
124
+ 2. Generates a Altair plot from `df_agg` (store in `fig`)
125
+ 3. Enhances the previous code based on the new plot_instruction
126
+
127
+ Args:
128
+ plot_instruction (str): A description of the plot to generate, e.g. "Bar chart of total revenue by region".
129
+
130
+ Returns:
131
+ dict: A dictionary containing:
132
+ - `plot_instruction`: The original plot instruction.
133
+ - `code`: The generated Python code as a string.
134
+
135
+ By running this tool, you are assume already show the plot to user, so do not say you cannot display the plot.
136
+
137
+ """
138
+
139
+ prompt_enhance_plot_code = """
140
+ You are a Python assistant. A pandas DataFrame `df` is available.
141
+
142
+ You know the previous code that already generated a plot,
143
+ "{previous_code}"
144
+
145
+ Your task:
146
+ Enhance previous code based on this request:
147
+ "{plot_instruction}"
148
+
149
+ Rules:
150
+ - Do NOT create fake/sample data.
151
+ - Use only the real `df`.
152
+ - must create variable `df_agg` for the aggregated DataFrame.
153
+ - must create variable `chart` for the Altair chart.
154
+ - always show title and tooltip in the chart.
155
+ - No print statements or explanation โ€” just code.
156
+ - Be flexible interpreting column names:
157
+ - If the plot_instruction uses a partial or common term (e.g. "customer"), find the best matching column(s) in schema (like "customer_name").
158
+ - Normalize and expand synonyms or abbreviations to match columns.
159
+ - If multiple columns match, pick the most relevant one.
160
+
161
+ Example result:
162
+ import altair as alt
163
+ df_agg = df.groupby('region')['sales'].sum().reset_index().sort_values('sales', ascending=False)
164
+ chart = alt.Chart(df_agg).mark_bar().encode(
165
+ x='region:N',
166
+ y='sales:Q',
167
+ color=alt.Color('region:N', scale=alt.Scale(scheme='tableau10')),
168
+ tooltip=['region', 'sales']
169
+ ).properties(
170
+ title='Top Sales per Region'
171
+ ).transform_calculate(
172
+ text='datum.sales'
173
+ ).mark_bar(
174
+ cornerRadiusTopLeft=3, cornerRadiusTopRight=3
175
+ )
176
+ """
177
+
178
+
179
+ prompt_enhance_plot_code = prompt_enhance_plot_code.format(previous_code = previous_code, plot_instruction=plot_instruction)
180
+
181
+ try:
182
+ response = llm_plot.invoke([HumanMessage(content=prompt_enhance_plot_code)])
183
+ code = response.content.strip()
184
+
185
+ # Remove markdown fences if present
186
+ if code.startswith("```"):
187
+ lines = code.split("\n")
188
+ if lines[0].startswith("```"):
189
+ lines = lines[1:]
190
+ if lines[-1].startswith("```"):
191
+ lines = lines[:-1]
192
+ code = "\n".join(lines).strip()
193
+
194
+ return {
195
+ "plot_instruction": plot_instruction,
196
+ "code": code,
197
+ "interpretation":" "
198
+ }
199
+ except Exception as e:
200
+ raise RuntimeError(f"Failed to generate plot: {e}")
201
+
202
+ def generate_plot_from_code(code: str):
203
+ local_scope = {"df": df, "alt": alt}
204
+ exec(code, {}, local_scope)
205
+
206
+ if "chart" not in local_scope:
207
+ raise ValueError("No valid `chart` was generated.")
208
+ return local_scope["chart"]
209
+
210
+ def generate_df_agg_from_code(code: str):
211
+ local_scope = {"df": df, "alt": alt}
212
+ exec(code, {}, local_scope)
213
+
214
+ if "chart" not in local_scope:
215
+ raise ValueError("No valid `chart` was generated.")
216
+ return local_scope["df_agg"]
217
+
218
+ tools = [
219
+ describe_schema,
220
+ generate_plot_code,
221
+ enhance_plot_code,
222
+ ]
223
+
224
+ # --- LLM Setup ---
225
+ llm = ChatGoogleGenerativeAI(
226
+ model="gemini-1.5-flash",
227
+ temperature=0.5,
228
+ max_tokens=None,
229
+ timeout=None,
230
+ max_retries=2,
231
+ )
232
+ llm = llm.bind_tools(tools)
233
+
234
+ llm_analysis = ChatGoogleGenerativeAI(
235
+ model="gemini-1.5-flash",
236
+ temperature=0.5,
237
+ max_tokens=None,
238
+ timeout=None,
239
+ max_retries=2,
240
+ )
241
+
242
+ llm_plot = ChatGoogleGenerativeAI(
243
+ model="gemini-2.0-flash",
244
+ temperature=0.5,
245
+ max_tokens=None,
246
+ timeout=None,
247
+ max_retries=2,
248
+ )
249
+
250
+ # --- LangGraph State Setup ---
251
+ class AgentState(TypedDict):
252
+ messages: Annotated[list[AnyMessage], add_messages]
253
+ assigned_tools: Optional[List[str]] # List of tools assigned to the agent
254
+ table_schema: Optional[str] # Schema of the DataFrame, assume only one table
255
+ plots: List[dict] # List of generated plots
256
+
257
+ sys_msg = SystemMessage(content="""
258
+ You are a helpful assistant named Terloka Bro who works for creating plots.
259
+ you can run tools such as `describe_schema` to understand the dataframe schema,
260
+ and `generate_plot_code` to generate Python code that creates a plot using the Altair library.
261
+ Please do `describe_schema` first then `generate_plot_code` to create a plot, do not call those two function at the same time.
262
+ No need to say if the chart cannot be displayed, because it already handled in the application.
263
+ You already have access to a DataFrame called `df`
264
+ """)
265
+
266
+
267
+ def assistant(state: AgentState) -> AgentState:
268
+
269
+ schema_output = describe_schema.invoke(df)
270
+ res = llm.invoke([sys_msg] + [HumanMessage(content="show your scheme")] + [AIMessage(content=schema_output)] + [ToolMessage(content=schema_output, name="describe_schema", id=str(uuid.uuid4()), tool_call_id=str(uuid.uuid4()))] + state["messages"])
271
+
272
+ state["messages"].append(res)
273
+ assigned_tools = []
274
+ if isinstance(res, AIMessage):
275
+ if res.tool_calls:
276
+ for tool_call in res.tool_calls:
277
+ assigned_tools.append(tool_call)
278
+ return {
279
+ "messages": state["messages"],
280
+ "assigned_tools": assigned_tools,
281
+ "table_schema": state.get("table_schema", []),
282
+ "plots": state.get("plots", [])
283
+ }
284
+
285
+
286
+ sys_msg_analysis = SystemMessage(content="""
287
+ You are given an aggregated `df_agg` dataframe and `instruction`. Your are required to analyze the finding base on the given data.
288
+ """)
289
+ def assistant_analysis(plot_code,plot_instruction):
290
+
291
+ df_agg_temp = generate_df_agg_from_code(plot_code)
292
+ df_agg_result = df_agg_temp.to_dict(orient='list')
293
+
294
+ prompt_analysis = f"""
295
+ You are given aggregation data result:
296
+ ```
297
+ {df_agg_result}
298
+ ```
299
+ By given analysis requirement :
300
+ ```
301
+ {plot_instruction}
302
+ ```
303
+ The expect output:
304
+ - Only provide insight and findings base on the instruction and result
305
+ - Do NOT give suggest plot code
306
+ - Do NOT explain the technical of the chart information
307
+ """
308
+
309
+ res = llm_analysis.invoke([sys_msg_analysis] + [HumanMessage(content=prompt_analysis)])
310
+ analysis_str = res.content
311
+
312
+ return analysis_str
313
+
314
+ def clean_runned_tools(state: AgentState, tool_name: str) -> AgentState:
315
+ """Clean the runned tools from the state"""
316
+ if state["assigned_tools"]:
317
+ removed_list = state["assigned_tools"].copy()
318
+ for tool_call in state["assigned_tools"]:
319
+ if tool_call.get('name') == tool_name:
320
+ removed_list.remove(tool_call)
321
+ break
322
+ state["assigned_tools"] = removed_list
323
+ return state
324
+
325
+ def do_describe_chema(state: AgentState) -> AgentState:
326
+ """Perform the describe schema using the assigned tool"""
327
+ if state["assigned_tools"]:
328
+ for tool_call in state["assigned_tools"]:
329
+ if tool_call.get('name') == "describe_schema":
330
+ tool_res = describe_schema.invoke(tool_call['args']) # Call the tool with the arguments
331
+ state["table_schema"] = tool_res
332
+ tool_message = ToolMessage(
333
+ content=str(tool_res), # Convert the result to string
334
+ id =str(uuid.uuid4()), # Generate a unique ID for the tool message
335
+ name=tool_call['name'], # Use the tool name from the tool call
336
+ tool_call_id=tool_call['id'] # Use the tool call ID for tracking
337
+ )
338
+ state["messages"].append(tool_message)
339
+ break
340
+ """ delete the runned tool call from the state """
341
+ state = clean_runned_tools(state, "describe_schema")
342
+ return state
343
+
344
+ def do_generate_plot_code(state: AgentState) -> AgentState:
345
+ """Perform the plot generation using the assigned tool"""
346
+ if state["assigned_tools"]:
347
+ for tool_call in state["assigned_tools"]:
348
+ if tool_call.get('name') == "generate_plot_code":
349
+ tool_res = generate_plot_code.invoke(tool_call['args']) # Call the tool with the arguments
350
+ if "plots" not in state:
351
+ state["plots"] = []
352
+ state["plots"].append(tool_res)
353
+
354
+ tool_message = ToolMessage(
355
+ content=str(tool_res['code']), # Convert the result to string, but only the chart
356
+ id =str(uuid.uuid4()), # Generate a unique ID for the tool message
357
+ name=tool_call['name'], # Use the tool name from the tool call
358
+ tool_call_id=tool_call['id'] # Use the tool call ID for tracking
359
+ )
360
+ state["messages"].append(tool_message)
361
+ break
362
+ """ delete the runned tool call from the state """
363
+ state = clean_runned_tools(state, "generate_plot_code")
364
+ return state
365
+
366
+ def do_enhance_plot_code(state: AgentState) -> AgentState:
367
+ """Perform the plot generation using the assigned tool"""
368
+ if state["assigned_tools"]:
369
+ for tool_call in state["assigned_tools"]:
370
+ if tool_call.get('name') == "enhance_plot_code":
371
+ tool_res = enhance_plot_code.invoke(tool_call['args']) # Call the tool with the arguments
372
+ if "plots" not in state:
373
+ state["plots"] = []
374
+ state["plots"].append(tool_res)
375
+
376
+ tool_message = ToolMessage(
377
+ content=str(tool_res['code']), # Convert the result to string, but only the chart
378
+ id =str(uuid.uuid4()), # Generate a unique ID for the tool message
379
+ name=tool_call['name'], # Use the tool name from the tool call
380
+ tool_call_id=tool_call['id'] # Use the tool call ID for tracking
381
+ )
382
+ state["messages"].append(tool_message)
383
+ break
384
+ """ delete the runned tool call from the state """
385
+ state = clean_runned_tools(state, "enhance_plot_code")
386
+ return state
387
+
388
+ def route_to_tool(state: AgentState) -> str:
389
+ """Determine the next step based on assigned tools"""
390
+ if state["assigned_tools"]:
391
+ for tool_call in state["assigned_tools"]:
392
+ if tool_call.get('name') == "describe_schema":
393
+ return "describe_schema"
394
+ elif tool_call.get('name') == "generate_plot_code":
395
+ return "generate_plot_code"
396
+ elif tool_call.get('name') == "enhance_plot_code":
397
+ return "enhance_plot_code"
398
+ return "no_tool_required"
399
+
400
+ def route_from_tool(state: AgentState) -> str:
401
+ """Determine the next step based on assigned tools"""
402
+ if state["assigned_tools"]:
403
+ for tool_call in state["assigned_tools"]:
404
+ if tool_call.get('name') == "generate_plot_code":
405
+ return "generate_plot_code"
406
+ return "assistant"
407
+
408
+
409
+ def build_graph():
410
+ builder = StateGraph(AgentState)
411
+ builder.add_node("Assistant", assistant)
412
+ builder.add_node("Describe Schema", do_describe_chema)
413
+ builder.add_node("Generate Plot", do_generate_plot_code)
414
+ builder.add_node("Enhance Plot", do_enhance_plot_code)
415
+
416
+ edges_to_tool = {
417
+ "describe_schema": "Describe Schema",
418
+ "generate_plot_code": "Generate Plot",
419
+ "enhance_plot_code": "Enhance Plot",
420
+ "no_tool_required": END,
421
+ }
422
+
423
+ edges_from_tool = {
424
+ "generate_plot_code": "Generate Plot",
425
+ "assistant": "Assistant",
426
+ }
427
+
428
+ builder.add_edge(START, "Assistant")
429
+ builder.add_conditional_edges("Assistant", route_to_tool, edges_to_tool)
430
+ builder.add_conditional_edges("Describe Schema", route_from_tool, edges_from_tool)
431
+ builder.add_conditional_edges("Generate Plot", route_from_tool, edges_from_tool)
432
+ builder.add_conditional_edges("Enhance Plot", route_from_tool, edges_from_tool)
433
+ builder.add_edge("Assistant", END)
434
+
435
+ memory = InMemorySaver()
436
+ return builder.compile(checkpointer=memory)
437
+
438
+ react_graph = build_graph()
439
+ config = {"configurable": {"thread_id": 123, "session": 100}}
440
+
441
+ # --- Gradio UI ---
442
+ def respond(message, chat_history):
443
+ chat_history = []
444
+ res = react_graph.invoke(
445
+ {"messages": [HumanMessage(content=message)]}
446
+ , config=config)
447
+
448
+ for msg in res["messages"]:
449
+ msg.pretty_print()
450
+ if isinstance(msg, HumanMessage):
451
+ chat_history.append({"role": "user", "content": msg.content})
452
+
453
+ if isinstance(msg, AIMessage):
454
+ ai_response = msg.content
455
+ chat_history.append({"role": "assistant", "content": ai_response})
456
+
457
+ if isinstance(msg, ToolMessage):
458
+ if msg.name == "generate_plot_code":
459
+ plot_result = generate_plot_from_code(msg.content)
460
+ chat_history.append({"role": "assistant", "content": gr.Plot(plot_result)})
461
+ chat_history.append({"role": "assistant", "content": res["plots"][-1].get("interpretation", " ")})
462
+
463
+ if msg.name == "enhance_plot_code":
464
+ plot_result = generate_plot_from_code(msg.content)
465
+ chat_history.append({"role": "assistant", "content": gr.Plot(plot_result)})
466
+
467
+ time.sleep(1)
468
+ return "", chat_history
469
+
470
+
471
+ my_theme = gr.Theme.from_hub("NoCrypt/miku")
472
+
473
+ def to_snake_case(name):
474
+ return name.lower().replace(' ', '_').replace('-', '_')
475
+
476
+ def get_info_df(df):
477
+ info_df = pd.DataFrame({
478
+ "column": df.columns,
479
+ "non_null_count": df.notnull().sum().values,
480
+ "dtype": df.dtypes.astype(str).values
481
+ })
482
+ return info_df
483
+
484
+ def summarize_nulls(df):
485
+ null_summary = df.isnull().sum().reset_index()
486
+ null_summary.columns = ['column', 'null_count']
487
+ null_summary['percent'] = (null_summary['null_count'] / len(df)) * 100
488
+ return null_summary[null_summary['null_count'] > 0]
489
+
490
+ def summarize_duplicates(df):
491
+ return pd.DataFrame({
492
+ "duplicated_rows": [df.duplicated().sum()],
493
+ "total_rows": [len(df)],
494
+ "percent_duplicated": [100 * df.duplicated().sum() / len(df)]
495
+ })
496
+
497
+ def load_example_dataset(name):
498
+ global df
499
+ try:
500
+ if name == "iris":
501
+ df = pd.read_csv("https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv")
502
+ elif name == "titanic":
503
+ df = pd.read_csv("https://raw.githubusercontent.com/datasciencedojo/datasets/refs/heads/master/titanic.csv")
504
+ elif name == "superstore":
505
+ df = pd.read_excel("https://public.tableau.com/app/sample-data/sample_-_superstore.xls")
506
+ else:
507
+ raise ValueError("Unknown dataset name.")
508
+
509
+ df.columns = [col.lower().replace(" ", "_") for col in df.columns]
510
+ null_summary = summarize_nulls(df)
511
+ dup_summary = summarize_duplicates(df)
512
+
513
+ return (
514
+ gr.update(visible=True), # Show main tabs
515
+ gr.update(visible=False), # Hide warning
516
+ gr.update(visible=False), # Hide iris button
517
+ gr.update(visible=False), # Hide titanic button
518
+ gr.update(visible=False), # Hide superstore button
519
+ gr.update(visible=False), # Hide upload button
520
+ df.describe().reset_index(),
521
+ get_info_df(df),
522
+ df.head(),
523
+ null_summary,
524
+ dup_summary
525
+ )
526
+ except Exception as e:
527
+ raise gr.Error(f"Failed to load dataset: {e}")
528
+
529
+ def handle_upload(file):
530
+ global df
531
+ if file is None or file.name == "":
532
+ return (
533
+ gr.update(visible=False), # Hide main tabs
534
+ gr.update(visible=True), # Show warning
535
+ pd.DataFrame(), "", pd.DataFrame(), pd.DataFrame(), pd.DataFrame()
536
+ )
537
+
538
+ try:
539
+ df = pd.read_csv(file) if file.name.endswith(".csv") else pd.read_excel(file)
540
+ except Exception as e:
541
+ raise gr.Error(f"Failed to read the file: {e}")
542
+
543
+ df.columns = [to_snake_case(col) for col in df.columns]
544
+ df = df
545
+
546
+ null_summary = summarize_nulls(df)
547
+ dup_summary = summarize_duplicates(df)
548
+
549
+ # Rebuild the graph and reset the config
550
+ global react_graph
551
+ react_graph = build_graph() # Rebuild graph to reset state
552
+ global config
553
+ config = {"configurable": {"thread_id": str(uuid.uuid4()), "session": str(uuid.uuid4())}}
554
+
555
+ return (
556
+ gr.update(visible=True), # Show main tabs
557
+ gr.update(visible=False), # Hide warning
558
+ gr.update(visible=False), # Hide iris button
559
+ gr.update(visible=False), # Hide titanic button
560
+ gr.update(visible=False), # Hide superstore button
561
+ gr.update(visible=True), # Hide upload button
562
+ df.describe().reset_index(),
563
+ get_info_df(df),
564
+ df.head(),
565
+ null_summary,
566
+ dup_summary
567
+ )
568
+
569
+ def refresh_graph():
570
+ global react_graph
571
+ react_graph = build_graph() # Rebuild graph to reset state
572
+ global config
573
+ config = {"configurable": {"thread_id": str(uuid.uuid4()), "session": str(uuid.uuid4())}}
574
+
575
+ # Layout
576
+ with gr.Blocks(theme=my_theme) as demo:
577
+ demo.load(refresh_graph, inputs=None, outputs=None)
578
+
579
+ gr.HTML("""
580
+ <style>
581
+ body, .container, h1, h2, h3, p, span {
582
+ font-family: "IBM Plex Sans";
583
+ }
584
+
585
+ #instruction blockquote {
586
+ margin: 12px auto 0 auto;
587
+ padding: 12px 16px;
588
+
589
+ border-radius: 6px;
590
+
591
+ font-size: 14px;
592
+ max-width: 7000px;
593
+ }
594
+
595
+ #chatbot_hint {
596
+ margin: 12px auto 0 auto;
597
+ padding: 12px 16px;
598
+
599
+ border-radius: 6px;
600
+
601
+ font-size: 14px;
602
+ max-width: 7000px;
603
+ }
604
+
605
+ @keyframes fadeInTitle {
606
+ 0% {
607
+ opacity: 0;
608
+ transform: translateY(-10px);
609
+ }
610
+ 100% {
611
+ opacity: 1;
612
+ transform: translateY(0);
613
+ }
614
+ }
615
+
616
+ .container {
617
+
618
+ padding: 24px;
619
+ border-radius: 16px;
620
+ box-shadow: 0 2px 30px rgba(42, 86, 198, 0.12);
621
+ text-align: center;
622
+ transition: box-shadow 0.3s ease;
623
+ margin-bottom: 12px;
624
+ }
625
+
626
+ .subtitle {
627
+ font-size: 16px;
628
+ margin-top: -6px;
629
+ }
630
+ </style>
631
+
632
+ <div class="container">
633
+ <h1>
634
+ <span style="font-size: 30px;">๐ŸŽฏ</span>
635
+ <span class="title-gradient">Terloka Data Insight Tool</span>
636
+ </h1>
637
+ <p class="subtitle">Your gateway to smarter decisions through travel data.</p>
638
+ </div>
639
+
640
+ """)
641
+
642
+ gr.Markdown(
643
+ "> Upload a file to get started. Supported formats: `.csv`, `.xls`, `.xlsx`",
644
+ elem_id="instruction"
645
+ )
646
+ warning_box = gr.Markdown("โš ๏ธ **You can't proceed without uploading your files first**", visible=True)
647
+ upload_btn = gr.File(file_types=[".csv", ".xls", ".xlsx"], label="๐Ÿ“ Upload File")
648
+ gr.Markdown("### Or use an example dataset:")
649
+ with gr.Row():
650
+ iris_btn = gr.Button("๐ŸŒธ Load Iris")
651
+ titanic_btn = gr.Button("๐Ÿšข Load Titanic")
652
+ superstore_btn = gr.Button("๐Ÿช Load Superstore")
653
+
654
+
655
+ with gr.Tabs(visible=False) as main_tabs:
656
+ with gr.Tab("๐Ÿค– ChatBot for Viz"):
657
+ gr.Markdown(
658
+ "๐Ÿ‘‰ Want to understand your data first? Go to the Data Exploration tab first!",
659
+ elem_id="chatbot_hint"
660
+ )
661
+
662
+ chatbot = gr.Chatbot(type="messages", label="Data Chatbot", elem_id="chatbot")
663
+
664
+ chat_input = gr.MultimodalTextbox(
665
+ interactive=True,
666
+ file_count="multiple",
667
+ placeholder="Ask about your data or upload files...",
668
+ show_label=False,
669
+ sources=[],
670
+ elem_id="chat_input"
671
+ )
672
+
673
+ def print_like_dislike(x: gr.LikeData):
674
+ print("User liked message:", x.liked, "at index:", x.index)
675
+
676
+ def add_message(history, message):
677
+ # Add uploaded files to history as user messages
678
+ for f in message.get("files", []):
679
+ history.append({"role": "user", "content": {"path": f}})
680
+ # Add text message if any
681
+ if message.get("text"):
682
+ history.append({"role": "user", "content": message["text"]})
683
+ # Clear input box after submit
684
+ return history, gr.MultimodalTextbox(value=None, interactive=True)
685
+
686
+ def bot(history: list):
687
+ last_user_msg = history[-1]["content"]
688
+ if isinstance(last_user_msg, dict): # If user uploaded files, skip LLM
689
+ return history
690
+
691
+ _, updated_history = respond(last_user_msg, history[:-1])
692
+ return updated_history
693
+
694
+
695
+ chat_msg = chat_input.submit(
696
+ add_message, inputs=[chatbot, chat_input], outputs=[chatbot, chat_input]
697
+ )
698
+ bot_msg = chat_msg.then(bot, chatbot, chatbot, api_name="bot_response")
699
+ bot_msg.then(lambda: gr.MultimodalTextbox(interactive=True), None, [chat_input])
700
+
701
+ chatbot.like(print_like_dislike, None, None, like_user_message=True)
702
+
703
+ with gr.Tab("๐Ÿ“Š Data Exploration"):
704
+ with gr.Column():
705
+ with gr.Accordion("๐Ÿงฎ Data Description", open=True):
706
+ describe_output = gr.DataFrame()
707
+ with gr.Accordion("๐Ÿ“‹ Data Info", open=True):
708
+ info_output = gr.DataFrame()
709
+ with gr.Accordion("๐Ÿ‘๏ธ Preview Data", open=False):
710
+ head_output = gr.DataFrame()
711
+ with gr.Accordion("๐Ÿงผ Null Detection", open=False):
712
+ null_output = gr.DataFrame()
713
+ with gr.Accordion("๐Ÿ“Ž Duplicate Check", open=False):
714
+ dup_output = gr.DataFrame()
715
+ # Removed Histogram section here
716
+
717
+
718
+
719
+ gr.Markdown("---")
720
+ gr.Markdown("๐Ÿ› ๏ธ Built with โค๏ธ by **Terloka Bros**", elem_id="footer")
721
+
722
+ upload_btn.change(
723
+ fn=handle_upload,
724
+ inputs=upload_btn,
725
+ outputs=[
726
+ main_tabs, warning_box, iris_btn, titanic_btn, superstore_btn,upload_btn,
727
+ describe_output, info_output,
728
+ head_output, null_output,
729
+ dup_output
730
+ ]
731
+ )
732
+ iris_btn.click(
733
+ fn=lambda: load_example_dataset("iris"),
734
+ outputs=[
735
+ main_tabs, warning_box, iris_btn, titanic_btn, superstore_btn,upload_btn,
736
+ describe_output, info_output,
737
+ head_output, null_output,
738
+ dup_output
739
+ ]
740
+ )
741
+
742
+ titanic_btn.click(
743
+ fn=lambda: load_example_dataset("titanic"),
744
+ outputs=[
745
+ main_tabs, warning_box, iris_btn, titanic_btn, superstore_btn,upload_btn,
746
+ describe_output, info_output,
747
+ head_output, null_output,
748
+ dup_output
749
+ ]
750
+ )
751
+
752
+ superstore_btn.click(
753
+ fn=lambda: load_example_dataset("superstore"),
754
+ outputs=[
755
+ main_tabs, warning_box, iris_btn, titanic_btn, superstore_btn,upload_btn,
756
+ describe_output, info_output,
757
+ head_output, null_output,
758
+ dup_output
759
+ ]
760
+ )
761
+
762
+ demo.launch()
assets/langgraph_flow.png ADDED
requirements.txt ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ huggingface_hub==0.28.1
2
+ xlrd==2.0.1
3
+ langchain==0.3.25
4
+ langchain-chroma==0.2.4
5
+ langchain-community==0.3.24
6
+ langchain-core==0.3.63
7
+ langchain-google-genai==2.1.5
8
+ langchain-groq==0.3.2
9
+ langchain-huggingface==0.1.2
10
+ langchain-tavily==0.2.0
11
+ langchain-text-splitters==0.3.8
12
+ langgraph==0.4.8
13
+ langgraph-checkpoint==2.0.26
14
+ langgraph-prebuilt==0.2.2
15
+ langgraph-sdk==0.1.70
16
+ gradio==5.32.0
17
+ altair==5.5.0
18
+ pandas==2.2.3