Spaces:
Running
Running
Commit
Β·
e389acb
1
Parent(s):
010ba8f
docs: Add annotated screenshots guide with 39 images
Browse files- SCREENSHOTS.md +318 -0
SCREENSHOTS.md
ADDED
|
@@ -0,0 +1,318 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# TraceMind-AI - Screenshots & Visual Guide
|
| 2 |
+
|
| 3 |
+
This document provides annotated screenshots of all screens in TraceMind-AI to help you understand the interface at a glance.
|
| 4 |
+
|
| 5 |
+
> **Live Demo**: https://huggingface.co/spaces/MCP-1st-Birthday/TraceMind
|
| 6 |
+
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
## π Screen 1: Leaderboard
|
| 10 |
+
|
| 11 |
+
**Purpose**: Browse all agent evaluation runs with AI-powered insights
|
| 12 |
+
|
| 13 |
+
The Leaderboard is the central hub for comparing agent performance across different models, configurations, and benchmarks.
|
| 14 |
+
|
| 15 |
+
### Leaderboard Table
|
| 16 |
+

|
| 17 |
+
|
| 18 |
+
The main leaderboard table displays all evaluation runs with sortable columns including model name, agent type, success rate, token usage, duration, cost, and CO2 emissions. Click any row to drill down into detailed run results. Use the search and filter options to find specific runs.
|
| 19 |
+
|
| 20 |
+
### Summary Cards
|
| 21 |
+

|
| 22 |
+
|
| 23 |
+
Quick-glance summary cards show key metrics: total evaluations, average success rate, total tokens processed, and estimated costs. These cards update dynamically based on your current filter selection.
|
| 24 |
+
|
| 25 |
+
### AI Insights Tab
|
| 26 |
+

|
| 27 |
+
|
| 28 |
+
AI-powered analysis of the leaderboard data, generated using the TraceMind MCP Server's `analyze_leaderboard` tool. Provides intelligent summaries of trends, top performers, and recommendations based on your evaluation history.
|
| 29 |
+
|
| 30 |
+
### Analytics - Cost Efficiency
|
| 31 |
+

|
| 32 |
+
|
| 33 |
+
Interactive chart comparing cost efficiency across different models. Visualizes the relationship between accuracy achieved and cost per evaluation, helping you identify the most cost-effective models for your use case.
|
| 34 |
+
|
| 35 |
+
### Analytics - Performance Heatmap
|
| 36 |
+

|
| 37 |
+
|
| 38 |
+
Heatmap visualization showing performance patterns across different test categories and models. Darker colors indicate higher success rates, making it easy to spot strengths and weaknesses of each model.
|
| 39 |
+
|
| 40 |
+
### Analytics - Speed vs Accuracy
|
| 41 |
+

|
| 42 |
+
|
| 43 |
+
Scatter plot comparing execution speed against accuracy for all runs. Helps identify models that offer the best balance of speed and quality for time-sensitive applications.
|
| 44 |
+
|
| 45 |
+
### Trends Tab
|
| 46 |
+

|
| 47 |
+
|
| 48 |
+
Historical trends showing how model performance has evolved over time. Track improvements in accuracy, cost reduction, and speed optimization across your evaluation history.
|
| 49 |
+
|
| 50 |
+
---
|
| 51 |
+
|
| 52 |
+
## π€ Screen 2: Agent Chat
|
| 53 |
+
|
| 54 |
+
**Purpose**: Interactive autonomous agent powered by MCP tools
|
| 55 |
+
|
| 56 |
+
The Agent Chat provides a conversational interface to interact with the TraceMind MCP Server. Ask questions about your evaluations, request analysis, or generate insights using natural language.
|
| 57 |
+
|
| 58 |
+
### Chat Interface (Part 1)
|
| 59 |
+

|
| 60 |
+
|
| 61 |
+
The chat interface header and input area. Type natural language queries like "What was my best performing model last week?" or "Compare GPT-4 vs Claude on code generation tasks." The agent autonomously selects and executes appropriate MCP tools.
|
| 62 |
+
|
| 63 |
+
### Chat Interface (Part 2)
|
| 64 |
+

|
| 65 |
+
|
| 66 |
+
Example conversation showing the agent executing MCP tools to answer questions. Notice how the agent shows its reasoning process and which tools it's using, providing full transparency into the analysis workflow.
|
| 67 |
+
|
| 68 |
+
### Chat Interface (Part 3)
|
| 69 |
+

|
| 70 |
+
|
| 71 |
+
Extended conversation demonstrating multi-turn interactions. The agent maintains context across messages, allowing for follow-up questions and iterative exploration of your evaluation data.
|
| 72 |
+
|
| 73 |
+
---
|
| 74 |
+
|
| 75 |
+
## π Screen 3: New Evaluation
|
| 76 |
+
|
| 77 |
+
**Purpose**: Submit evaluation jobs to HuggingFace Jobs or Modal
|
| 78 |
+
|
| 79 |
+
Configure and submit new agent evaluation jobs directly from the UI. Supports both API-based models (via LiteLLM) and local models (via Transformers).
|
| 80 |
+
|
| 81 |
+
### Configuration Form (Part 1)
|
| 82 |
+

|
| 83 |
+
|
| 84 |
+
Model selection and basic configuration. Choose from supported models (OpenAI, Anthropic, Llama, etc.), select agent type (ToolCallingAgent or CodeAgent), and configure the evaluation benchmark dataset.
|
| 85 |
+
|
| 86 |
+
### Configuration Form (Part 2)
|
| 87 |
+

|
| 88 |
+
|
| 89 |
+
Advanced options including hardware selection (CPU, A10 GPU, H200 GPU), number of test cases, timeout settings, and OpenTelemetry instrumentation options for detailed tracing.
|
| 90 |
+
|
| 91 |
+
### Cost Estimation
|
| 92 |
+

|
| 93 |
+
|
| 94 |
+
Real-time cost estimation before submitting your job. Shows estimated compute costs, API costs (for LiteLLM models), total cost, and estimated duration. Uses the TraceMind MCP Server's `estimate_cost` tool for accurate predictions.
|
| 95 |
+
|
| 96 |
+
### Submit Evaluation
|
| 97 |
+

|
| 98 |
+
|
| 99 |
+
Final review and submission screen. Confirm your configuration, review the estimated costs, and submit the job to either HuggingFace Jobs or Modal for execution.
|
| 100 |
+
|
| 101 |
+
---
|
| 102 |
+
|
| 103 |
+
## π Screen 4: Job Monitoring
|
| 104 |
+
|
| 105 |
+
**Purpose**: Track status of submitted evaluation jobs
|
| 106 |
+
|
| 107 |
+
Monitor the progress and status of all your submitted evaluation jobs in real-time.
|
| 108 |
+
|
| 109 |
+
### Recent Jobs
|
| 110 |
+

|
| 111 |
+
|
| 112 |
+
List of recently submitted jobs with status indicators (Pending, Running, Completed, Failed). Shows job ID, model, submission time, and current progress. Click any job to view detailed logs.
|
| 113 |
+
|
| 114 |
+
### Inspect Jobs
|
| 115 |
+

|
| 116 |
+
|
| 117 |
+
Detailed job inspection view showing real-time logs, resource utilization, and progress metrics. Useful for debugging failed jobs or monitoring long-running evaluations.
|
| 118 |
+
|
| 119 |
+
---
|
| 120 |
+
|
| 121 |
+
## π Screen 5: Run Details
|
| 122 |
+
|
| 123 |
+
**Purpose**: View detailed results for a specific evaluation run
|
| 124 |
+
|
| 125 |
+
Deep dive into the results of a completed evaluation run with comprehensive metrics and visualizations.
|
| 126 |
+
|
| 127 |
+
### Overview Tab
|
| 128 |
+

|
| 129 |
+
|
| 130 |
+
High-level summary of the evaluation run including success rate, total tokens, cost breakdown, and execution time. Quick-access buttons to view traces, compare with other runs, or download results.
|
| 131 |
+
|
| 132 |
+
### Test Cases Tab
|
| 133 |
+

|
| 134 |
+
|
| 135 |
+
Detailed breakdown of individual test cases. Shows each test's prompt, expected output, actual response, success/failure status, and execution metrics. Click any test case to view its full trace.
|
| 136 |
+
|
| 137 |
+
### Performance Tab
|
| 138 |
+

|
| 139 |
+
|
| 140 |
+
Performance charts showing token distribution, latency breakdown, and cost analysis per test case. Identify bottlenecks and outliers in your evaluation run.
|
| 141 |
+
|
| 142 |
+
### AI Insights Tab
|
| 143 |
+

|
| 144 |
+
|
| 145 |
+
AI-generated analysis of this specific run, powered by the TraceMind MCP Server's `analyze_results` tool. Provides detailed breakdown of failure patterns, success factors, and recommendations for improvement.
|
| 146 |
+
|
| 147 |
+
### GPU Metrics Tab
|
| 148 |
+

|
| 149 |
+
|
| 150 |
+
GPU utilization metrics for runs executed on GPU hardware (A10 or H200). Shows memory usage, compute utilization, temperature, and power consumption over time. Only available for GPU-accelerated jobs.
|
| 151 |
+
|
| 152 |
+
---
|
| 153 |
+
|
| 154 |
+
## π Screen 6: Trace Visualization
|
| 155 |
+
|
| 156 |
+
**Purpose**: Deep-dive into agent execution traces with OpenTelemetry data
|
| 157 |
+
|
| 158 |
+
Explore the complete execution flow of individual agent runs using OTEL trace data.
|
| 159 |
+
|
| 160 |
+
### Waterfall View
|
| 161 |
+

|
| 162 |
+
|
| 163 |
+
Timeline visualization of the agent's execution flow. Shows the sequence of LLM calls, tool invocations, and reasoning steps with precise timing information. Hover over any span for detailed attributes.
|
| 164 |
+
|
| 165 |
+
### Span Details Tab
|
| 166 |
+

|
| 167 |
+
|
| 168 |
+
Detailed view of individual spans in the trace. Shows span name, parent-child relationships, duration, and all OpenTelemetry attributes including token counts, model parameters, and tool inputs/outputs.
|
| 169 |
+
|
| 170 |
+
### Thought Graph Tab
|
| 171 |
+

|
| 172 |
+
|
| 173 |
+
Visual graph representation of the agent's reasoning process. Shows how thoughts, tool calls, and observations connect to form the agent's decision-making flow. Great for understanding complex multi-step reasoning.
|
| 174 |
+
|
| 175 |
+
### Raw Data Tab
|
| 176 |
+

|
| 177 |
+
|
| 178 |
+
Full JSON export of the OTEL trace data for advanced analysis or integration with external observability tools. Copy or download the complete trace for offline analysis.
|
| 179 |
+
|
| 180 |
+
### About This Trace
|
| 181 |
+

|
| 182 |
+
|
| 183 |
+
Summary information about the trace including trace ID, associated run, test case, total duration, and span count. Provides context for understanding what this trace represents.
|
| 184 |
+
|
| 185 |
+
---
|
| 186 |
+
|
| 187 |
+
## βοΈ Screen 7: Compare Runs
|
| 188 |
+
|
| 189 |
+
**Purpose**: Side-by-side comparison of evaluation runs
|
| 190 |
+
|
| 191 |
+
Compare multiple evaluation runs to understand performance differences between models, configurations, or time periods.
|
| 192 |
+
|
| 193 |
+
### Side-by-Side Comparison
|
| 194 |
+

|
| 195 |
+
|
| 196 |
+
Direct comparison table showing metrics from two selected runs side by side. Highlights differences in success rate, cost, speed, and token usage with clear visual indicators.
|
| 197 |
+
|
| 198 |
+
### Report Card
|
| 199 |
+

|
| 200 |
+
|
| 201 |
+
Generated comparison report with winner/loser indicators for each metric category. Provides a quick summary of which run performs better and by how much.
|
| 202 |
+
|
| 203 |
+
### Radar Comparison
|
| 204 |
+

|
| 205 |
+
|
| 206 |
+
Radar chart visualization comparing runs across multiple dimensions: accuracy, speed, cost efficiency, token efficiency, and consistency. Quickly identify trade-offs between different configurations.
|
| 207 |
+
|
| 208 |
+
### AI Insights
|
| 209 |
+

|
| 210 |
+
|
| 211 |
+
AI-powered comparison analysis using the TraceMind MCP Server's `compare_runs` tool. Provides intelligent narrative explaining the key differences, likely causes, and recommendations for choosing between models.
|
| 212 |
+
|
| 213 |
+
---
|
| 214 |
+
|
| 215 |
+
## π§ͺ Screen 8: Synthetic Data Generator
|
| 216 |
+
|
| 217 |
+
**Purpose**: Generate custom test datasets with AI
|
| 218 |
+
|
| 219 |
+
Create custom evaluation datasets tailored to your specific use case using AI-powered generation.
|
| 220 |
+
|
| 221 |
+
### Generator Interface
|
| 222 |
+

|
| 223 |
+
|
| 224 |
+
Configure synthetic data generation by specifying the domain, complexity level, number of test cases, and any specific requirements. Uses the TraceMind MCP Server's `generate_test_cases` tool to create diverse, realistic test scenarios. Preview generated cases before saving to your evaluation dataset.
|
| 225 |
+
|
| 226 |
+
---
|
| 227 |
+
|
| 228 |
+
## βοΈ Screen 9: Settings
|
| 229 |
+
|
| 230 |
+
**Purpose**: Configure API keys and preferences
|
| 231 |
+
|
| 232 |
+
Manage your TraceMind configuration including API keys, default settings, and integration options.
|
| 233 |
+
|
| 234 |
+
### Settings (Part 1)
|
| 235 |
+

|
| 236 |
+
|
| 237 |
+
API key configuration for various providers: OpenAI, Anthropic, HuggingFace, Google Gemini. Keys are securely stored and masked after entry. Test connection buttons verify your keys are working.
|
| 238 |
+
|
| 239 |
+
### Settings (Part 2)
|
| 240 |
+

|
| 241 |
+
|
| 242 |
+
Additional settings including default model selection, MCP server connection settings, notification preferences, and data export options. Configure TraceMind to match your workflow preferences.
|
| 243 |
+
|
| 244 |
+
---
|
| 245 |
+
|
| 246 |
+
## π Screen 10: Documentation
|
| 247 |
+
|
| 248 |
+
**Purpose**: In-app documentation and guides
|
| 249 |
+
|
| 250 |
+
Comprehensive documentation accessible directly within TraceMind, covering all features and integrations.
|
| 251 |
+
|
| 252 |
+
### About Tab
|
| 253 |
+

|
| 254 |
+
|
| 255 |
+
Overview of TraceMind-AI including its purpose, key features, and the ecosystem it's part of. Includes quick links to demo videos, GitHub repositories, and community resources.
|
| 256 |
+
|
| 257 |
+
### TraceVerde Tab
|
| 258 |
+

|
| 259 |
+
|
| 260 |
+
Documentation for TraceVerde (genai_otel_instrument), the OpenTelemetry instrumentation library that powers TraceMind's tracing capabilities. Shows installation, usage examples, and supported frameworks.
|
| 261 |
+
|
| 262 |
+
### SMOLTRACE Tab
|
| 263 |
+

|
| 264 |
+
|
| 265 |
+
Documentation for SMOLTRACE, the evaluation engine backend. Covers configuration options, benchmark datasets, and integration with HuggingFace datasets for storing evaluation results.
|
| 266 |
+
|
| 267 |
+
### TraceMind MCP Server Tab
|
| 268 |
+

|
| 269 |
+
|
| 270 |
+
Complete documentation for the TraceMind MCP Server including all 11 tools, 3 resources, and 3 prompts. Shows how to connect the MCP server to Claude Desktop, Cursor, or other MCP clients.
|
| 271 |
+
|
| 272 |
+
### Job Submission Tab
|
| 273 |
+

|
| 274 |
+
|
| 275 |
+
Guide to submitting evaluation jobs via HuggingFace Jobs or Modal. Covers hardware options (CPU, A10, H200), cost considerations, and best practices for running large-scale evaluations.
|
| 276 |
+
|
| 277 |
+
---
|
| 278 |
+
|
| 279 |
+
## π Dashboard
|
| 280 |
+
|
| 281 |
+
### Dashboard Overview
|
| 282 |
+

|
| 283 |
+
|
| 284 |
+
The main dashboard provides a unified view of your TraceMind activity: recent evaluations, quick stats, trending models, and shortcut buttons to common actions. This is your starting point for navigating the full TraceMind-AI experience.
|
| 285 |
+
|
| 286 |
+
---
|
| 287 |
+
|
| 288 |
+
## π¦ Screenshot Summary
|
| 289 |
+
|
| 290 |
+
| Screen | Screenshots | Key Features |
|
| 291 |
+
|--------|-------------|--------------|
|
| 292 |
+
| Leaderboard | 7 | Sortable table, summary cards, AI insights, analytics charts, trends |
|
| 293 |
+
| Agent Chat | 3 | Natural language queries, MCP tool execution, multi-turn conversations |
|
| 294 |
+
| New Evaluation | 4 | Model selection, hardware config, cost estimation, job submission |
|
| 295 |
+
| Job Monitoring | 2 | Job status tracking, real-time logs, progress monitoring |
|
| 296 |
+
| Run Details | 5 | Overview metrics, test cases, performance charts, AI analysis, GPU metrics |
|
| 297 |
+
| Trace Visualization | 5 | Waterfall timeline, span details, thought graph, raw OTEL data |
|
| 298 |
+
| Compare Runs | 4 | Side-by-side metrics, report card, radar chart, AI comparison |
|
| 299 |
+
| Synthetic Data | 1 | AI-powered test case generation, domain configuration |
|
| 300 |
+
| Settings | 2 | API key management, default preferences, MCP connection |
|
| 301 |
+
| Documentation | 5 | About, TraceVerde, SMOLTRACE, MCP Server, Job Submission guides |
|
| 302 |
+
| Dashboard | 1 | Activity overview, quick stats, navigation shortcuts |
|
| 303 |
+
| **Total** | **39** | Complete UI coverage with explanatory descriptions |
|
| 304 |
+
|
| 305 |
+
---
|
| 306 |
+
|
| 307 |
+
## π Related Documentation
|
| 308 |
+
|
| 309 |
+
- [README.md](README.md) - Quick start guide
|
| 310 |
+
- [USER_GUIDE.md](USER_GUIDE.md) - Complete walkthrough
|
| 311 |
+
- [MCP_INTEGRATION.md](MCP_INTEGRATION.md) - Technical MCP details
|
| 312 |
+
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
|
| 313 |
+
|
| 314 |
+
---
|
| 315 |
+
|
| 316 |
+
**Status**: β
Screenshots Complete - 39 annotated images organized and deployed
|
| 317 |
+
|
| 318 |
+
**Last Updated**: November 2025
|