dylanebert commited on
Commit
642907a
·
1 Parent(s): 18386c7

initial server

Browse files
Files changed (4) hide show
  1. README.md +87 -5
  2. __pycache__/app.cpython-311.pyc +0 -0
  3. app.py +367 -0
  4. requirements.txt +2 -0
README.md CHANGED
@@ -1,12 +1,94 @@
1
  ---
2
- title: Research Tracker Mcp
3
- emoji: 🏢
4
- colorFrom: red
5
- colorTo: yellow
6
  sdk: gradio
7
  sdk_version: 5.38.2
8
  app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Research Tracker MCP
3
+ emoji: 🔬
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: gradio
7
  sdk_version: 5.38.2
8
  app_file: app.py
9
  pinned: false
10
+ license: mit
11
  ---
12
 
13
+ # Research Tracker MCP Server
14
+
15
+ A Gradio-based MCP (Model Context Protocol) server that provides research inference utilities for AI assistants and tools. This server offers public-facing APIs to extract and infer research metadata from various sources like papers, repositories, and project pages.
16
+
17
+ ## Features
18
+
19
+ ### MCP Tools Available
20
+
21
+ - **`infer_authors`**: Extract author names from research papers, repositories, or project URLs
22
+ - **`infer_paper_url`**: Find associated research papers from GitHub repos, project pages, or partial information
23
+ - **`infer_code_repository`**: Locate code repositories from paper URLs or project information
24
+ - **`infer_research_name`**: Extract formal paper/project titles from various inputs
25
+ - **`classify_research_url`**: Classify URLs as Paper, Code, Model, Dataset, Space, or Project
26
+
27
+ ### Supported Input Types
28
+
29
+ - **arXiv papers**: `https://arxiv.org/abs/2010.11929`
30
+ - **GitHub repositories**: `https://github.com/google-research/vision_transformer`
31
+ - **Hugging Face resources**: Models, Datasets, Spaces, Papers
32
+ - **Project pages**: GitHub Pages, personal websites
33
+ - **Research titles**: Natural language paper titles
34
+
35
+ ## Usage
36
+
37
+ ### As MCP Server
38
+
39
+ This space can be used as an MCP server by AI assistants that support the MCP protocol. Configure your MCP client with:
40
+
41
+ ```json
42
+ {
43
+ "mcpServers": {
44
+ "research-tracker": {
45
+ "url": "https://YOUR_SPACE_NAME.hf.space/gradio_api/mcp/sse"
46
+ }
47
+ }
48
+ }
49
+ ```
50
+
51
+ ### Web Interface
52
+
53
+ The space also provides a web interface for testing the inference functions directly in your browser.
54
+
55
+ ## Architecture
56
+
57
+ This MCP server delegates all inference logic to the [Research Tracker Backend](https://huggingface.co/spaces/dylanebert/research-tracker-backend) to ensure consistency and avoid code duplication. It serves as a public-facing interface for research inference utilities without requiring database access.
58
+
59
+ ## Examples
60
+
61
+ ### Infer Authors from arXiv Paper
62
+ ```python
63
+ infer_authors("https://arxiv.org/abs/2010.11929")
64
+ # Returns: ["Alexey Dosovitskiy", "Lucas Beyer", "Alexander Kolesnikov", ...]
65
+ ```
66
+
67
+ ### Find Paper from GitHub Repository
68
+ ```python
69
+ infer_paper_url("https://github.com/google-research/vision_transformer")
70
+ # Returns: "https://arxiv.org/abs/2010.11929"
71
+ ```
72
+
73
+ ### Classify URL Type
74
+ ```python
75
+ classify_research_url("https://huggingface.co/google/vit-base-patch16-224")
76
+ # Returns: "Model"
77
+ ```
78
+
79
+ ## Requirements
80
+
81
+ - Python 3.11+
82
+ - Gradio with MCP support
83
+ - Internet connection for backend API calls
84
+
85
+ ## Development
86
+
87
+ The server is built with:
88
+ - **Gradio**: Web interface and MCP protocol support
89
+ - **Requests**: HTTP client for backend communication
90
+ - **Backend Integration**: Calls to research-tracker-backend API
91
+
92
+ ## License
93
+
94
+ MIT License - Feel free to use and modify for your research needs.
__pycache__/app.cpython-311.pyc ADDED
Binary file (17 kB). View file
 
app.py ADDED
@@ -0,0 +1,367 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Research Tracker MCP Server
3
+
4
+ A Gradio-based MCP server that provides research inference utilities.
5
+ Delegates inference logic to the research-tracker-backend for consistency.
6
+ """
7
+
8
+ import os
9
+ import requests
10
+ import gradio as gr
11
+ from typing import List, Dict, Any, Optional
12
+ import logging
13
+
14
+ # Configure logging
15
+ logging.basicConfig(level=logging.INFO)
16
+ logger = logging.getLogger(__name__)
17
+
18
+ # Configuration
19
+ BACKEND_URL = "https://dylanebert-research-tracker-backend.hf.space"
20
+ HF_TOKEN = os.environ.get("HF_TOKEN")
21
+ REQUEST_TIMEOUT = 30
22
+
23
+ if not HF_TOKEN:
24
+ logger.warning("HF_TOKEN not found in environment variables")
25
+
26
+
27
+ def make_backend_request(endpoint: str, data: Dict[str, Any]) -> Dict[str, Any]:
28
+ """
29
+ Make a request to the research-tracker-backend.
30
+
31
+ Args:
32
+ endpoint: The backend endpoint to call (e.g., 'infer-authors')
33
+ data: The data to send in the request body
34
+
35
+ Returns:
36
+ The response data from the backend
37
+
38
+ Raises:
39
+ Exception: If the request fails or returns an error
40
+ """
41
+ url = f"{BACKEND_URL}/{endpoint}"
42
+ headers = {
43
+ "Content-Type": "application/json",
44
+ "Authorization": f"Bearer {HF_TOKEN}" if HF_TOKEN else ""
45
+ }
46
+
47
+ try:
48
+ response = requests.post(url, json=data, headers=headers, timeout=REQUEST_TIMEOUT)
49
+ response.raise_for_status()
50
+ return response.json()
51
+ except requests.exceptions.Timeout:
52
+ raise Exception(f"Request to {endpoint} timed out")
53
+ except requests.exceptions.RequestException as e:
54
+ raise Exception(f"Request to {endpoint} failed: {str(e)}")
55
+
56
+
57
+ def infer_authors(input_data: str) -> List[str]:
58
+ """
59
+ Infer authors from research paper or project information.
60
+
61
+ This function attempts to extract author names from various inputs like
62
+ paper URLs (arXiv, Hugging Face papers), project pages, or repository links.
63
+ It uses the research-tracker-backend inference engine.
64
+
65
+ Args:
66
+ input_data: A URL, paper title, or other research-related input
67
+
68
+ Returns:
69
+ A list of author names, or empty list if no authors found
70
+
71
+ Examples:
72
+ >>> infer_authors("https://arxiv.org/abs/2103.00020")
73
+ ["Alexey Dosovitskiy", "Lucas Beyer", "Alexander Kolesnikov", ...]
74
+
75
+ >>> infer_authors("https://github.com/google-research/vision_transformer")
76
+ ["Alexey Dosovitskiy", "Lucas Beyer", ...]
77
+ """
78
+ if not input_data or not input_data.strip():
79
+ return []
80
+
81
+ try:
82
+ # Create a minimal row data structure for the backend
83
+ row_data = {
84
+ "Name": None,
85
+ "Authors": [],
86
+ "Paper": input_data if "arxiv" in input_data or "huggingface.co/papers" in input_data else None,
87
+ "Code": input_data if "github.com" in input_data else None,
88
+ "Project": input_data if "github.io" in input_data else None,
89
+ "Space": input_data if "huggingface.co/spaces" in input_data else None,
90
+ "Model": input_data if "huggingface.co/models" in input_data else None,
91
+ "Dataset": input_data if "huggingface.co/datasets" in input_data else None,
92
+ }
93
+
94
+ # If we can't classify the input, try it as a paper
95
+ if not any(row_data.values()):
96
+ row_data["Paper"] = input_data
97
+
98
+ # Call the backend
99
+ result = make_backend_request("infer-authors", row_data)
100
+
101
+ # Extract authors from response
102
+ authors = result.get("authors", [])
103
+ if isinstance(authors, str):
104
+ # Handle comma-separated string format
105
+ authors = [author.strip() for author in authors.split(",") if author.strip()]
106
+ elif not isinstance(authors, list):
107
+ authors = []
108
+
109
+ return authors
110
+
111
+ except Exception as e:
112
+ logger.error(f"Error inferring authors: {e}")
113
+ return []
114
+
115
+
116
+ def infer_paper_url(input_data: str) -> str:
117
+ """
118
+ Infer the paper URL from various research-related inputs.
119
+
120
+ This function attempts to find the associated research paper from
121
+ inputs like GitHub repositories, project pages, or partial URLs.
122
+
123
+ Args:
124
+ input_data: A URL, repository link, or other research-related input
125
+
126
+ Returns:
127
+ The paper URL (typically arXiv or Hugging Face papers), or empty string if not found
128
+
129
+ Examples:
130
+ >>> infer_paper_url("https://github.com/google-research/vision_transformer")
131
+ "https://arxiv.org/abs/2010.11929"
132
+
133
+ >>> infer_paper_url("Vision Transformer")
134
+ "https://arxiv.org/abs/2010.11929"
135
+ """
136
+ if not input_data or not input_data.strip():
137
+ return ""
138
+
139
+ try:
140
+ # Create row data structure
141
+ row_data = {
142
+ "Name": input_data if not input_data.startswith("http") else None,
143
+ "Authors": [],
144
+ "Paper": input_data if "arxiv" in input_data or "huggingface.co/papers" in input_data else None,
145
+ "Code": input_data if "github.com" in input_data else None,
146
+ "Project": input_data if "github.io" in input_data else None,
147
+ "Space": input_data if "huggingface.co/spaces" in input_data else None,
148
+ "Model": input_data if "huggingface.co/models" in input_data else None,
149
+ "Dataset": input_data if "huggingface.co/datasets" in input_data else None,
150
+ }
151
+
152
+ # Call the backend
153
+ result = make_backend_request("infer-paper", row_data)
154
+
155
+ # Extract paper URL from response
156
+ paper_url = result.get("paper", "")
157
+ return paper_url if paper_url else ""
158
+
159
+ except Exception as e:
160
+ logger.error(f"Error inferring paper: {e}")
161
+ return ""
162
+
163
+
164
+ def infer_code_repository(input_data: str) -> str:
165
+ """
166
+ Infer the code repository URL from research-related inputs.
167
+
168
+ This function attempts to find the associated code repository from
169
+ inputs like paper URLs, project pages, or partial information.
170
+
171
+ Args:
172
+ input_data: A URL, paper link, or other research-related input
173
+
174
+ Returns:
175
+ The code repository URL (typically GitHub), or empty string if not found
176
+
177
+ Examples:
178
+ >>> infer_code_repository("https://arxiv.org/abs/2010.11929")
179
+ "https://github.com/google-research/vision_transformer"
180
+
181
+ >>> infer_code_repository("Vision Transformer")
182
+ "https://github.com/google-research/vision_transformer"
183
+ """
184
+ if not input_data or not input_data.strip():
185
+ return ""
186
+
187
+ try:
188
+ # Create row data structure
189
+ row_data = {
190
+ "Name": input_data if not input_data.startswith("http") else None,
191
+ "Authors": [],
192
+ "Paper": input_data if "arxiv" in input_data or "huggingface.co/papers" in input_data else None,
193
+ "Code": input_data if "github.com" in input_data else None,
194
+ "Project": input_data if "github.io" in input_data else None,
195
+ "Space": input_data if "huggingface.co/spaces" in input_data else None,
196
+ "Model": input_data if "huggingface.co/models" in input_data else None,
197
+ "Dataset": input_data if "huggingface.co/datasets" in input_data else None,
198
+ }
199
+
200
+ # Call the backend
201
+ result = make_backend_request("infer-code", row_data)
202
+
203
+ # Extract code URL from response
204
+ code_url = result.get("code", "")
205
+ return code_url if code_url else ""
206
+
207
+ except Exception as e:
208
+ logger.error(f"Error inferring code: {e}")
209
+ return ""
210
+
211
+
212
+ def infer_research_name(input_data: str) -> str:
213
+ """
214
+ Infer the research paper or project name from various inputs.
215
+
216
+ This function attempts to extract the formal name/title of a research
217
+ paper or project from URLs, repositories, or partial information.
218
+
219
+ Args:
220
+ input_data: A URL, repository link, or other research-related input
221
+
222
+ Returns:
223
+ The research name/title, or empty string if not found
224
+
225
+ Examples:
226
+ >>> infer_research_name("https://arxiv.org/abs/2010.11929")
227
+ "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"
228
+
229
+ >>> infer_research_name("https://github.com/google-research/vision_transformer")
230
+ "Vision Transformer"
231
+ """
232
+ if not input_data or not input_data.strip():
233
+ return ""
234
+
235
+ try:
236
+ # Create row data structure
237
+ row_data = {
238
+ "Name": None,
239
+ "Authors": [],
240
+ "Paper": input_data if "arxiv" in input_data or "huggingface.co/papers" in input_data else None,
241
+ "Code": input_data if "github.com" in input_data else None,
242
+ "Project": input_data if "github.io" in input_data else None,
243
+ "Space": input_data if "huggingface.co/spaces" in input_data else None,
244
+ "Model": input_data if "huggingface.co/models" in input_data else None,
245
+ "Dataset": input_data if "huggingface.co/datasets" in input_data else None,
246
+ }
247
+
248
+ # Call the backend
249
+ result = make_backend_request("infer-name", row_data)
250
+
251
+ # Extract name from response
252
+ name = result.get("name", "")
253
+ return name if name else ""
254
+
255
+ except Exception as e:
256
+ logger.error(f"Error inferring name: {e}")
257
+ return ""
258
+
259
+
260
+ def classify_research_url(url: str) -> str:
261
+ """
262
+ Classify the type of research-related URL or input.
263
+
264
+ This function determines what type of research resource a given URL
265
+ or input represents (paper, code, model, dataset, etc.).
266
+
267
+ Args:
268
+ url: The URL or input to classify
269
+
270
+ Returns:
271
+ The field type: "Paper", "Code", "Space", "Model", "Dataset", "Project", or "Unknown"
272
+
273
+ Examples:
274
+ >>> classify_research_url("https://arxiv.org/abs/2010.11929")
275
+ "Paper"
276
+
277
+ >>> classify_research_url("https://github.com/google-research/vision_transformer")
278
+ "Code"
279
+
280
+ >>> classify_research_url("https://huggingface.co/google/vit-base-patch16-224")
281
+ "Model"
282
+ """
283
+ if not url or not url.strip():
284
+ return "Unknown"
285
+
286
+ try:
287
+ # Call the backend
288
+ result = make_backend_request("infer-field", {"value": url})
289
+
290
+ # Extract field from response
291
+ field = result.get("field", "Unknown")
292
+ return field if field else "Unknown"
293
+
294
+ except Exception as e:
295
+ logger.error(f"Error classifying URL: {e}")
296
+ return "Unknown"
297
+
298
+
299
+ # Create Gradio interface
300
+ def create_demo():
301
+ """Create the Gradio demo interface for testing."""
302
+
303
+ with gr.Blocks(title="Research Tracker MCP Server") as demo:
304
+ gr.Markdown("# Research Tracker MCP Server")
305
+ gr.Markdown("Test the research inference utilities that are available through MCP.")
306
+
307
+ with gr.Tab("Authors"):
308
+ with gr.Row():
309
+ author_input = gr.Textbox(
310
+ label="Input (URL, paper title, etc.)",
311
+ placeholder="https://arxiv.org/abs/2010.11929",
312
+ lines=1
313
+ )
314
+ author_output = gr.JSON(label="Authors")
315
+ author_btn = gr.Button("Infer Authors")
316
+ author_btn.click(infer_authors, inputs=author_input, outputs=author_output)
317
+
318
+ with gr.Tab("Paper"):
319
+ with gr.Row():
320
+ paper_input = gr.Textbox(
321
+ label="Input (GitHub repo, project name, etc.)",
322
+ placeholder="https://github.com/google-research/vision_transformer",
323
+ lines=1
324
+ )
325
+ paper_output = gr.Textbox(label="Paper URL")
326
+ paper_btn = gr.Button("Infer Paper")
327
+ paper_btn.click(infer_paper_url, inputs=paper_input, outputs=paper_output)
328
+
329
+ with gr.Tab("Code"):
330
+ with gr.Row():
331
+ code_input = gr.Textbox(
332
+ label="Input (paper URL, project name, etc.)",
333
+ placeholder="https://arxiv.org/abs/2010.11929",
334
+ lines=1
335
+ )
336
+ code_output = gr.Textbox(label="Code Repository URL")
337
+ code_btn = gr.Button("Infer Code")
338
+ code_btn.click(infer_code_repository, inputs=code_input, outputs=code_output)
339
+
340
+ with gr.Tab("Name"):
341
+ with gr.Row():
342
+ name_input = gr.Textbox(
343
+ label="Input (URL, repo, etc.)",
344
+ placeholder="https://github.com/google-research/vision_transformer",
345
+ lines=1
346
+ )
347
+ name_output = gr.Textbox(label="Research Name/Title")
348
+ name_btn = gr.Button("Infer Name")
349
+ name_btn.click(infer_research_name, inputs=name_input, outputs=name_output)
350
+
351
+ with gr.Tab("Classify"):
352
+ with gr.Row():
353
+ classify_input = gr.Textbox(
354
+ label="URL to classify",
355
+ placeholder="https://huggingface.co/google/vit-base-patch16-224",
356
+ lines=1
357
+ )
358
+ classify_output = gr.Textbox(label="URL Type")
359
+ classify_btn = gr.Button("Classify URL")
360
+ classify_btn.click(classify_research_url, inputs=classify_input, outputs=classify_output)
361
+
362
+ return demo
363
+
364
+
365
+ if __name__ == "__main__":
366
+ demo = create_demo()
367
+ demo.launch(mcp_server=True, share=False)
requirements.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ gradio[mcp]==5.38.2
2
+ requests==2.32.4