Spaces:
Sleeping
Sleeping
| title: ๐น๐ฅ๐ธDeepResearchEvaluator | |
| emoji: ๐น๐ฅ๐ธ | |
| colorFrom: red | |
| colorTo: purple | |
| sdk: streamlit | |
| sdk_version: 1.41.1 | |
| app_file: app.py | |
| pinned: true | |
| license: mit | |
| short_description: Deep Research Evaluator for Long Horizon Learning Tasks | |
| # ๐ต', '๐ถ', '๐ธ', '๐น', '๐บ', '๐ท', '๐ฅ', '๐ป | |
| A Deep Research Evaluator is a conceptual AI system designed to analyze and synthesize information from extensive research literature, such as arXiv papers, to learn about specific topics and generate code applicable to long-horizon tasks in AI. This involves understanding complex subjects, identifying relevant methodologies, and implementing solutions that require planning and execution over extended sequences. | |
| Key Topics and Related Papers: | |
| Long-Horizon Task Planning in Robotics: | |
| "MLDT: Multi-Level Decomposition for Complex Long-Horizon Robotic Task Planning with Open-Source Large Language Model" | |
| Authors: Yike Wu, Jiatao Zhang, Nan Hu, LanLing Tang, Guilin Qi, Jun Shao, Jie Ren, Wei Song | |
| This paper introduces a method that decomposes complex tasks at multiple levels to enhance planning capabilities using open-source large language models. | |
| ARXIV | |
| "ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning" | |
| Authors: Zhehua Zhou, Jiayang Song, Kunpeng Yao, Zhan Shu, Lei Ma | |
| The study presents a framework that improves LLM-based planning through an iterative self-refinement process, enhancing feasibility and correctness in task plans. | |
| ARXIV | |
| Skill-Based Reinforcement Learning: | |
| "Skill Reinforcement Learning and Planning for Open-World Long-Horizon Tasks" | |
| Authors: Haoqi Yuan, Chi Zhang, Hongcheng Wang, Feiyang Xie, Penglin Cai, Hao Dong, Zongqing Lu | |
| This research focuses on building multi-task agents in open-world environments by learning basic skills and planning over them to accomplish long-horizon tasks efficiently. | |
| ARXIV | |
| "SkillTree: Explainable Skill-Based Deep Reinforcement Learning for Long-Horizon Control Tasks" | |
| Authors: Yongyan Wen, Siyuan Li, Rongchang Zuo, Lei Yuan, Hangyu Mao, Peng Liu | |
| The paper proposes a framework that integrates a differentiable decision tree within the high-level policy to generate skill embeddings, enhancing explainability in decision-making for complex tasks. | |
| ARXIV | |
| Neuro-Symbolic Approaches: | |
| "Learning for Long-Horizon Planning via Neuro-Symbolic Abductive Imitation" | |
| Authors: Jie-Jing Shao, Hao-Ran Hao, Xiao-Wen Yang, Yu-Feng Li | |
| This work introduces a framework that combines data-driven learning and symbolic-based reasoning to enable long-horizon planning through abductive imitation learning. | |
| ARXIV | |
| "CaStL: Constraints as Specifications through LLM Translation for Long-Horizon Task and Motion Planning" | |
| Authors: [Authors not specified] | |
| The study presents a method that utilizes large language models to translate constraints into formal specifications, facilitating long-horizon task and motion planning. | |
| ARXIV | |
| Evaluation Frameworks for AI Models: | |
| "ASI: Accuracy-Stability Index for Evaluating Deep Learning Models" | |
| Authors: Wei Dai, Daniel Berleant | |
| The paper introduces the Accuracy-Stability Index (ASI), a quantitative measure that incorporates both accuracy and stability for assessing deep learning models. | |
| ARXIV | |
| "Benchmarks for Deep Off-Policy Evaluation" | |
| Authors: Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R. Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Tom Le Paine | |
| This research provides a collection of policies that, in conjunction with existing offline datasets, can be used for benchmarking off-policy evaluation in deep learning. | |
| ARXIV | |
| These topics and papers contribute to the development of AI systems capable of understanding research literature and applying the acquired knowledge to complex, long-horizon tasks, thereby advancing the field of artificial intelligence. | |
| --- | |
| Features: | |
| ๐ฏ Core Configuration & Setup | |
| Configures Streamlit page with title "๐ฒBikeAI๐ Claude/GPT Research" | |
| ๐ API Setup & Clients | |
| Initializes OpenAI, Anthropic, and HuggingFace API clients with environment variables | |
| ๐ Session State Management | |
| Manages conversation history, transcripts, file editing states, and model selections | |
| ๐ง get_high_info_terms() | |
| Extracts meaningful keywords from text while filtering common stop words | |
| ๐ท๏ธ clean_text_for_filename() | |
| Sanitizes text to create valid filenames by removing special characters | |
| ๐ generate_filename() | |
| Creates intelligent filenames based on content and timestamps | |
| ๐พ create_file() | |
| Saves prompt and response content to files with smart naming | |
| ๐ get_download_link() | |
| Generates base64-encoded download links for files | |
| ๐ค clean_for_speech() | |
| Prepares text for speech synthesis by removing special characters | |
| ๐ฃ๏ธ speech_synthesis_html() | |
| Creates HTML for browser-based speech synthesis | |
| ๐ edge_tts_generate_audio() | |
| Generates MP3 audio files using Edge TTS | |
| ๐ต speak_with_edge_tts() | |
| Wrapper for Edge TTS audio generation | |
| ๐ง play_and_download_audio() | |
| Creates audio player interface with download option | |
| ๐ธ process_image() | |
| Analyzes images using GPT-4V | |
| ๐๏ธ process_audio() | |
| Transcribes audio using Whisper | |
| ๐ฅ process_video() | |
| Extracts frames from video files | |
| ๐ค process_video_with_gpt() | |
| Analyzes video frames using GPT-4V | |
| ๐ parse_arxiv_refs() | |
| Parses research paper references into structured format | |
| ๐ perform_ai_lookup() | |
| Searches and processes arXiv papers with audio summaries | |
| ๐ create_zip_of_files() | |
| Bundles multiple files into a zip with smart naming | |
| ๐ load_files_for_sidebar() | |
| Organizes files by timestamp for sidebar display | |
| ๐ท๏ธ extract_keywords_from_md() | |
| Pulls keywords from markdown files for organization | |
| ๐ display_file_manager_sidebar() | |
| Creates interactive sidebar for file management | |
| ๐ฌ main() | |
| Orchestrates overall application flow and UI components | |