sciresearch / SETUP.md
sccastillo's picture
document output
5259707

Configuración de SciResearch API con Research Team

🔑 Configurar OpenAI API Key

Para que la funcionalidad de IA funcione correctamente, necesitas configurar tu API key de OpenAI:

Para uso local:

  1. Crea un archivo .env en la raíz del proyecto
  2. Agrega tu API key de OpenAI:
OPENAI_API_KEY=tu_api_key_aqui

Para Hugging Face Spaces:

  1. Ve a tu Space: https://huggingface.co/spaces/sccastillo/sciresearch
  2. Haz clic en "Settings"
  3. Ve a la sección "Variables and secrets"
  4. Agrega una nueva variable:
    • Name: OPENAI_API_KEY
    • Value: Tu API key de OpenAI (sk-...)

🧬 Research Team - Funcionalidad Principal

La nueva funcionalidad Research Team implementa un sistema multi-agente para Claims Anchoring y Reference Formatting siguiendo las especificaciones de Johnson & Johnson:

🎯 Características del Research Team:

Claims Anchoring Workflow:

  • Analyzer Agent: Extrae y clasifica claims en jerárquicas (core, supporting, contextual)
  • SearchAssistant: Búsqueda paralela en Google Scholar, PubMed, y arXiv
  • Researcher Agent: Anclaje de claims con referencias y validación de evidencia

Reference Formatting Workflow:

  • Editor Agent: Formatea referencias según guidelines de J&J
  • Validación: Verifica integridad y completitud de referencias

🔧 Arquitectura Técnica:

  • LangGraph: Orquestación de workflow multi-agente
  • Parallel Processing: Procesamiento simultáneo de múltiples claims
  • Mock Tools: Herramientas simuladas para desarrollo y testing

🚀 Características

  • Interfaz web interactiva: Pregunta directamente en la página principal
  • Research Team Interface: Procesa documentos para análisis de claims
  • API REST: Endpoints para integración
  • Respuestas inteligentes: Usa OpenAI para responder preguntas
  • Documentación automática: Disponible en /docs

📝 Endpoints disponibles:

Endpoints básicos:

  • GET / - Página principal con interfaz interactiva
  • POST /api/generate - Generar respuestas con IA
  • GET /api/health - Estado de la aplicación
  • GET /docs - Documentación Swagger UI

Endpoints del Research Team:

  • POST /api/research/process - Procesar documento con Research Team
  • GET /api/research/status - Estado del Research Team

🧪 Ejemplo de uso con curl:

Respuesta básica con IA:

curl -X POST "https://sccastillo-sciresearch.hf.space/api/generate" \
  -H "Content-Type: application/json" \
  -d '{"question": "¿Qué es la inteligencia artificial?"}'

Procesamiento de documento con Research Team:

curl -X POST "https://sccastillo-sciresearch.hf.space/api/research/process" \
  -H "Content-Type: application/json" \
  -d '{
    "document_content": "Daratumumab is a human monoclonal antibody that targets CD38. Clinical studies have demonstrated significant efficacy in treating multiple myeloma patients. The POLLUX study demonstrated that daratumumab in combination with lenalidomide and dexamethasone significantly improved progression-free survival."
  }'

🧪 Testing con Documento de Ejemplo

El archivo test_document.md contiene un documento de muestra con:

  • Claims médicos estructurados
  • Referencias formateadas
  • Metadatos de producto (Daratumumab, países LATAM)
  • Información de contacto

Puedes usar este contenido para probar la funcionalidad del Research Team.

📊 Ejemplo de Salida del Research Team

Cuando proceses un documento médico como el del ejemplo de Daratumumab, el Research Team devuelve una respuesta estructurada con análisis detallado:

{
  "detailed_analysis": {
    "claims_extracted": {
      "all_claims": [
        {
          "id": "claim_1",
          "text": "Daratumumab is a human monoclonal antibody that targets CD38",
          "type": "core",
          "importance_score": 9,
          "position": 1,
          "context": "Opening statement defining the drug mechanism"
        },
        {
          "id": "claim_2", 
          "text": "Clinical studies have demonstrated significant efficacy in treating multiple myeloma patients",
          "type": "core",
          "importance_score": 8,
          "position": 2,
          "context": "Clinical efficacy statement"
        },
        {
          "id": "claim_3",
          "text": "The POLLUX study demonstrated that daratumumab in combination with lenalidomide and dexamethasone significantly improved progression-free survival",
          "type": "supporting",
          "importance_score": 7,
          "position": 3,
          "context": "Specific study evidence"
        }
      ],
      "core_claims": [
        {
          "id": "claim_1",
          "text": "Daratumumab is a human monoclonal antibody that targets CD38",
          "type": "core",
          "importance_score": 9,
          "position": 1,
          "context": "Opening statement defining the drug mechanism"
        },
        {
          "id": "claim_2",
          "text": "Clinical studies have demonstrated significant efficacy in treating multiple myeloma patients", 
          "type": "core",
          "importance_score": 8,
          "position": 2,
          "context": "Clinical efficacy statement"
        }
      ],
      "total_claims_found": 3,
      "core_claims_count": 2
    },
    "anchoring_results": {
      "detailed_anchoring": [
        {
          "claim_id": "claim_1",
          "claim_text": "Daratumumab is a human monoclonal antibody that targets CD38",
          "validation_status": "validated",
          "supporting_evidence": [
            "CD38 is highly expressed on multiple myeloma cells",
            "Daratumumab demonstrates potent anti-tumor activity through multiple mechanisms"
          ],
          "anchored_references": [
            {
              "reference_id": "gs_claim_1_1",
              "supporting_text": "Daratumumab (DARZALEX) is a human IgG1κ monoclonal antibody that binds specifically to CD38",
              "relevance_score": 0.95,
              "section": "Background"
            }
          ],
          "quality_assessment": "High quality evidence from peer-reviewed sources"
        }
      ],
      "claims_with_evidence": [
        {
          "claim_id": "claim_1",
          "claim_text": "Daratumumab is a human monoclonal antibody that targets CD38",
          "validation_status": "validated",
          "supporting_evidence": [
            "CD38 is highly expressed on multiple myeloma cells",
            "Daratumumab demonstrates potent anti-tumor activity through multiple mechanisms"
          ],
          "anchored_references": [
            {
              "reference_id": "gs_claim_1_1",
              "supporting_text": "Daratumumab (DARZALEX) is a human IgG1κ monoclonal antibody that binds specifically to CD38",
              "relevance_score": 0.95,
              "section": "Background"
            }
          ],
          "quality_assessment": "High quality evidence from peer-reviewed sources"
        },
        {
          "claim_id": "claim_2",
          "claim_text": "Clinical studies have demonstrated significant efficacy in treating multiple myeloma patients",
          "validation_status": "validated",
          "supporting_evidence": [
            "Phase III trials showed improved overall response rates",
            "Significant progression-free survival benefit demonstrated"
          ],
          "anchored_references": [
            {
              "reference_id": "pm_claim_2_1",
              "supporting_text": "Daratumumab significantly improved outcomes in relapsed/refractory multiple myeloma",
              "relevance_score": 0.88,
              "section": "Results"
            }
          ],
          "quality_assessment": "Strong clinical evidence from randomized controlled trials"
        }
      ]
    },
    "formatted_references": {
      "references": [
        {
          "id": "gs_claim_1_1",
          "original": "Research findings about daratumumab mechanism",
          "formatted": "Multiple Authors et al. Daratumumab mechanism of action in multiple myeloma. Hematol Oncol 2024; 42(3): 145-158.",
          "changes_applied": "Applied J&J journal article format, added proper author citation",
          "source_type": "journal",
          "completion_status": "complete"
        },
        {
          "id": "pm_claim_2_1", 
          "original": "Clinical efficacy research findings",
          "formatted": "Clinical Research Team et al. Efficacy of daratumumab in multiple myeloma treatment. Blood Cancer J [Internet]. 2024 Mar 15 [cited 2024 Nov 15]; 14(1): 25. Available from: https://doi.org/10.1038/example",
          "changes_applied": "Applied J&J journal epub format with DOI",
          "source_type": "journal",
          "completion_status": "complete"
        }
      ],
      "reference_details": [
        {
          "reference_id": "gs_claim_1_1",
          "formatted_citation": "Multiple Authors et al. Daratumumab mechanism of action in multiple myeloma. Hematol Oncol 2024; 42(3): 145-158.",
          "source_type": "journal",
          "completion_status": "complete"
        },
        {
          "reference_id": "pm_claim_2_1",
          "formatted_citation": "Clinical Research Team et al. Efficacy of daratumumab in multiple myeloma treatment. Blood Cancer J [Internet]. 2024 Mar 15 [cited 2024 Nov 15]; 14(1): 25. Available from: https://doi.org/10.1038/example",
          "source_type": "journal", 
          "completion_status": "complete"
        }
      ]
    }
  },
  "summary_statistics": {
    "document_metadata": {
      "product": "daratumumab",
      "countries": ["mexico", "brazil", "argentina"],
      "language": "english"
    },
    "claims_analysis": {
      "total_claims": 3,
      "core_claims_count": 2
    },
    "claims_anchoring": {
      "summary": {
        "total_claims_processed": 2,
        "successfully_validated": 2,
        "validation_rate": 1.0,
        "claims_summary": [
          {
            "claim_id": "claim_1",
            "status": "validated",
            "references_found": 1
          },
          {
            "claim_id": "claim_2", 
            "status": "validated",
            "references_found": 1
          }
        ]
      }
    },
    "reference_formatting": {
      "total_references": 2
    },
    "processing_status": {
      "analyzer": "completed",
      "claim_claim_1": "completed",
      "claim_claim_2": "completed"
    }
  }
}

🔍 Estructura de la Respuesta:

detailed_analysis (Contenido Principal):

  • claims_extracted: Claims identificados y clasificados por importancia
  • anchoring_results: Validación de evidencia para cada claim con referencias
  • formatted_references: Referencias formateadas según guidelines J&J

summary_statistics (Información Resumida):

  • document_metadata: Producto, países, idioma detectados
  • claims_analysis: Conteos y métricas de claims
  • claims_anchoring: Tasa de validación y resumen
  • reference_formatting: Total de referencias procesadas
  • processing_status: Estado de procesamiento por agente

1. Análisis de Documento (Analyzer Agent)

  • Extrae claims y los clasifica por importancia
  • Identifica producto, países, y idioma
  • Genera estructura jerárquica de claims

2. Búsqueda Paralela (SearchAssistant)

  • Procesa solo claims core (alta prioridad)
  • Búsqueda simultánea en múltiples fuentes
  • Optimización de recursos y rate limiting

3. Anclaje de Claims (Researcher Agent)

  • Valida evidencia de soporte para cada claim
  • Extrae pasajes relevantes de referencias
  • Genera scoring de relevancia y calidad

4. Formateo de Referencias (Editor Agent)

  • Aplica guidelines de formato J&J
  • Completa información faltante
  • Estandariza citaciones según tipo de fuente

5. Ensamblaje Final

  • Combina resultados de todos los agentes
  • Genera reporte completo con métricas
  • Proporciona documento reconstructado

Optimizaciones de Performance

  • Parallel Processing: Múltiples claims procesados simultáneamente
  • Mock Tools: Evita rate limits durante desarrollo
  • State Management: LangGraph maneja estado distribuido
  • Error Handling: Tolerancia a fallos individuales