Spaces:

salvinjose
/

HNTAI

Paused

dev-3 commited on Jun 10

Commit

e91f155

1 Parent(s): ff95253

docker changes

Files changed (3) hide show

.env ADDED Viewed

+HF_HOME=/tmp/huggingface
+TRANSFORMERS_CACHE=/tmp/huggingface
+XDG_CACHE_HOME=/tmp
+TORCH_HOME=/tmp/torch
+WHISPER_CACHE=/tmp/whisper
+UPLOAD_DIR=/tmp/uploads

Dockerfile CHANGED Viewed

@@ -1,20 +1,14 @@
 FROM python:3.10-slim
-RUN apt-get update && apt-get install -y ffmpeg
-# Install system dependencies and build tools
 RUN apt-get update && apt-get install -y \
-    build-essential \
-    pkg-config \
-    libsystemd-dev \
-    libcairo2-dev \
     tesseract-ocr \
     libglib2.0-0 \
     libsm6 \
     libxrender1 \
     libxext6 \
     poppler-utils \
-    gettext \
     libgl1 \
     && rm -rf /var/lib/apt/lists/*
@@ -28,12 +22,7 @@ COPY requirements.txt .
 # Install pip and dependencies
 RUN pip install --upgrade pip \
- && pip install -r requirements.txt --no-cache-dir \
- # Remove build tools and clean up to reduce image size
- && apt-get remove -y build-essential pkg-config libsystemd-dev libcairo2-dev \
- && apt-get autoremove -y \
- && apt-get clean \
- && rm -rf /var/lib/apt/lists/*
 # Copy the rest of your code
 COPY . .

 FROM python:3.10-slim
+# Install only required system dependencies
 RUN apt-get update && apt-get install -y \
+    ffmpeg \
     tesseract-ocr \
     libglib2.0-0 \
     libsm6 \
     libxrender1 \
     libxext6 \
     poppler-utils \
     libgl1 \
     && rm -rf /var/lib/apt/lists/*
 # Install pip and dependencies
 RUN pip install --upgrade pip \
+ && pip install -r requirements.txt --no-cache-dir
 # Copy the rest of your code
 COPY . .

README.md CHANGED Viewed

@@ -1,10 +1,39 @@
 ---
-title: HNTAI
 emoji: 📉
 colorFrom: blue
 colorTo: green
 sdk: docker
 pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: HNTAI - Medical Data Extraction API
 emoji: 📉
 colorFrom: blue
 colorTo: green
 sdk: docker
+app_port: 7860
 pinned: false
 ---
+# HNTAI - Medical Data Extraction API
+This is a Flask-based API for extracting and processing medical data from various document formats.
+## Features
+- Document text extraction (PDF, DOCX, Images)
+- Audio transcription
+- Medical data extraction
+- PHI (Protected Health Information) scrubbing
+- Text summarization
+## Deployment on Hugging Face Spaces
+- Uses Docker for deployment
+- All models and data are cached in /tmp
+- Optimized for memory usage
+- Auto-retries for model loading
+- Proper error handling
+## Environment Variables
+All necessary environment variables are pre-configured for Hugging Face Spaces deployment.
+## API Endpoints
+- POST /upload - Upload and process medical documents
+- POST /transcribe - Transcribe audio files
+- POST /extract_medical_data - Extract structured medical data
+- POST /api/generate_summary - Generate text summaries
+- POST /api/extract_medical_data_from_audio - Process audio recordings
+For more details, check the API documentation.