Spaces:

StabRise
/

README

Running

App Files Files Community

MykolaMelnyk commited on Dec 16, 2024

Commit

55da6dd

verified ·

1 Parent(s): 0726016

Update README.md

Browse files

Files changed (1) hide show

README.md +68 -1

README.md CHANGED Viewed

@@ -7,4 +7,71 @@ sdk: static
 pinned: false
 ---
-Edit this `README.md` markdown file to author your organization card.

 pinned: false
 ---
+# Hi there 👋
+StabRise - Document Processing Solutions
+# Our projects
+## PDF DataSource for the Apache Spark
+<a href="https://stabrise.com/spark-pdf/"><img alt="Spark Pdf" src="https://stabrise.com/media/filer_public_thumbnails/filer_public/16/d6/16d6a0d6-f162-42ad-a5a3-7dc20361ad24/sparkpdf.png__1000x300_subsampling-2.webp" height="120"></a>
+---
+**Source Code**: [https://github.com/StabRise/spark-pdf](https://github.com/StabRise/spark-pdf)
+**Home page**: [https://stabrise.com/spark-pdf/](https://stabrise.com/spark-pdf/)
+**Quick Start Jupyter Notebook**: [https://github.com/StabRise/spark-pdf/blob/main/examples/PdfDataSource.ipynb](https://github.com/StabRise/spark-pdf/blob/main/examples/PdfDataSource.ipynb)
+---
+The project provides a custom data source for the Apache Spark that allows you to read PDF files into the Spark DataFrame.
+## Key features:
+- Read PDF documents to the Spark DataFrame
+- Support read PDF files lazy per page
+- Support big files, up to 10k pages
+- Support scanned PDF files (call OCR)
+- No need to install Tesseract OCR, it's included in the package
+## ScaleDP
+<a href="https://stabrise.com/scaledp/"><img alt="ScaleDP" src="https://stabrise.com/media/filer_public_thumbnails/filer_public/4a/7d/4a7d97c2-50d7-4b7a-9902-af2df9b574da/scaledplogo.png__1000x300_subsampling-2.webp" height="120" /></a>
+---
+**Source Code**: [https://github.com/StabRise/scaledp](https://github.com/StabRise/scaledp)
+**Home page**: [https://stabrise.com/scaledp/](https://stabrise.com/scaledp/)
+**Quick Start Jupyter Notebook**: [https://github.com/StabRise/ScaleDP-Tutorials/blob/master/1.QuickStart.ipynb](https://github.com/StabRise/ScaleDP-Tutorials/blob/master/1.QuickStart.ipynb)
+---
+ScaleDP is an Open-Source Library for processing documents using Apache Spark.
+### Key features:
+- Load PDF documents/Images
+- Extract text from PDF documents/Images
+- Extract images from PDF documents
+- OCR Images/PDF documents
+- Run NER on text extracted from PDF documents/Images
+- Visualize NER results
+## De-Identify
+<a href="https://deidentify.online"><img alt="De-Identify" src="https://stabrise.com/media/filer_public_thumbnails/filer_public/fb/fe/fbfe4b0c-dadb-4878-88ad-1c0ece0dc053/deidentifylogo.png__1000x300_subsampling-2.webp" height="120" /></a>
+De-Identify is tool for de-identification/anonymization data
+### Supported formats
+ - text
+ - images
+ - pdf documents
+ - DICOM files