Spaces:

ccapo
/

portfolio

Sleeping

Christopher Capobianco commited on Oct 24, 2024

Commit

a2d3475

1 Parent(s): e71d901

Add warning message about doc classifier

Files changed (2) hide show

Home.py CHANGED Viewed

@@ -20,6 +20,7 @@ with st.container():
     text_column, image_column = st.columns((3,1))
     with text_column:
         st.subheader("Document Classifier", divider="green")
         st.markdown("""
             - Used OCR text and a Random Forest classification model to predict a document's classification
             - Trained on Real World Documents Collection at Kaggle

     text_column, image_column = st.columns((3,1))
     with text_column:
         st.subheader("Document Classifier", divider="green")
+        st.warning("Work in Progress")
         st.markdown("""
             - Used OCR text and a Random Forest classification model to predict a document's classification
             - Trained on Real World Documents Collection at Kaggle

projects/01_Document_Classifier.py CHANGED Viewed

@@ -2,7 +2,6 @@ import streamlit as st
 import easyocr
 import pickle
 import spacy
-# import en_core_web_sm
 import re
 import os
 import subprocess
@@ -75,6 +74,8 @@ def autoclassifier(images):
 st.header('Document Classifier', divider='green')
 st.markdown("#### What is OCR?")
 st.markdown("OCR stands for Optical Character Recognition, and the technology for it has been around for over 30 years.")
 st.markdown("In this project, we leverage the extraction of the text from an image to classify the document. I am using EasyOCR as the OCR Engine, and I do some pre-processing of the raw OCR text to improve the quality of the words used to classify the documents.")

 import easyocr
 import pickle
 import spacy
 import re
 import os
 import subprocess
 st.header('Document Classifier', divider='green')
+st.warning("Work in Progress")
 st.markdown("#### What is OCR?")
 st.markdown("OCR stands for Optical Character Recognition, and the technology for it has been around for over 30 years.")
 st.markdown("In this project, we leverage the extraction of the text from an image to classify the document. I am using EasyOCR as the OCR Engine, and I do some pre-processing of the raw OCR text to improve the quality of the words used to classify the documents.")