Can the dataset or data aggregation methodology be made public?

#1
by bruh444 - opened

Hi, can you please provide some details on what dataset for training the efficientnet-b0. Was it a custom dataset?

I'm looking to add custom labels to it like person. The logo and signature performance can be improved.

Would of immense help if the dataset aggregation methology atleast is provided for improving the performance.

Thanks for the work! Look back to your input.

Docling org

@bruh444 Unfortunately, we can make a HF dataset yet, due to potential republishing issues. However, we can collaborate and retrain, potentially adding new classes!

@bruh444 Unfortunately, we can make a HF dataset yet, due to potential republishing issues. However, we can collaborate and retrain, potentially adding new classes!

Hey! Love to colloborate on this. I can help prepare a high quality dataset for signatures and logos. I intended to primarily use financial documents filed at exchanges to retrieve them and create a labelled dataset with an LLM model or something.

For people classifications, I intend to classify very images like following. Not sure how I can aggregate this data without using sythentic sources.

person.png

Let me know how you'd like to connect!

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment