training dataset

#2
by syedroshanzameer - opened

Hello Sarfraz,
Do you have the original dataset that was used to train the classifier?
I wanted to fine-tune this model to add more document types to it. ex: different types of w2 forms.
Looks like the model fails to classify some types of W2 forms.

Hi Syed,

For w2 forms I used original forms from IRS web site with some augmentations. I did not use synthetic data generation, which could have caused the low accuracy. Synthetic data generation improves model accuracy and is needed to make the model better.

Given below are examples

fw2_page_2_300_z2_bw.png
fw2_page_2_300_z2_bw_cropped_bottom.png
fw2_page_2_300_z2_bw_cropped_top_bottom.png
fw2_page_2_300_z2_cropped_bottom.png
fw2_page_2_300_z2_cropped_top_bottom.png
fw2_page_2_300_z2.png

Regards

Got it! Thanks for the response.
May I know how many samples per each class you have used for the classification task?
Is it possible to save this model to onnx format for inferencing?

303 samples for w2 form.
840 samples for 1040

Regards

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment