KevinHuSh commited on
Commit
e319829
·
1 Parent(s): 2d09c38

fix exception in pdf parser (#584)

Browse files

### What problem does this PR solve?
#451

### Type of change

- [x] Bug Fix (non-breaking change which fixes an issue)

Files changed (1) hide show
  1. deepdoc/parser/pdf_parser.py +2 -1
deepdoc/parser/pdf_parser.py CHANGED
@@ -470,7 +470,8 @@ class RAGFlowPdfParser:
470
  continue
471
 
472
  if re.match(r"[0-9]{2,3}/[0-9]{3}$", up["text"]) \
473
- or re.match(r"[0-9]{2,3}/[0-9]{3}$", down["text"]):
 
474
  i += 1
475
  continue
476
 
 
470
  continue
471
 
472
  if re.match(r"[0-9]{2,3}/[0-9]{3}$", up["text"]) \
473
+ or re.match(r"[0-9]{2,3}/[0-9]{3}$", down["text"]) \
474
+ or not down["text"].strip():
475
  i += 1
476
  continue
477