Tips on working with table padding #341
Unanswered
TastyMoocow
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm going thru the tutorial on table transfer in deepdoctection using a sample PDF. Using the default padding of 60, it only detects one of two tables. Playing around with padding yield both tables being detected but the table and cell outline no longer aligns. I also see the box dimension being changed, for example, the left edge changed when I changed the top padding but the left padding remains at 60.
Any suggests are greatly appreciated. Thank you in advance.
`from PIL import Image
from pathlib import Path
from matplotlib import pyplot as plt
from IPython.core.display import HTML
import deepdoctection as dd
from operator import length_hint
import os
os.environ["USE_DD_PILLOW"]="True"
os.environ["USE_DD_OPENCV"]="False"
analyzer = dd.get_dd_analyzer(config_overwrite=
["PT.LAYOUT.WEIGHTS=microsoft/table-transformer-detection/pytorch_model.bin", # TATR table detection model
"PT.ITEM.WEIGHTS=microsoft/table-transformer-structure-recognition/pytorch_model.bin", # TATR table segmentation model
"PT.ITEM.FILTER=['table', 'column_header', 'projected_row_header', 'spanning']",
"SEGMENTATION.REMOVE_IOU_THRESHOLD_ROWS=0.5", #works for detecting row but not spanning
"SEGMENTATION.REMOVE_IOU_THRESHOLD_COLS=0.1",
"TEXT_ORDERING.INCLUDE_RESIDUAL_TEXT_CONTAINER=False",
#"TEXT_ORDERING.PARAGRAPH_BREAK=0.01",
"OCR.USE_DOCTR=True",
"OCR.USE_TESSERACT=False",
"PT.LAYOUT.PAD.TOP=200",
])
#analyzer.pipe_component_list[0].predictor.config.threshold = 0.01 #this break table detection, too high
df = analyzer.analyze(path = "path to pdf")
df.reset_state()
doc = iter(df)
page = next(doc)
print(length_hint(page.tables))
image = page.viz()
plt.figure(figsize = (25,17))
plt.axis('off')
plt.imshow(image)
plt.show()
`
Here are the screenshots and different settings:
Default 60 padding, first table correctly detected

Top Pad 200 with rest at 60, second table detected but first table's cells are no longer aligned and left edge moved

Beta Was this translation helpful? Give feedback.
All reactions