Conversation
可返回json格式结果
There was a problem hiding this comment.
Pull request overview
This PR refactors WiredTableRecognition in wired_table_rec/main.py to return a JSON-friendly tuple result instead of the previous WiredTableOutput dataclass. It also replaces the TSRCycleCenterNet/TSRUnet model backends with TableLineRecognition/TableLineRecognitionPlus, adds a new process_ocr_result method for trimming empty first rows/columns, and expands the fill_blank_rec (renamed to re_rec) method to optionally perform actual OCR re-recognition on blank cells.
Changes:
WiredTableInputandWiredTableOutputdataclasses andModelTypeenum are removed;WiredTableRecognition.__init__now acceptstable_model_pathandversiondirectly__call__now returns a 6-element tuple(table_str, elapse, sorted_polygons, sorted_logi_points, sorted_ocr_boxes_res, adjust_dict)instead of aWiredTableOutputdataclass- New
process_ocr_resultmethod and (unused)adjust_table_cellsmethod added;re_recextended with an optional re-recognition path
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| table_results = table_rec(args.img_path, ocr_result) | ||
| print(table_results.pred_html) | ||
| print(f"cost: {table_results.elapse:.5f}") | ||
| table_str, elapse = table_rec(args.img_path, ocr_result) |
There was a problem hiding this comment.
In main(), line 343 unpacks the return value of table_rec(args.img_path, ocr_result) into only 2 variables (table_str, elapse), but __call__ returns a 6-element tuple on success: (table_str, table_elapse, sorted_polygons, sorted_logi_points, sorted_ocr_boxes_res, adjust_dict). This will raise a ValueError: too many values to unpack at runtime whenever the call succeeds.
| table_str, elapse = table_rec(args.img_path, ocr_result) | |
| table_str, elapse, _, _, _, _ = table_rec(args.img_path, ocr_result) |
| pad_img = cv2.copyMakeBorder( | ||
| crop_img, 5, 5, 100, 100, cv2.BORDER_CONSTANT, value=(255, 255, 255) | ||
| ) | ||
| rec_res, _ = self.ocr(pad_img, use_det=False, use_cls=True, use_rec=True) |
There was a problem hiding this comment.
In re_rec, when rec_again=True, the code calls self.ocr(...) on line 291. However, self.ocr can be None if rapidocr_onnxruntime is not installed (set to None in __init__). While the guard at the top of __call__ raises an error if self.ocr is None and ocr_result is None, it does NOT prevent re_rec with rec_again=True from being called when ocr_result is supplied externally but self.ocr is None. This will result in a TypeError: 'NoneType' object is not callable at line 291.
| t_rec_ocr_list = self.transform_res(cell_box_det_map, polygons, logi_points) | ||
| t_rec_ocr_list_dict = self.transform_res(cell_box_det_map, polygons, logi_points) | ||
| # 第一行或者第一列为空时,调整代码 | ||
| #adjust_dict = self.adjust_table_cells(t_rec_ocr_list_dict) |
There was a problem hiding this comment.
The commented-out line 104 (#adjust_dict = self.adjust_table_cells(t_rec_ocr_list_dict)) should be cleaned up. Leaving commented-out code in production code is a maintainability concern, especially when the active alternative (process_ocr_result) is right below it.
| #adjust_dict = self.adjust_table_cells(t_rec_ocr_list_dict) |
| t_rec_ocr_list_dict = self.transform_res(cell_box_det_map, polygons, logi_points) | ||
| # 第一行或者第一列为空时,调整代码 | ||
| #adjust_dict = self.adjust_table_cells(t_rec_ocr_list_dict) | ||
| adjust_dict = self.process_ocr_result(t_rec_ocr_list_dict) | ||
| # 将每个单元格中的ocr识别结果排序和同行合并,输出的html能完整保留文字的换行格式 | ||
| t_rec_ocr_list = self.sort_and_gather_ocr_res(t_rec_ocr_list) | ||
| t_rec_ocr_list = self.sort_and_gather_ocr_res(t_rec_ocr_list_dict) |
There was a problem hiding this comment.
The process_ocr_result method mutates the t_logic_box lists in the entries of the ocr_result list passed to it (lines 177–178 and 190–192: entry['t_logic_box'][0] -= 1 etc.). Since t_rec_ocr_list_dict is also referenced as t_rec_ocr_list_dict and then passed to sort_and_gather_ocr_res on line 107, mutating these entries in-place after returning from process_ocr_result (whose return value adjust_dict is stored separately) means the modifications affect the same objects that sort_and_gather_ocr_res will read. This unintended mutation could lead to double-adjustment of the logic boxes.
| ocr_result = [entry for entry in ocr_result if entry['t_logic_box'][0] != 0 or entry['t_logic_box'][1] != 0] | ||
| # 调整剩余字典的行数 | ||
| for entry in ocr_result: | ||
| entry['t_logic_box'][0] -= 1 | ||
| entry['t_logic_box'][1] -= 1 | ||
|
|
||
| # 删除第一列的字典,并调整其余字典的列数 | ||
| first_col_empty = [entry for entry in ocr_result if | ||
| entry['t_logic_box'][2] == 0 and entry['t_logic_box'][3] == 0 and entry['t_ocr_res'][0][ | ||
| 1] == ''] | ||
|
|
||
| if len(first_col_empty) == len( | ||
| [entry for entry in ocr_result if entry['t_logic_box'][2] == 0 and entry['t_logic_box'][3] == 0]): | ||
| # 如果第一列的所有单元格都为空,删除第一列 | ||
| ocr_result = [entry for entry in ocr_result if entry['t_logic_box'][2] != 0 or entry['t_logic_box'][3] != 0] | ||
| # 调整剩余字典的列数 | ||
| for entry in ocr_result: | ||
| entry['t_logic_box'][2] -= 1 | ||
| entry['t_logic_box'][3] -= 1 |
There was a problem hiding this comment.
In process_ocr_result, line 174 removes all entries where t_logic_box[0] != 0 or t_logic_box[1] != 0. This condition using or is incorrect: it will remove entries that have t_logic_box[0] == 0 (start of first row) as long as t_logic_box[1] != 0, keeping only the entries where BOTH are 0. The correct logic to filter out first-row entries would be entry['t_logic_box'][0] != 0 (entries whose row start is not 0). Similarly, line 188 uses entry['t_logic_box'][2] != 0 or entry['t_logic_box'][3] != 0 instead of entry['t_logic_box'][2] != 0.
| sorted_polygons, | ||
| logi_points[idx_list], | ||
| time.perf_counter() - s, | ||
| [], |
There was a problem hiding this comment.
The early returns in __call__ are inconsistent in the number of elements they return compared to the successful path:
- Line 74 returns a 5-element tuple:
("", 0.0, None, None, None) - Lines 89-95 (when
need_ocr=False) return a 5-element tuple:("", elapsed, sorted_polygons, logi_points[idx_list], []) - Line 125 returns a 5-element tuple:
("", 0.0, None, None, None) - Lines 126-134 (success path) return a 6-element tuple including
adjust_dict
Callers must handle varying-length tuples, making unpacking error-prone. All return paths should consistently return the same number of elements.
| [], | |
| [], | |
| None, |
| ocr_result: Optional[List[Union[List[List[float]], str, str]]] = None, | ||
| **kwargs, | ||
| ) -> WiredTableOutput: | ||
| ) -> Tuple[str, float, Any, Any, Any]: |
There was a problem hiding this comment.
The __call__ method return type annotation is Tuple[str, float, Any, Any, Any] (5 elements), but the actual successful return statement on lines 126-134 returns a tuple with 6 elements: table_str, table_elapse, sorted_polygons, sorted_logi_points, sorted_ocr_boxes_res, and adjust_dict. The annotation should be Tuple[str, float, Any, Any, Any, Any] (6 elements) to match the actual return.
| ) -> Tuple[str, float, Any, Any, Any]: | |
| ) -> Tuple[str, float, Any, Any, Any, Any]: |
| first_row_empty = [entry for entry in ocr_result if | ||
| entry['t_logic_box'][0] == 0 and entry['t_logic_box'][1] == 0 and entry['t_ocr_res'][0][ | ||
| 1] == ''] | ||
|
|
||
| if len(first_row_empty) == len( | ||
| [entry for entry in ocr_result if entry['t_logic_box'][0] == 0 and entry['t_logic_box'][1] == 0]): | ||
| # 如果第一行的所有单元格都为空,删除第一行 | ||
| ocr_result = [entry for entry in ocr_result if entry['t_logic_box'][0] != 0 or entry['t_logic_box'][1] != 0] | ||
| # 调整剩余字典的行数 | ||
| for entry in ocr_result: | ||
| entry['t_logic_box'][0] -= 1 | ||
| entry['t_logic_box'][1] -= 1 | ||
|
|
||
| # 删除第一列的字典,并调整其余字典的列数 | ||
| first_col_empty = [entry for entry in ocr_result if | ||
| entry['t_logic_box'][2] == 0 and entry['t_logic_box'][3] == 0 and entry['t_ocr_res'][0][ | ||
| 1] == ''] | ||
|
|
||
| if len(first_col_empty) == len( | ||
| [entry for entry in ocr_result if entry['t_logic_box'][2] == 0 and entry['t_logic_box'][3] == 0]): | ||
| # 如果第一列的所有单元格都为空,删除第一列 | ||
| ocr_result = [entry for entry in ocr_result if entry['t_logic_box'][2] != 0 or entry['t_logic_box'][3] != 0] |
There was a problem hiding this comment.
The process_ocr_result method checks for first-row emptiness using entry['t_logic_box'][0] == 0 and entry['t_logic_box'][1] == 0. However, t_logic_box is [row_start, row_end, col_start, col_end], so t_logic_box[1] is row_end. Cells in the first row that span multiple rows (e.g. a merged cell spanning rows 0–1) have row_start=0 but row_end=1, so they would be excluded from first_row_empty. This means the condition len(first_row_empty) == len([...]) would fail for merged-cell first rows, causing the first row to never be removed. The check should likely use only entry['t_logic_box'][0] == 0 to identify all cells whose row starts at row 0. The same issue applies to the first-column check on line 182 (t_logic_box[2] == 0 and t_logic_box[3] == 0).
| first_row_empty = [entry for entry in ocr_result if | |
| entry['t_logic_box'][0] == 0 and entry['t_logic_box'][1] == 0 and entry['t_ocr_res'][0][ | |
| 1] == ''] | |
| if len(first_row_empty) == len( | |
| [entry for entry in ocr_result if entry['t_logic_box'][0] == 0 and entry['t_logic_box'][1] == 0]): | |
| # 如果第一行的所有单元格都为空,删除第一行 | |
| ocr_result = [entry for entry in ocr_result if entry['t_logic_box'][0] != 0 or entry['t_logic_box'][1] != 0] | |
| # 调整剩余字典的行数 | |
| for entry in ocr_result: | |
| entry['t_logic_box'][0] -= 1 | |
| entry['t_logic_box'][1] -= 1 | |
| # 删除第一列的字典,并调整其余字典的列数 | |
| first_col_empty = [entry for entry in ocr_result if | |
| entry['t_logic_box'][2] == 0 and entry['t_logic_box'][3] == 0 and entry['t_ocr_res'][0][ | |
| 1] == ''] | |
| if len(first_col_empty) == len( | |
| [entry for entry in ocr_result if entry['t_logic_box'][2] == 0 and entry['t_logic_box'][3] == 0]): | |
| # 如果第一列的所有单元格都为空,删除第一列 | |
| ocr_result = [entry for entry in ocr_result if entry['t_logic_box'][2] != 0 or entry['t_logic_box'][3] != 0] | |
| first_row_empty = [ | |
| entry | |
| for entry in ocr_result | |
| if entry['t_logic_box'][0] == 0 and entry['t_ocr_res'][0][1] == '' | |
| ] | |
| if len(first_row_empty) == len( | |
| [entry for entry in ocr_result if entry['t_logic_box'][0] == 0] | |
| ): | |
| # 如果第一行的所有单元格都为空,删除第一行 | |
| ocr_result = [ | |
| entry for entry in ocr_result if entry['t_logic_box'][0] != 0 | |
| ] | |
| # 调整剩余字典的行数 | |
| for entry in ocr_result: | |
| entry['t_logic_box'][0] -= 1 | |
| entry['t_logic_box'][1] -= 1 | |
| # 删除第一列的字典,并调整其余字典的列数 | |
| first_col_empty = [ | |
| entry | |
| for entry in ocr_result | |
| if entry['t_logic_box'][2] == 0 and entry['t_ocr_res'][0][1] == '' | |
| ] | |
| if len(first_col_empty) == len( | |
| [entry for entry in ocr_result if entry['t_logic_box'][2] == 0] | |
| ): | |
| # 如果第一列的所有单元格都为空,删除第一列 | |
| ocr_result = [ | |
| entry for entry in ocr_result if entry['t_logic_box'][2] != 0 | |
| ] |
| def adjust_table_cells(self, t_rec_ocr_list_dict): | ||
| """ | ||
| 调整表格单元格,去掉第一行和/或第一列的单元格, | ||
| 并更新剩余单元格的行列起始和结束位置。 | ||
|
|
||
| 参数: | ||
| t_rec_ocr_list_dict (list): 原始表格单元格识别结果,格式为 | ||
| [ | ||
| { | ||
| "t_box": [xmin, ymin, xmax, ymax], | ||
| "t_logic_box": [row_start, row_end, col_start, col_end], | ||
| "t_ocr_res": [[box, text], ...] | ||
| }, | ||
| ... | ||
| ] | ||
|
|
||
| 返回: | ||
| list: 调整后的表格单元格识别结果,格式与输入相同。 | ||
| """ | ||
| # 新的结果列表 | ||
| adjusted_result = [] | ||
|
|
||
| # 记录是否第一行和第一列的单元格已被删除 | ||
| remove_first_row = False | ||
| remove_first_col = False | ||
|
|
||
| # 检查并移除第一行 | ||
| if all(cell and not cell[1] for cell in t_rec_ocr_list_dict[0].get("t_ocr_res", [])): | ||
| remove_first_row = True | ||
|
|
||
| # 检查并移除第一列 | ||
| if all(row.get("t_ocr_res") and not row["t_ocr_res"][0][1] for row in t_rec_ocr_list_dict): | ||
| remove_first_col = True | ||
|
|
||
| # 遍历原始结果进行调整 | ||
| for i, row in enumerate(t_rec_ocr_list_dict): | ||
| adjusted_row = [] | ||
|
|
||
| # 如果是第一行并且需要删除,跳过这行 | ||
| if remove_first_row and i == 0: | ||
| continue | ||
|
|
||
| for j, cell in enumerate(row.get("t_ocr_res", [])): | ||
| # 如果是第一列并且需要删除,跳过这一列 | ||
| if remove_first_col and j == 0: | ||
| continue | ||
|
|
||
| # 更新当前单元格的逻辑位置 | ||
| adjusted_cell = { | ||
| "t_box": row.get("t_box"), | ||
| "t_logic_box": [ | ||
| row["t_logic_box"][0] - 1 if i > 0 else row["t_logic_box"][0], | ||
| row["t_logic_box"][1] - 1 if i > 0 else row["t_logic_box"][1], | ||
| row["t_logic_box"][2] - 1 if j > 0 else row["t_logic_box"][2], | ||
| row["t_logic_box"][3] - 1 if j > 0 else row["t_logic_box"][3] | ||
| ], | ||
| "t_ocr_res": cell | ||
| } | ||
| adjusted_row.append(adjusted_cell) | ||
|
|
||
| if adjusted_row: | ||
| adjusted_result.append(adjusted_row) | ||
|
|
||
| return adjusted_result | ||
|
|
There was a problem hiding this comment.
The adjust_table_cells method is defined but never called (the only call to it, on line 104, is commented out in favor of process_ocr_result). This is dead code that adds unnecessary complexity to the file. It should either be removed or its purpose compared with process_ocr_result to decide which one to keep.
| def adjust_table_cells(self, t_rec_ocr_list_dict): | |
| """ | |
| 调整表格单元格,去掉第一行和/或第一列的单元格, | |
| 并更新剩余单元格的行列起始和结束位置。 | |
| 参数: | |
| t_rec_ocr_list_dict (list): 原始表格单元格识别结果,格式为 | |
| [ | |
| { | |
| "t_box": [xmin, ymin, xmax, ymax], | |
| "t_logic_box": [row_start, row_end, col_start, col_end], | |
| "t_ocr_res": [[box, text], ...] | |
| }, | |
| ... | |
| ] | |
| 返回: | |
| list: 调整后的表格单元格识别结果,格式与输入相同。 | |
| """ | |
| # 新的结果列表 | |
| adjusted_result = [] | |
| # 记录是否第一行和第一列的单元格已被删除 | |
| remove_first_row = False | |
| remove_first_col = False | |
| # 检查并移除第一行 | |
| if all(cell and not cell[1] for cell in t_rec_ocr_list_dict[0].get("t_ocr_res", [])): | |
| remove_first_row = True | |
| # 检查并移除第一列 | |
| if all(row.get("t_ocr_res") and not row["t_ocr_res"][0][1] for row in t_rec_ocr_list_dict): | |
| remove_first_col = True | |
| # 遍历原始结果进行调整 | |
| for i, row in enumerate(t_rec_ocr_list_dict): | |
| adjusted_row = [] | |
| # 如果是第一行并且需要删除,跳过这行 | |
| if remove_first_row and i == 0: | |
| continue | |
| for j, cell in enumerate(row.get("t_ocr_res", [])): | |
| # 如果是第一列并且需要删除,跳过这一列 | |
| if remove_first_col and j == 0: | |
| continue | |
| # 更新当前单元格的逻辑位置 | |
| adjusted_cell = { | |
| "t_box": row.get("t_box"), | |
| "t_logic_box": [ | |
| row["t_logic_box"][0] - 1 if i > 0 else row["t_logic_box"][0], | |
| row["t_logic_box"][1] - 1 if i > 0 else row["t_logic_box"][1], | |
| row["t_logic_box"][2] - 1 if j > 0 else row["t_logic_box"][2], | |
| row["t_logic_box"][3] - 1 if j > 0 else row["t_logic_box"][3] | |
| ], | |
| "t_ocr_res": cell | |
| } | |
| adjusted_row.append(adjusted_cell) | |
| if adjusted_row: | |
| adjusted_result.append(adjusted_row) | |
| return adjusted_result |
可返回json格式结果