feat: implement table cell boxes extraction with SLANeXt
Phase 1-3 implementation of extract-table-cell-boxes proposal: - Add enable_table_cell_boxes_extraction config option - Implement lazy-loaded SLANeXt model caching in PPStructureEnhanced - Add _extract_cell_boxes_with_slanet() method for direct model invocation - Supplement PPStructureV3 table processing with SLANeXt cell boxes - Add _compute_table_grid_from_cell_boxes() for column width calculation - Modify draw_table_region() to use cell_boxes for accurate layout Key features: - Auto-detect table type (wired/wireless) using PP-LCNet classifier - Convert 8-point polygon bbox to 4-point rectangle - Graceful fallback to equal distribution when cell_boxes unavailable - Proper coordinate transformation with scaling support 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -161,6 +161,14 @@ class Settings(BaseSettings):
|
||||
description="Cell detection model for borderless tables. RT-DETR-L provides best accuracy."
|
||||
)
|
||||
|
||||
# Table Cell Boxes Extraction - supplement PPStructureV3 with direct SLANeXt calls
|
||||
# When enabled, directly invokes SLANeXt models to extract cell bounding boxes
|
||||
# which are not exposed by the PPStructureV3 high-level API
|
||||
enable_table_cell_boxes_extraction: bool = Field(
|
||||
default=True,
|
||||
description="Enable direct SLANeXt model calls to extract table cell bounding boxes for accurate PDF layout."
|
||||
)
|
||||
|
||||
# Formula Recognition Model Configuration (Stage 4)
|
||||
# Available models:
|
||||
# - "PP-FormulaNet_plus-L": Best for Chinese formulas (90.64% Chinese, 92.22% English BLEU)
|
||||
|
||||
Reference in New Issue
Block a user