Files
OCR/openspec/changes/archive/2025-11-27-upgrade-ppstructure-models/MODEL_CLEANUP.md
egg 5448a047ff chore: archive upgrade-ppstructure-models proposal
Archived as 2025-11-27-upgrade-ppstructure-models
Spec updated: ocr-processing (added PP-StructureV3 Configuration)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 14:22:33 +08:00

3.7 KiB

PP-StructureV3 Model Cache Cleanup Guide

Overview

After upgrading PP-StructureV3 models, older unused models may remain in the cache directory. This guide explains how to safely remove them to free disk space.

Model Cache Location

PaddleX/PaddleOCR 3.x stores downloaded models in:

~/.paddlex/official_models/

Models After Upgrade

Current Active Models (DO NOT DELETE)

Model Purpose Approx. Size
PP-DocLayout_plus-L Layout detection for Chinese documents ~350MB
SLANeXt_wired Table structure recognition (bordered tables) ~351MB
SLANeXt_wireless Table structure recognition (borderless tables) ~351MB
PP-FormulaNet_plus-L Formula recognition (Chinese + English) ~800MB
PP-OCRv5_* Text detection and recognition ~150MB
picodet_lcnet_x1_0_fgd_layout_cdla CDLA layout model option ~10MB

Deprecated Models (Safe to Delete)

Model Reason Approx. Size
PP-DocLayout-S Replaced by PP-DocLayout_plus-L ~50MB
SLANet Replaced by SLANeXt_wired/wireless ~7MB
SLANet_plus Replaced by SLANeXt_wired/wireless ~7MB
PP-FormulaNet-S Replaced by PP-FormulaNet_plus-L ~200MB
PP-FormulaNet-L Replaced by PP-FormulaNet_plus-L ~400MB

Cleanup Commands

List Current Cache

# List all cached models
ls -la ~/.paddlex/official_models/

# Show disk usage per model
du -sh ~/.paddlex/official_models/*

Delete Deprecated Models

# Remove deprecated layout model
rm -rf ~/.paddlex/official_models/PP-DocLayout-S

# Remove deprecated table models
rm -rf ~/.paddlex/official_models/SLANet
rm -rf ~/.paddlex/official_models/SLANet_plus

# Remove deprecated formula models (if present)
rm -rf ~/.paddlex/official_models/PP-FormulaNet-S
rm -rf ~/.paddlex/official_models/PP-FormulaNet-L

Cleanup Script

#!/bin/bash
# cleanup_old_models.sh - Remove deprecated PP-StructureV3 models

CACHE_DIR="$HOME/.paddlex/official_models"

echo "PP-StructureV3 Model Cleanup"
echo "============================"
echo ""

# Check if cache directory exists
if [ ! -d "$CACHE_DIR" ]; then
    echo "Cache directory not found: $CACHE_DIR"
    exit 0
fi

# List deprecated models
DEPRECATED_MODELS=(
    "PP-DocLayout-S"
    "SLANet"
    "SLANet_plus"
    "PP-FormulaNet-S"
    "PP-FormulaNet-L"
)

echo "Checking for deprecated models..."
echo ""

TOTAL_SIZE=0
for model in "${DEPRECATED_MODELS[@]}"; do
    MODEL_PATH="$CACHE_DIR/$model"
    if [ -d "$MODEL_PATH" ]; then
        SIZE=$(du -sh "$MODEL_PATH" 2>/dev/null | cut -f1)
        echo "Found: $model ($SIZE)"
        TOTAL_SIZE=$((TOTAL_SIZE + 1))
    fi
done

if [ $TOTAL_SIZE -eq 0 ]; then
    echo "No deprecated models found. Cache is clean."
    exit 0
fi

echo ""
read -p "Delete these models? [y/N]: " confirm

if [ "$confirm" = "y" ] || [ "$confirm" = "Y" ]; then
    for model in "${DEPRECATED_MODELS[@]}"; do
        MODEL_PATH="$CACHE_DIR/$model"
        if [ -d "$MODEL_PATH" ]; then
            rm -rf "$MODEL_PATH"
            echo "Deleted: $model"
        fi
    done
    echo ""
    echo "Cleanup complete."
else
    echo "Cleanup cancelled."
fi

Space Savings Estimate

After cleanup, you can expect to free approximately:

  • ~65MB from deprecated layout model
  • ~14MB from deprecated table models
  • ~600MB from deprecated formula models (if present)

Total potential savings: ~680MB

Notes

  1. Models are downloaded on first use. Deleting active models will trigger re-download.
  2. The cache directory may vary if PADDLEX_HOME environment variable is set.
  3. Always verify which models your configuration uses before deleting.