Files
OCR/openspec/changes/upgrade-ppstructure-models/MODEL_CLEANUP.md
egg 6235280c45 feat: upgrade PP-StructureV3 models to latest versions
- Layout: PP-DocLayout-S → PP-DocLayout_plus-L (83.2% mAP)
- Table: Single model → Dual SLANeXt (wired/wireless)
- Formula: PP-FormulaNet_plus-L for enhanced recognition
- Add preprocessing flags support (orientation, unwarping)
- Update frontend i18n descriptions

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
2025-11-27 14:22:06 +08:00

142 lines
3.7 KiB
Markdown

# PP-StructureV3 Model Cache Cleanup Guide
## Overview
After upgrading PP-StructureV3 models, older unused models may remain in the cache directory. This guide explains how to safely remove them to free disk space.
## Model Cache Location
PaddleX/PaddleOCR 3.x stores downloaded models in:
```
~/.paddlex/official_models/
```
## Models After Upgrade
### Current Active Models (DO NOT DELETE)
| Model | Purpose | Approx. Size |
|-------|---------|--------------|
| `PP-DocLayout_plus-L` | Layout detection for Chinese documents | ~350MB |
| `SLANeXt_wired` | Table structure recognition (bordered tables) | ~351MB |
| `SLANeXt_wireless` | Table structure recognition (borderless tables) | ~351MB |
| `PP-FormulaNet_plus-L` | Formula recognition (Chinese + English) | ~800MB |
| `PP-OCRv5_*` | Text detection and recognition | ~150MB |
| `picodet_lcnet_x1_0_fgd_layout_cdla` | CDLA layout model option | ~10MB |
### Deprecated Models (Safe to Delete)
| Model | Reason | Approx. Size |
|-------|--------|--------------|
| `PP-DocLayout-S` | Replaced by PP-DocLayout_plus-L | ~50MB |
| `SLANet` | Replaced by SLANeXt_wired/wireless | ~7MB |
| `SLANet_plus` | Replaced by SLANeXt_wired/wireless | ~7MB |
| `PP-FormulaNet-S` | Replaced by PP-FormulaNet_plus-L | ~200MB |
| `PP-FormulaNet-L` | Replaced by PP-FormulaNet_plus-L | ~400MB |
## Cleanup Commands
### List Current Cache
```bash
# List all cached models
ls -la ~/.paddlex/official_models/
# Show disk usage per model
du -sh ~/.paddlex/official_models/*
```
### Delete Deprecated Models
```bash
# Remove deprecated layout model
rm -rf ~/.paddlex/official_models/PP-DocLayout-S
# Remove deprecated table models
rm -rf ~/.paddlex/official_models/SLANet
rm -rf ~/.paddlex/official_models/SLANet_plus
# Remove deprecated formula models (if present)
rm -rf ~/.paddlex/official_models/PP-FormulaNet-S
rm -rf ~/.paddlex/official_models/PP-FormulaNet-L
```
### Cleanup Script
```bash
#!/bin/bash
# cleanup_old_models.sh - Remove deprecated PP-StructureV3 models
CACHE_DIR="$HOME/.paddlex/official_models"
echo "PP-StructureV3 Model Cleanup"
echo "============================"
echo ""
# Check if cache directory exists
if [ ! -d "$CACHE_DIR" ]; then
echo "Cache directory not found: $CACHE_DIR"
exit 0
fi
# List deprecated models
DEPRECATED_MODELS=(
"PP-DocLayout-S"
"SLANet"
"SLANet_plus"
"PP-FormulaNet-S"
"PP-FormulaNet-L"
)
echo "Checking for deprecated models..."
echo ""
TOTAL_SIZE=0
for model in "${DEPRECATED_MODELS[@]}"; do
MODEL_PATH="$CACHE_DIR/$model"
if [ -d "$MODEL_PATH" ]; then
SIZE=$(du -sh "$MODEL_PATH" 2>/dev/null | cut -f1)
echo "Found: $model ($SIZE)"
TOTAL_SIZE=$((TOTAL_SIZE + 1))
fi
done
if [ $TOTAL_SIZE -eq 0 ]; then
echo "No deprecated models found. Cache is clean."
exit 0
fi
echo ""
read -p "Delete these models? [y/N]: " confirm
if [ "$confirm" = "y" ] || [ "$confirm" = "Y" ]; then
for model in "${DEPRECATED_MODELS[@]}"; do
MODEL_PATH="$CACHE_DIR/$model"
if [ -d "$MODEL_PATH" ]; then
rm -rf "$MODEL_PATH"
echo "Deleted: $model"
fi
done
echo ""
echo "Cleanup complete."
else
echo "Cleanup cancelled."
fi
```
## Space Savings Estimate
After cleanup, you can expect to free approximately:
- **~65MB** from deprecated layout model
- **~14MB** from deprecated table models
- **~600MB** from deprecated formula models (if present)
Total potential savings: **~680MB**
## Notes
1. Models are downloaded on first use. Deleting active models will trigger re-download.
2. The cache directory may vary if `PADDLEX_HOME` environment variable is set.
3. Always verify which models your configuration uses before deleting.