Archived as 2025-11-27-upgrade-ppstructure-models Spec updated: ocr-processing (added PP-StructureV3 Configuration) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
142 lines
3.7 KiB
Markdown
142 lines
3.7 KiB
Markdown
# PP-StructureV3 Model Cache Cleanup Guide
|
|
|
|
## Overview
|
|
|
|
After upgrading PP-StructureV3 models, older unused models may remain in the cache directory. This guide explains how to safely remove them to free disk space.
|
|
|
|
## Model Cache Location
|
|
|
|
PaddleX/PaddleOCR 3.x stores downloaded models in:
|
|
|
|
```
|
|
~/.paddlex/official_models/
|
|
```
|
|
|
|
## Models After Upgrade
|
|
|
|
### Current Active Models (DO NOT DELETE)
|
|
|
|
| Model | Purpose | Approx. Size |
|
|
|-------|---------|--------------|
|
|
| `PP-DocLayout_plus-L` | Layout detection for Chinese documents | ~350MB |
|
|
| `SLANeXt_wired` | Table structure recognition (bordered tables) | ~351MB |
|
|
| `SLANeXt_wireless` | Table structure recognition (borderless tables) | ~351MB |
|
|
| `PP-FormulaNet_plus-L` | Formula recognition (Chinese + English) | ~800MB |
|
|
| `PP-OCRv5_*` | Text detection and recognition | ~150MB |
|
|
| `picodet_lcnet_x1_0_fgd_layout_cdla` | CDLA layout model option | ~10MB |
|
|
|
|
### Deprecated Models (Safe to Delete)
|
|
|
|
| Model | Reason | Approx. Size |
|
|
|-------|--------|--------------|
|
|
| `PP-DocLayout-S` | Replaced by PP-DocLayout_plus-L | ~50MB |
|
|
| `SLANet` | Replaced by SLANeXt_wired/wireless | ~7MB |
|
|
| `SLANet_plus` | Replaced by SLANeXt_wired/wireless | ~7MB |
|
|
| `PP-FormulaNet-S` | Replaced by PP-FormulaNet_plus-L | ~200MB |
|
|
| `PP-FormulaNet-L` | Replaced by PP-FormulaNet_plus-L | ~400MB |
|
|
|
|
## Cleanup Commands
|
|
|
|
### List Current Cache
|
|
|
|
```bash
|
|
# List all cached models
|
|
ls -la ~/.paddlex/official_models/
|
|
|
|
# Show disk usage per model
|
|
du -sh ~/.paddlex/official_models/*
|
|
```
|
|
|
|
### Delete Deprecated Models
|
|
|
|
```bash
|
|
# Remove deprecated layout model
|
|
rm -rf ~/.paddlex/official_models/PP-DocLayout-S
|
|
|
|
# Remove deprecated table models
|
|
rm -rf ~/.paddlex/official_models/SLANet
|
|
rm -rf ~/.paddlex/official_models/SLANet_plus
|
|
|
|
# Remove deprecated formula models (if present)
|
|
rm -rf ~/.paddlex/official_models/PP-FormulaNet-S
|
|
rm -rf ~/.paddlex/official_models/PP-FormulaNet-L
|
|
```
|
|
|
|
### Cleanup Script
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# cleanup_old_models.sh - Remove deprecated PP-StructureV3 models
|
|
|
|
CACHE_DIR="$HOME/.paddlex/official_models"
|
|
|
|
echo "PP-StructureV3 Model Cleanup"
|
|
echo "============================"
|
|
echo ""
|
|
|
|
# Check if cache directory exists
|
|
if [ ! -d "$CACHE_DIR" ]; then
|
|
echo "Cache directory not found: $CACHE_DIR"
|
|
exit 0
|
|
fi
|
|
|
|
# List deprecated models
|
|
DEPRECATED_MODELS=(
|
|
"PP-DocLayout-S"
|
|
"SLANet"
|
|
"SLANet_plus"
|
|
"PP-FormulaNet-S"
|
|
"PP-FormulaNet-L"
|
|
)
|
|
|
|
echo "Checking for deprecated models..."
|
|
echo ""
|
|
|
|
TOTAL_SIZE=0
|
|
for model in "${DEPRECATED_MODELS[@]}"; do
|
|
MODEL_PATH="$CACHE_DIR/$model"
|
|
if [ -d "$MODEL_PATH" ]; then
|
|
SIZE=$(du -sh "$MODEL_PATH" 2>/dev/null | cut -f1)
|
|
echo "Found: $model ($SIZE)"
|
|
TOTAL_SIZE=$((TOTAL_SIZE + 1))
|
|
fi
|
|
done
|
|
|
|
if [ $TOTAL_SIZE -eq 0 ]; then
|
|
echo "No deprecated models found. Cache is clean."
|
|
exit 0
|
|
fi
|
|
|
|
echo ""
|
|
read -p "Delete these models? [y/N]: " confirm
|
|
|
|
if [ "$confirm" = "y" ] || [ "$confirm" = "Y" ]; then
|
|
for model in "${DEPRECATED_MODELS[@]}"; do
|
|
MODEL_PATH="$CACHE_DIR/$model"
|
|
if [ -d "$MODEL_PATH" ]; then
|
|
rm -rf "$MODEL_PATH"
|
|
echo "Deleted: $model"
|
|
fi
|
|
done
|
|
echo ""
|
|
echo "Cleanup complete."
|
|
else
|
|
echo "Cleanup cancelled."
|
|
fi
|
|
```
|
|
|
|
## Space Savings Estimate
|
|
|
|
After cleanup, you can expect to free approximately:
|
|
- **~65MB** from deprecated layout model
|
|
- **~14MB** from deprecated table models
|
|
- **~600MB** from deprecated formula models (if present)
|
|
|
|
Total potential savings: **~680MB**
|
|
|
|
## Notes
|
|
|
|
1. Models are downloaded on first use. Deleting active models will trigger re-download.
|
|
2. The cache directory may vary if `PADDLEX_HOME` environment variable is set.
|
|
3. Always verify which models your configuration uses before deleting.
|