chore: archive upgrade-ppstructure-models proposal
Archived as 2025-11-27-upgrade-ppstructure-models Spec updated: ocr-processing (added PP-StructureV3 Configuration) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,141 @@
|
||||
# PP-StructureV3 Model Cache Cleanup Guide
|
||||
|
||||
## Overview
|
||||
|
||||
After upgrading PP-StructureV3 models, older unused models may remain in the cache directory. This guide explains how to safely remove them to free disk space.
|
||||
|
||||
## Model Cache Location
|
||||
|
||||
PaddleX/PaddleOCR 3.x stores downloaded models in:
|
||||
|
||||
```
|
||||
~/.paddlex/official_models/
|
||||
```
|
||||
|
||||
## Models After Upgrade
|
||||
|
||||
### Current Active Models (DO NOT DELETE)
|
||||
|
||||
| Model | Purpose | Approx. Size |
|
||||
|-------|---------|--------------|
|
||||
| `PP-DocLayout_plus-L` | Layout detection for Chinese documents | ~350MB |
|
||||
| `SLANeXt_wired` | Table structure recognition (bordered tables) | ~351MB |
|
||||
| `SLANeXt_wireless` | Table structure recognition (borderless tables) | ~351MB |
|
||||
| `PP-FormulaNet_plus-L` | Formula recognition (Chinese + English) | ~800MB |
|
||||
| `PP-OCRv5_*` | Text detection and recognition | ~150MB |
|
||||
| `picodet_lcnet_x1_0_fgd_layout_cdla` | CDLA layout model option | ~10MB |
|
||||
|
||||
### Deprecated Models (Safe to Delete)
|
||||
|
||||
| Model | Reason | Approx. Size |
|
||||
|-------|--------|--------------|
|
||||
| `PP-DocLayout-S` | Replaced by PP-DocLayout_plus-L | ~50MB |
|
||||
| `SLANet` | Replaced by SLANeXt_wired/wireless | ~7MB |
|
||||
| `SLANet_plus` | Replaced by SLANeXt_wired/wireless | ~7MB |
|
||||
| `PP-FormulaNet-S` | Replaced by PP-FormulaNet_plus-L | ~200MB |
|
||||
| `PP-FormulaNet-L` | Replaced by PP-FormulaNet_plus-L | ~400MB |
|
||||
|
||||
## Cleanup Commands
|
||||
|
||||
### List Current Cache
|
||||
|
||||
```bash
|
||||
# List all cached models
|
||||
ls -la ~/.paddlex/official_models/
|
||||
|
||||
# Show disk usage per model
|
||||
du -sh ~/.paddlex/official_models/*
|
||||
```
|
||||
|
||||
### Delete Deprecated Models
|
||||
|
||||
```bash
|
||||
# Remove deprecated layout model
|
||||
rm -rf ~/.paddlex/official_models/PP-DocLayout-S
|
||||
|
||||
# Remove deprecated table models
|
||||
rm -rf ~/.paddlex/official_models/SLANet
|
||||
rm -rf ~/.paddlex/official_models/SLANet_plus
|
||||
|
||||
# Remove deprecated formula models (if present)
|
||||
rm -rf ~/.paddlex/official_models/PP-FormulaNet-S
|
||||
rm -rf ~/.paddlex/official_models/PP-FormulaNet-L
|
||||
```
|
||||
|
||||
### Cleanup Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# cleanup_old_models.sh - Remove deprecated PP-StructureV3 models
|
||||
|
||||
CACHE_DIR="$HOME/.paddlex/official_models"
|
||||
|
||||
echo "PP-StructureV3 Model Cleanup"
|
||||
echo "============================"
|
||||
echo ""
|
||||
|
||||
# Check if cache directory exists
|
||||
if [ ! -d "$CACHE_DIR" ]; then
|
||||
echo "Cache directory not found: $CACHE_DIR"
|
||||
exit 0
|
||||
fi
|
||||
|
||||
# List deprecated models
|
||||
DEPRECATED_MODELS=(
|
||||
"PP-DocLayout-S"
|
||||
"SLANet"
|
||||
"SLANet_plus"
|
||||
"PP-FormulaNet-S"
|
||||
"PP-FormulaNet-L"
|
||||
)
|
||||
|
||||
echo "Checking for deprecated models..."
|
||||
echo ""
|
||||
|
||||
TOTAL_SIZE=0
|
||||
for model in "${DEPRECATED_MODELS[@]}"; do
|
||||
MODEL_PATH="$CACHE_DIR/$model"
|
||||
if [ -d "$MODEL_PATH" ]; then
|
||||
SIZE=$(du -sh "$MODEL_PATH" 2>/dev/null | cut -f1)
|
||||
echo "Found: $model ($SIZE)"
|
||||
TOTAL_SIZE=$((TOTAL_SIZE + 1))
|
||||
fi
|
||||
done
|
||||
|
||||
if [ $TOTAL_SIZE -eq 0 ]; then
|
||||
echo "No deprecated models found. Cache is clean."
|
||||
exit 0
|
||||
fi
|
||||
|
||||
echo ""
|
||||
read -p "Delete these models? [y/N]: " confirm
|
||||
|
||||
if [ "$confirm" = "y" ] || [ "$confirm" = "Y" ]; then
|
||||
for model in "${DEPRECATED_MODELS[@]}"; do
|
||||
MODEL_PATH="$CACHE_DIR/$model"
|
||||
if [ -d "$MODEL_PATH" ]; then
|
||||
rm -rf "$MODEL_PATH"
|
||||
echo "Deleted: $model"
|
||||
fi
|
||||
done
|
||||
echo ""
|
||||
echo "Cleanup complete."
|
||||
else
|
||||
echo "Cleanup cancelled."
|
||||
fi
|
||||
```
|
||||
|
||||
## Space Savings Estimate
|
||||
|
||||
After cleanup, you can expect to free approximately:
|
||||
- **~65MB** from deprecated layout model
|
||||
- **~14MB** from deprecated table models
|
||||
- **~600MB** from deprecated formula models (if present)
|
||||
|
||||
Total potential savings: **~680MB**
|
||||
|
||||
## Notes
|
||||
|
||||
1. Models are downloaded on first use. Deleting active models will trigger re-download.
|
||||
2. The cache directory may vary if `PADDLEX_HOME` environment variable is set.
|
||||
3. Always verify which models your configuration uses before deleting.
|
||||
Reference in New Issue
Block a user