Initial commit with Llama API client and docs

Add Python scripts for Llama API chat clients, endpoint testing, and quick tests. Include documentation (README, CONTRIBUTING, 操作指南), license, and .gitignore. Supports multiple endpoints and models for OpenAI-compatible Llama API usage.
2025-09-19 21:44:02 +08:00
parent 4e28c131d2
commit 8a929936ad
18 changed files with 2073 additions and 0 deletions
--- a/.claude/settings.local.json
+++ b/.claude/settings.local.json
@@ -0,0 +1,15 @@
+{
+  "permissions": {
+    "allow": [
+      "Bash(pip install:*)",
+      "Bash(python:*)",
+      "Bash(ping:*)",
+      "Bash(curl:*)",
+      "Bash(dir)",
+      "Bash(git init:*)",
+      "Bash(git add:*)",
+      "Bash(git commit:*)"
+    ],
+    "defaultMode": "acceptEdits"
+  }
+}
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,102 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+share/python-wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+# PyInstaller
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+cover/
+
+# Virtual environments
+venv/
+ENV/
+env/
+.venv/
+.env
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+*~
+.DS_Store
+
+# Project specific
+*.log
+*.tmp
+temp/
+tmp/
+logs/
+output/
+
+# API keys and secrets (if stored in separate config)
+config.ini
+secrets.json
+.env.local
+.env.production
+
+# Test outputs
+test_results/
+*.test.txt
+
+# Backup files
+*.bak
+*.backup
+*.old
+
+# Windows
+Thumbs.db
+ehthumbs.db
+Desktop.ini
+
+# macOS
+.DS_Store
+.AppleDouble
+.LSOverride
+
+# Linux
+.directory
+.Trash-*
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -0,0 +1,196 @@
+# Contributing to Llama API Client
+
+Thank you for your interest in contributing to Llama API Client! This document provides guidelines for contributing to the project.
+
+## How to Contribute
+
+### Reporting Bugs
+
+Before creating bug reports, please check existing issues to avoid duplicates. When creating a bug report, include:
+
+- A clear and descriptive title
+- Steps to reproduce the issue
+- Expected behavior
+- Actual behavior
+- System information (OS, Python version, etc.)
+- Error messages or logs
+
+### Suggesting Enhancements
+
+Enhancement suggestions are welcome! Please provide:
+
+- A clear and descriptive title
+- Detailed description of the proposed feature
+- Use cases and benefits
+- Possible implementation approach
+
+### Pull Requests
+
+1. **Fork the repository** and create your branch from `main`
+2. **Follow the coding style** used in the project
+3. **Write clear commit messages**
+4. **Add tests** if applicable
+5. **Update documentation** if needed
+6. **Test your changes** thoroughly
+
+## Development Setup
+
+```bash
+# Clone your fork
+git clone https://github.com/yourusername/llama-api-client.git
+cd llama-api-client
+
+# Create virtual environment
+python -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+
+# Install dependencies
+pip install -r requirements.txt
+
+# Run tests
+python quick_test.py
+```
+
+## Coding Standards
+
+### Python Style Guide
+
+- Follow PEP 8
+- Use meaningful variable names
+- Add docstrings to functions and classes
+- Keep functions focused and small
+- Handle exceptions appropriately
+
+### Example Code Style
+
+```python
+def clean_response(text: str) -> str:
+    """
+    Clean AI response by removing special markers.
+    
+    Args:
+        text: Raw response text from AI
+        
+    Returns:
+        Cleaned text without special markers
+    """
+    # Implementation here
+    return cleaned_text
+```
+
+### Commit Message Format
+
+Use clear and descriptive commit messages:
+
+- `feat:` New feature
+- `fix:` Bug fix
+- `docs:` Documentation changes
+- `style:` Code style changes
+- `refactor:` Code refactoring
+- `test:` Test additions or changes
+- `chore:` Maintenance tasks
+
+Examples:
+```
+feat: Add support for new model endpoint
+fix: Handle encoding errors in Windows terminals
+docs: Update README with troubleshooting section
+```
+
+## Testing
+
+### Running Tests
+
+```bash
+# Quick connection test
+python quick_test.py
+
+# Test all models
+python test_all_models.py
+
+# Test specific endpoint
+python local_api_test.py
+```
+
+### Writing Tests
+
+When adding new features, include appropriate tests:
+
+```python
+def test_endpoint_connection():
+    """Test if endpoint is reachable"""
+    assert test_endpoint({"url": "...", "models": ["..."]})
+```
+
+## Documentation
+
+- Update README.md for user-facing changes
+- Update 操作指南.md for Chinese documentation
+- Add docstrings to all public functions
+- Include usage examples for new features
+
+## Code Review Process
+
+1. All submissions require review before merging
+2. Reviews focus on:
+   - Code quality and style
+   - Test coverage
+   - Documentation completeness
+   - Performance implications
+   - Security considerations
+
+## Areas for Contribution
+
+### Current Needs
+
+- [ ] Add retry logic for failed connections
+- [ ] Implement connection pooling
+- [ ] Add streaming response support
+- [ ] Create GUI interface
+- [ ] Add conversation export/import
+- [ ] Implement rate limiting
+- [ ] Add proxy support
+- [ ] Create Docker container
+- [ ] Add more language examples
+- [ ] Improve error messages
+
+### Future Features
+
+- Web interface
+- Mobile app support
+- Voice input/output
+- Multi-user support
+- Analytics dashboard
+- Plugin system
+
+## Community
+
+### Communication Channels
+
+- GitHub Issues: Bug reports and feature requests
+- GitHub Discussions: General questions and discussions
+- Pull Requests: Code contributions
+
+### Code of Conduct
+
+- Be respectful and inclusive
+- Welcome newcomers
+- Provide constructive feedback
+- Focus on what is best for the community
+- Show empathy towards others
+
+## Questions?
+
+If you have questions about contributing, feel free to:
+
+1. Open an issue with the `question` label
+2. Check existing documentation
+3. Review closed issues for similar questions
+
+## License
+
+By contributing, you agree that your contributions will be licensed under the MIT License.
+
+---
+
+Thank you for contributing to Llama API Client! 🚀
--- a/21
+++ b/21
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2025 Llama API Client Contributors
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
--- a/README.md
+++ b/README.md
@@ -0,0 +1,201 @@
+# Llama API Client
+
+A Python client for connecting to Llama AI models through OpenAI-compatible API endpoints.
+
+## Features
+
+- 🌐 Support for both internal network and external API endpoints
+- 🤖 Multiple model support (GPT-OSS-120B, DeepSeek-R1-671B, Qwen3-Embedding-8B)
+- 💬 Interactive chat interface with conversation history
+- 🔄 Automatic endpoint testing and failover
+- 🧹 Automatic response cleaning (removes thinking tags and special markers)
+- 📝 Full conversation context management
+
+## Quick Start
+
+### Installation
+
+```bash
+# Clone the repository
+git clone https://github.com/yourusername/llama-api-client.git
+cd llama-api-client
+
+# Install dependencies
+pip install -r requirements.txt
+```
+
+### Basic Usage
+
+```python
+from openai import OpenAI
+
+# Configure API
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+BASE_URL = "http://192.168.0.6:21180/v1"
+
+# Create client
+client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
+
+# Send request
+response = client.chat.completions.create(
+    model="gpt-oss-120b",
+    messages=[{"role": "user", "content": "Hello!"}],
+    temperature=0.7,
+    max_tokens=200
+)
+
+print(response.choices[0].message.content)
+```
+
+### Run Interactive Chat
+
+```bash
+# Full-featured chat with all endpoints
+python llama_full_api.py
+
+# Internal network only
+python llama_chat.py
+
+# Quick test
+python quick_test.py
+```
+
+## Available Endpoints
+
+### Internal Network (Tested & Working ✅)
+
+| Endpoint | URL | Status |
+|----------|-----|--------|
+| Internal 1 | `http://192.168.0.6:21180/v1` | ✅ Working |
+| Internal 2 | `http://192.168.0.6:21181/v1` | ✅ Working |
+| Internal 3 | `http://192.168.0.6:21182/v1` | ✅ Working |
+| Internal 4 | `http://192.168.0.6:21183/v1` | ❌ Error 500 |
+
+### External Network
+
+| Endpoint | URL | Status |
+|----------|-----|--------|
+| GPT-OSS | `https://llama.theaken.com/v1/gpt-oss-120b` | 🔄 Pending |
+| DeepSeek | `https://llama.theaken.com/v1/deepseek-r1-671b` | 🔄 Pending |
+| General | `https://llama.theaken.com/v1` | 🔄 Pending |
+
+## Project Structure
+
+```
+llama-api-client/
+├── README.md                 # This file
+├── requirements.txt          # Python dependencies
+├── 操作指南.md               # Chinese operation guide
+├── llama_full_api.py        # Full-featured chat client
+├── llama_chat.py            # Internal network chat client
+├── local_api_test.py        # Endpoint testing tool
+├── quick_test.py            # Quick connection test
+├── test_all_models.py       # Model testing script
+└── demo_chat.py             # Demo chat with fallback
+```
+
+## Chat Commands
+
+During chat sessions, you can use these commands:
+
+- `exit` or `quit` - End the conversation
+- `clear` - Clear conversation history
+- `model` - Switch between available models
+
+## Configuration
+
+### API Key
+```python
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+```
+
+### Available Models
+- `gpt-oss-120b` - GPT Open Source 120B parameters
+- `deepseek-r1-671b` - DeepSeek R1 671B parameters
+- `qwen3-embedding-8b` - Qwen3 Embedding 8B parameters
+
+## Troubleshooting
+
+### Issue: 502 Bad Gateway
+**Cause**: External API server is offline  
+**Solution**: Use internal network endpoints
+
+### Issue: Connection Error
+**Cause**: Not on internal network or incorrect IP  
+**Solution**: 
+1. Verify network connectivity: `ping 192.168.0.6`
+2. Check firewall settings
+3. Ensure you're on the same network
+
+### Issue: Encoding Error
+**Cause**: Windows terminal encoding issues  
+**Solution**: Use English for conversations or modify terminal encoding
+
+### Issue: Response Contains Special Markers
+**Description**: Responses may contain `<think>`, `<|channel|>` tags  
+**Solution**: The client automatically removes these markers
+
+## Response Cleaning
+
+The client automatically removes these special markers from AI responses:
+- `<think>...</think>` - Thinking process
+- `<|channel|>...<|message|>` - Channel markers
+- `<|end|>`, `<|start|>` - End/start markers
+
+## Requirements
+
+- Python 3.7+
+- openai>=1.0.0
+- requests (optional, for direct API calls)
+
+## Development
+
+### Testing Connection
+```python
+python -c "from openai import OpenAI; client = OpenAI(api_key='YOUR_KEY', base_url='YOUR_URL'); print(client.chat.completions.create(model='gpt-oss-120b', messages=[{'role': 'user', 'content': 'test'}], max_tokens=5).choices[0].message.content)"
+```
+
+### Adding New Endpoints
+Edit `ENDPOINTS` dictionary in `llama_full_api.py`:
+```python
+ENDPOINTS = {
+    "internal": [
+        {
+            "name": "New Endpoint",
+            "url": "http://new-endpoint/v1",
+            "models": ["gpt-oss-120b"]
+        }
+    ]
+}
+```
+
+## License
+
+MIT License - See LICENSE file for details
+
+## Contributing
+
+1. Fork the repository
+2. Create your feature branch (`git checkout -b feature/amazing-feature`)
+3. Commit your changes (`git commit -m 'Add amazing feature'`)
+4. Push to the branch (`git push origin feature/amazing-feature`)
+5. Open a Pull Request
+
+## Support
+
+For issues or questions:
+1. Check the [操作指南.md](操作指南.md) for detailed Chinese documentation
+2. Open an issue on GitHub
+3. Contact the API administrator for server-related issues
+
+## Acknowledgments
+
+- Built with OpenAI Python SDK
+- Compatible with OpenAI API format
+- Supports multiple Llama model variants
+
+---
+
+**Last Updated**: 2025-09-19  
+**Version**: 1.0.0  
+**Status**: Internal endpoints working, external endpoints pending
--- a/demo_chat.py
+++ b/demo_chat.py
@@ -0,0 +1,124 @@
+"""
+Llama API 對話程式 (示範版本)
+當 API 伺服器恢復後，可以使用此程式進行對話
+"""
+
+from openai import OpenAI
+import time
+
+# API 設定
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+BASE_URL = "https://llama.theaken.com/v1"
+
+def simulate_chat():
+    """模擬對話功能（用於展示）"""
+    print("\n" + "="*50)
+    print("Llama AI 對話系統 - 示範模式")
+    print("="*50)
+    print("\n[注意] API 伺服器目前離線，以下為模擬對話")
+    print("當伺服器恢復後，將自動連接真實 API\n")
+    
+    # 模擬回應
+    demo_responses = [
+        "你好！我是 Llama AI 助手，很高興為你服務。",
+        "這是一個示範回應。當 API 伺服器恢復後，你將收到真實的 AI 回應。",
+        "我可以回答問題、協助編程、翻譯文字等多種任務。",
+        "請問有什麼我可以幫助你的嗎？"
+    ]
+    
+    response_index = 0
+    print("輸入 'exit' 結束對話\n")
+    
+    while True:
+        user_input = input("你: ").strip()
+        
+        if user_input.lower() in ['exit', 'quit']:
+            print("\n再見！")
+            break
+            
+        if not user_input:
+            continue
+        
+        # 模擬思考時間
+        print("\nAI 思考中", end="")
+        for _ in range(3):
+            time.sleep(0.3)
+            print(".", end="", flush=True)
+        print()
+        
+        # 顯示模擬回應
+        print(f"\nAI: {demo_responses[response_index % len(demo_responses)]}")
+        response_index += 1
+
+def real_chat():
+    """實際對話功能（當 API 可用時）"""
+    client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
+    
+    print("\n" + "="*50)
+    print("Llama AI 對話系統")
+    print("="*50)
+    print("\n已連接到 Llama API")
+    print("輸入 'exit' 結束對話\n")
+    
+    messages = []
+    
+    while True:
+        user_input = input("你: ").strip()
+        
+        if user_input.lower() in ['exit', 'quit']:
+            print("\n再見！")
+            break
+            
+        if not user_input:
+            continue
+            
+        messages.append({"role": "user", "content": user_input})
+        
+        try:
+            print("\nAI 思考中...")
+            response = client.chat.completions.create(
+                model="gpt-oss-120b",
+                messages=messages,
+                temperature=0.7,
+                max_tokens=1000
+            )
+            
+            ai_response = response.choices[0].message.content
+            print(f"\nAI: {ai_response}")
+            messages.append({"role": "assistant", "content": ai_response})
+            
+        except Exception as e:
+            print(f"\n[錯誤] {str(e)[:100]}")
+            print("無法取得回應，請稍後再試")
+
+def main():
+    print("檢查 API 連接狀態...")
+    
+    # 嘗試連接 API
+    try:
+        client = OpenAI(api_key=API_KEY, base_url=BASE_URL)
+        
+        # 快速測試
+        response = client.chat.completions.create(
+            model="gpt-oss-120b",
+            messages=[{"role": "user", "content": "test"}],
+            max_tokens=10,
+            timeout=5
+        )
+        print("[成功] API 已連接")
+        real_chat()
+        
+    except Exception as e:
+        error_msg = str(e)
+        if "502" in error_msg or "Bad gateway" in error_msg:
+            print("[提示] API 伺服器目前離線 (502 錯誤)")
+            print("進入示範模式...")
+            simulate_chat()
+        else:
+            print(f"[錯誤] 無法連接: {error_msg[:100]}")
+            print("\n是否要進入示範模式? (y/n): ", end="")
+            if input().lower() == 'y':
+                simulate_chat()
+
+if __name__ == "__main__":
+    main()
--- a/llama_chat.py
+++ b/llama_chat.py
@@ -0,0 +1,196 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+"""
+Llama 內網 API 對話程式
+支援多個端點和模型選擇
+"""
+
+from openai import OpenAI
+import sys
+import re
+
+# API 配置
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+
+# 可用端點 (前 3 個已測試可用)
+ENDPOINTS = [
+    "http://192.168.0.6:21180/v1",
+    "http://192.168.0.6:21181/v1", 
+    "http://192.168.0.6:21182/v1",
+    "http://192.168.0.6:21183/v1"
+]
+
+# 模型列表
+MODELS = [
+    "gpt-oss-120b",
+    "deepseek-r1-671b",
+    "qwen3-embedding-8b"
+]
+
+def clean_response(text):
+    """清理 AI 回應中的特殊標記"""
+    # 移除思考標記
+    if "<think>" in text:
+        text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
+    
+    # 移除 channel 標記
+    if "<|channel|>" in text:
+        parts = text.split("<|message|>")
+        if len(parts) > 1:
+            text = parts[-1]
+    
+    # 移除結束標記
+    text = text.replace("<|end|>", "").replace("<|start|>", "")
+    
+    # 清理多餘空白
+    text = text.strip()
+    
+    return text
+
+def test_endpoint(endpoint):
+    """測試端點是否可用"""
+    try:
+        client = OpenAI(api_key=API_KEY, base_url=endpoint)
+        response = client.chat.completions.create(
+            model="gpt-oss-120b",
+            messages=[{"role": "user", "content": "Hi"}],
+            max_tokens=10,
+            timeout=5
+        )
+        return True
+    except:
+        return False
+
+def chat_session(endpoint, model):
+    """對話主程式"""
+    print("\n" + "="*60)
+    print("Llama AI 對話系統")
+    print("="*60)
+    print(f"端點: {endpoint}")
+    print(f"模型: {model}")
+    print("\n指令:")
+    print("  exit/quit - 結束對話")
+    print("  clear - 清空對話歷史")
+    print("  model - 切換模型")
+    print("-"*60)
+    
+    client = OpenAI(api_key=API_KEY, base_url=endpoint)
+    messages = []
+    
+    while True:
+        try:
+            user_input = input("\n你: ").strip()
+            
+            if not user_input:
+                continue
+                
+            if user_input.lower() in ['exit', 'quit']:
+                print("再見！")
+                break
+                
+            if user_input.lower() == 'clear':
+                messages = []
+                print("[系統] 對話歷史已清空")
+                continue
+                
+            if user_input.lower() == 'model':
+                print("\n可用模型:")
+                for i, m in enumerate(MODELS, 1):
+                    print(f"  {i}. {m}")
+                choice = input("選擇 (1-3): ").strip()
+                if choice in ['1', '2', '3']:
+                    model = MODELS[int(choice)-1]
+                    print(f"[系統] 已切換到 {model}")
+                continue
+            
+            messages.append({"role": "user", "content": user_input})
+            
+            print("\nAI 思考中...", end="", flush=True)
+            
+            try:
+                response = client.chat.completions.create(
+                    model=model,
+                    messages=messages,
+                    temperature=0.7,
+                    max_tokens=1000
+                )
+                
+                ai_response = response.choices[0].message.content
+                ai_response = clean_response(ai_response)
+                
+                print("\r" + " "*20 + "\r", end="")  # 清除 "思考中..."
+                print(f"AI: {ai_response}")
+                
+                messages.append({"role": "assistant", "content": ai_response})
+                
+            except UnicodeEncodeError:
+                print("\r[錯誤] 編碼問題，請使用英文對話")
+                messages.pop()  # 移除最後的用戶訊息
+            except Exception as e:
+                print(f"\r[錯誤] {str(e)[:100]}")
+                messages.pop()  # 移除最後的用戶訊息
+                
+        except KeyboardInterrupt:
+            print("\n\n[中斷] 使用 exit 命令正常退出")
+            continue
+        except EOFError:
+            print("\n再見！")
+            break
+
+def main():
+    print("="*60)
+    print("Llama 內網 API 對話程式")
+    print("="*60)
+    
+    # 測試端點
+    print("\n正在檢查可用端點...")
+    available = []
+    for i, endpoint in enumerate(ENDPOINTS[:3], 1):  # 只測試前3個
+        print(f"  測試 {endpoint}...", end="", flush=True)
+        if test_endpoint(endpoint):
+            print(" [OK]")
+            available.append(endpoint)
+        else:
+            print(" [失敗]")
+    
+    if not available:
+        print("\n[錯誤] 沒有可用的端點")
+        sys.exit(1)
+    
+    # 選擇端點
+    if len(available) == 1:
+        selected_endpoint = available[0]
+        print(f"\n使用端點: {selected_endpoint}")
+    else:
+        print(f"\n找到 {len(available)} 個可用端點:")
+        for i, ep in enumerate(available, 1):
+            print(f"  {i}. {ep}")
+        print("\n選擇端點 (預設: 1): ", end="")
+        choice = input().strip()
+        if choice and choice.isdigit() and 1 <= int(choice) <= len(available):
+            selected_endpoint = available[int(choice)-1]
+        else:
+            selected_endpoint = available[0]
+    
+    # 選擇模型
+    print("\n可用模型:")
+    for i, model in enumerate(MODELS, 1):
+        print(f"  {i}. {model}")
+    print("\n選擇模型 (預設: 1): ", end="")
+    choice = input().strip()
+    if choice in ['1', '2', '3']:
+        selected_model = MODELS[int(choice)-1]
+    else:
+        selected_model = MODELS[0]
+    
+    # 開始對話
+    chat_session(selected_endpoint, selected_model)
+
+if __name__ == "__main__":
+    try:
+        main()
+    except KeyboardInterrupt:
+        print("\n\n程式已退出")
+    except Exception as e:
+        print(f"\n[錯誤] {e}")
+        sys.exit(1)
--- a/llama_full_api.py
+++ b/llama_full_api.py
@@ -0,0 +1,293 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+"""
+Llama API 完整對話程式
+支援內網和外網端點
+"""
+
+from openai import OpenAI
+import requests
+import sys
+import re
+from datetime import datetime
+
+# API 金鑰
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+
+# API 端點配置
+ENDPOINTS = {
+    "內網": [
+        {
+            "name": "內網端點 1 (21180)",
+            "url": "http://192.168.0.6:21180/v1",
+            "models": ["gpt-oss-120b", "deepseek-r1-671b", "qwen3-embedding-8b"]
+        },
+        {
+            "name": "內網端點 2 (21181)",
+            "url": "http://192.168.0.6:21181/v1",
+            "models": ["gpt-oss-120b", "deepseek-r1-671b", "qwen3-embedding-8b"]
+        },
+        {
+            "name": "內網端點 3 (21182)",
+            "url": "http://192.168.0.6:21182/v1",
+            "models": ["gpt-oss-120b", "deepseek-r1-671b", "qwen3-embedding-8b"]
+        }
+    ],
+    "外網": [
+        {
+            "name": "外網 GPT-OSS-120B",
+            "url": "https://llama.theaken.com/v1/gpt-oss-120b",
+            "models": ["gpt-oss-120b"]
+        },
+        {
+            "name": "外網 DeepSeek-R1-671B",
+            "url": "https://llama.theaken.com/v1/deepseek-r1-671b",
+            "models": ["deepseek-r1-671b"]
+        },
+        {
+            "name": "外網通用端點",
+            "url": "https://llama.theaken.com/v1",
+            "models": ["gpt-oss-120b", "deepseek-r1-671b", "qwen3-embedding-8b"]
+        }
+    ]
+}
+
+def clean_response(text):
+    """清理 AI 回應中的特殊標記"""
+    # 移除思考標記
+    if "<think>" in text:
+        text = re.sub(r'<think>.*?</think>', '', text, flags=re.DOTALL)
+    
+    # 移除 channel 標記
+    if "<|channel|>" in text:
+        parts = text.split("<|message|>")
+        if len(parts) > 1:
+            text = parts[-1]
+    
+    # 移除結束標記
+    text = text.replace("<|end|>", "").replace("<|start|>", "")
+    
+    # 清理多餘空白
+    text = text.strip()
+    
+    return text
+
+def test_endpoint(endpoint_info):
+    """測試端點是否可用"""
+    url = endpoint_info["url"]
+    model = endpoint_info["models"][0]  # 使用第一個模型測試
+    
+    try:
+        # 對於特定模型的 URL，需要特殊處理
+        if "/gpt-oss-120b" in url or "/deepseek-r1-671b" in url:
+            # 這些可能是特定模型的端點
+            base_url = url.rsplit("/", 1)[0]  # 移除模型名稱部分
+        else:
+            base_url = url
+            
+        client = OpenAI(api_key=API_KEY, base_url=base_url)
+        response = client.chat.completions.create(
+            model=model,
+            messages=[{"role": "user", "content": "test"}],
+            max_tokens=5,
+            timeout=8
+        )
+        return True
+    except Exception as e:
+        # 也嘗試使用 requests 直接測試
+        try:
+            headers = {
+                "Authorization": f"Bearer {API_KEY}",
+                "Content-Type": "application/json"
+            }
+            
+            test_url = f"{url}/chat/completions" if not url.endswith("/chat/completions") else url
+            data = {
+                "model": model,
+                "messages": [{"role": "user", "content": "test"}],
+                "max_tokens": 5
+            }
+            
+            response = requests.post(test_url, headers=headers, json=data, timeout=8)
+            return response.status_code == 200
+        except:
+            return False
+
+def test_all_endpoints():
+    """測試所有端點"""
+    print("\n" + "="*60)
+    print("測試 API 端點連接")
+    print("="*60)
+    
+    available_endpoints = []
+    
+    # 測試內網端點
+    print("\n[內網端點測試]")
+    for endpoint in ENDPOINTS["內網"]:
+        print(f"  測試 {endpoint['name']}...", end="", flush=True)
+        if test_endpoint(endpoint):
+            print(" [OK]")
+            available_endpoints.append(("內網", endpoint))
+        else:
+            print(" [FAIL]")
+    
+    # 測試外網端點
+    print("\n[外網端點測試]")
+    for endpoint in ENDPOINTS["外網"]:
+        print(f"  測試 {endpoint['name']}...", end="", flush=True)
+        if test_endpoint(endpoint):
+            print(" [OK]")
+            available_endpoints.append(("外網", endpoint))
+        else:
+            print(" [FAIL]")
+    
+    return available_endpoints
+
+def chat_session(endpoint_info):
+    """對話主程式"""
+    print("\n" + "="*60)
+    print("Llama AI 對話系統")
+    print("="*60)
+    print(f"端點: {endpoint_info['name']}")
+    print(f"URL: {endpoint_info['url']}")
+    print(f"可用模型: {', '.join(endpoint_info['models'])}")
+    print("\n指令:")
+    print("  exit/quit - 結束對話")
+    print("  clear - 清空對話歷史")
+    print("  model - 切換模型")
+    print("-"*60)
+    
+    # 處理 URL
+    url = endpoint_info["url"]
+    if "/gpt-oss-120b" in url or "/deepseek-r1-671b" in url:
+        base_url = url.rsplit("/", 1)[0]
+    else:
+        base_url = url
+    
+    client = OpenAI(api_key=API_KEY, base_url=base_url)
+    
+    # 選擇初始模型
+    if len(endpoint_info['models']) == 1:
+        current_model = endpoint_info['models'][0]
+    else:
+        print("\n選擇模型:")
+        for i, model in enumerate(endpoint_info['models'], 1):
+            print(f"  {i}. {model}")
+        choice = input("選擇 (預設: 1): ").strip()
+        if choice.isdigit() and 1 <= int(choice) <= len(endpoint_info['models']):
+            current_model = endpoint_info['models'][int(choice)-1]
+        else:
+            current_model = endpoint_info['models'][0]
+    
+    print(f"\n使用模型: {current_model}")
+    messages = []
+    
+    while True:
+        try:
+            user_input = input("\n你: ").strip()
+            
+            if not user_input:
+                continue
+                
+            if user_input.lower() in ['exit', 'quit']:
+                print("再見！")
+                break
+                
+            if user_input.lower() == 'clear':
+                messages = []
+                print("[系統] 對話歷史已清空")
+                continue
+                
+            if user_input.lower() == 'model':
+                if len(endpoint_info['models']) == 1:
+                    print(f"[系統] 此端點只支援 {endpoint_info['models'][0]}")
+                else:
+                    print("\n可用模型:")
+                    for i, m in enumerate(endpoint_info['models'], 1):
+                        print(f"  {i}. {m}")
+                    choice = input("選擇: ").strip()
+                    if choice.isdigit() and 1 <= int(choice) <= len(endpoint_info['models']):
+                        current_model = endpoint_info['models'][int(choice)-1]
+                        print(f"[系統] 已切換到 {current_model}")
+                continue
+            
+            messages.append({"role": "user", "content": user_input})
+            
+            print("\nAI 思考中...", end="", flush=True)
+            
+            try:
+                response = client.chat.completions.create(
+                    model=current_model,
+                    messages=messages,
+                    temperature=0.7,
+                    max_tokens=1000
+                )
+                
+                ai_response = response.choices[0].message.content
+                ai_response = clean_response(ai_response)
+                
+                print("\r" + " "*20 + "\r", end="")
+                print(f"AI: {ai_response}")
+                
+                messages.append({"role": "assistant", "content": ai_response})
+                
+            except Exception as e:
+                print(f"\r[錯誤] {str(e)[:100]}")
+                messages.pop()
+                
+        except KeyboardInterrupt:
+            print("\n\n[中斷] 使用 exit 命令正常退出")
+            continue
+        except EOFError:
+            print("\n再見！")
+            break
+
+def main():
+    print("="*60)
+    print("Llama API 完整對話程式")
+    print(f"時間: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    print("="*60)
+    
+    # 測試所有端點
+    available = test_all_endpoints()
+    
+    if not available:
+        print("\n[錯誤] 沒有可用的端點")
+        print("\n可能的原因:")
+        print("1. 網路連接問題")
+        print("2. API 服務離線")
+        print("3. 防火牆阻擋")
+        sys.exit(1)
+    
+    # 顯示可用端點
+    print("\n" + "="*60)
+    print(f"找到 {len(available)} 個可用端點:")
+    print("="*60)
+    
+    for i, (network_type, endpoint) in enumerate(available, 1):
+        print(f"{i}. [{network_type}] {endpoint['name']}")
+        print(f"   URL: {endpoint['url']}")
+        print(f"   模型: {', '.join(endpoint['models'])}")
+    
+    # 選擇端點
+    print("\n選擇端點 (預設: 1): ", end="")
+    choice = input().strip()
+    
+    if choice.isdigit() and 1 <= int(choice) <= len(available):
+        selected = available[int(choice)-1][1]
+    else:
+        selected = available[0][1]
+    
+    # 開始對話
+    chat_session(selected)
+
+if __name__ == "__main__":
+    try:
+        main()
+    except KeyboardInterrupt:
+        print("\n\n程式已退出")
+    except Exception as e:
+        print(f"\n[錯誤] {e}")
+        import traceback
+        traceback.print_exc()
+        sys.exit(1)
--- a/llama_test.py
+++ b/llama_test.py
@@ -0,0 +1,99 @@
+from openai import OpenAI
+import sys
+
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+BASE_URL = "https://llama.theaken.com/v1"
+
+AVAILABLE_MODELS = [
+    "gpt-oss-120b",
+    "deepseek-r1-671b",
+    "qwen3-embedding-8b"
+]
+
+def chat_with_llama(model_name="gpt-oss-120b"):
+    client = OpenAI(
+        api_key=API_KEY,
+        base_url=BASE_URL
+    )
+    
+    print(f"\n使用模型: {model_name}")
+    print("-" * 50)
+    print("輸入 'exit' 或 'quit' 來結束對話")
+    print("-" * 50)
+    
+    messages = []
+    
+    while True:
+        user_input = input("\n你: ").strip()
+        
+        if user_input.lower() in ['exit', 'quit']:
+            print("對話結束")
+            break
+            
+        if not user_input:
+            continue
+            
+        messages.append({"role": "user", "content": user_input})
+        
+        try:
+            response = client.chat.completions.create(
+                model=model_name,
+                messages=messages,
+                temperature=0.7,
+                max_tokens=2000
+            )
+            
+            assistant_reply = response.choices[0].message.content
+            print(f"\nAI: {assistant_reply}")
+            
+            messages.append({"role": "assistant", "content": assistant_reply})
+            
+        except Exception as e:
+            print(f"\n錯誤: {str(e)}")
+            print("請檢查網路連接和 API 設定")
+
+def test_connection():
+    print("測試連接到 Llama API...")
+    
+    client = OpenAI(
+        api_key=API_KEY,
+        base_url=BASE_URL
+    )
+    
+    try:
+        response = client.chat.completions.create(
+            model="gpt-oss-120b",
+            messages=[{"role": "user", "content": "Hello, this is a test message."}],
+            max_tokens=50
+        )
+        print("[OK] 連接成功!")
+        print(f"測試回應: {response.choices[0].message.content}")
+        return True
+    except Exception as e:
+        print(f"[ERROR] 連接失敗: {str(e)[:200]}")
+        return False
+
+def main():
+    print("=" * 50)
+    print("Llama 模型對話測試程式")
+    print("=" * 50)
+    
+    print("\n可用的模型:")
+    for i, model in enumerate(AVAILABLE_MODELS, 1):
+        print(f"  {i}. {model}")
+    
+    if test_connection():
+        print("\n選擇要使用的模型 (輸入數字 1-3，預設: 1):")
+        choice = input().strip()
+        
+        if choice == "2":
+            model = AVAILABLE_MODELS[1]
+        elif choice == "3":
+            model = AVAILABLE_MODELS[2]
+        else:
+            model = AVAILABLE_MODELS[0]
+        
+        chat_with_llama(model)
+
+if __name__ == "__main__":
+    main()
--- a/local_api_test.py
+++ b/local_api_test.py
@@ -0,0 +1,243 @@
+"""
+內網 Llama API 測試程式
+使用 OpenAI 相容格式連接到本地 API 端點
+"""
+
+from openai import OpenAI
+import requests
+import json
+from datetime import datetime
+
+# API 配置
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+
+# 內網端點列表
+LOCAL_ENDPOINTS = [
+    "http://192.168.0.6:21180/v1",
+    "http://192.168.0.6:21181/v1",
+    "http://192.168.0.6:21182/v1",
+    "http://192.168.0.6:21183/v1"
+]
+
+# 可用模型
+MODELS = [
+    "gpt-oss-120b",
+    "deepseek-r1-671b",
+    "qwen3-embedding-8b"
+]
+
+def test_endpoint_with_requests(endpoint, model="gpt-oss-120b"):
+    """使用 requests 測試端點"""
+    print(f"\n[使用 requests 測試]")
+    print(f"端點: {endpoint}")
+    print(f"模型: {model}")
+    
+    headers = {
+        "Authorization": f"Bearer {API_KEY}",
+        "Content-Type": "application/json"
+    }
+    
+    data = {
+        "model": model,
+        "messages": [
+            {"role": "user", "content": "Say 'Hello, I am working!' if you can see this."}
+        ],
+        "temperature": 0.7,
+        "max_tokens": 50
+    }
+    
+    try:
+        response = requests.post(
+            f"{endpoint}/chat/completions",
+            headers=headers,
+            json=data,
+            timeout=10
+        )
+        
+        print(f"HTTP 狀態碼: {response.status_code}")
+        
+        if response.status_code == 200:
+            result = response.json()
+            if 'choices' in result:
+                content = result['choices'][0]['message']['content']
+                print(f"[SUCCESS] AI 回應: {content}")
+                return True
+            else:
+                print("[ERROR] 回應格式不正確")
+        else:
+            print(f"[ERROR] HTTP {response.status_code}")
+            if response.status_code != 502:  # 避免顯示 HTML 錯誤頁
+                print(f"詳情: {response.text[:200]}")
+                
+    except requests.exceptions.ConnectTimeout:
+        print("[TIMEOUT] 連接超時")
+    except requests.exceptions.ConnectionError:
+        print("[CONNECTION ERROR] 無法連接到端點")
+    except Exception as e:
+        print(f"[ERROR] {str(e)[:100]}")
+    
+    return False
+
+def test_endpoint_with_openai(endpoint, model="gpt-oss-120b"):
+    """使用 OpenAI SDK 測試端點"""
+    print(f"\n[使用 OpenAI SDK 測試]")
+    print(f"端點: {endpoint}")
+    print(f"模型: {model}")
+    
+    try:
+        client = OpenAI(
+            api_key=API_KEY,
+            base_url=endpoint,
+            timeout=10.0
+        )
+        
+        response = client.chat.completions.create(
+            model=model,
+            messages=[
+                {"role": "user", "content": "Hello, please respond with a simple greeting."}
+            ],
+            temperature=0.7,
+            max_tokens=50
+        )
+        
+        content = response.choices[0].message.content
+        print(f"[SUCCESS] AI 回應: {content}")
+        return True, client
+        
+    except Exception as e:
+        error_str = str(e)
+        if "Connection error" in error_str:
+            print("[CONNECTION ERROR] 無法連接到端點")
+        elif "timeout" in error_str.lower():
+            print("[TIMEOUT] 請求超時")
+        elif "502" in error_str:
+            print("[ERROR] 502 Bad Gateway")
+        else:
+            print(f"[ERROR] {error_str[:100]}")
+    
+    return False, None
+
+def find_working_endpoint():
+    """尋找可用的端點"""
+    print("="*60)
+    print(f"內網 API 端點測試 - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    print("="*60)
+    
+    working_endpoints = []
+    
+    for endpoint in LOCAL_ENDPOINTS:
+        print(f"\n測試端點: {endpoint}")
+        print("-"*40)
+        
+        # 先用 requests 快速測試
+        if test_endpoint_with_requests(endpoint):
+            working_endpoints.append(endpoint)
+            print(f"[OK] 端點 {endpoint} 可用！")
+        else:
+            # 再用 OpenAI SDK 測試
+            success, _ = test_endpoint_with_openai(endpoint)
+            if success:
+                working_endpoints.append(endpoint)
+                print(f"[OK] 端點 {endpoint} 可用！")
+    
+    return working_endpoints
+
+def interactive_chat(endpoint, model="gpt-oss-120b"):
+    """互動式對話"""
+    print(f"\n連接到: {endpoint}")
+    print(f"使用模型: {model}")
+    print("="*60)
+    print("開始對話 (輸入 'exit' 結束)")
+    print("="*60)
+    
+    client = OpenAI(
+        api_key=API_KEY,
+        base_url=endpoint
+    )
+    
+    messages = []
+    
+    while True:
+        user_input = input("\n你: ").strip()
+        
+        if user_input.lower() in ['exit', 'quit']:
+            print("對話結束")
+            break
+            
+        if not user_input:
+            continue
+            
+        messages.append({"role": "user", "content": user_input})
+        
+        try:
+            print("\nAI 思考中...")
+            response = client.chat.completions.create(
+                model=model,
+                messages=messages,
+                temperature=0.7,
+                max_tokens=1000
+            )
+            
+            ai_response = response.choices[0].message.content
+            print(f"\nAI: {ai_response}")
+            messages.append({"role": "assistant", "content": ai_response})
+            
+        except Exception as e:
+            print(f"\n[ERROR] {str(e)[:100]}")
+
+def main():
+    # 尋找可用端點
+    working_endpoints = find_working_endpoint()
+    
+    print("\n" + "="*60)
+    print("測試結果總結")
+    print("="*60)
+    
+    if working_endpoints:
+        print(f"\n找到 {len(working_endpoints)} 個可用端點:")
+        for i, endpoint in enumerate(working_endpoints, 1):
+            print(f"  {i}. {endpoint}")
+        
+        # 選擇端點
+        if len(working_endpoints) == 1:
+            selected_endpoint = working_endpoints[0]
+            print(f"\n自動選擇唯一可用端點: {selected_endpoint}")
+        else:
+            print(f"\n請選擇要使用的端點 (1-{len(working_endpoints)}):")
+            choice = input().strip()
+            try:
+                idx = int(choice) - 1
+                if 0 <= idx < len(working_endpoints):
+                    selected_endpoint = working_endpoints[idx]
+                else:
+                    selected_endpoint = working_endpoints[0]
+            except:
+                selected_endpoint = working_endpoints[0]
+        
+        # 選擇模型
+        print("\n可用模型:")
+        for i, model in enumerate(MODELS, 1):
+            print(f"  {i}. {model}")
+        
+        print("\n請選擇模型 (1-3, 預設: 1):")
+        choice = input().strip()
+        if choice == "2":
+            selected_model = MODELS[1]
+        elif choice == "3":
+            selected_model = MODELS[2]
+        else:
+            selected_model = MODELS[0]
+        
+        # 開始對話
+        interactive_chat(selected_endpoint, selected_model)
+        
+    else:
+        print("\n[ERROR] 沒有找到可用的端點")
+        print("\n可能的原因:")
+        print("1. 內網 API 服務未啟動")
+        print("2. 防火牆阻擋了連接")
+        print("3. IP 地址或端口設定錯誤")
+        print("4. 不在同一個網路環境")
+
+if __name__ == "__main__":
+    main()
--- a/quick_test.py
+++ b/quick_test.py
@@ -0,0 +1,54 @@
+"""
+快速測試內網 Llama API
+"""
+
+from openai import OpenAI
+
+# API 設定
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+BASE_URL = "http://192.168.0.6:21180/v1"  # 使用第一個可用端點
+
+def quick_test():
+    print("連接到內網 API...")
+    print(f"端點: {BASE_URL}")
+    print("-" * 50)
+    
+    client = OpenAI(
+        api_key=API_KEY,
+        base_url=BASE_URL
+    )
+    
+    # 測試對話
+    test_messages = [
+        "你好，請自我介紹",
+        "1 + 1 等於多少？",
+        "今天天氣如何？"
+    ]
+    
+    for msg in test_messages:
+        print(f"\n問: {msg}")
+        
+        try:
+            response = client.chat.completions.create(
+                model="gpt-oss-120b",
+                messages=[
+                    {"role": "user", "content": msg}
+                ],
+                temperature=0.7,
+                max_tokens=200
+            )
+            
+            answer = response.choices[0].message.content
+            # 清理可能的思考標記
+            if "<think>" in answer:
+                answer = answer.split("</think>")[-1].strip()
+            if "<|channel|>" in answer:
+                answer = answer.split("<|message|>")[-1].strip()
+                
+            print(f"答: {answer}")
+            
+        except Exception as e:
+            print(f"錯誤: {str(e)[:100]}")
+
+if __name__ == "__main__":
+    quick_test()
--- a/requirements.txt
+++ b/requirements.txt
@@ -0,0 +1 @@
+openai>=1.0.0
--- a/simple_llama_test.py
+++ b/simple_llama_test.py
@@ -0,0 +1,46 @@
+import requests
+import json
+
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+BASE_URL = "https://llama.theaken.com/v1/chat/completions"
+
+def test_api():
+    headers = {
+        "Authorization": f"Bearer {API_KEY}",
+        "Content-Type": "application/json"
+    }
+    
+    data = {
+        "model": "gpt-oss-120b",
+        "messages": [
+            {"role": "user", "content": "Hello, can you respond?"}
+        ],
+        "temperature": 0.7,
+        "max_tokens": 100
+    }
+    
+    print("正在測試 API 連接...")
+    print(f"URL: {BASE_URL}")
+    print(f"Model: gpt-oss-120b")
+    print("-" * 50)
+    
+    try:
+        response = requests.post(BASE_URL, headers=headers, json=data, timeout=30)
+        
+        if response.status_code == 200:
+            result = response.json()
+            print("[成功] API 回應:")
+            print(result['choices'][0]['message']['content'])
+        else:
+            print(f"[錯誤] HTTP {response.status_code}")
+            print(f"回應內容: {response.text[:500]}")
+            
+    except requests.exceptions.Timeout:
+        print("[錯誤] 請求超時")
+    except requests.exceptions.ConnectionError:
+        print("[錯誤] 無法連接到伺服器")
+    except Exception as e:
+        print(f"[錯誤] {str(e)}")
+
+if __name__ == "__main__":
+    test_api()
--- a/test_all_models.py
+++ b/test_all_models.py
@@ -0,0 +1,143 @@
+import requests
+import json
+import time
+
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+BASE_URL = "https://llama.theaken.com/v1"
+
+MODELS = [
+    "gpt-oss-120b",
+    "deepseek-r1-671b", 
+    "qwen3-embedding-8b"
+]
+
+def test_model(model_name):
+    """測試單個模型"""
+    print(f"\n[測試模型: {model_name}]")
+    print("-" * 40)
+    
+    headers = {
+        "Authorization": f"Bearer {API_KEY}",
+        "Content-Type": "application/json"
+    }
+    
+    # 測試聊天完成端點
+    chat_url = f"{BASE_URL}/chat/completions"
+    data = {
+        "model": model_name,
+        "messages": [
+            {"role": "system", "content": "You are a helpful assistant."},
+            {"role": "user", "content": "Say 'Hello, I am working!' if you can see this message."}
+        ],
+        "temperature": 0.5,
+        "max_tokens": 50
+    }
+    
+    try:
+        print(f"連接到: {chat_url}")
+        response = requests.post(chat_url, headers=headers, json=data, timeout=30)
+        
+        print(f"HTTP 狀態碼: {response.status_code}")
+        
+        if response.status_code == 200:
+            result = response.json()
+            if 'choices' in result and len(result['choices']) > 0:
+                content = result['choices'][0]['message']['content']
+                print(f"[SUCCESS] AI 回應: {content}")
+                return True
+            else:
+                print("[ERROR] 回應格式異常")
+                print(f"回應內容: {json.dumps(result, indent=2)}")
+        else:
+            print(f"[ERROR] 錯誤回應")
+            # 檢查是否是 HTML 錯誤頁面
+            if response.text.startswith('<!DOCTYPE'):
+                print("收到 HTML 錯誤頁面 (可能是 502 Bad Gateway)")
+            else:
+                print(f"回應內容: {response.text[:300]}")
+                
+    except requests.exceptions.Timeout:
+        print("[TIMEOUT] 請求超時 (30秒)")
+    except requests.exceptions.ConnectionError as e:
+        print(f"[CONNECTION ERROR]: {str(e)[:100]}")
+    except Exception as e:
+        print(f"[UNEXPECTED ERROR]: {str(e)[:100]}")
+    
+    return False
+
+def test_api_endpoints():
+    """測試不同的 API 端點"""
+    print("\n[測試 API 端點可用性]")
+    print("=" * 50)
+    
+    headers = {
+        "Authorization": f"Bearer {API_KEY}",
+        "Content-Type": "application/json"
+    }
+    
+    # 測試不同的可能端點
+    endpoints = [
+        f"{BASE_URL}/models",
+        f"{BASE_URL}/chat/completions",
+        BASE_URL
+    ]
+    
+    for endpoint in endpoints:
+        try:
+            print(f"\n測試端點: {endpoint}")
+            response = requests.get(endpoint, headers=headers, timeout=10)
+            print(f"  狀態碼: {response.status_code}")
+            
+            if response.status_code == 200:
+                print("  [OK] 端點可訪問")
+                # 如果是 JSON 回應，顯示部分內容
+                try:
+                    data = response.json()
+                    print(f"  回應類型: JSON")
+                    if 'data' in data:
+                        print(f"  包含 {len(data['data'])} 項資料")
+                except:
+                    print(f"  回應類型: {response.headers.get('content-type', 'unknown')}")
+            elif response.status_code == 405:
+                print("  [OK] 端點存在 (但不支援 GET 方法)")
+            elif response.status_code == 502:
+                print("  [ERROR] 502 Bad Gateway - 伺服器暫時無法使用")
+            else:
+                print(f"  [ERROR] 無法訪問")
+                
+        except Exception as e:
+            print(f"  [ERROR]: {str(e)[:50]}")
+
+def main():
+    print("=" * 50)
+    print("Llama API 完整測試程式")
+    print("=" * 50)
+    print(f"API 基礎 URL: {BASE_URL}")
+    print(f"API 金鑰: {API_KEY[:10]}...{API_KEY[-5:]}")
+    
+    # 首先測試端點可用性
+    test_api_endpoints()
+    
+    print("\n" + "=" * 50)
+    print("開始測試各個模型")
+    print("=" * 50)
+    
+    success_count = 0
+    for model in MODELS:
+        if test_model(model):
+            success_count += 1
+        time.sleep(1)  # 避免請求過快
+    
+    print("\n" + "=" * 50)
+    print(f"測試結果: {success_count}/{len(MODELS)} 個模型成功連接")
+    
+    if success_count == 0:
+        print("\n可能的問題：")
+        print("1. API 伺服器暫時離線 (502 錯誤)")
+        print("2. API 金鑰可能不正確")
+        print("3. 網路連接問題")
+        print("4. 防火牆或代理設定")
+        print("\n建議稍後再試，或聯繫 API 提供者確認服務狀態。")
+
+if __name__ == "__main__":
+    main()
--- a/test_with_timeout.py
+++ b/test_with_timeout.py
@@ -0,0 +1,111 @@
+import requests
+import json
+from datetime import datetime
+
+# API 配置
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+BASE_URL = "https://llama.theaken.com/v1"
+
+def test_endpoints():
+    """測試不同的 API 端點和模型"""
+    
+    print("="*60)
+    print(f"Llama API 測試 - {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
+    print("="*60)
+    
+    headers = {
+        "Authorization": f"Bearer {API_KEY}",
+        "Content-Type": "application/json"
+    }
+    
+    # 測試配置
+    tests = [
+        {
+            "name": "GPT-OSS-120B",
+            "model": "gpt-oss-120b",
+            "prompt": "Say hello in one word"
+        },
+        {
+            "name": "DeepSeek-R1-671B", 
+            "model": "deepseek-r1-671b",
+            "prompt": "Say hello in one word"
+        },
+        {
+            "name": "Qwen3-Embedding-8B",
+            "model": "qwen3-embedding-8b",
+            "prompt": "Say hello in one word"
+        }
+    ]
+    
+    success_count = 0
+    
+    for test in tests:
+        print(f"\n[測試 {test['name']}]")
+        print("-"*40)
+        
+        data = {
+            "model": test["model"],
+            "messages": [
+                {"role": "user", "content": test["prompt"]}
+            ],
+            "temperature": 0.5,
+            "max_tokens": 20
+        }
+        
+        try:
+            # 使用較短的超時時間
+            response = requests.post(
+                f"{BASE_URL}/chat/completions",
+                headers=headers,
+                json=data,
+                timeout=15
+            )
+            
+            print(f"HTTP 狀態: {response.status_code}")
+            
+            if response.status_code == 200:
+                result = response.json()
+                if 'choices' in result:
+                    content = result['choices'][0]['message']['content']
+                    print(f"[SUCCESS] 成功回應: {content}")
+                    success_count += 1
+                else:
+                    print("[ERROR] 回應格式錯誤")
+            elif response.status_code == 502:
+                print("[ERROR] 502 Bad Gateway - 伺服器無法回應")
+            elif response.status_code == 401:
+                print("[ERROR] 401 Unauthorized - API 金鑰可能錯誤")
+            elif response.status_code == 404:
+                print("[ERROR] 404 Not Found - 模型或端點不存在")
+            else:
+                print(f"[ERROR] 錯誤 {response.status_code}")
+                if not response.text.startswith('<!DOCTYPE'):
+                    print(f"詳情: {response.text[:200]}")
+                    
+        except requests.exceptions.Timeout:
+            print("[TIMEOUT] 請求超時 (15秒)")
+        except requests.exceptions.ConnectionError as e:
+            print(f"[CONNECTION ERROR] 無法連接到伺服器")
+        except Exception as e:
+            print(f"[UNKNOWN ERROR]: {str(e)[:100]}")
+    
+    # 總結
+    print("\n" + "="*60)
+    print(f"測試結果: {success_count}/{len(tests)} 成功")
+    
+    if success_count == 0:
+        print("\n診斷資訊:")
+        print("• 網路連接: 正常 (可 ping 通)")
+        print("• API 端點: https://llama.theaken.com/v1")
+        print("• 錯誤類型: 502 Bad Gateway")
+        print("• 可能原因: 後端 API 服務暫時離線")
+        print("\n建議行動:")
+        print("1. 稍後再試 (建議 10-30 分鐘後)")
+        print("2. 聯繫 API 管理員確認服務狀態")
+        print("3. 檢查是否有服務維護公告")
+    else:
+        print(f"\n[OK] API 服務正常運作中!")
+        print(f"[OK] 可使用的模型數: {success_count}")
+
+if __name__ == "__main__":
+    test_endpoints()
--- a/使用說明.txt
+++ b/使用說明.txt
@@ -0,0 +1,33 @@
+===========================================
+Llama 模型對話測試程式 - 使用說明
+===========================================
+
+安裝步驟:
+---------
+1. 確保已安裝 Python 3.7 或更高版本
+
+2. 安裝依賴套件:
+   pip install -r requirements.txt
+
+執行程式:
+---------
+python llama_test.py
+
+功能說明:
+---------
+1. 程式啟動後會自動測試 API 連接
+2. 選擇要使用的模型 (1-3)
+3. 開始與 AI 進行對話
+4. 輸入 'exit' 或 'quit' 結束對話
+
+可用模型:
+---------
+1. gpt-oss-120b (預設)
+2. deepseek-r1-671b
+3. qwen3-embedding-8b
+
+注意事項:
+---------
+- 確保網路連接正常
+- API 金鑰已內建於程式中
+- 如遇到錯誤，請檢查網路連接或聯繫管理員
--- a/操作指南.md
+++ b/操作指南.md
@@ -0,0 +1,181 @@
+# Llama API 連接操作指南
+
+## 一、API 連接資訊
+
+### API 金鑰
+```
+paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo=
+```
+
+### 可用端點
+
+#### 內網端點（已測試成功）
+| 端點名稱 | URL | 狀態 | 支援模型 |
+|---------|-----|------|---------|
+| 內網端點 1 | http://192.168.0.6:21180/v1 | ✅ 可用 | gpt-oss-120b, deepseek-r1-671b, qwen3-embedding-8b |
+| 內網端點 2 | http://192.168.0.6:21181/v1 | ✅ 可用 | gpt-oss-120b, deepseek-r1-671b, qwen3-embedding-8b |
+| 內網端點 3 | http://192.168.0.6:21182/v1 | ✅ 可用 | gpt-oss-120b, deepseek-r1-671b, qwen3-embedding-8b |
+| 內網端點 4 | http://192.168.0.6:21183/v1 | ❌ 錯誤 | 500 Internal Server Error |
+
+#### 外網端點（待測試）
+| 端點名稱 | URL | 狀態 | 支援模型 |
+|---------|-----|------|---------|
+| GPT-OSS 專用 | https://llama.theaken.com/v1/gpt-oss-120b | 待測試 | gpt-oss-120b |
+| DeepSeek 專用 | https://llama.theaken.com/v1/deepseek-r1-671b | 待測試 | deepseek-r1-671b |
+| 通用端點 | https://llama.theaken.com/v1 | 待測試 | 所有模型 |
+
+## 二、快速開始
+
+### 1. 安裝依賴
+```bash
+pip install openai
+```
+
+### 2. 測試連接（Python）
+
+#### 內網連接範例
+```python
+from openai import OpenAI
+
+# 設定 API
+API_KEY = "paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo="
+BASE_URL = "http://192.168.0.6:21180/v1"  # 使用內網端點 1
+
+# 創建客戶端
+client = OpenAI(
+    api_key=API_KEY,
+    base_url=BASE_URL
+)
+
+# 發送請求
+response = client.chat.completions.create(
+    model="gpt-oss-120b",
+    messages=[
+        {"role": "user", "content": "你好，請自我介紹"}
+    ],
+    temperature=0.7,
+    max_tokens=200
+)
+
+# 顯示回應
+print(response.choices[0].message.content)
+```
+
+## 三、使用現成程式
+
+### 程式清單
+1. **llama_full_api.py** - 完整對話程式（支援內外網）
+2. **llama_chat.py** - 內網專用對話程式
+3. **local_api_test.py** - 端點測試工具
+4. **quick_test.py** - 快速測試腳本
+
+### 執行對話程式
+```bash
+# 執行完整版（自動測試所有端點）
+python llama_full_api.py
+
+# 執行內網版
+python llama_chat.py
+
+# 快速測試
+python quick_test.py
+```
+
+## 四、對話程式使用說明
+
+### 基本操作
+1. 執行程式後會自動測試可用端點
+2. 選擇要使用的端點（輸入數字）
+3. 選擇要使用的模型
+4. 開始對話
+
+### 對話中指令
+- `exit` 或 `quit` - 結束對話
+- `clear` - 清空對話歷史
+- `model` - 切換模型
+
+## 五、常見問題處理
+
+### 問題 1：502 Bad Gateway
+**原因**：外網 API 伺服器離線  
+**解決**：使用內網端點
+
+### 問題 2：Connection Error
+**原因**：不在內網環境或 IP 錯誤  
+**解決**：
+1. 確認在同一網路環境
+2. 檢查防火牆設定
+3. ping 192.168.0.6 測試連通性
+
+### 問題 3：編碼錯誤
+**原因**：Windows 終端編碼問題  
+**解決**：使用英文對話或修改終端編碼
+
+### 問題 4：回應包含特殊標記
+**說明**：如 `<think>`, `<|channel|>` 等  
+**處理**：程式已自動過濾這些標記
+
+## 六、API 回應格式清理
+
+部分模型回應可能包含思考過程標記，程式會自動清理：
+- `<think>...</think>` - 思考過程
+- `<|channel|>...<|message|>` - 通道標記
+- `<|end|>`, `<|start|>` - 結束/開始標記
+
+## 七、測試結果摘要
+
+### 成功測試
+✅ 內網端點 1-3 全部正常運作  
+✅ 支援 OpenAI SDK 標準格式  
+✅ 可正常進行對話  
+
+### 待確認
+- 外網端點需等待伺服器恢復
+- DeepSeek 和 Qwen 模型需進一步測試
+
+## 八、技術細節
+
+### 使用 OpenAI SDK
+```python
+from openai import OpenAI
+
+client = OpenAI(
+    api_key="你的金鑰",
+    base_url="API端點URL"
+)
+```
+
+### 使用 requests 庫
+```python
+import requests
+
+headers = {
+    "Authorization": "Bearer 你的金鑰",
+    "Content-Type": "application/json"
+}
+
+data = {
+    "model": "gpt-oss-120b",
+    "messages": [{"role": "user", "content": "你好"}],
+    "temperature": 0.7,
+    "max_tokens": 200
+}
+
+response = requests.post(
+    "API端點URL/chat/completions",
+    headers=headers,
+    json=data
+)
+```
+
+## 九、建議使用方式
+
+1. **開發測試**：使用內網端點（速度快、穩定）
+2. **生產環境**：配置多個端點自動切換
+3. **對話應用**：使用 llama_full_api.py
+4. **API 整合**：參考 quick_test.py 的實現
+
+---
+
+最後更新：2025-09-19  
+測試環境：Windows / Python 3.13
--- a/連線參數.txt
+++ b/連線參數.txt
@@ -0,0 +1,14 @@
+可以連接 llama 的模型，ai進行對話
+他的連線資料如下:
+
+外網連線：
+https://llama.theaken.com/v1https://llama.theaken.com/v1/gpt-oss-120b/
+https://llama.theaken.com/v1https://llama.theaken.com/v1/deepseek-r1-671b/
+https://llama.theaken.com/v1https://llama.theaken.com/v1/gpt-oss-120b/
+外網模型路徑：
+  1. /gpt-oss-120b/
+  2. /deepseek-r1-671b/
+  3. /qwen3-embedding-8b/
+ 
+
+金鑰：paVrIT+XU1NhwCAOb0X4aYi75QKogK5YNMGvQF1dCyo=