304 lines
6.5 KiB
Markdown
304 lines
6.5 KiB
Markdown
|
|
# Server & Client Usage Guide
|
||
|
|
|
||
|
|
## ✅ Server is Working!
|
||
|
|
|
||
|
|
The WebSocket server is running on port **8766** with GPU acceleration.
|
||
|
|
|
||
|
|
## Quick Start
|
||
|
|
|
||
|
|
### 1. Start the Server
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./run.sh server/ws_server.py
|
||
|
|
```
|
||
|
|
|
||
|
|
Server will start on: `ws://localhost:8766`
|
||
|
|
|
||
|
|
### 2. Test with Simple Client
|
||
|
|
|
||
|
|
```bash
|
||
|
|
./run.sh test_client.py test.wav
|
||
|
|
```
|
||
|
|
|
||
|
|
### 3. Use Microphone Client
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# List audio devices first
|
||
|
|
./run.sh client/mic_stream.py --list-devices
|
||
|
|
|
||
|
|
# Start streaming from microphone
|
||
|
|
./run.sh client/mic_stream.py
|
||
|
|
|
||
|
|
# Or specify device
|
||
|
|
./run.sh client/mic_stream.py --device 0
|
||
|
|
```
|
||
|
|
|
||
|
|
## Available Clients
|
||
|
|
|
||
|
|
### 1. **test_client.py** - Simple File Testing
|
||
|
|
```bash
|
||
|
|
./run.sh test_client.py your_audio.wav
|
||
|
|
```
|
||
|
|
- Sends audio file to server
|
||
|
|
- Shows real-time transcription
|
||
|
|
- Good for testing
|
||
|
|
|
||
|
|
### 2. **client/mic_stream.py** - Live Microphone
|
||
|
|
```bash
|
||
|
|
./run.sh client/mic_stream.py
|
||
|
|
```
|
||
|
|
- Captures from microphone
|
||
|
|
- Streams to server
|
||
|
|
- Real-time transcription display
|
||
|
|
|
||
|
|
### 3. **Custom Client** - Your Own Script
|
||
|
|
|
||
|
|
```python
|
||
|
|
import asyncio
|
||
|
|
import websockets
|
||
|
|
import json
|
||
|
|
|
||
|
|
async def connect():
|
||
|
|
async with websockets.connect("ws://localhost:8766") as ws:
|
||
|
|
# Send audio as int16 PCM bytes
|
||
|
|
audio_bytes = your_audio_data.astype('int16').tobytes()
|
||
|
|
await ws.send(audio_bytes)
|
||
|
|
|
||
|
|
# Receive transcription
|
||
|
|
response = await ws.recv()
|
||
|
|
result = json.loads(response)
|
||
|
|
print(result['text'])
|
||
|
|
|
||
|
|
asyncio.run(connect())
|
||
|
|
```
|
||
|
|
|
||
|
|
## Server Options
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Custom host/port
|
||
|
|
./run.sh server/ws_server.py --host 0.0.0.0 --port 9000
|
||
|
|
|
||
|
|
# Enable VAD (for long audio)
|
||
|
|
./run.sh server/ws_server.py --use-vad
|
||
|
|
|
||
|
|
# Different model
|
||
|
|
./run.sh server/ws_server.py --model nemo-parakeet-tdt-0.6b-v3
|
||
|
|
|
||
|
|
# Change sample rate
|
||
|
|
./run.sh server/ws_server.py --sample-rate 16000
|
||
|
|
```
|
||
|
|
|
||
|
|
## Client Options
|
||
|
|
|
||
|
|
### Microphone Client
|
||
|
|
```bash
|
||
|
|
# List devices
|
||
|
|
./run.sh client/mic_stream.py --list-devices
|
||
|
|
|
||
|
|
# Use specific device
|
||
|
|
./run.sh client/mic_stream.py --device 2
|
||
|
|
|
||
|
|
# Custom server URL
|
||
|
|
./run.sh client/mic_stream.py --url ws://192.168.1.100:8766
|
||
|
|
|
||
|
|
# Adjust chunk duration (lower = lower latency)
|
||
|
|
./run.sh client/mic_stream.py --chunk-duration 0.05
|
||
|
|
```
|
||
|
|
|
||
|
|
## Protocol
|
||
|
|
|
||
|
|
The server uses a simple JSON-based protocol:
|
||
|
|
|
||
|
|
### Server → Client Messages
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "info",
|
||
|
|
"message": "Connected to ASR server",
|
||
|
|
"sample_rate": 16000
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "transcript",
|
||
|
|
"text": "transcribed text here",
|
||
|
|
"is_final": false
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"type": "error",
|
||
|
|
"message": "error description"
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
### Client → Server Messages
|
||
|
|
|
||
|
|
**Send audio:**
|
||
|
|
- Binary data (int16 PCM, little-endian)
|
||
|
|
- Sample rate: 16000 Hz
|
||
|
|
- Mono channel
|
||
|
|
|
||
|
|
**Send commands:**
|
||
|
|
```json
|
||
|
|
{"type": "final"} // Process remaining buffer
|
||
|
|
{"type": "reset"} // Reset audio buffer
|
||
|
|
```
|
||
|
|
|
||
|
|
## Audio Format Requirements
|
||
|
|
|
||
|
|
- **Format**: int16 PCM (bytes)
|
||
|
|
- **Sample Rate**: 16000 Hz
|
||
|
|
- **Channels**: Mono (1)
|
||
|
|
- **Byte Order**: Little-endian
|
||
|
|
|
||
|
|
### Convert Audio in Python
|
||
|
|
|
||
|
|
```python
|
||
|
|
import numpy as np
|
||
|
|
import soundfile as sf
|
||
|
|
|
||
|
|
# Load audio
|
||
|
|
audio, sr = sf.read("file.wav", dtype='float32')
|
||
|
|
|
||
|
|
# Convert to mono
|
||
|
|
if audio.ndim > 1:
|
||
|
|
audio = audio[:, 0]
|
||
|
|
|
||
|
|
# Resample if needed (install resampy)
|
||
|
|
if sr != 16000:
|
||
|
|
import resampy
|
||
|
|
audio = resampy.resample(audio, sr, 16000)
|
||
|
|
|
||
|
|
# Convert to int16 for sending
|
||
|
|
audio_int16 = (audio * 32767).astype(np.int16)
|
||
|
|
audio_bytes = audio_int16.tobytes()
|
||
|
|
```
|
||
|
|
|
||
|
|
## Examples
|
||
|
|
|
||
|
|
### Browser Client (JavaScript)
|
||
|
|
|
||
|
|
```javascript
|
||
|
|
const ws = new WebSocket('ws://localhost:8766');
|
||
|
|
|
||
|
|
ws.onopen = () => {
|
||
|
|
console.log('Connected!');
|
||
|
|
|
||
|
|
// Capture from microphone
|
||
|
|
navigator.mediaDevices.getUserMedia({ audio: true })
|
||
|
|
.then(stream => {
|
||
|
|
const audioContext = new AudioContext({ sampleRate: 16000 });
|
||
|
|
const source = audioContext.createMediaStreamSource(stream);
|
||
|
|
const processor = audioContext.createScriptProcessor(4096, 1, 1);
|
||
|
|
|
||
|
|
processor.onaudioprocess = (e) => {
|
||
|
|
const audioData = e.inputBuffer.getChannelData(0);
|
||
|
|
// Convert float32 to int16
|
||
|
|
const int16Data = new Int16Array(audioData.length);
|
||
|
|
for (let i = 0; i < audioData.length; i++) {
|
||
|
|
int16Data[i] = Math.max(-32768, Math.min(32767, audioData[i] * 32768));
|
||
|
|
}
|
||
|
|
ws.send(int16Data.buffer);
|
||
|
|
};
|
||
|
|
|
||
|
|
source.connect(processor);
|
||
|
|
processor.connect(audioContext.destination);
|
||
|
|
});
|
||
|
|
};
|
||
|
|
|
||
|
|
ws.onmessage = (event) => {
|
||
|
|
const data = JSON.parse(event.data);
|
||
|
|
if (data.type === 'transcript') {
|
||
|
|
console.log('Transcription:', data.text);
|
||
|
|
}
|
||
|
|
};
|
||
|
|
```
|
||
|
|
|
||
|
|
### Python Script Client
|
||
|
|
|
||
|
|
```python
|
||
|
|
#!/usr/bin/env python3
|
||
|
|
import asyncio
|
||
|
|
import websockets
|
||
|
|
import sounddevice as sd
|
||
|
|
import numpy as np
|
||
|
|
import json
|
||
|
|
|
||
|
|
async def stream_microphone():
|
||
|
|
uri = "ws://localhost:8766"
|
||
|
|
|
||
|
|
async with websockets.connect(uri) as ws:
|
||
|
|
print("Connected!")
|
||
|
|
|
||
|
|
def audio_callback(indata, frames, time, status):
|
||
|
|
# Convert to int16 and send
|
||
|
|
audio = (indata[:, 0] * 32767).astype(np.int16)
|
||
|
|
asyncio.create_task(ws.send(audio.tobytes()))
|
||
|
|
|
||
|
|
# Start recording
|
||
|
|
with sd.InputStream(callback=audio_callback,
|
||
|
|
channels=1,
|
||
|
|
samplerate=16000,
|
||
|
|
blocksize=1600): # 0.1 second chunks
|
||
|
|
|
||
|
|
while True:
|
||
|
|
response = await ws.recv()
|
||
|
|
data = json.loads(response)
|
||
|
|
if data.get('type') == 'transcript':
|
||
|
|
print(f"→ {data['text']}")
|
||
|
|
|
||
|
|
asyncio.run(stream_microphone())
|
||
|
|
```
|
||
|
|
|
||
|
|
## Performance
|
||
|
|
|
||
|
|
With GPU (GTX 1660):
|
||
|
|
- **Latency**: <100ms per chunk
|
||
|
|
- **Throughput**: ~50-100x realtime
|
||
|
|
- **GPU Memory**: ~1.3GB
|
||
|
|
- **Languages**: 25+ (auto-detected)
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Server won't start
|
||
|
|
```bash
|
||
|
|
# Check if port is in use
|
||
|
|
lsof -i:8766
|
||
|
|
|
||
|
|
# Kill existing server
|
||
|
|
pkill -f ws_server.py
|
||
|
|
|
||
|
|
# Restart
|
||
|
|
./run.sh server/ws_server.py
|
||
|
|
```
|
||
|
|
|
||
|
|
### Client can't connect
|
||
|
|
```bash
|
||
|
|
# Check server is running
|
||
|
|
ps aux | grep ws_server
|
||
|
|
|
||
|
|
# Check firewall
|
||
|
|
sudo ufw allow 8766
|
||
|
|
```
|
||
|
|
|
||
|
|
### No transcription output
|
||
|
|
- Check audio format (must be int16 PCM, 16kHz, mono)
|
||
|
|
- Check chunk size (not too small)
|
||
|
|
- Check server logs for errors
|
||
|
|
|
||
|
|
### GPU not working
|
||
|
|
- Server will fall back to CPU automatically
|
||
|
|
- Check `nvidia-smi` for GPU status
|
||
|
|
- Verify CUDA libraries are loaded (should be automatic with `./run.sh`)
|
||
|
|
|
||
|
|
## Next Steps
|
||
|
|
|
||
|
|
1. **Test the server**: `./run.sh test_client.py test.wav`
|
||
|
|
2. **Try microphone**: `./run.sh client/mic_stream.py`
|
||
|
|
3. **Build your own client** using the examples above
|
||
|
|
|
||
|
|
Happy transcribing! 🎤
|