Real-time transcription for Meta Ray-Ban smart glasses. Captures audio from your glasses (or phone mic) and transcribes it live using Deepgram's streaming API, with speaker diarization and partial results.
Supported platforms: iOS (iPhone) and Android (Pixel, Samsung, etc.)
Start streaming from your glasses, tap the Scribe button, and get a live transcript:
- Real-time speech-to-text via Deepgram Nova-3 (streaming WebSocket)
- Speaker diarization -- labels who is speaking (S1, S2, ...)
- Partial results update in real-time as words are spoken
- Final results with punctuation and smart formatting
- Works with glasses mic or phone mic (iPhone/Phone mode)
Also includes:
- Gemini Live -- real-time voice + vision AI assistant (optional)
- WebRTC streaming -- share your glasses POV live to a browser viewer
- Phone mode -- test without glasses using your phone camera + mic
Meta Ray-Ban Glasses (or phone mic)
|
| PCM audio (16kHz mono)
v
iOS / Android App (this project)
|
| PCM Int16 stream (100ms chunks)
v
Deepgram Nova-3 (WebSocket)
|
|-- Partial transcripts --> App --> Live UI
|-- Final transcripts ----> App --> Scrolling panel
|-- Speaker labels -------> App --> S1, S2, S3...
v
Real-time transcript display
git clone https://github.com/sseanliu/GlassFlow.git
cd GlassFlow/samples/CameraAccess
open CameraAccess.xcodeprojcp CameraAccess/Secrets.swift.example CameraAccess/Secrets.swiftEdit Secrets.swift with your API keys:
- Deepgram API key (required for transcription)
- Gemini API key (optional, for AI assistant)
Select your iPhone as the target device and hit Run (Cmd+R).
Without glasses (iPhone mode):
- Tap "Start on iPhone"
- Tap the Scribe button to start transcribing
- Speak -- the transcript appears in real-time
With Meta Ray-Ban glasses:
First, enable Developer Mode in the Meta AI app:
- Open the Meta AI app on your iPhone
- Go to Settings (gear icon, bottom left)
- Tap App Info
- Tap the App version number 5 times -- this unlocks Developer Mode
- Go back to Settings -- you'll now see a Developer Mode toggle. Turn it on.
Then in GlassFlow:
- Tap "Start Streaming" in the app
- Tap the Scribe button for live transcription
git clone https://github.com/sseanliu/GlassFlow.gitOpen samples/CameraAccessAndroid/ in Android Studio.
The Meta DAT Android SDK is distributed via GitHub Packages. You need a GitHub Personal Access Token with read:packages scope.
- Go to GitHub > Settings > Developer Settings > Personal Access Tokens and create a classic token with
read:packagesscope - In
samples/CameraAccessAndroid/local.properties, add:
github_token=YOUR_GITHUB_TOKENcd samples/CameraAccessAndroid/app/src/main/java/com/meta/wearable/dat/externalsampleapps/cameraaccess/
cp Secrets.kt.example Secrets.ktEdit Secrets.kt with your Deepgram API key (required) and optional Gemini API key.
- Let Gradle sync in Android Studio
- Select your Android phone as the target device
- Click Run (Shift+F10)
Without glasses (Phone mode):
- Tap "Start on Phone"
- Tap the Scribe button to start transcribing
- Speak -- the transcript appears in real-time
With Meta Ray-Ban glasses:
Enable Developer Mode in the Meta AI app (same steps as iOS above), then:
- Tap "Start Streaming" in the app
- Tap the Scribe button for live transcription
All source code is in samples/CameraAccess/CameraAccess/:
| File | Purpose |
|---|---|
Transcription/DeepgramService.swift |
WebSocket streaming client for Deepgram Nova-3 |
Transcription/TranscriptionViewModel.swift |
Session lifecycle, partial/final segment management |
Transcription/TranscriptionView.swift |
Live scrolling transcript panel with speaker labels |
Gemini/AudioManager.swift |
Mic capture (PCM 16kHz) + audio playback (PCM 24kHz) |
Gemini/GeminiConfig.swift |
API keys, model config |
Gemini/GeminiLiveService.swift |
WebSocket client for Gemini Live API |
Gemini/GeminiSessionViewModel.swift |
Gemini session lifecycle |
iPhone/IPhoneCameraManager.swift |
AVCaptureSession wrapper for iPhone camera mode |
WebRTC/WebRTCClient.swift |
WebRTC peer connection + SDP negotiation |
- Input: Glasses mic or phone mic -> AudioManager (PCM Int16, 16kHz mono, 100ms chunks)
- Streaming: PCM chunks sent over WebSocket to Deepgram in real-time
- Output: Partial transcripts update live, final transcripts with punctuation and speaker labels
- Diarization: Deepgram identifies speakers (S1, S2, ...) automatically
- iPhone mode:
.voiceChataudio session for echo cancellation (mic + speaker co-located) - Glasses mode:
.videoChataudio session with Bluetooth HFP (mic is on glasses, speaker is on phone)
- iOS 17.0+
- Xcode 15.0+
- Deepgram API key (get one free)
- Meta Ray-Ban glasses (optional -- use iPhone mode for testing)
- Gemini API key (optional, for AI assistant)
- Android 14+ (API 34+)
- Android Studio Ladybug or newer
- GitHub account with
read:packagestoken (for DAT SDK) - Deepgram API key (get one free)
- Meta Ray-Ban glasses (optional -- use Phone mode for testing)
- Gemini API key (optional, for AI assistant)
Transcript not appearing -- Check that your Deepgram API key is configured in Settings or Secrets.swift. Make sure microphone permission is granted.
"Deepgram API key not configured" -- Add your key in the in-app Settings screen (Deepgram section) or in Secrets.swift.
Echo/feedback in iPhone mode -- The app uses .voiceChat audio session for echo cancellation. Try turning down the volume.
Gradle sync fails with 401 Unauthorized (Android) -- Your GitHub token is missing or doesn't have read:packages scope. Check local.properties. Generate a new token at github.com/settings/tokens.
For DAT SDK issues, see the developer documentation or the discussions forum.
This source code is licensed under the license found in the LICENSE file in the root directory of this source tree.
