AVSpeechSynthesizer is broken on iOS 17 in Xcode 15

Question

Created Sep ’23

Replies 32

Boosts 17

Views 12k

Participants 18

When you initialize AVSpeechSynthesizer as View prorety in SwiftUI project in Xcode 15 with iOS 17 simulator, you get some comments in console:

Failed to get sandbox extensions
Query for com.apple.MobileAsset.VoiceServicesVocalizerVoice failed: 2
#FactoryInstall Unable to query results, error: 5
Unable to list voice folder
Query for com.apple.MobileAsset.VoiceServices.GryphonVoice failed: 2
Unable to list voice folder
Unable to list voice folder
Query for com.apple.MobileAsset.VoiceServices.GryphonVoice failed: 2
Unable to list voice folder

When you try to run utterance inside a Button as synthesizer.speak(AVSpeechUtterance(string: "iOS 17 broke TextToSpeech")), you get endless stream of warnings that repeaths on and on in console like this:

AddInstanceForFactory: No factory registered for id <CFUUID 0x60000024f200> F8BB1C28-BAE8-11D6-9C31-00039315CD46
Cannot find executable for CFBundle 0x600003b2cd20 </Library/Developer/CoreSimulator/Volumes/iOS_21A328/Library/Developer/CoreSimulator/Profiles/Runtimes/iOS 17.0.simruntime/Contents/Resources/RuntimeRoot/System/Library/PrivateFrameworks/TextToSpeechMauiSupport.framework> (not loaded)
Failed to load first party audio unit from TextToSpeechMauiSupport.framework
Could not instantiate audio unit. Error=Error Domain=NSOSStatusErrorDomain Code=-3000 "(null)"
Could not instantiate audio unit. Error=Error Domain=NSOSStatusErrorDomain Code=-3000 "(null)"
Could not instantiate audio unit. Error=Error Domain=NSOSStatusErrorDomain Code=-3000 "(null)"
Could not instantiate audio unit. Error=Error Domain=NSOSStatusErrorDomain Code=-3000 "(null)"
Could not instantiate audio unit. Error=Error Domain=NSOSStatusErrorDomain Code=-3000 "(null)"
Couldn't find audio unit for request SSML Length: 40, Voice: [AVSpeechSynthesisProviderVoice 0x600002127e30] Name: Samantha, Identifier: com.apple.voice.compact.en-US.Samantha, Supported Languages (
    "en-US"
), Age: 0, Gender: 0, Size: 0, Version: (null)
VoiceProvider: Could not start synthesis for request SSML Length: 40, Voice: [AVSpeechSynthesisProviderVoice 0x600002127e30] Name: Samantha, Identifier: com.apple.voice.compact.en-US.Samantha, Supported Languages (
    "en-US"
), Age: 0, Gender: 0, Size: 0, Version: (null), converted from tts request [TTSSpeechRequest 0x600003709680] iOS 17 broke TextToSpeech language: en-US footprint: compact rate: 0.500000 pitch: 1.000000 volume: 1.000000
Failed to speak request with error: Error Domain=TTSErrorDomain Code=-4010 "(null)". Attempting to speak again with fallback identifier: com.apple.voice.compact.en-US.Samantha

CPU is under pressure (more than 100%). AVSpeechSynthesizer doesn't speak. All works fine on iOS 16.

The code of View:

import SwiftUI
import AVFoundation

struct ContentView: View {
    let synthesizer = AVSpeechSynthesizer()
    
    var body: some View {
        VStack {
            Button {
                synthesizer.speak(AVSpeechUtterance(string: "iOS 17 broke TextToSpeech"))
            } label: {
                Text("speak")
            }
            .buttonStyle(.borderedProminent)

        }
        .padding()
    }
}

#Preview {
    ContentView()
}

On the real device nothing at all happened.

The same happens to my production app. I have so much crashes related to TextToSpeach and iOS 17. What's going on?

Boost

Answer 1

OKAppleDeveloper OP

Oct ’23

I wrote this code to test the 158 available voices on XCODE15 and iOS17, so far the 10 I tested are crashing the app. Fred en-US is working. Can we get more hands to test the whole list and provide the working voices?

https://gist.github.com/Koze/d1de49c24fc28375a9e314c72f7fdae4

Thanks.

import SwiftUI import AVFoundation

// Define a struct to hold voice information struct VoiceInfo: Comparable { var name: String var language: String var identifier: String

static func < (lhs: VoiceInfo, rhs: VoiceInfo) -> Bool {
    if lhs.language == rhs.language {
        return lhs.name < rhs.name
    }
    return lhs.language < rhs.language
}

}

struct ContentView: View { @State private var selectedVoice = 0 @State private var textToSpeak = "Hello, World!" let synthesizer = AVSpeechSynthesizer()

// Create and sort the array of voice information
let voices: [VoiceInfo] = [
    VoiceInfo(name: "Majed", language: "ar-001", identifier: "com.apple.voice.compact.ar-001.Maged"),
    ...
].sorted()

var body: some View {
    VStack {
        Picker("Select a Voice", selection: $selectedVoice) {
            ForEach(0 ..< voices.count, id: \.self) { index in
                Text("\(voices[index].name) (\(voices[index].language))")
                    .tag(index)
            }
        }
        .pickerStyle(WheelPickerStyle())
        .padding()
        
        TextField("Enter text to speak", text: $textToSpeak)
            .textFieldStyle(RoundedBorderTextFieldStyle())
            .padding()
        
        Button("Speak") {
            speakText()
        }
        .padding()
    }
}

func speakText() {
    let utterance = AVSpeechUtterance(string: textToSpeak)
    utterance.voice = AVSpeechSynthesisVoice(identifier: voices[selectedVoice].identifier)
    synthesizer.speak(utterance)
}

}

struct ContentView_Previews: PreviewProvider { static var previews: some View { ContentView() } }

#Preview { ContentView() }

0

Answer 2

Macho Man Randy Savage OP

Oct ’23

Certain voices just stop at random points on speech strings in my app testing. Also the -willSpeakRange: delegate callback sometimes jumps all over the place.

Doesn't seem to be any rhyme or reason to it.

0

Answer 3

fromlos gatos OP

Oct ’23

Does not appear to be fixed in the simulator.

0

Answer 4

Macho Man Randy Savage OP

Oct ’23

Address sanitizer crash as soon as you call -speakUtterance: still occurs on iOS 17.1. Kind of surprising that this seems like it is being treated as a low priority issue. I'm pretty sure WebKit uses AVSpeechSynthesizer to implement the Web Speech API so even beyond use in native apps every WKWebView and maybe even Safari could be impacted?

It would be great if there was a little more communication from Apple on this to let devs know if a fix for this issue is coming anytime soon (and by soon I mean not in iOS 18) or if we should investigate other ways to implement this functionality.

0

Answer 5

ns1045 OP

Nov ’23

That text to speech feature will be the most important part of my app. So it would be really really nice to be fixed soon.

Same issue here with german and english voices.

0

Answer 6

dw_dw OP

Nov ’23

Does anyone have an update on this issue? Neither the iOS 17.2 beta 2 nor the Xcode 15.1 Beta 2 release notes mention it

1

Answer 7

edvilme OP

Nov ’23

@s43 that works, thx!! Still weird that it happens tho, we are almost at midlife of iOS 17, it should have been fixed by now...

1

Answer 8

dmscro OP

Nov ’23

And to be clear, this is not just a simulator issue? " ... nothing at all happened ... " on real device means no problems on the device (so it works)? Or no sound on the device (it's broken there too)?

2

Answer 9

dmscro OP

Nov ’23

Other devs / other posts report that voices that don't work seem to also have their .audioFileSettings property like this [ : ] ... empty!

I ran

 func testVoices() {
        let voiceList = AVSpeechSynthesisVoice.speechVoices()
        for voice in voiceList {
            print(voice.name + ": \(voice.audioFileSettings)")
        }
   }

Output shows a handful of voices (including Fred) have data in their .audioFileSettings property ( and unfortunately most have [ : ] )
Voices that have settings seem to work in the simulator - although it's spotty on an actual iOS 17 device. Fred, for example, works only 1/30 of the time when tested on my iPhone but is fine in the simulator.

~~ My Voice Settings Output~~ Majed: [:] Daria: [:] Montse: [:] Zuzana: [:] Sara: [:] Anna: [:] Melina: [:] Karen: [:] Daniel: [:] Moira: [:] Rishi: [:]

Trinoids: ["AVFormatIDKey": 1819304813, "AVLinearPCMIsBigEndianKey": 0, "AVSampleRateKey": 22050, "AVLinearPCMBitDepthKey": 32, "AVLinearPCMIsFloatKey": 1, "AVNumberOfChannelsKey": 1, "AVLinearPCMIsNonInterleaved": 1]

Albert: ["AVLinearPCMBitDepthKey": 32, "AVSampleRateKey": 22050, "AVLinearPCMIsBigEndianKey": 0, "AVFormatIDKey": 1819304813, "AVNumberOfChannelsKey": 1, "AVLinearPCMIsFloatKey": 1, "AVLinearPCMIsNonInterleaved": 1]

ester: ["AVLinearPCMIsFloatKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVNumberOfChannelsKey": 1, "AVFormatIDKey": 1819304813, "AVLinearPCMIsBigEndianKey": 0, "AVSampleRateKey": 22050, "AVLinearPCMBitDepthKey": 32]

Samantha: [:]

Whisper: ["AVLinearPCMIsFloatKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVNumberOfChannelsKey": 1, "AVFormatIDKey": 1819304813, "AVLinearPCMIsBigEndianKey": 0, "AVSampleRateKey": 22050, "AVLinearPCMBitDepthKey": 32]

Superstar: ["AVNumberOfChannelsKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVLinearPCMIsFloatKey": 1, "AVSampleRateKey": 22050, "AVFormatIDKey": 1819304813, "AVLinearPCMBitDepthKey": 32, "AVLinearPCMIsBigEndianKey": 0]

Bells: ["AVLinearPCMBitDepthKey": 32, "AVNumberOfChannelsKey": 1, "AVSampleRateKey": 22050, "AVFormatIDKey": 1819304813, "AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMIsNonInterleaved": 1, "AVLinearPCMIsFloatKey": 1]

Organ: ["AVLinearPCMBitDepthKey": 32, "AVNumberOfChannelsKey": 1, "AVSampleRateKey": 22050, "AVFormatIDKey": 1819304813, "AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMIsNonInterleaved": 1, "AVLinearPCMIsFloatKey": 1]

Bad News: ["AVLinearPCMIsFloatKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVNumberOfChannelsKey": 1, "AVFormatIDKey": 1819304813, "AVLinearPCMIsBigEndianKey": 0, "AVSampleRateKey": 22050, "AVLinearPCMBitDepthKey": 32]

Bubbles: ["AVFormatIDKey": 1819304813, "AVLinearPCMIsNonInterleaved": 1, "AVLinearPCMBitDepthKey": 32, "AVSampleRateKey": 22050, "AVLinearPCMIsFloatKey": 1, "AVLinearPCMIsBigEndianKey": 0, "AVNumberOfChannelsKey": 1]

Junior: ["AVLinearPCMBitDepthKey": 32, "AVFormatIDKey": 1819304813, "AVNumberOfChannelsKey": 1, "AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMIsFloatKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVSampleRateKey": 22050]

Bahh: ["AVLinearPCMBitDepthKey": 32, "AVFormatIDKey": 1819304813, "AVNumberOfChannelsKey": 1, "AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMIsFloatKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVSampleRateKey": 22050]

Wobble: ["AVLinearPCMBitDepthKey": 32, "AVFormatIDKey": 1819304813, "AVNumberOfChannelsKey": 1, "AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMIsFloatKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVSampleRateKey": 22050]

Boing: ["AVNumberOfChannelsKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVLinearPCMBitDepthKey": 32, "AVSampleRateKey": 22050, "AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMIsFloatKey": 1, "AVFormatIDKey": 1819304813]

Good News: ["AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMBitDepthKey": 32, "AVLinearPCMIsFloatKey": 1, "AVNumberOfChannelsKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVSampleRateKey": 22050, "AVFormatIDKey": 1819304813]

Zarvox: ["AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMBitDepthKey": 32, "AVLinearPCMIsFloatKey": 1, "AVNumberOfChannelsKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVSampleRateKey": 22050, "AVFormatIDKey": 1819304813]

Ralph: ["AVLinearPCMIsBigEndianKey": 0, "AVLinearPCMBitDepthKey": 32, "AVLinearPCMIsFloatKey": 1, "AVNumberOfChannelsKey": 1, "AVLinearPCMIsNonInterleaved": 1, "AVSampleRateKey": 22050, "AVFormatIDKey": 1819304813]

Cellos: ["AVLinearPCMIsBigEndianKey": 0, "AVFormatIDKey": 1819304813, "AVLinearPCMIsNonInterleaved": 1, "AVSampleRateKey": 22050, "AVNumberOfChannelsKey": 1, "AVLinearPCMIsFloatKey": 1, "AVLinearPCMBitDepthKey": 32]

Kathy: ["AVNumberOfChannelsKey": 1, "AVLinearPCMIsFloatKey": 1, "AVSampleRateKey": 22050, "AVFormatIDKey": 1819304813, "AVLinearPCMBitDepthKey": 32, "AVLinearPCMIsNonInterleaved": 1, "AVLinearPCMIsBigEndianKey": 0]

Fred: ["AVNumberOfChannelsKey": 1, "AVLinearPCMIsFloatKey": 1, "AVSampleRateKey": 22050, "AVFormatIDKey": 1819304813, "AVLinearPCMBitDepthKey": 32, "AVLinearPCMIsNonInterleaved": 1, "AVLinearPCMIsBigEndianKey": 0]

Tessa: [:] Mónica: [:] Paulina: [:] Satu: [:] Amélie: [:] Thomas: [:] Carmit: [:] Lekha: [:] Lana: [:] Tünde: [:] Damayanti: [:] Alice: [:] Kyoko: [:] Yuna: [:] Amira: [:] Nora: [:] Ellen: [:] Xander: [:] Zosia: [:] Luciana: [:] Joana: [:] Ioana: [:] Milena: [:] Laura: [:] Alva: [:] Kanya: [:] Yelda: [:] Lesya: [:] Linh: [:] Tingting: [:] Sinji: [:] Meijia: [:]

1

Answer 10

dmscro OP

Nov ’23

OK. After some experimenting I have a solution. This works consistently on both the simulator and my iOS 17 iPhone.

Manually downloaded voices simply work.

Settings > Accessibility > Live Speech (English US) on both the sim and my iphone. I downloaded Alex, Ava (Premium), Evan(Enhanced)

Then used this block to obtain the exact identifier for the voice:

        let englishVoices = AVSpeechSynthesisVoice.speechVoices().filter { $0.language.starts(with: "en") }
        for voice in englishVoices {
            let quality = voice.quality == .enhanced ? "Enhanced" : "Default"
            print("\(voice.identifier): \(quality)")
        }

TTS as usual:

func speak(text: String) {
        let utterance = AVSpeechUtterance(string: text)
        
        if let voice = AVSpeechSynthesisVoice(identifier: "com.apple.voice.enhanced.en-US.Evan") {
            utterance.voice = voice
        }

        // Other configs
        utterance.rate = AVSpeechUtteranceDefaultSpeechRate
        utterance.pitchMultiplier = 1.0
        utterance.volume = 1.0

        speechSynthesizer.speak(utterance)
    }

It works for Alex, Evan, and Ava which are all that I've tested so far. I'm planning on popping up an alert in my app for the user to download one of these (or another once I test) if they initiate TTS and don't already have one of these voices available. No, it's not ideal to do that but it's a functional workaround in the meantime. Otherwise I can fall back on one of the very terrible old voices (Trinoids, anyone?) that are working by default as listed in my reply above.

Also, I was incorrect about Fred and similar voices that have data in their .audioFileSettings dictionary not working on my device. I was, in fact, failing to release the audio context after speech recognition. My app uses both STT and TTS, although STT doesn't work reliably in the sim (this is a known issue) so I wasn't initiating it there. On my test iPhone, failing to return the AVAudioSession to category .ambient or .playback after SST caused the issue. Fred, Tessa, Kathy, Ralph, Zarvox etc. all work on both the simulator and my test iPhone now. Of course a) no one want's to use those voices; and b) The rest definitely don't work.

I tried setting values in the .audioFileSettings dictionary for other voices like Sara, Ellen etc. thinking that I might have some success. But of course, that property is read-only.

I think at this point I'm comfortable concluding that the issue is those empty .audioFileSettings dictionaries that are properties of each voice in the bundled iOS 17 SDK. Specifically, it is this class: https://developer.apple.com/documentation/avfaudio/avspeechsynthesisvoice that's the problem

New, user downloaded voices will have the necessary data in their audioFileSettings dictionaries, as do very old (and somewhat useless Apple voices bundled in the SDK)

This should be an easy, quick fix for an AV Framework engineer at Apple.

1

Answer 11

Macho Man Randy Savage OP

Dec ’23

For whatever reason I'm not getting e-mail notifications when people reply in this thread even though I'm "following it." Weird.

I'm a little confused by all the manually downloading voice stuff. I don't particularly care that much that speech synthesis isn't working in the simulator (although it would be nice if it did...like it use to). I noticed, on device, that speech synthesis just randomly seems to stop working since the release of iOS 17. Incorrect behavior includes:

-Jumping around to random ranges of the speech string at random points. -Speech synthesis simply stops speaking at certain points of some speech strings.

When synthesizing short amounts of text (a couple of words) it usually "works" but the inconsistent behavior seems to match the idea that there are possibly memory management issues going on under the covers. I haven't checked on iOS 17.2 to see if it's fixed. Nothing in the release notes say anything about it. The API is basically unusable. My TSI got closed with a no workaround available message and my bug report has not received a reply.

Sooo....they seem to be in no rush to fix it. I was going to release my app in just a few days before they broke it and put me on hold. Investigating alternatives. Feels like I got burned. I imagine all the devs who already have a released app using AVSpeechSynthesizer must feel really burned.

0

Answer 12

And0Austria OP

Dec ’23

In iOS 17.2 the address sanitizer crash seems to be fixed and also in the field I don't see TTS related crashes anymore. Have anyone seen some functional problems (like stopping speaking of long texts) on iOS 17.2?

0

Answer 13

Macho Man Randy Savage OP

Dec ’23

@And0Austria Thanks for sharing that info. I'll have to give AVSpeechSynthesizer another try on iOS 17.2 now and see. It would be nice (if they did fix it) if they included that information in the release notes and/or updated some of our bug reports but if it's fixed that'll be good enough....

0

Answer 14

mt404 OP

Jan ’24

This is a frustrating bug that has been around in various incarnations since iOS 15. I hold no hope that Apple will ever fix. Have resorted to 3rd part TTS services as not able to rely on this for mission critical application features. Sadly they cost money, but I don't see any real alternative. On the plus side, the voices are about a million time better than the broken/garbage ones that are supposedly part of the AVFoundation framework.

Godspeed.

0

Answer 15

frankusu OP

Feb ’24

Got Xcode 15.3 Beta with iOS 17.2 fixes the problem

0