Streaming is available in most browsers,
and in the Developer app.
-
Create a great ShazamKit experience
Discover how your app can offer a great audio matching experience with the latest updates to ShazamKit. We'll take you through matching features, updates to audio recognition, and interactions with the Shazam library. Learn tips and best practices for using ShazamKit in your audio apps. For more on ShazamKit, check out "Create custom catalogs at scale with ShazamKit" from WWDC22 as well as "Explore ShazamKit" and "Create custom audio experiences with ShazamKit" from WWDC21.
Resources
Related Videos
WWDC22
WWDC21
-
Download
♪ ♪ David: Hi, I'm David Ilenwabor, an Engineer on the ShazamKit team.
ShazamKit is a framework that allows you to bring audio recognition to your apps. You can match audio against Shazam's vast music catalog or match against your own prerecorded audio using custom catalogs. 2022 saw some great updates to ShazamKit which improved working with custom catalogs at scale. There was the introduction of the Shazam CLI to handle heavy workflow when using custom catalogs, time restricted media items for better syncing, and frequency skewing to differentiate between two similar sounding bits of audio. If you're not already familiar with how these work, check out the Create Custom catalogs at scale with ShazamKit video. But to give a quick overview, ShazamKit lets you perform a match by converting audio into a special format called Signatures. You can pass in a stream of audio buffers or the signature data into a ShazamKit session. The session then uses the signature to find a match in the Shazam Catalog or a Custom catalog. If there's a match, the session returns a match object with media items that represent the metadata of the match. You can then display the media items in your app. ShazamKit can perform a match by generating a signature from a stream of audio buffers or using a signature file which can be stored on disk. Signatures are irreversible which means it is not possible to reconstruct the original recording from a signature. This protects the privacy of our customers. A catalog is a group of signatures with their associated media items and a match occurs when a query signature sufficiently matches part of a reference signature in a catalog. Matches can occur even when the query signature is noisy, such as music playing in a restaurant. Now that I've covered that, I'll move on to the exciting new updates in ShazamKit this year. In this session, I'll go through new changes for recognizing audio with ShazamKit, then I'll talk about the Shazam Library API, which has been redefined with exciting new functionality. Finally, I'll take you through some best practices for creating better app experiences with ShazamKit. Before I get started, I suggest you download the attached sample code project on the developer portal. I will be making use of this project throughout this video. There's a lot to cover, so I'll get started.
First off, Audio recognition. The process of using ShazamKit to recognize audio from the microphone can be summarized in the following steps. First, ask for microphone permissions from the user. Then, start recording after permissions have been granted. Next, pass in the recorded audio buffers to ShazamKit, and finally, handle the result. To demonstrate this, I've built a demo app which you can find in the sample project. I love dancing, and to keep up with the latest trends, I built an app to help me discover trending dance moves to a song. The app works by listening to audio using the microphone, and proceeds to find a dance video. So for example, I can ask Siri to help me find a song. Hey, Siri, play "Push It" by Dukes.
Siri: Now playing "Push It" by Dukes. David: Then, I can tap the Learn The Dance button to start recording. ♪ ♪ ShazamKit recognizes the song and the app searches for an appropriate dance video to go with it. Seems like I got one. Hmm! Looks like my twin Dancing Dave is showing me some moves. This looks exciting. So how was this implemented? Let me take you through the code. Here I have the sample project opened in Xcode. I have added the microphone usage description in my info.plist file which is used to request microphone access. I also have a host of SwiftUI views for the home screen and the dance video screen. However, this Matcher class is where all the magic of audio recognition happens.
On initialization, I have a method to configure and set up the audio engine. In this method, I install a tap to receive PCMbuffers and prepare the audio engine. Also, I have a match method which is called when I tap on the Learn The Dance button. I request for recording permission, and if this is granted, I call start on the audio engine to begin recording. Next, I tell the UI matching has started, then I call session.results and wait for an async sequence of match results. After receiving a result, I set the match object if it was a match, and I handle the no match and error cases. This class also has a stopRecording Function in which I stop the audio engine.
This works great, but notice how I have a lot of setup code to configure the audio engine before I can receive audio buffers. This can be challenging to get right, especially if you aren't familiar with audio programming. And so, to make recording and matching easier, we've introduced a new API called SHManagedSession. Managed Session automatically takes care of starting the recording for you without the hassle of setting up audio buffers. This makes it very easy to set up and use.
Microphone permission is required to use Managed Session. Without this permission, the session cannot start recording. Therefore, it is important you add the Microphone usage description entry to the info.plist file of your app. Managed Session will use this description when asking for microphone access from the user. So how can I use this API in code? First, I'll create an instance of SHManagedSession, then I can wait on a result by calling the result method. This method returns an enum which has three states that can either be a match, NoMatch, or an error. Next, I can switch over the result, using the returned media items in the case of a match, and handling the no match and error cases. And what if I want to have a longer recording session that can return many results over time? Well, I can do this by using the async sequence results property on managedSession. I can use each result that's received from the sequence just as before. This ensures I can keep recording audio for long periods. Finally, I can stop matching by calling cancel on managedSession. This cancels any currently running match attempt and stops recording. And that's it. With Managed Session, it's just a few lines of code to start recording and receive a result after matching. Going back to my app, I'm going to update the Matcher implementation to use managedSession. I can replace all instances of SHSession with SHManagedSession.
Then, I can delete the configure audio engine method and its usage.
And in the match method, I can delete the calls to request recording permission and to start the audio engine.
Finally, in the stopRecording method, I can replace the existing code to stop the audio engine with just a call to managedSession's cancel method.
Now, I'll run the app to make sure everything is still working as expected. Hey, Siri, play "Push It" by Dukes.
Siri: Here's "Push It" by Dukes. ♪ ♪ Exciting! Everything is still working fine, but this time, the code is even better and cleaner with Managed Session.
That's not all. There's even more to Managed Session to talk about. Depending on your use case, you may want managedSession to prepare for a match attempt ahead of time. Preparing a Managed Session makes the session more responsive when matching. It also preallocates the necessary resources needed for a match and, it starts prerecording in anticipation of a match attempt.
To give you an idea of the benefits of using prepare, here's a timeline representing the behavior of the session without calling prepare. When you ask for a result, the session allocates the resources for the match attempt, then starts recording, finally, it returns a match. However, when you call prepare, the session immediately preallocates the resources and starts prerecording. Then, when you ask for a result, the session returns a match faster than before. To do this in code, I can simply call the prepare method before I ask for a result. Calling this method is entirely up to you and ShazamKit will call it on your behalf if necessary.
Now, you might be wondering, "How do I track the current behavior of the session? "For example, in a long running session, "how do I know it's recording or matching or doing something else?" To help with this, Managed Session has a property called state which represents the current state of the session. The three states are idle, prerecording, and matching. In the idle state, the session is neither recording nor making a match attempt. This is the case if the session just completed a single match attempt or you call cancel, or the session terminates the async sequence of results when carrying out multiple matches. Prerecording represents the state after the session has been prepared. In this state, all the necessary resources for matching are ready and the session is prerecording for a match attempt. You can then proceed with matching or cancel prerecording. Matching is the third possible state which indicates the session is making at least one match attempt. Calling prepare in this state will be ignored by the session. Here's an example of how the managedSession state could be used in SwiftUI to drive view behavior. Here, I have the sample implementation of a subview from the demo app. I have implemented different behaviors for this view if the state is idle or matching. Currently, the state of the session is idle and the text view is set to Hear Music. Also, I have a conditional that checks if the state is matching or not. If it is, I display a progress view, and if it's not, I display the Learn the Dance button. Since the state is currently idle, the Learn the Dance button is displayed. When I tap on the button, the state changes to matching and my UI automatically refreshes. This time the text is set to Matching and the progress view replaces the button since matching has commenced. Whenever the state of the session changes, SwiftUI will automatically refresh your views to respond to those changes without any extra work. And this is because managedSession conforms to Observable, which is a new Swift type that makes objects automatically communicate their changes to observers. Therefore, SwiftUI can easily respond to any state changes of managedSession. To learn more about Observable, check out the Discover Observation of SwiftUI video.
Now that I've covered audio recognition, I'll talk about the Shazam library.
In 2021, ShazamKit provided an API to allow developers to write a match result to the Shazam Library, provided it has a valid Shazam ID. This means that it corresponds to a song in the Shazam Catalog. The added item is visible in the Control Center Music Recognition module and the Shazam app if installed. It is also synced across devices. There is no special permission required to write to the Shazam library, but I recommend you avoid storing content in it without making customers aware, as all songs saved in the library will be attributed to the app that added them. Here, the second song in the list is attributed to the ShazamKit Dance Finder app.
Over the years, usage of this API presented different use cases and led to some drawbacks. For example, what if you wanted to view items you've added in your own app? The go-to solution would be managing your own local storage which can be tedious to handle and prone to bugs. Because of these drawbacks, a new class has been introduced called SHLibrary. I recommend adopting SHLibrary, as it offers more extensive features compared to the previous SHMediaLibrary class. Some of the core features of SHLibrary include adding media items to the Shazam Library, which works the same way as the corresponding method in SHMediaLibrary; reading media items; and deleting media items from the library. Note that your app can only read and delete what it has added to the library. Items returned when you read are specific to your app and do not represent the entire library. Also, attempting to delete a media item your app hasn't added will throw an error. Next, I'm going to explain how to use SHLibrary.
Adding with SHLibrary is as simple as calling the addItems method of the default library object. This method takes in an array of media items to be added. Reading from the library is equally simple. As an example, here's how you can read items from the library and populate a List view in SwiftUI. You simply pass in the items property of the library object into the list initializer. SHLibrary also conforms to the new Swift Observable type, therefore, your SwiftUI views will automatically reload when there's a change. You can also read from the library in a non-UI context. For example, if I want to retrieve the most popular genre of a user from their synced Shazams, I can ask for the current items of the library. Then, once I have this, I can filter through the array of items to get all the returned genres, and count the genre with the highest frequency. Finally, I can remove items from the library by calling removeItems on the library object, passing in the array of media items to be removed. Going back to my app, since I've added recognized songs to my library, I can use the new SHLibrary to read these songs. In the RecentDancesView, I have a List which contains an empty array of mediaItems in the initializer. I'll replace the empty array with items from SHLibrary to automatically read my library items.
I'll run the app with these changes.
As soon as the app loads, I receive a list of songs which the app has added to the Shazam Library. With SHLibrary, I get this functionality for free, and I don't need to maintain a database of matched songs. Next up, I'll add a Swipe to delete action on each row, so I can delete a song from the library.
I can add a swipeAction on the row view.
Then when the swiped button is tapped, I can call the removeItems method of SHLibrary, passing in the media item that is to be deleted.
Now that's done, I'll run the app with these changes. I've got the app open on my iPad as well. I can swipe on an item on my iPhone, and tap the delete button. The changes are synced and the deleted item is also removed from the list on the iPad. This is amazing. Now that you've learned how to use the new library APIs and how you can make use of Managed Session to handle recording, I'll take you through some best practices and offer some tips when using some of the new features introduced this year. SHManagedSession and SHSession are closely related. They can achieve almost the same thing, albeit in different ways. Use managedSession when you want to let ShazamKit handle the recording for you. Use SHSession when you are generating the audio buffers and passing them into the framework. Use managedSession to recognize audio coming from the microphone or an AirPod. Use SHSession when you want to only recognize audio streaming from the microphone. Matching arbitrary signatures with managedSession is not supported. Therefore, if you have a signature file or loaded signature data in memory, use SHSession to match it. Finally, managedSsession automatically handles audio formats for the matching, while SHSession allows matching with multiple PCM audio formats.
Speaking of audio formats in SHSession, previously, the matchStreamingBuffer method only allowed matching PCM audio buffers with specific format settings at these sample rates. Audio buffers with unsupported settings resulted in a NoMatch. With this release, SHSession now supports PCM buffers with most format settings, sampled at a range of rates. You can pass in these buffers and SHSession will handle the format conversion for you. Finally, if you have two or more bits of audio that sound similar in a custom catalog, ShazamKit can now return all the matches from the custom catalog when you pass in a query signature that matches multiple reference signatures. The matches are returned, sorted by the best match quality and you can filter for the appropriate match result you want.
As a tip, properly annotate the reference signatures that sound similar in their respective metadata, so you can distinguish between which result you want.
Here's an example of how you can achieve this. Say I have a television show where every episode has the same intro sound. I can generate a televisionShowCatalog with reference signatures representing each episode. I can create a session using this catalog, and when matching the intro section, ShazamKit will return a match result with mediaItems of each episode. I can then filter through the mediaItems and only return mediaItems for a particular episode, say Episode 2, for example. This is how proper annotation helps.
Now that I've gone through all the exciting updates this year, I'll conclude by switching back to my wonderful app and attempt to learn one more dance. I'll switch to my AirPods and play a song. Since I am using Managed Session in the app, it can listen to the audio playing in the AirPod and find a dance video for me. I'm going to press on the touch control of my AirPods to play a song and wait for the app to detect the audio.
Sweet! Looks like Dancing Dave is showing off a couple of Afrobeat moves, which I'll try my best to learn after this talk. I hope you're as excited as we are with these new updates. Thank you for joining, and have a great WWDC. ♪ ♪
-
-
6:46 - Single match with SHManagedSession
let managedSession = SHManagedSession() let result = await managedSession.result() switch result { case .match(let match): print("Match found. MediaItemsCount: \(match.mediaItems.count)") case .noMatch(_): print("No match found") case .error(_, _): print("An error occurred") }
-
7:16 - Multiple matches with SHManagedSession
let managedSession = SHManagedSession() // Continuously match for await result in managedSession.results { switch result { case .match(let match): print("Match found. MediaItemsCount: \(match.mediaItems.count)") case .noMatch(_): print("No match found") case .error(_, _): print("An error occurred") } }
-
7:37 - Stop SHManagedSession
let managedSession = SHManagedSession() // Cancel the session managedSession.cancel()
-
8:02 - ShazamKit Matcher with SHManagedSession
import Foundation import ShazamKit struct MatchResult: Identifiable, Equatable { let id = UUID() let match: SHMatch? } @MainActor final class Matcher: ObservableObject { @Published var isMatching = false @Published var currentMatchResult: MatchResult? var currentMediaItem: SHMatchedMediaItem? { currentMatchResult?.match?.mediaItems.first } private let session: SHManagedSession init() { if let catalog = try? ResourcesProvider.catalog() { session = SHManagedSession(catalog: catalog) } else { session = SHManagedSession() } } func match() async { isMatching = true for await result in session.results { switch result { case .match(let match): Task { @MainActor in self.currentMatchResult = MatchResult(match: match) } case .noMatch(_): print("No match") endSession() case .error(let error, _): print("Error \(error.localizedDescription)") endSession() } stopRecording() } } func stopRecording() { session.cancel() } func endSession() { // Reset result of any previous match. isMatching = false currentMatchResult = MatchResult(match: nil) } }
-
10:07 - Preparing SHManagedSession
let managedSession = SHManagedSession() await managedSession.prepare() let result = await managedSession.result()
-
11:39 - SHManagedSession Idle State in SwiftUI
struct MatchView: View { let session: SHManagedSession var body: some View { VStack { Text(session.state == .idle ? "Hear Music?" : "Matching") if session.state == .matching { ProgressView() } else { Button { // start match } label: { Text("Learn the Dance") } } } }
-
12:25 - SHManagedSession Matching State in SwiftUI
struct MatchView: View { let session: SHManagedSession var body: some View { VStack { Text(session.state == .idle ? "Hear Music?" : "Matching") if session.state == .matching { ProgressView() } else { Button { // start match } label: { Text("Learn the Dance") } } } } }
-
15:23 - Adding with SHLibrary
func add(mediaItems: [SHMediaItem]) async throws { try await SHLibrary.default.addItems(mediaItems) }
-
15:34 - Reading with SHLibrary
struct LibraryView: View { var body: some View { List(SHLibrary.default.items) { item in MediaItemView(item: item) } } }
-
16:00 - Reading with SHLibrary in a non-UI context
// Determine a user’s most popular genre let currentItems = await SHLibrary.default.items let genres = currentItems.flatMap { $0.genres } // count frequency of genres and get the highest let mostPopularGenre = highestOccurringGenre(from: genres)
-
16:25 - SHLibrary Remove
func remove(mediaItems: [SHMediaItem]) async throws { try await SHLibrary.default.removeItems(mediaItems) }
-
16:42 - RecentDancesView with SHLibrary read and delete implementation
import SwiftUI import ShazamKit enum NavigationPath: Hashable { case nowPlayingView(videoURL: URL) case danceCompletionView } struct RecentDancesView: View { private enum ViewConstants { static let emptyStateImageName: String = "EmptyStateIcon" static let emptyStateTextTitle: String = "No Dances Yet?" static let emptyStateTextSubtitle: String = "Find some music to start learning" static let deleteSwipeViewOpacity: Double = 0.5 static let matchingStateTextTopPadding: CGFloat = 24 static let matchingStateTextBottomPadding: CGFloat = 16 static let progressViewScaleEffect: CGFloat = 1.1 static let progressViewBottomPadding: CGFloat = 12.0 static let learnDanceButtonWidth: CGFloat = 250 static let curvedTopSideRectangleHeight: CGFloat = 200 static let listRowBottomInset: CGFloat = 30.0 static let matchingStateText: String = "Get Ready..." static let notMatchingStateText: String = "Hear Music?" static let noMatchText: String = "No dance video for audio" static let navigationTitleText: String = "Recent Dances" static let learnDanceButtonText: String = "Learn the Dance" static let retryButtonText: String = "Try Again" static let cancelButtonText: String = "Cancel" } // MARK: Properties private var isListEmpty: Bool { SHLibrary.default.items.isEmpty } @State private var matchingState: String = ViewConstants.notMatchingStateText @State private var matchButtonText: String = ViewConstants.learnDanceButtonText @State private var canRetryMatchAttempt = false @State private var navigationPath: [NavigationPath] = [] // MARK: Environment @EnvironmentObject private var matcher: Matcher @Environment(\.openURL) var openURL var body: some View { NavigationStack(path: $navigationPath) { ZStack(alignment: .bottom) { List(SHLibrary.default.items, id: \.self) { mediaItem in RecentDanceRowView(mediaItem: mediaItem) .onTapGesture(perform: { guard let appleMusicURL = mediaItem.appleMusicURL else { return } openURL(appleMusicURL) }) .swipeActions { Button { Task { try? await SHLibrary.default.removeItems([mediaItem]) } } label: { Image(systemName: "trash") .symbolRenderingMode(.hierarchical) } .tint(.appPrimary.opacity(0.5)) } } .listStyle(.plain) .overlay { if isListEmpty { ContentUnavailableView { Label(ViewConstants.emptyStateTextTitle, image: ImageResource(name: ViewConstants.emptyStateImageName, bundle: Bundle.main)) .font(.title) .foregroundStyle(Color.white) } description: { Text(ViewConstants.emptyStateTextSubtitle) .foregroundStyle(Color.white) } } } .safeAreaInset(edge: .bottom, spacing: ViewConstants.listRowBottomInset) { ZStack(alignment: .top) { CurvedTopSideRectangle() VStack { Text(matchingState) .font(.body) .foregroundStyle(.white) .padding(.top, ViewConstants.matchingStateTextTopPadding) .padding(.bottom, ViewConstants.matchingStateTextBottomPadding) if matcher.isMatching { ProgressView() .progressViewStyle(.circular) .tint(.appPrimary) .scaleEffect(x: ViewConstants.progressViewScaleEffect, y: ViewConstants.progressViewScaleEffect) .padding(.bottom, ViewConstants.progressViewBottomPadding) Button(ViewConstants.cancelButtonText) { canRetryMatchAttempt = false matcher.stopRecording() matcher.endSession() } .foregroundStyle(Color.appPrimary) .font(.subheadline) .fontWeight(.semibold) } else { Button { Task { await matcher.match() } matchingState = ViewConstants.matchingStateText canRetryMatchAttempt = true } label: { Text(matchButtonText) .foregroundStyle(.black) .font(.title3) .fontWeight(.heavy) .frame(maxWidth: .infinity) } .frame(width: ViewConstants.learnDanceButtonWidth) .padding() .background(Color.appPrimary) .clipShape(Capsule()) } } } .edgesIgnoringSafeArea(.bottom) .frame(height: ViewConstants.curvedTopSideRectangleHeight) } } .background(Color.appSecondary) .navigationTitle(isListEmpty ? "" : ViewConstants.navigationTitleText) .preferredColorScheme(.dark) .toolbarColorScheme(.dark, for: .navigationBar) .navigationBarTitleDisplayMode(.large) .toolbarBackground(Color.appSecondary, for: .navigationBar) .frame(maxHeight: .infinity) .onChange(of: matcher.currentMatchResult, { _, result in guard navigationPath.isEmpty else { print("Dance video already displayed") return } guard let match = result?.match, let url = ResourcesProvider.videoURL(forFilename: match.mediaItems.first?.videoTitle ?? "") else { matchingState = canRetryMatchAttempt ? ViewConstants.noMatchText : ViewConstants.notMatchingStateText matchButtonText = canRetryMatchAttempt ? ViewConstants.retryButtonText : ViewConstants.learnDanceButtonText return } canRetryMatchAttempt = false // Add the video playing view to the navigation stack. navigationPath.append(.nowPlayingView(videoURL: url)) }) .navigationDestination(for: NavigationPath.self, destination: { newNavigationPath in switch newNavigationPath { case .nowPlayingView(let videoURL): NowPlayingView(navigationPath: $navigationPath, nowPlayingViewModel: NowPlayingViewModel(player: AVPlayer(url: videoURL))) case .danceCompletionView: DanceCompletionView(navigationPath: $navigationPath) } }) .onAppear { if AVAudioSession.sharedInstance().category != .ambient { Task.detached { try? AVAudioSession.sharedInstance().setCategory(.ambient) } } matchingState = ViewConstants.notMatchingStateText matchButtonText = ViewConstants.learnDanceButtonText } } } }
-
20:23 - Filtering for specific media items
func match(from televisionShowCatalog: SHCustomCatalog) async -> [SHMatchedMediaItem] { let managedSession = SHManagedSession(catalog: televisionShowCatalog) let result = await managedSession.result() if case .match(let match) = result { // filter for only media items related to a particular episode let filteredMediaItems = match.mediaItems.filter { $0.title == "Episode 2" } return filteredMediaItems } return [] }
-
-
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.