Streaming is available in most browsers,
and in the Developer app.
-
Create custom catalogs at scale with ShazamKit
Learn how ShazamKit can help you build custom catalogs and support exact matching of any audio source within your app — all on-device. Find out how you can easily generate audio signatures and build catalogs at scale through the new ShazamKit CLI. We'll also show you how you can quickly update your app to sync with large amounts of audio content like multiple seasons of a TV show or multiple episodes of a podcast, and we'll share updates to the ShazamKit API and SHMediaItems to help your apps respond precisely to key moments in audio sources using time ranges. For more on ShazamKit, we recommend watching "Explore ShazamKit" and "Create custom audio experiences with ShazamKit" from WWDC21.
Resources
Related Videos
WWDC23
WWDC21
-
Download
♪ instrumental hip hop music ♪ Hello, I'm Neil Foley, an engineer on the ShazamKit team. In 2021, we introduced ShazamKit, allowing you to match audio against Shazam's vast catalog of recorded music. We also introduced custom catalog matching, giving developers the ability to match their own audio and provide synced experiences. Now we have some important updates that streamline working with custom catalogs at scale. In this session, I'm going to use some of the existing ShazamKit concepts such as signatures, catalogs, and media items. If you're not already familiar with those, check out the "Explore ShazamKit” and "Create custom audio experiences with ShazamKit" talks from WWDC21. But as quick overview, ShazamKit lets you convert audio into a special format that can be matched. We call these signatures. Signatures can be combined with media items containing metadata to form a reference signature. And reference signatures can be stored together in a file that we call a custom catalog. Now that we are all caught up, I'll take you through building custom catalogs at scale, and then I'll talk about some tips and tricks to make great catalogs. In today's custom catalog workflow, if you have a small amount of content you want to be matched, working with custom catalogs can be a simple task. You just need to follow these steps. Record your audio in a format that ShazamKit accepts.
Use the signature generator to transform it into a signature.
Annotate it with your metadata, and then store it in a custom catalog. And that's it, you can provide a Shazam experience. But some of those steps can be daunting, especially if you're not familiar with audio programming. Dealing with sample rates and buffers can be tricky even for the most experienced developer. And what happens when you have a vast amount of content you'd like to make Shazamable, like 10 seasons of a TV show? This workflow can become painful.
And if you have large amounts of content, it can quickly become unmanageable. If you're thinking of improving this workflow for yourself, you'll probably need to write code to transform audio into signatures, more code to load and associate media items, and each time you change your content, you'll have to repeat the work. This is a big investment when you just want to match some audio. And then if you want to sync content with ShazamKit, you need complicated logic to figure out what should be shown and when. I'll introduce some great enhancements to ShazamKit that streamline this workflow. But first a quick demo. Here I have the FoodMath app that Alex demonstrated in 2021 that syncs a maths quiz with an on screen lesson. I've updated it with the latest ShazamKit features, and I'm going play back the FoodMath video to see how it syncs.
Skip to 26 seconds.
2, 3 green apples. How many apples do I have in total? Your timer starts…now. Okay, time's up. Let's see how you did. Skip to 56 seconds. Today, to spice it up a bit, when I went to the shop, started with 2 red apples... and I bought 2 green apples. How many apples did I have in total this time? Your timer starts…now.
Okay, time's up. Seems to be working great.
There's rich content synced with the video and when I said "now," the menu appeared at exactly the right time. Also, when content was no longer relevant, it disappeared right on cue. But how does it work? Let's have a look at the code. There's just a simple loop. It uses an AsyncSequence on the session instead of the delegate callbacks that we used before. The sequence returns an enum representing match, no match, or error. I'm only interested in matches, so I've restricted the loop to just that case. And to build the result for display, I reduce the media items to the content that I need.
There's actually not much more to see in the app, just SwiftUI views that are driven by the matchResult that we create. There's no complicated logic or timing code and it syncs perfectly. So the question remains, how does it sync so well? FoodMaths' secret is the rich custom catalog that drives the experience. I created the catalog with a simple tool that we've built to complement ShazamKit, and you can use it too to create rich experiences in your own apps. The Shazam CLI ships as part of macOS 13 and provides an easy way to sync content. It can help to automate some of the repetitive tasks associated with creating custom catalogs. Let's update the custom catalog that I just showed you. Time for another demo.
Here's a folder containing the FoodMath video file, and here's my terminal in the same folder. I'll use the CLI to convert the video into a signature using the signature command.
I just pass the video file as input and specify our signature output.
Okay. There's our signature.
Now I want combine this signature with media items to make a custom catalog. The CLI accepts a simple comma separated file for describing media items that I'll copy here.
It describes everything that I need to sync my content.
Here's where I've specified my titles, and here's a custom JSON field I've defined for the equation. The headers map to media item properties. For details on the mapping, run the custom catalog create command with the help flag.
It describes the relationship between the csv headers and media item properties. Now I want to combine them together into a custom catalog. So I'll run the create command.
I pass in the signature file and the csv file and it outputs a catalog.
Okay, now we have our catalog. Excitingly, I have early access to the latest FoodMath episode, so I want to add that to our catalog file. Let me copy the files here.
Here's our media items for our new episode.
I'll run the update command passing in the video, the new media, and the catalog to update.
Okay, we've updated our catalog. That's a quick overview of how to create catalogs, but if you're like me, you'll really you'll want to script this.
The FoodMath app actually has quite a few new episodes, and I want to add them all to this catalog. I've written a really simple script that loops through all the episode folders and combines them into a custom catalog. I'll run it now.
There we go. We now have one catalog representing every FoodMath episode and the script used the display command to detail what's inside the catalog. I think we have everything. The foodmath project is already referencing our new catalog. So let's build and run so that we can enjoy doing some maths.
Skip to 30 seconds. How many apples do I have in total? Your timer starts…now. Okay, time's up. Let's see how you did. I like that guy. That's a great episode. What about a new episode? Let's try that.
Skip to 15 seconds. Over the years, I explored what makes a guacamole truly delicious, and I wrote down my favorite guacamole recipe. It calls for 4 avocados. Tomorrow my friend is visiting. So for the two of us, I only need to make half of the portion. How many avocados do I need? The timer starts…now. That's correct. You need two avocados. Let's make this guacamole together. Let's give this a try.
Mmm. That turned out to be great. I hope you had some fun and see you next time.
Oh! They have a new host. Interesting. Anyway, I've created a rich synced experience in no time at all. The Shazam CLI supports a rich set of commands. Let's go over them.
You can create a signature from any media file that has an audio track. You can create custom catalogs by combining signatures and media items. You can display a catalog's content. Add, remove, and export both signatures and media items. Next, on to how the CLI created the signatures from the FoodMath videos.
SHSignatureGenerator now has a method signatureFromAsset that's available on all platforms. With this method, there's no more manually pulling audio buffers from media. Simply pass an AVAsset with an audio track to turn it into a signature. If you have multiple tracks in your asset, they'll be mixed together ensuring the signature captures everything. Okay, now that I have a signature that represents the media, how did I accurately sync content? I used the Timed MediaItem API. Attaching a time range to the media item makes it easy to specify when it starts and when it ends. Media items can also have multiple time ranges to target more than one portion of a signature. Imagine that you have a media item that targets the chorus of a song. You can add a time range for each place it's sung.
Specifying the time ranges is only useful if you're notified when they start and when they end. ShazamKit will deliver a match callback synced with the time range, one when it starts and one when it ends. Signatures can contain many media items, so this callback will contain only the media items that are in range at that specific point in time. There's a few simple rules for which media items will be returned in a callback and their order, so let's go over them.
Media items outside of their time range will not be returned. Media items within their time range will be returned, with the most recent events coming first.
And finally, media items with no time ranges will always be returned last, but they will be unordered. Media items that have no time range can be a great place to store global information that applies to the whole reference signature. In my FoodMath example, I used it to store the name of the episode. It appears when no other media items are in range.
One final point, if all your media items have time ranges and none of them are in scope, ShazamKit will always return a media item with basic match information. This way, you will always get important properties, such as the predictedCurrentMatch offset and the frequencySkew.
And in code, it's easy too. Timed media items are created by specifying the timeRanges media item property. It's an array of Swift ranges. It can also be read back using the timeRanges property. And for Objective-C programmers, there's a new SHRange class as a drop in replacement. Now that you've seen how to build them, let's explore some tips and tricks to make great custom catalogs. Avoid creating many small signatures for one piece of media. A signature is a one to one mapping to the media that it represents, so for each piece of audio you have, be it from a song or video, create one signature for the entire duration.
A longer signature provides more opportunities for ShazamKit to match audio peaks, resulting in better accuracy. It also avoids issues with query signatures overlapping multiple reference signatures.
Using the new Timed MediaItem API, you can target synced content to individual areas. There's no need to divide a piece of audio into multiple signatures. I showed an example where we had one piece of media, but with multiple media items. But what should we do if we have a huge amount of content that we want to make Shazamable? How should we split it up? There's a trade-off you need to make when splitting your content across custom catalogs. If you create individual catalogs for each media asset, you'll need to know which piece of audio is being played so that you can load the correct catalog. And if you put them all together in one catalog, you'll have a larger download and use more memory, but you can match many more pieces of audio. Our advice is to keep the catalog files you create tightly focused. For example, a catalog per music track or the whole album, but not the artist's whole discography. Keeping things separate means that you can decide what to load at runtime. You can do that with the custom catalog add API.
Try it out and see if helps with your use case. If you have multiple audio assets that sound the same, maybe a show that always starts with the same intro music, and you want to provide a custom experience for each episode, or a song that's sampled in another track, maybe consider using frequency skew as a differentiator. Skewing audio is raising or lowing the frequencies in the recording. When you do this, you affect how the audio sounds, but if you do it by a small enough amount, it can be noticed by ShazamKit but not by the average human ear.
So if we take an audio recording, make a custom catalog from it, and then play it back with the frequencies slightly shifted: ShazamKit will still match the audio, and it will also report the skew amount through the frequencySkew property. Here's how to do that in code.
There are limits to how much you can skew audio without the change becoming either noticeable to the human ear or unrecognizable to ShazamKit.
Keeping the skew to less than 5 percent should be safe and provide enough room to differentiate multiple skewed recordings. To really take advantage of this, use the frequencySkew ranges. Media items will only be returned if they fall inside the specified skew ranges.
The range specifies as a percentage how much the audio differs from the original. A value of 0 indicates the audio is unskewed and a value of .01 indicates a 1 percent skew. You can access the property on media items using the frequencySkewRanges property.
I'll go over the steps to get this working in your app: First create a reference signature of your original audio recording. Then take a media item and restrict it by frequency skew to 3 to 4 percent. Place this inside your custom catalog.
Now play back the audio skewed by 3 to 4 percent, and your media item will be returned. Playing back the audio unskewed or skewed outside of the range will not return your media item. That's frequency skewing.
Now that you've seen the exciting updates to ShazamKit this year, you're ready to make some amazing synced experiences. So remember these best practices: First, create one signature per media asset. You'll get better accuracy from ShazamKit and simpler creation pipeline. Create your signatures with SHSignatureGenerators signatureFromAsset. It accepts a wide variety of media, meaning you no longer have to deal with low level audio details.
Target synced content to areas of interest with the new Timed MediaItem API. It combines a simple API with excellent accuracy. And finally let the Shazam CLI streamline the way you create custom catalogs. It's been designed take away the hassle of dealing with vast amounts of media and let you focus on the great experiences you want to make instead. I hope you enjoyed the latest updates to ShazamKit, and I'm excited to see you make everything Shazamable. All of the information we discussed and links to documentation are attached to this session. Thanks for joining. Enjoy the rest of WWDC22. ♪ ♪
-
-
4:26 - Food Math Matcher
/* See LICENSE folder for this sample’s licensing information. Abstract: The model that is responsible for matching against the catalog and update the SwiftUI Views. */ import ShazamKit import AVFAudio struct MatchResult { var title: String? var equation: Equation? var episode: Episode? var answerRange: ClosedRange<Int>? var hasContent: Bool { equation != nil || title != nil || answerRange != nil } } class Matcher: NSObject, ObservableObject, SHSessionDelegate { @Published var matchResult: MatchResult? private var session: SHSession! private let audioEngine = AVAudioEngine() private var matchingTask: Task<Void, Never>? = nil func match(catalog: SHCustomCatalog) throws { session = SHSession(catalog: catalog) session.delegate = self let audioFormat = AVAudioFormat(standardFormatWithSampleRate: audioEngine.inputNode.outputFormat(forBus: 0).sampleRate, channels: 1) audioEngine.inputNode.installTap(onBus: 0, bufferSize: 2048, format: audioFormat) { [weak session] buffer, audioTime in session?.matchStreamingBuffer(buffer, at: audioTime) } try AVAudioSession.sharedInstance().setCategory(.record) AVAudioSession.sharedInstance().requestRecordPermission { [weak self] success in guard success, let self = self else { return } Task.detached { try? self.audioEngine.start() } } Task { @MainActor in for await case .match(let match) in session.results { self.matchResult = match.matchResult } } } } extension SHMatch { var matchResult: MatchResult { mediaItems.reduce(into: MatchResult()) { result, mediaItem in result.title = result.title ?? mediaItem.title result.episode = result.episode ?? mediaItem.episode result.equation = result.equation ?? mediaItem.equation result.answerRange = result.answerRange ?? mediaItem.answerRange } } }
-
13:51 - Timed Media Items
// Restrict this media item to only describe the first 5 seconds let mediaItem = SHMediaItem(properties: [ .title: "Title", .timeRanges:[0.0..<5.0] ]) let timeRanges: [Range<TimeInterval>] = mediaItem.timeRanges
-
16:02 - Combine Catalogs
let parentCatalog = SHCustomCatalog() parentCatalog.add(from: URL(fileURLWithPath: "/path/to/Episode1.shazamcatalog")) parentCatalog.add(from: URL(fileURLWithPath: "/path/to/Episode2.shazamcatalog")) parentCatalog.add(from: URL(fileURLWithPath: "/path/to/Episode3.shazamcatalog"))
-
16:58 - Frequency Skew
func within(range: Range<Float>, for matchedMediaItem: SHMatchedMediaItem) -> Bool { range.contains(matchedMediaItem.frequencySkew) }
-
17:21 - Frequency Skew Ranges
// Restrict this media item to only describe the first 5 seconds let mediaItem = SHMediaItem(properties: [ .title: “Frequency Skewed Audio”, .frequencySkewRanges:[0.01..<0.02] ]) let frequencySkewRanges: [Range<Float>] = mediaItem.frequencySkewRanges
-
-
Looking for something specific? Enter a topic above and jump straight to the good stuff.
An error occurred when submitting your query. Please check your Internet connection and try again.