VNRecognizedText returns wrong bounding box

Question

Created Dec ’23

Replies 0

Boosts 0

Participants 1

I am trying to parse text from an image, split it into words and store the words in a String array. Additionally I want to store the bounding box of each recognized word.

My code works but for some reason the bounding boxes of words that are not separated by a space but by an apostrophe come out wrong.

Here is the complete code of my VNRecognizeTextRequestHander:

let request = VNRecognizeTextRequest { request, error in
                guard let observations = request.results as? [VNRecognizedTextObservation] else {
                    return
                }
                // split recognized text into words and store each word with corresponding observation
                let wordObservations = observations.flatMap { observation in
                    observation.topCandidates(1).first?.string.unicodeScalars
                        .split(whereSeparator: { CharacterSet.letters.inverted.contains($0) })
                        .map { (observation, $0) } ?? []
                }
                // store recognized words as strings
                recognizedWords = wordObservations.map { (observation, word) in String(word) }
                // calculate bounding box for each word
                recognizedWordRects = wordObservations.map { (observation, word) in
                    guard let candidate = observation.topCandidates(1).first else { return .zero }
                    let stringRange = word.startIndex..<word.endIndex
                    guard let rect = try? candidate.boundingBox(for: stringRange)?.boundingBox else { return .zero }
                    let bottomLeftOriginRect = VNImageRectForNormalizedRect(rect, Int(captureRect.width), Int(captureRect.height))
                    // adjust coordinate system to start in top left corner
                    let topLeftOriginRect = CGRect(origin: CGPoint(x: bottomLeftOriginRect.minX,
                                                  y: captureRect.height - bottomLeftOriginRect.height - bottomLeftOriginRect.minY),
                                  size: bottomLeftOriginRect.size)
                    print("BoundingBox for word '\(String(word))': \(topLeftOriginRect)")
                    return topLeftOriginRect
                }
}

And here's an example for what's happening. When I'm processing the following image:

the code above produces the following output:

BoundingBox for word 'In': (23.00069557577264, 5.718113962610181, 45.89460636656961, 32.78087073878238)
BoundingBox for word 'un': (71.19064286904202, 6.289275587192936, 189.16024359557852, 34.392966621800475)
BoundingBox for word 'intervista': (71.19064286904202, 6.289275587192936, 189.16024359557852, 34.392966621800475)
BoundingBox for word 'del': (262.64622870703477, 8.558512219726875, 54.733978711037985, 32.79967358237818)

Notice how the bounding boxes of the words 'un' and 'intervista' are exactly the same. This happens consistently for words that are separated by an apostrophe. Why is that?

Thank you for any help

Elias

Boost