TabularData framework lets you import, organize, and export a table of data. It’s great when you’re training a machine learning model but it’s a handy tool in many other scenarios as well.
General:
DevForums tag: TabularData
TabularData framework documentation
Explore and manipulate data in Swift with TabularData tech talk
For a ‘hello world’ style example, see this DevForums post
Share and Enjoy
—
Quinn “The Eskimo!” @ Developer Technical Support @ Apple
let myEmail = "eskimo" + "1" + "@" + "apple.com"
TabularData
RSS for tagImport, organize, and prepare a table of data to train a machine learning model.
Posts under TabularData tag
6 Posts
Sort by:
Post
Replies
Boosts
Views
Activity
Hi,
In Xcode 14 I was able to train linear regression models with Create ML using large CSV files (I tested on about 30000 items and 5 features):
However, in Xcode 15 (I tested on 15.0.1 and 15.1), the training continuously stays in the "Processing" state:
When using a dataset with 900 items, everything works fine.
I filed a feedback for this issue: FB13516799.
Does anybody else have this issue / can reproduce it?
I'm building up a data frame for the sole purpose of using that lovely textual grid output. I'm getting output without any issue, but I'm trying to sort out how I might apply a formatter to a specific column so that print(dataframeInstance) "just works" nicely. In my use case, I'm running a function, collecting its output - appending that into a frame, and then using TabularData to get a nice output in a unit test, so I can see the patterns within the output.
I found https://developer.apple.com/documentation/tabulardata/column/description(options:), but wasn't able to find any way to "pre-bind" that to a dataframe Column when I was creating it. (I have some double values that get a bit "excessive" in length due to the joys of floating point rounding)
Is there a way of setting a formatter on a column at creation time, or after (using a property) that could basically use the same pattern as that description method above?
Can anyone show me some sample codes of the following function?
func aggregated<Element, Result>(
on columnNames: [String],
naming: (String) -> String,
transform: (DiscontiguousColumnSlice) throws -> Result?
) rethrows -> DataFrame
I am working with data in Swift using the TabularData framework. I load data from a CSV file into a DataFrame, then copy the data into a second DataFrame, and finally remove a row from the second DataFrame.
The problem arises when I try to remove a row from the second DataFrame, at which point I receive an EXC_BAD_ACCESS error. However, if I modify the "timings" column (the final column) before removing the row (even to an identical value), the code runs without errors.
Interestingly, this issue only occurs when a row in the column of the CSV file contains more than 15 characters.
This is the code I'm using:
func loadCSV() {
let documentsDirectory = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
let url = documentsDirectory.appendingPathComponent("example.csv")
var dataframe: DataFrame
do {
dataframe = try .init(
contentsOfCSVFile: url,
columns: ["user", "filename", "syllable count", "timings"],
types: ["user": .string, "filename": .string, "syllable count": .integer, "timings": .string]
)
} catch {
fatalError("Failed to load csv data")
}
print("First data frame",dataframe, separator: "\n") /// This works
var secondFrame = DataFrame()
secondFrame.append(column: Column<String>(name: "user", capacity: 1000))
secondFrame.append(column: Column<String>(name: "filename", capacity: 1000))
secondFrame.append(column: Column<Int>(name: "syllable count", capacity: 1000))
secondFrame.append(column: Column<String>(name: "timings", capacity: 1000))
for row in 0..<dataframe.rows.count {
secondFrame.appendEmptyRow()
for col in 0..<4 {
secondFrame.rows[row][col] = dataframe.rows[row][col]
}
}
// secondFrame.rows[row][3, String.self] = String("0123456789ABCDEF") /* If we include this line, it will not crash, even though the content is the same */
print("Second data frame before removing row",dataframe, separator: "\n") // Before removal
secondFrame.removeRow(at: 0)
print("Second data frame after removing row",dataframe, separator: "\n") // After removal—we will get Thread 1: EXC_BAD_ACCESS here. The line will still print, however
}
and the csv (minimal example):
user,filename,syllable count,timings
john,john-001,12,0123456789ABCDEF
jane,jane-001,10,0123456789ABCDE
I've been able to replicate this bug on macOS and iOS using minimal projects. I'm unsure why this error is occurring and why modifying the "timings" column prevents it.
It should be noted that this same error occurs with a single data frame loaded from a CSV file, which means that I basically cannot load from CSV if I want to modify the DataFrame afterwards.
I'm fairly new to Swift programming so I might be overlooking something, but I'm puzzled why the following code doesn't properly insert a row in a DataFrame. The goal is to move a row at a given index to a new index. I would normally:
Copy the row that I want to move
Remove the row from the original dataset
Insert the copy to the new position
The CSV I'm using is from Wikipedia:
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00
My code (Swift playground):
import Foundation
import TabularData
let fileUrl = Bundle.main.url(forResource: "data", withExtension: "csv")
let options = CSVReadingOptions(hasHeaderRow: true, delimiter: ",")
var dataFrame = try! DataFrame(contentsOfCSVFile: fileUrl!, options: options)
print("Original data")
print(dataFrame)
let rowToMove: Int = 2
let row = dataFrame.rows[rowToMove]
print("Row to move")
print(row)
dataFrame.removeRow(at: rowToMove)
print("After removing")
print(dataFrame)
dataFrame.insert(row: row, at: 0)
print("After inserting")
print(dataFrame)
This results in the following:
Original data
┏━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ ┃ Year ┃ Make ┃ Model ┃ Description ┃ Price ┃
┃ ┃ <Int> ┃ <String> ┃ <String> ┃ <String> ┃ <Double> ┃
┡━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ 0 │ 1,997 │ Ford │ E350 │ ac, abs, moon │ 3,000.0 │
│ 1 │ 1,999 │ Chevy │ Venture "Extended Edition" │ │ 4,900.0 │
│ 2 │ 1,999 │ Chevy │ Venture "Extended Edition, Very Large" │ │ 5,000.0 │
│ 3 │ 1,996 │ Jeep │ Grand Cherokee │ MUST SELL! air, moon roof, loaded │ 4,799.0 │
└───┴───────┴──────────┴────────────────────────────────────────┴───────────────────────────────────┴──────────┘
4 rows, 5 columns
Row to move
┏━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ ┃ Year ┃ Make ┃ Model ┃ Description ┃ Price ┃
┃ ┃ <Int> ┃ <String> ┃ <String> ┃ <String> ┃ <Double> ┃
┡━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ 2 │ 1,999 │ Chevy │ Venture "Extended Edition, Very Large" │ │ 5,000.0 │
└───┴───────┴──────────┴────────────────────────────────────────┴─────────────┴──────────┘
1 row, 5 columns
After removing
┏━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ ┃ Year ┃ Make ┃ Model ┃ Description ┃ Price ┃
┃ ┃ <Int> ┃ <String> ┃ <String> ┃ <String> ┃ <Double> ┃
┡━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ 0 │ 1,997 │ Ford │ E350 │ ac, abs, moon │ 3,000.0 │
│ 1 │ 1,999 │ Chevy │ Venture "Extended Edition" │ │ 4,900.0 │
│ 2 │ 1,996 │ Jeep │ Grand Cherokee │ MUST SELL! air, moon roof, loaded │ 4,799.0 │
└───┴───────┴──────────┴────────────────────────────┴───────────────────────────────────┴──────────┘
3 rows, 5 columns
After inserting
┏━━━┳━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ ┃ Year ┃ Make ┃ Model ┃ Description ┃ Price ┃
┃ ┃ <Int> ┃ <String> ┃ <String> ┃ <String> ┃ <Double> ┃
┡━━━╇━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ 0 │ 1,999 │ Chevy │ │ │ 5,000.0 │
│ 1 │ 1,997 │ Ford │ E350 │ ac, abs, moon │ 3,000.0 │
│ 2 │ 1,996 │ Jeep │ Grand Cherokee │ MUST SELL! air, moon roof, loaded │ 4,799.0 │
│ 3 │ nil │ nil │ nil │ nil │ nil │
└───┴───────┴──────────┴────────────────────────────────────────┴───────────────────────────────────┴──────────┘
4 rows, 5 columns
Everything is fine up until inserting. I spot a few issues:
A row gets deleted (original data row 1)
A row filled with nil's is added (at index 3)
the row I want to insert isn't properly inserted (notice how the 'model' text has gone).
I assume I'm missing something - does it have to do with the row copy keeping its index (2)? How can I fix this?
I have tried multiple playgrounds and consistently get the same error in any playground I create. There is a tabular data playground that does work but I see nothing I am not doing.
Here is the code that fails with
Error: cannot find 'MLDataTable' in scope
/* code start */
import CoreML
import Foundation
import TabularData
let jsonFile = Bundle.main.url(forResource: "sentiment_analysis", withExtension: "json")!
let tempTable = try DataTable
let dataTable = try MLDataTable(contentsOf: jsonFile)
print(dataTable)
/* code end */