Dataset
Learn how to upload datasets using the Python SDK with pandas DataFrames, CSV files, or Excel sheets.
The Dataset class in the Python SDK provides powerful methods to upload data from various sources. Choose the upload method that best matches your data format and workflow.
Upload pandas DataFrames directly without saving to disk
Upload CSV files from your local file system
Upload specific sheets from Excel workbooks
SDK Upload Methods
upload_dataframe(dataframe: DataFrame, name: str) -> Dataset | None
Upload a pandas DataFrame directly using the SDK client.
Parameters:
dataframe: pandas.DataFrame to uploadname: User-friendly dataset name
Returns:
Datasetobject on success (with id populated)Noneon failure
Example:
import pandas as pd
from fount import Fount
# Initialize SDK client
client = Fount(api_key="your-api-key")
# Create or load your DataFrame
df = pd.read_csv('local_data.csv')
# Upload using SDK
dataset = client.upload_dataframe(df, name="Sales Data")
if dataset:
print(f"Upload successful! Dataset ID: {dataset.id}")Use Cases:
- When you've already processed data in pandas
- For dynamic data generation in your Python scripts
- When working with in-memory transformations
SDK Implementation Notes
Error Handling
All SDK upload methods return None on failure. Always check the return value before proceeding:
# Good practice: Check return value
dataset = client.upload_csv(pathname, name="My Data")
if dataset is None:
print("Upload failed - check your file path and permissions")
# Handle error appropriately
else:
print(f"Success! Dataset ID: {dataset.id}")
# Continue with dataset operationsDataset Object
The SDK returns a Dataset object containing:
id: Unique identifier for the uploaded datasetname: The friendly name you provided- Additional metadata about the upload
You'll use the id field for subsequent SDK operations like querying or updating the dataset:
# Use dataset.id for further operations
results = client.query_dataset(dataset.id)SDK Best Practices
- Client Initialization: Always initialize the SDK client with proper authentication
- Naming: Use descriptive names that clearly identify your data
- File Paths: Use absolute paths or ensure relative paths are correct
- Memory Management: For very large files, prefer
upload_csv()overupload_dataframe()to avoid memory issues - Excel Sheets: Verify sheet names exactly match (case-sensitive)
- Error Handling: Implement proper error handling for production code
Troubleshooting
upload_csv
Upload failed - check your file path and permissions / dataset is None
The CSV upload did not create a Dataset object. The method returns None on failure.
Common causes: wrong local file path, file not readable, or using a relative path from the wrong working directory.
Fix:
from pathlib import Path
print(Path("my_data.csv").resolve()) # check absolute path
dataset = client.upload_csv("my_data.csv", name="My Data")
if dataset is None:
print("Upload failed — check path and file permissions")FileNotFoundError: [Errno 2] No such file or directory
The SDK cannot find the file at the given path.
Fix: Use an absolute path or verify the relative path from os.getcwd():
import os
print(os.getcwd()) # confirm working directory
dataset = client.upload_csv("/absolute/path/to/data.csv", name="My Data")ParserError: Error tokenizing data / Expected X fields in line Y, saw Z
The CSV file is malformed or uses a delimiter that pandas cannot parse.
Fix: Verify the file opens cleanly in pandas before uploading:
import pandas as pd
df = pd.read_csv("my_data.csv") # must succeed before upload
print(df.head())Re-save the file as a clean UTF-8 CSV if needed.
upload_dataframe
AttributeError: object has no attribute 'columns' / dataframe must be a pandas DataFrame
The object passed as dataframe is not a valid pandas DataFrame.
Fix: Load the data into pandas first:
import pandas as pd
df = pd.read_csv("my_data.csv")
assert isinstance(df, pd.DataFrame)
dataset = client.upload_dataframe(df, name="My Data")Object of type ... is not JSON serializable / Upload failed
The DataFrame contains complex Python objects (lists, dicts, arrays, or mixed nested types) that cannot be serialized.
Fix: Check df.dtypes and convert object columns to strings, numerics, or datetimes before uploading:
print(df.dtypes)
# Convert problematic columns
df["my_col"] = df["my_col"].astype(str)MemoryError / Kernel died / Upload failed for large dataframe
The in-memory DataFrame is too large for the upload path to handle.
Fix: Use upload_csv() for large files instead:
# Check memory usage first
df.info(memory_usage="deep")
# Use file-based upload for large data
dataset = client.upload_csv("/path/to/large_file.csv", name="Large Data")upload_excel
Worksheet named '...' not found
The sheet name passed does not exist in the workbook (sheet names are case-sensitive).
Fix: List available sheet names and copy the exact value:
import pandas as pd
print(pd.ExcelFile("my_workbook.xlsx").sheet_names)
# Then use the exact name:
dataset = client.upload_excel("my_workbook.xlsx", sheet_name="Sheet1", name="My Data")Unsupported format / Excel file format cannot be determined / BadZipFile
The file is not a valid Excel workbook (e.g., it is a CSV renamed to .xlsx, corrupted, or password-protected).
Fix: Open the file locally to confirm it is a real .xlsx workbook. If it is a CSV, use upload_csv() instead:
dataset = client.upload_csv("my_data.csv", name="My Data")All Upload Methods
EmptyDataError / No columns to parse from file / Uploaded dataset has 0 rows
The dataset is empty or could not be parsed into rows and columns.
Fix: Verify the data before uploading:
print(df.shape) # must be (rows > 0, cols > 0)
print(df.head())
print(df.columns.tolist())ValidationError: name field required / name must be a valid string
The name parameter is missing, None, blank, or not a string.
Fix: Always pass a descriptive non-empty string:
dataset = client.upload_dataframe(df, name="q4_sales_weekly")Dataset not found / 404 / NoneType object has no attribute 'id'
The dataset reference used in a downstream call (train(), tune(), inference()) is invalid.
Fix: Print dataset.id immediately after upload to confirm the object is valid before proceeding:
dataset = client.upload_dataframe(df, name="My Data")
print(dataset.id) # must not be NoneDuplicate column names / columns overlap / ambiguous column selection
The uploaded dataset has repeated column headers, which makes column selection in training/tuning ambiguous.
Fix: Rename columns so every column is unique before uploading:
print(df.columns[df.columns.duplicated()].tolist()) # find duplicates
df.columns = [f"{c}_{i}" if df.columns.tolist().count(c) > 1 else c
for i, c in enumerate(df.columns)]