EDA References
Tips, patterns, cheat sheets, and a full copy-paste script for Fount EDA.
Tips and FAQs
Use everything via fount_eda.*
fount_eda.*Call all functions directly through your fount_eda instance, for example, fount_eda.read_csv, fount_eda.DataFrame, fount_eda.array, fount_eda.train_test_split. There's no need to use fount_eda.pd.read_csv or reach into submodules manually.
Finding functions quickly
Autocomplete in your IDE will surface available functions after initializing fount_eda. You can also search for a function by pattern:
fount_eda.list("pattern")Priority and naming collisions
Fount EDA resolves naming conflicts by preferring pandas, then NumPy, then scikit-learn by default. If you want NumPy names to take priority, set this explicitly:
fount_eda.set_priority(["numpy", "pandas", "sklearn"])Common error messages
- "Tool not installed": install the missing dependency (
pandas,numpy, orscikit-learn) and retry - "Function not found": check for typos; the error message may suggest the closest valid function name
Keeping things simple
Use small, obvious steps: read → inspect → clean lightly → EDA → split → baseline model. Only add complexity if the EDA results highlight a specific need, such as heavy class imbalance or a high proportion of rare categories.
Cheat sheets
DataFrames
| Operation | Code |
|---|---|
| Read CSV | fount_eda.read_csv("file.csv") |
| Read TSV | fount_eda.read_csv("file.tsv", sep="\t") |
| Read Excel | fount_eda.read_excel("file.xlsx") |
| Read JSON | fount_eda.read_json("file.json", orient="records") |
| Create DataFrame | fount_eda.DataFrame({...}) |
| Select columns | df[["col1", "col2"]] |
| Filter rows | df[df["col"] > value] |
| Drop column | df.drop(columns=["col"], errors="ignore") |
| Sort | df.sort_values("col") |
| Merge | fount_eda.merge(left, right, how="left", on="key") |
| Concat rows | fount_eda.concat([df1, df2], axis=0) |
| Concat cols | fount_eda.concat([df1, df2], axis=1) |
| Describe | df.describe() |
| Value counts | df["col"].value_counts() |
| Parse dates | fount_eda.to_datetime(df["date"], errors="coerce") |
| Save CSV | df.to_csv("out.csv", index=False) |
| Save Excel | df.to_excel("out.xlsx", index=False) |
Arrays
| Operation | Code |
|---|---|
| Create | fount_eda.array([...]) |
| Range | fount_eda.arange(start, stop, step) |
| Linspace | fount_eda.linspace(start, stop, n) |
| Logspace | fount_eda.logspace(start, stop, n) |
| Zeros / Ones | fount_eda.zeros(shape) / fount_eda.ones(shape) |
| Eye / Diag | fount_eda.eye(n) / fount_eda.diag([...]) |
| Mean / Median | fount_eda.mean(x) / fount_eda.median(x) |
| Std / Var | fount_eda.std(x) / fount_eda.var(x) |
| Percentile | fount_eda.percentile(x, 90) |
| Quantile | fount_eda.quantile(x, 0.9) |
| Clip | fount_eda.clip(x, 0, 1) |
| Concatenate | fount_eda.concatenate([x, y]) |
| Vstack / Hstack | fount_eda.vstack([a, b]) / fount_eda.hstack([a, b]) |
| Random floats | fount_eda.random.random(n) |
| Random ints | fount_eda.random.integers(low, high, size=n) |
| Norm | fount_eda.linalg.norm(v) |
| FFT | fount_eda.fft.fft(signal) |
Machine learning
| Operation | Code |
|---|---|
| Train/test split | fount_eda.train_test_split(X, y, test_size=0.2, random_state=42) |
| One-hot encoder | fount_eda.OneHotEncoder(handle_unknown="ignore", sparse_output=False) |
| Column transformer | fount_eda.ColumnTransformer([...]) |
| Pipeline | fount_eda.Pipeline([...]) |
| Random Forest | fount_eda.ensemble.RandomForestRegressor(n_estimators=300, random_state=0) |
| Evaluate (R²) | reg.score(X_te, y_te) |
EDA suite
| Method | Use case |
|---|---|
fount_eda.eda(df, target=..., datetime_col=...) | Programmatic result object |
fount_eda.eda_md(df, target=..., datetime_col=...) | Markdown string |
fount_eda.eda_print(df, target=..., datetime_col=...) | Console print |
fount_eda.eda_report_and_score(df) | Full report and score |
rep.suitability_score | 0–100 model-readiness score |
rep.recommendations | Suggested improvements |
End-to-end script
Copy this into example_fount_eda.py and run it against your own data.csv. Assumes a numeric target column "Units" and an optional "date" column.
# example_fount_eda.py
from fount_eda import EDA
fount_eda = EDA()
# 1. Load data
df = fount_eda.read_csv("data.csv")
# 2. Light cleanup
df = df.drop(columns=["unused"], errors="ignore")
df = df[df["Units"].notna()]
# 3. EDA — choose one or run all three for comparison
rep = fount_eda.eda(df, target="Units", datetime_col="date")
print("Suitability:", rep.suitability_score)
md = fount_eda.eda_md(df, target="Units", datetime_col="date")
with open("eda_report.md", "w", encoding="utf-8") as f:
f.write(md)
print("\n--- Printed EDA ---")
fount_eda.eda_print(df, target="Units", datetime_col="date")
# 4. Baseline model
X = df.drop(columns=["Units"], errors="ignore")
y = df["Units"]
X_tr, X_te, y_tr, y_te = fount_eda.train_test_split(X, y, test_size=0.2, random_state=42)
reg = fount_eda.ensemble.RandomForestRegressor(n_estimators=300, random_state=0).fit(X_tr, y_tr)
print("Test R²:", reg.score(X_te, y_te))
# 5. Save cleaned data
df.to_csv("cleaned.csv", index=False)Practical notes
- Use clear, descriptive column names, they make EDA output and model results easier to interpret
- Keep your target numeric; if it's categorical, consider a classification approach (outside the scope of Fount EDA)
- Parse date columns once with
fount_eda.to_datetimeand keep them consistent throughout your script - When comparing dataset variants, re-run EDA on each and compare suitability scores
- Start with a simple baseline model, optimize only after the EDA results give you a clear direction