EDA References

Tips, patterns, cheat sheets, and a full copy-paste script for Fount EDA.

Tips and FAQs

Use everything via `fount_eda.*`

Call all functions directly through your fount_eda instance, for example, fount_eda.read_csv, fount_eda.DataFrame, fount_eda.array, fount_eda.train_test_split. There's no need to use fount_eda.pd.read_csv or reach into submodules manually.

Finding functions quickly

Autocomplete in your IDE will surface available functions after initializing fount_eda. You can also search for a function by pattern:

fount_eda.list("pattern")

Priority and naming collisions

Fount EDA resolves naming conflicts by preferring pandas, then NumPy, then scikit-learn by default. If you want NumPy names to take priority, set this explicitly:

fount_eda.set_priority(["numpy", "pandas", "sklearn"])

Common error messages

"Tool not installed": install the missing dependency (pandas, numpy, or scikit-learn) and retry
"Function not found": check for typos; the error message may suggest the closest valid function name

Keeping things simple

Use small, obvious steps: read → inspect → clean lightly → EDA → split → baseline model. Only add complexity if the EDA results highlight a specific need, such as heavy class imbalance or a high proportion of rare categories.

Cheat sheets

DataFrames

Operation	Code
Read CSV	`fount_eda.read_csv("file.csv")`
Read TSV	`fount_eda.read_csv("file.tsv", sep="\t")`
Read Excel	`fount_eda.read_excel("file.xlsx")`
Read JSON	`fount_eda.read_json("file.json", orient="records")`
Create DataFrame	`fount_eda.DataFrame({...})`
Select columns	`df[["col1", "col2"]]`
Filter rows	`df[df["col"] > value]`
Drop column	`df.drop(columns=["col"], errors="ignore")`
Sort	`df.sort_values("col")`
Merge	`fount_eda.merge(left, right, how="left", on="key")`
Concat rows	`fount_eda.concat([df1, df2], axis=0)`
Concat cols	`fount_eda.concat([df1, df2], axis=1)`
Describe	`df.describe()`
Value counts	`df["col"].value_counts()`
Parse dates	`fount_eda.to_datetime(df["date"], errors="coerce")`
Save CSV	`df.to_csv("out.csv", index=False)`
Save Excel	`df.to_excel("out.xlsx", index=False)`

Arrays

Operation	Code
Create	`fount_eda.array([...])`
Range	`fount_eda.arange(start, stop, step)`
Linspace	`fount_eda.linspace(start, stop, n)`
Logspace	`fount_eda.logspace(start, stop, n)`
Zeros / Ones	`fount_eda.zeros(shape)` / `fount_eda.ones(shape)`
Eye / Diag	`fount_eda.eye(n)` / `fount_eda.diag([...])`
Mean / Median	`fount_eda.mean(x)` / `fount_eda.median(x)`
Std / Var	`fount_eda.std(x)` / `fount_eda.var(x)`
Percentile	`fount_eda.percentile(x, 90)`
Quantile	`fount_eda.quantile(x, 0.9)`
Clip	`fount_eda.clip(x, 0, 1)`
Concatenate	`fount_eda.concatenate([x, y])`
Vstack / Hstack	`fount_eda.vstack([a, b])` / `fount_eda.hstack([a, b])`
Random floats	`fount_eda.random.random(n)`
Random ints	`fount_eda.random.integers(low, high, size=n)`
Norm	`fount_eda.linalg.norm(v)`
FFT	`fount_eda.fft.fft(signal)`

Machine learning

Operation	Code
Train/test split	`fount_eda.train_test_split(X, y, test_size=0.2, random_state=42)`
One-hot encoder	`fount_eda.OneHotEncoder(handle_unknown="ignore", sparse_output=False)`
Column transformer	`fount_eda.ColumnTransformer([...])`
Pipeline	`fount_eda.Pipeline([...])`
Random Forest	`fount_eda.ensemble.RandomForestRegressor(n_estimators=300, random_state=0)`
Evaluate (R²)	`reg.score(X_te, y_te)`

EDA suite

Method	Use case
`fount_eda.eda(df, target=..., datetime_col=...)`	Programmatic result object
`fount_eda.eda_md(df, target=..., datetime_col=...)`	Markdown string
`fount_eda.eda_print(df, target=..., datetime_col=...)`	Console print
`fount_eda.eda_report_and_score(df)`	Full report and score
`rep.suitability_score`	0–100 model-readiness score
`rep.recommendations`	Suggested improvements

End-to-end script

Copy this into example_fount_eda.py and run it against your own data.csv. Assumes a numeric target column "Units" and an optional "date" column.

# example_fount_eda.py
from fount_eda import EDA

fount_eda = EDA()

# 1. Load data
df = fount_eda.read_csv("data.csv")

# 2. Light cleanup
df = df.drop(columns=["unused"], errors="ignore")
df = df[df["Units"].notna()]

# 3. EDA — choose one or run all three for comparison
rep = fount_eda.eda(df, target="Units", datetime_col="date")
print("Suitability:", rep.suitability_score)

md = fount_eda.eda_md(df, target="Units", datetime_col="date")
with open("eda_report.md", "w", encoding="utf-8") as f:
    f.write(md)

print("\n--- Printed EDA ---")
fount_eda.eda_print(df, target="Units", datetime_col="date")

# 4. Baseline model
X = df.drop(columns=["Units"], errors="ignore")
y = df["Units"]

X_tr, X_te, y_tr, y_te = fount_eda.train_test_split(X, y, test_size=0.2, random_state=42)
reg = fount_eda.ensemble.RandomForestRegressor(n_estimators=300, random_state=0).fit(X_tr, y_tr)
print("Test R²:", reg.score(X_te, y_te))

# 5. Save cleaned data
df.to_csv("cleaned.csv", index=False)

Practical notes

Use clear, descriptive column names, they make EDA output and model results easier to interpret
Keep your target numeric; if it's categorical, consider a classification approach (outside the scope of Fount EDA)
Parse date columns once with fount_eda.to_datetime and keep them consistent throughout your script
When comparing dataset variants, re-run EDA on each and compare suitability scores
Start with a simple baseline model, optimize only after the EDA results give you a clear direction