# API Comparison Table - Quick Function Lookup ## Overview This guide helps you quickly find the Real Simple Stats function you need, with comparisons to similar functions in other popular libraries. --- ## Descriptive Statistics | Task | Real Simple Stats | NumPy | SciPy | Pandas | statsmodels | |------|-------------------|-------|-------|--------|-------------| | **Mean** | `mean(data)` | `np.mean(data)` | - | `df.mean()` | - | | **Median** | `median(data)` | `np.median(data)` | - | `df.median()` | - | | **Mode** | `mode(data)` | - | `stats.mode(data)` | `df.mode()` | - | | **Std Dev** | `sample_std_dev(data)` | `np.std(data, ddof=1)` | - | `df.std()` | - | | **Variance** | `sample_variance(data)` | `np.var(data, ddof=1)` | - | `df.var()` | - | | **Range** | `data_range(data)` | `np.ptp(data)` | - | `df.max() - df.min()` | - | | **IQR** | `interquartile_range(data)` | `np.percentile(data, 75) - np.percentile(data, 25)` | - | `df.quantile(0.75) - df.quantile(0.25)` | - | | **5-Number Summary** | `five_number_summary(data)` | - | - | `df.describe()` | - | | **CV** | `coefficient_of_variation(data)` | - | `stats.variation(data)` | - | - | **Example:** ```python import real_simple_stats as rss data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] print(rss.mean(data)) # 5.5 print(rss.five_number_summary(data)) # {'min': 1, 'q1': 3.25, ...} ``` --- ## Probability Distributions ### Normal Distribution | Task | Real Simple Stats | SciPy | NumPy | |------|-------------------|-------|-------| | **PDF** | `normal_pdf(x, mu, sigma)` | `stats.norm.pdf(x, mu, sigma)` | - | | **CDF** | `normal_cdf(x, mu, sigma)` | `stats.norm.cdf(x, mu, sigma)` | - | | **Inverse CDF** | `normal_ppf(p, mu, sigma)` | `stats.norm.ppf(p, mu, sigma)` | - | | **Z-score** | `z_score(x, mu, sigma)` | `(x - mu) / sigma` | - | | **Random samples** | - | `stats.norm.rvs(mu, sigma, size=n)` | `np.random.normal(mu, sigma, n)` | ### Binomial Distribution | Task | Real Simple Stats | SciPy | NumPy | |------|-------------------|-------|-------| | **PMF** | `binomial_probability(n, k, p)` | `stats.binom.pmf(k, n, p)` | - | | **CDF** | `binomial_cdf(k, n, p)` | `stats.binom.cdf(k, n, p)` | - | | **Mean** | `binomial_mean(n, p)` | `n * p` | - | | **Variance** | `binomial_variance(n, p)` | `n * p * (1-p)` | - | ### Other Distributions | Distribution | Real Simple Stats | SciPy Equivalent | |--------------|-------------------|------------------| | **Poisson** | `poisson_pmf(k, lam)` | `stats.poisson.pmf(k, lam)` | | **Geometric** | `geometric_pmf(k, p)` | `stats.geom.pmf(k, p)` | | **Exponential** | `exponential_pdf(x, lam)` | `stats.expon.pdf(x, scale=1/lam)` | | **Negative Binomial** | `negative_binomial_pmf(k, r, p)` | `stats.nbinom.pmf(k, r, p)` | **Example:** ```python import real_simple_stats as rss # Normal distribution prob = rss.normal_cdf(1.96, 0, 1) # P(Z ≤ 1.96) ≈ 0.975 # Binomial distribution prob = rss.binomial_probability(10, 7, 0.5) # P(X=7) when n=10, p=0.5 ``` --- ## 🧪 Hypothesis Testing | Test | Real Simple Stats | SciPy | statsmodels | |------|-------------------|-------|-------------| | **One-sample t-test** | `one_sample_t_test(data, mu0)` | `stats.ttest_1samp(data, mu0)` | `sm.stats.ttest_ind(data, mu0)` | | **Two-sample t-test** | `two_sample_t_test(data1, data2)` | `stats.ttest_ind(data1, data2)` | `sm.stats.ttest_ind(data1, data2)` | | **Paired t-test** | `paired_t_test(data1, data2)` | `stats.ttest_rel(data1, data2)` | - | | **Z-test** | `one_sample_z_test(data, mu0, sigma)` | `sm.stats.ztest(data, value=mu0)` | `sm.stats.ztest(data, value=mu0)` | | **Chi-square test** | `chi_square_statistic(obs, exp)` | `stats.chisquare(obs, exp)` | - | | **ANOVA** | `one_way_anova(groups)` | `stats.f_oneway(*groups)` | `sm.stats.anova_lm()` | **Example:** ```python import real_simple_stats as rss # One-sample t-test data = [23, 25, 28, 30, 32] t_stat, p_value = rss.one_sample_t_test(data, mu0=30) print(f"t = {t_stat:.3f}, p = {p_value:.3f}") # Two-sample t-test group1 = [1, 2, 3, 4, 5] group2 = [3, 4, 5, 6, 7] t_stat, p_value = rss.two_sample_t_test(group1, group2) ``` --- ## 📉 Regression & Correlation | Task | Real Simple Stats | SciPy | scikit-learn | statsmodels | |------|-------------------|-------|--------------|-------------| | **Pearson correlation** | `pearson_correlation(x, y)` | `stats.pearsonr(x, y)` | - | `sm.stats.correlation()` | | **Simple linear regression** | `linear_regression(x, y)` | `stats.linregress(x, y)` | `LinearRegression()` | `sm.OLS(y, x)` | | **R-squared** | `coefficient_of_determination(x, y)` | `linregress(x, y).rvalue**2` | `model.score(X, y)` | `results.rsquared` | | **Multiple regression** | `multiple_regression(X, y)` | - | `LinearRegression()` | `sm.OLS(y, X)` | | **Prediction** | `regression_equation(x, slope, intercept)` | `slope * x + intercept` | `model.predict(X)` | `results.predict(X)` | **Example:** ```python import real_simple_stats as rss x = [1, 2, 3, 4, 5] y = [2, 4, 5, 4, 5] # Correlation r = rss.pearson_correlation(x, y) print(f"Correlation: {r:.3f}") # Regression slope, intercept, r_value, p_value, std_err = rss.linear_regression(x, y) print(f"y = {slope:.2f}x + {intercept:.2f}") # Prediction y_pred = rss.regression_equation(6, slope, intercept) ``` --- ## 🔄 Time Series Analysis | Task | Real Simple Stats | pandas | statsmodels | |------|-------------------|--------|-------------| | **Moving average** | `moving_average(data, window, 'simple')` | `df.rolling(window).mean()` | - | | **Exponential MA** | `moving_average(data, window, 'exponential')` | `df.ewm(span=window).mean()` | - | | **Autocorrelation** | `autocorrelation(data, max_lag)` | `pd.Series(data).autocorr(lag)` | `sm.tsa.acf(data)` | | **Partial ACF** | `partial_autocorrelation(data, max_lag)` | - | `sm.tsa.pacf(data)` | | **Linear trend** | `linear_trend(data)` | - | `sm.tsa.deterministic.DeterministicTerm()` | | **Detrend** | `detrend(data, 'linear')` | - | `sm.tsa.detrend(data)` | | **Seasonal decompose** | `seasonal_decompose(data, period)` | - | `sm.tsa.seasonal_decompose()` | | **Differencing** | `difference(data, lag, order)` | `df.diff(lag)` | - | **Example:** ```python import real_simple_stats as rss data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] # Moving average ma = rss.moving_average(data, window_size=3, method='simple') # Autocorrelation acf = rss.autocorrelation(data, max_lag=5) # Trend analysis slope, intercept, r2 = rss.linear_trend(data) ``` --- ## Resampling Methods | Method | Real Simple Stats | scikit-learn | scipy | |--------|-------------------|--------------|-------| | **Bootstrap** | `bootstrap(data, statistic, n_iterations)` | - | - | | **Bootstrap CI** | Returns `confidence_interval` | - | - | | **Permutation test** | `permutation_test(data1, data2, statistic)` | `permutation_test()` | - | | **Jackknife** | `jackknife(data, statistic)` | - | - | | **Cross-validation** | `cross_validate(X, y, model_fn, k_folds)` | `cross_val_score()` | - | | **Stratified split** | `stratified_split(X, y, test_size)` | `train_test_split(stratify=y)` | - | **Example:** ```python import real_simple_stats as rss import numpy as np data = [1, 2, 3, 4, 5] # Bootstrap confidence interval result = rss.bootstrap(data, np.mean, n_iterations=1000) print(f"95% CI: {result['confidence_interval']}") # Permutation test group1 = [1, 2, 3, 4, 5] group2 = [3, 4, 5, 6, 7] result = rss.permutation_test(group1, group2, lambda d1, d2: np.mean(d1) - np.mean(d2)) print(f"p-value: {result['p_value']:.3f}") ``` --- ## Effect Sizes | Effect Size | Real Simple Stats | Other Libraries | |-------------|-------------------|-----------------| | **Cohen's d** | `cohens_d(group1, group2)` | `pg.compute_effsize()` (pingouin) | | **Hedges' g** | `hedges_g(group1, group2)` | `pg.compute_effsize(eftype='hedges')` | | **Glass's Δ** | `glass_delta(group1, group2)` | - | | **Eta-squared** | `eta_squared(groups)` | `pg.anova()['np2']` | | **Partial η²** | `partial_eta_squared(groups)` | - | | **Omega-squared** | `omega_squared(groups)` | - | | **Cramér's V** | `cramers_v(contingency_table)` | `scipy.stats.contingency.association()` | | **Phi coefficient** | `phi_coefficient(table)` | - | | **Odds ratio** | `odds_ratio(table)` | `statsmodels.stats.contingency_tables` | | **Relative risk** | `relative_risk(table)` | - | | **Cohen's h** | `cohens_h(p1, p2)` | - | | **Interpretation** | `interpret_effect_size(es, measure)` | - | **Example:** ```python import real_simple_stats as rss group1 = [1, 2, 3, 4, 5] group2 = [3, 4, 5, 6, 7] # Cohen's d d = rss.cohens_d(group1, group2) interpretation = rss.interpret_effect_size(d, 'd') print(f"Cohen's d = {d:.3f} ({interpretation})") # Cramér's V for categorical data table = [[10, 20], [30, 40]] v = rss.cramers_v(table) print(f"Cramér's V = {v:.3f}") ``` --- ## 🔬 Power Analysis | Analysis | Real Simple Stats | statsmodels | G*Power | |----------|-------------------|-------------|---------| | **t-test power** | `power_t_test(delta, power, sig_level)` | `sm.stats.TTestPower()` | Manual | | **Proportion test** | `power_proportion_test(p1, p2, power)` | `sm.stats.proportion_effectsize()` | Manual | | **ANOVA power** | `power_anova(effect_size, k, power)` | `sm.stats.FTestAnovaPower()` | Manual | | **Correlation power** | `power_correlation(r, power, sig_level)` | - | Manual | | **Min detectable effect** | `minimum_detectable_effect(n, power)` | - | Manual | | **Sample size summary** | `sample_size_summary(test_type, params)` | - | - | **Example:** ```python import real_simple_stats as rss # Calculate required sample size for t-test result = rss.power_t_test(delta=0.5, power=0.8, sig_level=0.05) print(f"Required n per group: {result['n']}") # Calculate power for given sample size result = rss.power_t_test(delta=0.5, n=64, sig_level=0.05) print(f"Statistical power: {result['power']:.3f}") ``` --- ## Bayesian Statistics | Method | Real Simple Stats | PyMC | Stan | |--------|-------------------|------|------| | **Beta-Binomial update** | `beta_binomial_update(α, β, k, n)` | Manual | Manual | | **Normal-Normal update** | `normal_normal_update(μ₀, σ₀², data, σ²)` | Manual | Manual | | **Gamma-Poisson update** | `gamma_poisson_update(α, β, data)` | Manual | Manual | | **Credible interval** | `credible_interval(dist, params, cred)` | `pm.hdi()` | Manual | | **HDI** | `highest_density_interval(samples, cred)` | `pm.hdi()` | Manual | | **Bayes factor** | `bayes_factor(L_H1, L_H0, prior_odds)` | Manual | Manual | | **Posterior predictive** | `posterior_predictive(dist, params, n)` | `pm.sample_posterior_predictive()` | Manual | **Example:** ```python import real_simple_stats as rss # Update Beta prior with binomial data prior_alpha, prior_beta = 1, 1 # Uniform prior successes, trials = 7, 10 post_alpha, post_beta = rss.beta_binomial_update(prior_alpha, prior_beta, successes, trials) # Calculate credible interval lower, upper = rss.credible_interval('beta', {'alpha': post_alpha, 'beta': post_beta}, 0.95) print(f"95% Credible Interval: [{lower:.3f}, {upper:.3f}]") ``` --- ## 📐 Multivariate Analysis | Method | Real Simple Stats | scikit-learn | statsmodels | |--------|-------------------|--------------|-------------| | **Multiple regression** | `multiple_regression(X, y)` | `LinearRegression()` | `sm.OLS()` | | **PCA** | `pca(X, n_components)` | `PCA()` | - | | **Factor analysis** | `factor_analysis(X, n_factors)` | `FactorAnalysis()` | - | | **Canonical correlation** | `canonical_correlation(X, Y)` | `CCA()` | - | | **Mahalanobis distance** | `mahalanobis_distance(X, point)` | - | - | **Example:** ```python import real_simple_stats as rss X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] y = [2, 4, 5, 4, 5] # Multiple regression result = rss.multiple_regression(X, y) print(f"R² = {result['r_squared']:.3f}") print(f"Coefficients: {result['coefficients']}") # PCA result = rss.pca(X, n_components=2) print(f"Explained variance: {result['explained_variance']}") ``` --- ## 🔍 Quick Lookup by Use Case ### "I want to..." **...compare two groups** - `two_sample_t_test(group1, group2)` - Test for mean difference - `cohens_d(group1, group2)` - Calculate effect size - `permutation_test(group1, group2, statistic)` - Non-parametric test **...analyze relationships** - `pearson_correlation(x, y)` - Linear correlation - `linear_regression(x, y)` - Fit regression line - `coefficient_of_determination(x, y)` - R² value **...work with time series** - `moving_average(data, window)` - Smooth data - `autocorrelation(data, max_lag)` - Find patterns - `seasonal_decompose(data, period)` - Decompose components **...calculate confidence** - `confidence_interval_known_std(mean, std, n, conf)` - Known σ - `confidence_interval_unknown_std(mean, std, n, conf)` - Unknown σ - `bootstrap(data, statistic, n_iterations)` - Bootstrap CI **...plan a study** - `power_t_test(delta, power, sig_level)` - Sample size for t-test - `required_sample_size(confidence, width, std)` - CI-based planning - `slovins_formula(N, e)` - Survey sample size **...do Bayesian analysis** - `beta_binomial_update(α, β, k, n)` - Update beliefs - `credible_interval(dist, params, cred)` - Bayesian CI - `bayes_factor(L_H1, L_H0)` - Compare hypotheses --- ## 📚 Function Categories ### By Statistical Domain **Descriptive Statistics**: `mean`, `median`, `mode`, `sample_std_dev`, `sample_variance`, `five_number_summary`, `interquartile_range`, `coefficient_of_variation` **Probability**: `normal_pdf`, `normal_cdf`, `binomial_probability`, `poisson_pmf`, `geometric_pmf`, `exponential_pdf` **Inference**: `one_sample_t_test`, `two_sample_t_test`, `paired_t_test`, `one_sample_z_test`, `chi_square_statistic`, `one_way_anova` **Regression**: `linear_regression`, `multiple_regression`, `pearson_correlation`, `coefficient_of_determination` **Time Series**: `moving_average`, `autocorrelation`, `seasonal_decompose`, `detrend`, `difference` **Resampling**: `bootstrap`, `permutation_test`, `jackknife`, `cross_validate` **Effect Sizes**: `cohens_d`, `eta_squared`, `cramers_v`, `odds_ratio` **Power Analysis**: `power_t_test`, `power_anova`, `minimum_detectable_effect` **Bayesian**: `beta_binomial_update`, `credible_interval`, `bayes_factor` **Multivariate**: `pca`, `factor_analysis`, `canonical_correlation` --- ## Learning Path **Beginner → Intermediate → Advanced** 1. **Start here**: `mean`, `median`, `std_dev`, `normal_cdf` 2. **Then learn**: `t_test`, `linear_regression`, `confidence_interval` 3. **Next**: `bootstrap`, `effect_sizes`, `power_analysis` 4. **Advanced**: `time_series`, `bayesian_stats`, `multivariate` --- ## Tips - **All functions return simple Python types** (floats, lists, dicts) - no custom objects - **Type hints included** for better IDE support - **Comprehensive docstrings** with examples in every function - **Consistent naming** - functions do what their names say - **Educational focus** - designed for learning and teaching --- **See also:** - [Mathematical Formulas](MATHEMATICAL_FORMULAS.md) - LaTeX notation for all functions - [FAQ](FAQ.md) - Common questions - [Migration Guide](MIGRATION_GUIDE.md) - Switching from other libraries