# Migration Guide - Switching to Real Simple Stats Complete guide for migrating from other statistical libraries to Real Simple Stats. --- ## 📚 Overview This guide helps you transition from: - **R** - Statistical programming language - **SciPy** - Python scientific computing - **statsmodels** - Python statistical models - **SPSS** - Commercial statistical software - **Excel** - Spreadsheet analysis --- ## 🔄 From R to Real Simple Stats ### Philosophy Differences | Aspect | R | Real Simple Stats | |--------|---|-------------------| | **Syntax** | `function(data, param=value)` | `function(data, param=value)` | | **Data structures** | data.frames, vectors | Lists, NumPy arrays | | **Output** | Complex objects | Simple dicts/tuples | | **Installation** | `install.packages()` | `pip install` | --- ### Common Function Translations #### Descriptive Statistics | R | Real Simple Stats | |---|-------------------| | `mean(x)` | `rss.mean(x)` | | `median(x)` | `rss.median(x)` | | `sd(x)` | `rss.sample_std_dev(x)` | | `var(x)` | `rss.sample_variance(x)` | | `quantile(x, c(0.25, 0.75))` | `rss.five_number_summary(x)` | | `IQR(x)` | `rss.interquartile_range(x)` | | `summary(x)` | `rss.five_number_summary(x)` | **Example Migration:** ```r # R code data <- c(1, 2, 3, 4, 5) mean_val <- mean(data) sd_val <- sd(data) ``` ```python # Python equivalent import real_simple_stats as rss data = [1, 2, 3, 4, 5] mean_val = rss.mean(data) sd_val = rss.sample_std_dev(data) ``` --- #### Hypothesis Tests | R | Real Simple Stats | |---|-------------------| | `t.test(x, mu=0)` | `rss.one_sample_t_test(x, mu0=0)` | | `t.test(x, y)` | `rss.two_sample_t_test(x, y)` | | `t.test(x, y, paired=TRUE)` | `rss.paired_t_test(x, y)` | | `chisq.test(obs, p=exp)` | `rss.chi_square_statistic(obs, exp)` | | `aov(y ~ group)` | `rss.one_way_anova(groups)` | **Example Migration:** ```r # R code group1 <- c(23, 25, 28, 30, 32) group2 <- c(28, 30, 35, 38, 40) result <- t.test(group1, group2) print(result$p.value) ``` ```python # Python equivalent import real_simple_stats as rss group1 = [23, 25, 28, 30, 32] group2 = [28, 30, 35, 38, 40] t_stat, p_value = rss.two_sample_t_test(group1, group2) print(p_value) ``` --- #### Regression | R | Real Simple Stats | |---|-------------------| | `cor(x, y)` | `rss.pearson_correlation(x, y)` | | `lm(y ~ x)` | `rss.linear_regression(x, y)` | | `summary(lm(y ~ x))$r.squared` | `rss.coefficient_of_determination(x, y)` | | `predict(model, newdata)` | `rss.regression_equation(x_new, slope, intercept)` | **Example Migration:** ```r # R code x <- c(1, 2, 3, 4, 5) y <- c(2, 4, 5, 4, 5) model <- lm(y ~ x) summary(model) ``` ```python # Python equivalent import real_simple_stats as rss x = [1, 2, 3, 4, 5] y = [2, 4, 5, 4, 5] slope, intercept, r_value, p_value, std_err = rss.linear_regression(x, y) r_squared = r_value ** 2 print(f"Slope: {slope:.3f}") print(f"Intercept: {intercept:.3f}") print(f"R²: {r_squared:.3f}") print(f"p-value: {p_value:.4f}") ``` --- #### Distributions | R | Real Simple Stats | |---|-------------------| | `dnorm(x, mean, sd)` | `rss.normal_pdf(x, mu, sigma)` | | `pnorm(x, mean, sd)` | `rss.normal_cdf(x, mu, sigma)` | | `qnorm(p, mean, sd)` | `rss.normal_ppf(p, mu, sigma)` | | `dbinom(k, n, p)` | `rss.binomial_probability(n, k, p)` | | `pbinom(k, n, p)` | `rss.binomial_cdf(k, n, p)` | | `dpois(k, lambda)` | `rss.poisson_pmf(k, lam)` | --- ### Key Differences 1. **Return Values:** ```r # R returns complex object result <- t.test(x, y) result$statistic result$p.value result$conf.int ``` ```python # Python returns tuple t_stat, p_value = rss.two_sample_t_test(x, y) # Simpler, but less information ``` 2. **Data Frames:** ```r # R uses data frames natively df <- data.frame(x=c(1,2,3), y=c(4,5,6)) cor(df$x, df$y) ``` ```python # Python uses lists or pandas import pandas as pd df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]}) rss.pearson_correlation(df['x'].tolist(), df['y'].tolist()) ``` 3. **Missing Values:** ```r # R handles NA automatically mean(c(1, 2, NA, 4), na.rm=TRUE) ``` ```python # Python requires manual handling data = [1, 2, None, 4] clean_data = [x for x in data if x is not None] rss.mean(clean_data) ``` --- ### Function Translations #### Descriptive Statistics | SciPy/NumPy | Real Simple Stats | |-------------|-------------------| | `np.mean(x)` | `rss.mean(x)` | | `np.median(x)` | `rss.median(x)` | | `np.std(x, ddof=1)` | `rss.sample_std_dev(x)` | | `np.var(x, ddof=1)` | `rss.sample_variance(x)` | | `stats.mode(x)` | `rss.mode(x)` | --- #### Hypothesis Tests | SciPy | Real Simple Stats | |-------|-------------------| | `stats.ttest_1samp(x, popmean)` | `rss.one_sample_t_test(x, mu0)` | | `stats.ttest_ind(x, y)` | `rss.two_sample_t_test(x, y)` | | `stats.ttest_rel(x, y)` | `rss.paired_t_test(x, y)` | | `stats.chisquare(obs, exp)` | `rss.chi_square_statistic(obs, exp)` | | `stats.f_oneway(*groups)` | `rss.one_way_anova(groups)` | **Example Migration:** ```python # SciPy code from scipy import stats import numpy as np data = [23, 25, 28, 30, 32] t_stat, p_value = stats.ttest_1samp(data, 30) ``` ```python # Real Simple Stats equivalent import real_simple_stats as rss data = [23, 25, 28, 30, 32] t_stat, p_value = rss.one_sample_t_test(data, mu0=30) ``` --- #### Distributions | SciPy | Real Simple Stats | |-------|-------------------| | `stats.norm.pdf(x, loc, scale)` | `rss.normal_pdf(x, mu, sigma)` | | `stats.norm.cdf(x, loc, scale)` | `rss.normal_cdf(x, mu, sigma)` | | `stats.norm.ppf(p, loc, scale)` | `rss.normal_ppf(p, mu, sigma)` | | `stats.binom.pmf(k, n, p)` | `rss.binomial_probability(n, k, p)` | | `stats.poisson.pmf(k, mu)` | `rss.poisson_pmf(k, lam)` | --- #### Regression | SciPy | Real Simple Stats | |-------|-------------------| | `stats.pearsonr(x, y)` | `rss.pearson_correlation(x, y)` | | `stats.linregress(x, y)` | `rss.linear_regression(x, y)` | **Example Migration:** ```python # SciPy code from scipy import stats x = [1, 2, 3, 4, 5] y = [2, 4, 5, 4, 5] slope, intercept, r_value, p_value, std_err = stats.linregress(x, y) ``` ```python # Real Simple Stats equivalent (identical!) import real_simple_stats as rss x = [1, 2, 3, 4, 5] y = [2, 4, 5, 4, 5] slope, intercept, r_value, p_value, std_err = rss.linear_regression(x, y) ``` --- ### Key Advantages of Real Simple Stats 1. **Simpler imports:** ```python # SciPy from scipy import stats from scipy.stats import norm, binom import numpy as np # Real Simple Stats import real_simple_stats as rss ``` 2. **Clearer function names:** ```python # SciPy stats.ttest_ind(group1, group2) # Real Simple Stats (more descriptive) rss.two_sample_t_test(group1, group2) ``` 3. **Educational focus:** ```python # Real Simple Stats has better docstrings help(rss.two_sample_t_test) # Includes: explanation, formula, interpretation ``` --- ## From statsmodels to Real Simple Stats ### Function Translations | statsmodels | Real Simple Stats | |-------------|-------------------| | `sm.OLS(y, X).fit()` | `rss.multiple_regression(X, y)` | | `sm.stats.ztest(x, value=mu0)` | `rss.one_sample_z_test(x, mu0, sigma)` | | `sm.stats.ttest_ind(x, y)` | `rss.two_sample_t_test(x, y)` | | `sm.stats.anova_lm()` | `rss.one_way_anova(groups)` | **Example Migration:** ```python # statsmodels code import statsmodels.api as sm import numpy as np X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] y = [2, 4, 5, 4, 5] X_with_const = sm.add_constant(X) model = sm.OLS(y, X_with_const).fit() print(model.summary()) ``` ```python # Real Simple Stats equivalent import real_simple_stats as rss X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] y = [2, 4, 5, 4, 5] result = rss.multiple_regression(X, y, include_intercept=True) print(f"R² = {result['r_squared']:.3f}") print(f"Coefficients: {result['coefficients']}") print(f"Intercept: {result['intercept']}") ``` --- ### When to Use Each **Use statsmodels when:** - Need detailed regression diagnostics - Require time series models (ARIMA, VAR) - Need generalized linear models (GLM) - Want comprehensive statistical tests **Use Real Simple Stats when:** - Learning statistics - Quick exploratory analysis - Teaching or presentations - Simple regression/correlation --- ## 💼 From SPSS to Real Simple Stats ### Menu-Driven to Code-Based | SPSS Menu | Real Simple Stats Code | |-----------|------------------------| | Analyze → Descriptive Statistics → Descriptives | `rss.five_number_summary(data)` | | Analyze → Compare Means → One-Sample T Test | `rss.one_sample_t_test(data, mu0)` | | Analyze → Compare Means → Independent-Samples T Test | `rss.two_sample_t_test(group1, group2)` | | Analyze → Compare Means → Paired-Samples T Test | `rss.paired_t_test(before, after)` | | Analyze → Correlate → Bivariate | `rss.pearson_correlation(x, y)` | | Analyze → Regression → Linear | `rss.linear_regression(x, y)` | --- ### Common SPSS Tasks #### Task 1: Descriptive Statistics **SPSS:** ``` DESCRIPTIVES VARIABLES=score /STATISTICS=MEAN STDDEV MIN MAX. ``` **Real Simple Stats:** ```python import real_simple_stats as rss score = [85, 90, 78, 92, 88] print(f"Mean: {rss.mean(score)}") print(f"Std Dev: {rss.sample_std_dev(score)}") print(f"Min: {min(score)}") print(f"Max: {max(score)}") ``` --- #### Task 2: Independent t-test **SPSS:** ``` T-TEST GROUPS=group(1 2) /VARIABLES=score. ``` **Real Simple Stats:** ```python import real_simple_stats as rss group1 = [85, 90, 78, 92, 88] group2 = [75, 80, 72, 82, 78] t_stat, p_value = rss.two_sample_t_test(group1, group2) d = rss.cohens_d(group1, group2) print(f"t = {t_stat:.3f}, p = {p_value:.3f}") print(f"Cohen's d = {d:.3f}") ``` --- #### Task 3: Correlation **SPSS:** ``` CORRELATIONS /VARIABLES=height weight. ``` **Real Simple Stats:** ```python import real_simple_stats as rss height = [65, 70, 68, 72, 66] weight = [150, 180, 165, 190, 155] r = rss.pearson_correlation(height, weight) print(f"r = {r:.3f}") ``` --- #### Task 4: Linear Regression **SPSS:** ``` REGRESSION /DEPENDENT score /METHOD=ENTER hours_studied. ``` **Real Simple Stats:** ```python import real_simple_stats as rss hours_studied = [1, 2, 3, 4, 5] score = [55, 65, 70, 80, 85] slope, intercept, r_value, p_value, std_err = rss.linear_regression( hours_studied, score ) print(f"Equation: score = {slope:.2f} * hours + {intercept:.2f}") print(f"R² = {r_value**2:.3f}") print(f"p = {p_value:.4f}") ``` --- ### Advantages of Real Simple Stats over SPSS 1. **Free and open-source** (SPSS is expensive) 2. **Reproducible** (code vs. clicking) 3. **Automatable** (scripts vs. manual) 4. **Portable** (runs anywhere Python runs) 5. **Integrates with Python ecosystem** (pandas, matplotlib, etc.) --- ## From Excel to Real Simple Stats ### Common Excel Functions | Excel | Real Simple Stats | |-------|-------------------| | `=AVERAGE(A1:A10)` | `rss.mean(data)` | | `=MEDIAN(A1:A10)` | `rss.median(data)` | | `=STDEV.S(A1:A10)` | `rss.sample_std_dev(data)` | | `=CORREL(A1:A10, B1:B10)` | `rss.pearson_correlation(x, y)` | | `=T.TEST(A1:A10, B1:B10, 2, 2)` | `rss.two_sample_t_test(x, y)` | | `=SLOPE(Y1:Y10, X1:X10)` | `rss.linear_regression(x, y)[0]` | | `=INTERCEPT(Y1:Y10, X1:X10)` | `rss.linear_regression(x, y)[1]` | --- ### Example Migration: Data Analysis **Excel Workflow:** 1. Enter data in columns A and B 2. Click Data → Data Analysis → t-Test 3. Select ranges 4. Click OK 5. View output **Real Simple Stats Workflow:** ```python import real_simple_stats as rss import pandas as pd # Read Excel file df = pd.read_excel('data.xlsx') # Perform t-test t_stat, p_value = rss.two_sample_t_test( df['Group1'].tolist(), df['Group2'].tolist() ) # Calculate effect size d = rss.cohens_d( df['Group1'].tolist(), df['Group2'].tolist() ) # Report results print(f"t-statistic: {t_stat:.3f}") print(f"p-value: {p_value:.4f}") print(f"Cohen's d: {d:.3f}") ``` --- ### Advantages over Excel 1. **Reproducibility**: Code can be re-run 2. **Scalability**: Handle large datasets 3. **Automation**: Process multiple files 4. **Version control**: Track changes with Git 5. **Advanced statistics**: More methods available --- ## 🔄 Complete Migration Example ### Scenario: Comparing Two Groups **R Code:** ```r # Load data group1 <- c(23, 25, 28, 30, 32) group2 <- c(28, 30, 35, 38, 40) # Descriptive statistics mean1 <- mean(group1) mean2 <- mean(group2) sd1 <- sd(group1) sd2 <- sd(group2) # t-test result <- t.test(group1, group2) # Effect size (requires package) library(effsize) d <- cohen.d(group1, group2) # Report cat(sprintf("Group 1: M=%.2f, SD=%.2f\n", mean1, sd1)) cat(sprintf("Group 2: M=%.2f, SD=%.2f\n", mean2, sd2)) cat(sprintf("t(%.0f)=%.2f, p=%.3f\n", result$parameter, result$statistic, result$p.value)) cat(sprintf("Cohen's d=%.2f\n", d$estimate)) ``` **Real Simple Stats Code:** ```python import real_simple_stats as rss # Load data group1 = [23, 25, 28, 30, 32] group2 = [28, 30, 35, 38, 40] # Descriptive statistics mean1 = rss.mean(group1) mean2 = rss.mean(group2) sd1 = rss.sample_std_dev(group1) sd2 = rss.sample_std_dev(group2) # t-test t_stat, p_value = rss.two_sample_t_test(group1, group2) # Effect size d = rss.cohens_d(group1, group2) interpretation = rss.interpret_effect_size(d, 'd') # Report print(f"Group 1: M={mean1:.2f}, SD={sd1:.2f}") print(f"Group 2: M={mean2:.2f}, SD={sd2:.2f}") print(f"t({len(group1)+len(group2)-2})={t_stat:.2f}, p={p_value:.3f}") print(f"Cohen's d={d:.2f} ({interpretation})") ``` --- ## 📋 Migration Checklist ### Before Migration - [ ] Identify which functions you use most - [ ] Check if Real Simple Stats supports them - [ ] Review [API Comparison](API_COMPARISON.md) - [ ] Test with sample data ### During Migration - [ ] Install Real Simple Stats: `pip install real-simple-stats` - [ ] Convert data structures (data.frames → lists/arrays) - [ ] Translate function calls - [ ] Verify results match original - [ ] Update documentation/comments ### After Migration - [ ] Run tests to ensure correctness - [ ] Update analysis scripts - [ ] Train team members - [ ] Document any limitations --- ## Quick Reference Card ### Most Common Translations ```python # Descriptive Statistics mean(x) → rss.mean(x) sd(x) / np.std(x, ddof=1) → rss.sample_std_dev(x) median(x) → rss.median(x) # Hypothesis Tests t.test(x, y) → rss.two_sample_t_test(x, y) cor.test(x, y) → rss.pearson_correlation(x, y) chisq.test(obs, exp) → rss.chi_square_statistic(obs, exp) # Regression lm(y ~ x) → rss.linear_regression(x, y) predict(model, newdata) → rss.regression_equation(x, slope, intercept) # Distributions pnorm(x, mean, sd) → rss.normal_cdf(x, mu, sigma) qnorm(p, mean, sd) → rss.normal_ppf(p, mu, sigma) ``` --- ## Tips for Successful Migration 1. **Start small**: Migrate one analysis at a time 2. **Verify results**: Compare outputs with original software 3. **Use version control**: Track changes with Git 4. **Document differences**: Note any discrepancies 5. **Leverage Python ecosystem**: Combine with pandas, matplotlib 6. **Ask for help**: Use [GitHub issues](https://github.com/kylejones200/real_simple_stats/issues) --- ## Additional Resources - **API Comparison**: [Detailed function mapping](API_COMPARISON.md) - **Examples**: [Interactive tutorials](INTERACTIVE_EXAMPLES.md) - **FAQ**: [Common questions](FAQ.md) - **Troubleshooting**: [Error solutions](TROUBLESHOOTING.md) --- **Need help migrating?** [Open an issue](https://github.com/kylejones200/real_simple_stats/issues) with your use case! **Last Updated**: 2025 **Version**: 0.3.0