Migration Guide - Switching to Real Simple Stats

Complete guide for migrating from other statistical libraries to Real Simple Stats.

📚 Overview

This guide helps you transition from:

R - Statistical programming language
SciPy - Python scientific computing
statsmodels - Python statistical models
SPSS - Commercial statistical software
Excel - Spreadsheet analysis

🔄 From R to Real Simple Stats

Philosophy Differences

Aspect	R	Real Simple Stats
Syntax	`function(data, param=value)`	`function(data, param=value)`
Data structures	data.frames, vectors	Lists, NumPy arrays
Output	Complex objects	Simple dicts/tuples
Installation	`install.packages()`	`pip install`

Common Function Translations

Descriptive Statistics

R	Real Simple Stats
`mean(x)`	`rss.mean(x)`
`median(x)`	`rss.median(x)`
`sd(x)`	`rss.sample_std_dev(x)`
`var(x)`	`rss.sample_variance(x)`
`quantile(x, c(0.25, 0.75))`	`rss.five_number_summary(x)`
`IQR(x)`	`rss.interquartile_range(x)`
`summary(x)`	`rss.five_number_summary(x)`

Example Migration:

# R code
data <- c(1, 2, 3, 4, 5)
mean_val <- mean(data)
sd_val <- sd(data)

# Python equivalent
import real_simple_stats as rss

data = [1, 2, 3, 4, 5]
mean_val = rss.mean(data)
sd_val = rss.sample_std_dev(data)

Hypothesis Tests

R	Real Simple Stats
`t.test(x, mu=0)`	`rss.one_sample_t_test(x, mu0=0)`
`t.test(x, y)`	`rss.two_sample_t_test(x, y)`
`t.test(x, y, paired=TRUE)`	`rss.paired_t_test(x, y)`
`chisq.test(obs, p=exp)`	`rss.chi_square_statistic(obs, exp)`
`aov(y ~ group)`	`rss.one_way_anova(groups)`

Example Migration:

# R code
group1 <- c(23, 25, 28, 30, 32)
group2 <- c(28, 30, 35, 38, 40)
result <- t.test(group1, group2)
print(result$p.value)

# Python equivalent
import real_simple_stats as rss

group1 = [23, 25, 28, 30, 32]
group2 = [28, 30, 35, 38, 40]
t_stat, p_value = rss.two_sample_t_test(group1, group2)
print(p_value)

Regression

R	Real Simple Stats
`cor(x, y)`	`rss.pearson_correlation(x, y)`
`lm(y ~ x)`	`rss.linear_regression(x, y)`
`summary(lm(y ~ x))$r.squared`	`rss.coefficient_of_determination(x, y)`
`predict(model, newdata)`	`rss.regression_equation(x_new, slope, intercept)`

Example Migration:

# R code
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 5)
model <- lm(y ~ x)
summary(model)

# Python equivalent
import real_simple_stats as rss

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
slope, intercept, r_value, p_value, std_err = rss.linear_regression(x, y)
r_squared = r_value ** 2

print(f"Slope: {slope:.3f}")
print(f"Intercept: {intercept:.3f}")
print(f"R²: {r_squared:.3f}")
print(f"p-value: {p_value:.4f}")

Distributions

R	Real Simple Stats
`dnorm(x, mean, sd)`	`rss.normal_pdf(x, mu, sigma)`
`pnorm(x, mean, sd)`	`rss.normal_cdf(x, mu, sigma)`
`qnorm(p, mean, sd)`	`rss.normal_ppf(p, mu, sigma)`
`dbinom(k, n, p)`	`rss.binomial_probability(n, k, p)`
`pbinom(k, n, p)`	`rss.binomial_cdf(k, n, p)`
`dpois(k, lambda)`	`rss.poisson_pmf(k, lam)`

Key Differences

Return Values:

# R returns complex object
result <- t.test(x, y)
result$statistic
result$p.value
result$conf.int

# Python returns tuple
t_stat, p_value = rss.two_sample_t_test(x, y)
# Simpler, but less information

Data Frames:

# R uses data frames natively
df <- data.frame(x=c(1,2,3), y=c(4,5,6))
cor(df$x, df$y)

# Python uses lists or pandas
import pandas as pd
df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})
rss.pearson_correlation(df['x'].tolist(), df['y'].tolist())

Missing Values:

# R handles NA automatically
mean(c(1, 2, NA, 4), na.rm=TRUE)

# Python requires manual handling
data = [1, 2, None, 4]
clean_data = [x for x in data if x is not None]
rss.mean(clean_data)

Function Translations

Descriptive Statistics

SciPy/NumPy	Real Simple Stats
`np.mean(x)`	`rss.mean(x)`
`np.median(x)`	`rss.median(x)`
`np.std(x, ddof=1)`	`rss.sample_std_dev(x)`
`np.var(x, ddof=1)`	`rss.sample_variance(x)`
`stats.mode(x)`	`rss.mode(x)`

Hypothesis Tests

SciPy	Real Simple Stats
`stats.ttest_1samp(x, popmean)`	`rss.one_sample_t_test(x, mu0)`
`stats.ttest_ind(x, y)`	`rss.two_sample_t_test(x, y)`
`stats.ttest_rel(x, y)`	`rss.paired_t_test(x, y)`
`stats.chisquare(obs, exp)`	`rss.chi_square_statistic(obs, exp)`
`stats.f_oneway(*groups)`	`rss.one_way_anova(groups)`

Example Migration:

# SciPy code
from scipy import stats
import numpy as np

data = [23, 25, 28, 30, 32]
t_stat, p_value = stats.ttest_1samp(data, 30)

# Real Simple Stats equivalent
import real_simple_stats as rss

data = [23, 25, 28, 30, 32]
t_stat, p_value = rss.one_sample_t_test(data, mu0=30)

Distributions

SciPy	Real Simple Stats
`stats.norm.pdf(x, loc, scale)`	`rss.normal_pdf(x, mu, sigma)`
`stats.norm.cdf(x, loc, scale)`	`rss.normal_cdf(x, mu, sigma)`
`stats.norm.ppf(p, loc, scale)`	`rss.normal_ppf(p, mu, sigma)`
`stats.binom.pmf(k, n, p)`	`rss.binomial_probability(n, k, p)`
`stats.poisson.pmf(k, mu)`	`rss.poisson_pmf(k, lam)`

Regression

SciPy	Real Simple Stats
`stats.pearsonr(x, y)`	`rss.pearson_correlation(x, y)`
`stats.linregress(x, y)`	`rss.linear_regression(x, y)`

Example Migration:

# SciPy code
from scipy import stats

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)

# Real Simple Stats equivalent (identical!)
import real_simple_stats as rss

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
slope, intercept, r_value, p_value, std_err = rss.linear_regression(x, y)

Key Advantages of Real Simple Stats

Simpler imports:

# SciPy
from scipy import stats
from scipy.stats import norm, binom
import numpy as np

# Real Simple Stats
import real_simple_stats as rss

Clearer function names:

# SciPy
stats.ttest_ind(group1, group2)

# Real Simple Stats (more descriptive)
rss.two_sample_t_test(group1, group2)

Educational focus:

# Real Simple Stats has better docstrings
help(rss.two_sample_t_test)
# Includes: explanation, formula, interpretation

From statsmodels to Real Simple Stats

Function Translations

statsmodels	Real Simple Stats
`sm.OLS(y, X).fit()`	`rss.multiple_regression(X, y)`
`sm.stats.ztest(x, value=mu0)`	`rss.one_sample_z_test(x, mu0, sigma)`
`sm.stats.ttest_ind(x, y)`	`rss.two_sample_t_test(x, y)`
`sm.stats.anova_lm()`	`rss.one_way_anova(groups)`

Example Migration:

# statsmodels code
import statsmodels.api as sm
import numpy as np

X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [2, 4, 5, 4, 5]
X_with_const = sm.add_constant(X)
model = sm.OLS(y, X_with_const).fit()
print(model.summary())

# Real Simple Stats equivalent
import real_simple_stats as rss

X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [2, 4, 5, 4, 5]
result = rss.multiple_regression(X, y, include_intercept=True)

print(f"R² = {result['r_squared']:.3f}")
print(f"Coefficients: {result['coefficients']}")
print(f"Intercept: {result['intercept']}")

When to Use Each

Use statsmodels when:

Need detailed regression diagnostics
Require time series models (ARIMA, VAR)
Need generalized linear models (GLM)
Want comprehensive statistical tests

Use Real Simple Stats when:

Learning statistics
Quick exploratory analysis
Teaching or presentations
Simple regression/correlation

💼 From SPSS to Real Simple Stats

Menu-Driven to Code-Based

SPSS Menu	Real Simple Stats Code
Analyze → Descriptive Statistics → Descriptives	`rss.five_number_summary(data)`
Analyze → Compare Means → One-Sample T Test	`rss.one_sample_t_test(data, mu0)`
Analyze → Compare Means → Independent-Samples T Test	`rss.two_sample_t_test(group1, group2)`
Analyze → Compare Means → Paired-Samples T Test	`rss.paired_t_test(before, after)`
Analyze → Correlate → Bivariate	`rss.pearson_correlation(x, y)`
Analyze → Regression → Linear	`rss.linear_regression(x, y)`

Common SPSS Tasks

Task 1: Descriptive Statistics

SPSS:

DESCRIPTIVES VARIABLES=score
  /STATISTICS=MEAN STDDEV MIN MAX.

Real Simple Stats:

import real_simple_stats as rss

score = [85, 90, 78, 92, 88]
print(f"Mean: {rss.mean(score)}")
print(f"Std Dev: {rss.sample_std_dev(score)}")
print(f"Min: {min(score)}")
print(f"Max: {max(score)}")

Task 2: Independent t-test

SPSS:

T-TEST GROUPS=group(1 2)
  /VARIABLES=score.

Real Simple Stats:

import real_simple_stats as rss

group1 = [85, 90, 78, 92, 88]
group2 = [75, 80, 72, 82, 78]

t_stat, p_value = rss.two_sample_t_test(group1, group2)
d = rss.cohens_d(group1, group2)

print(f"t = {t_stat:.3f}, p = {p_value:.3f}")
print(f"Cohen's d = {d:.3f}")

Task 3: Correlation

SPSS:

CORRELATIONS
  /VARIABLES=height weight.

Real Simple Stats:

import real_simple_stats as rss

height = [65, 70, 68, 72, 66]
weight = [150, 180, 165, 190, 155]

r = rss.pearson_correlation(height, weight)
print(f"r = {r:.3f}")

Task 4: Linear Regression

SPSS:

REGRESSION
  /DEPENDENT score
  /METHOD=ENTER hours_studied.

Real Simple Stats:

import real_simple_stats as rss

hours_studied = [1, 2, 3, 4, 5]
score = [55, 65, 70, 80, 85]

slope, intercept, r_value, p_value, std_err = rss.linear_regression(
    hours_studied, score
)

print(f"Equation: score = {slope:.2f} * hours + {intercept:.2f}")
print(f"R² = {r_value**2:.3f}")
print(f"p = {p_value:.4f}")

Advantages of Real Simple Stats over SPSS

Free and open-source (SPSS is expensive)
Reproducible (code vs. clicking)
Automatable (scripts vs. manual)
Portable (runs anywhere Python runs)
Integrates with Python ecosystem (pandas, matplotlib, etc.)

From Excel to Real Simple Stats

Common Excel Functions

Excel	Real Simple Stats
`=AVERAGE(A1:A10)`	`rss.mean(data)`
`=MEDIAN(A1:A10)`	`rss.median(data)`
`=STDEV.S(A1:A10)`	`rss.sample_std_dev(data)`
`=CORREL(A1:A10, B1:B10)`	`rss.pearson_correlation(x, y)`
`=T.TEST(A1:A10, B1:B10, 2, 2)`	`rss.two_sample_t_test(x, y)`
`=SLOPE(Y1:Y10, X1:X10)`	`rss.linear_regression(x, y)[0]`
`=INTERCEPT(Y1:Y10, X1:X10)`	`rss.linear_regression(x, y)[1]`

Example Migration: Data Analysis

Excel Workflow:

Enter data in columns A and B
Click Data → Data Analysis → t-Test
Select ranges
Click OK
View output

Real Simple Stats Workflow:

import real_simple_stats as rss
import pandas as pd

# Read Excel file
df = pd.read_excel('data.xlsx')

# Perform t-test
t_stat, p_value = rss.two_sample_t_test(
    df['Group1'].tolist(),
    df['Group2'].tolist()
)

# Calculate effect size
d = rss.cohens_d(
    df['Group1'].tolist(),
    df['Group2'].tolist()
)

# Report results
print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.4f}")
print(f"Cohen's d: {d:.3f}")

Advantages over Excel

Reproducibility: Code can be re-run
Scalability: Handle large datasets
Automation: Process multiple files
Version control: Track changes with Git
Advanced statistics: More methods available

🔄 Complete Migration Example

Scenario: Comparing Two Groups

R Code:

# Load data
group1 <- c(23, 25, 28, 30, 32)
group2 <- c(28, 30, 35, 38, 40)

# Descriptive statistics
mean1 <- mean(group1)
mean2 <- mean(group2)
sd1 <- sd(group1)
sd2 <- sd(group2)

# t-test
result <- t.test(group1, group2)

# Effect size (requires package)
library(effsize)
d <- cohen.d(group1, group2)

# Report
cat(sprintf("Group 1: M=%.2f, SD=%.2f\n", mean1, sd1))
cat(sprintf("Group 2: M=%.2f, SD=%.2f\n", mean2, sd2))
cat(sprintf("t(%.0f)=%.2f, p=%.3f\n",
            result$parameter, result$statistic, result$p.value))
cat(sprintf("Cohen's d=%.2f\n", d$estimate))

Real Simple Stats Code:

import real_simple_stats as rss

# Load data
group1 = [23, 25, 28, 30, 32]
group2 = [28, 30, 35, 38, 40]

# Descriptive statistics
mean1 = rss.mean(group1)
mean2 = rss.mean(group2)
sd1 = rss.sample_std_dev(group1)
sd2 = rss.sample_std_dev(group2)

# t-test
t_stat, p_value = rss.two_sample_t_test(group1, group2)

# Effect size
d = rss.cohens_d(group1, group2)
interpretation = rss.interpret_effect_size(d, 'd')

# Report
print(f"Group 1: M={mean1:.2f}, SD={sd1:.2f}")
print(f"Group 2: M={mean2:.2f}, SD={sd2:.2f}")
print(f"t({len(group1)+len(group2)-2})={t_stat:.2f}, p={p_value:.3f}")
print(f"Cohen's d={d:.2f} ({interpretation})")

📋 Migration Checklist

Before Migration

[ ] Identify which functions you use most
[ ] Check if Real Simple Stats supports them
[ ] Review API Comparison
[ ] Test with sample data

During Migration

[ ] Install Real Simple Stats: pip install real-simple-stats
[ ] Convert data structures (data.frames → lists/arrays)
[ ] Translate function calls
[ ] Verify results match original
[ ] Update documentation/comments

After Migration

[ ] Run tests to ensure correctness
[ ] Update analysis scripts
[ ] Train team members
[ ] Document any limitations

Quick Reference Card

Most Common Translations

# Descriptive Statistics
mean(x)                    → rss.mean(x)
sd(x) / np.std(x, ddof=1)  → rss.sample_std_dev(x)
median(x)                  → rss.median(x)

# Hypothesis Tests
t.test(x, y)               → rss.two_sample_t_test(x, y)
cor.test(x, y)             → rss.pearson_correlation(x, y)
chisq.test(obs, exp)       → rss.chi_square_statistic(obs, exp)

# Regression
lm(y ~ x)                  → rss.linear_regression(x, y)
predict(model, newdata)    → rss.regression_equation(x, slope, intercept)

# Distributions
pnorm(x, mean, sd)         → rss.normal_cdf(x, mu, sigma)
qnorm(p, mean, sd)         → rss.normal_ppf(p, mu, sigma)

Tips for Successful Migration

Start small: Migrate one analysis at a time
Verify results: Compare outputs with original software
Use version control: Track changes with Git
Document differences: Note any discrepancies
Leverage Python ecosystem: Combine with pandas, matplotlib
Ask for help: Use GitHub issues

Additional Resources

API Comparison: Detailed function mapping
Examples: Interactive tutorials
FAQ: Common questions
Troubleshooting: Error solutions

Need help migrating? Open an issue with your use case!

Last Updated: 2025 Version: 0.3.0