Migration Guide - Switching to Real Simple Stats

Complete guide for migrating from other statistical libraries to Real Simple Stats.


📚 Overview

This guide helps you transition from:

  • R - Statistical programming language

  • SciPy - Python scientific computing

  • statsmodels - Python statistical models

  • SPSS - Commercial statistical software

  • Excel - Spreadsheet analysis


🔄 From R to Real Simple Stats

Philosophy Differences

Aspect

R

Real Simple Stats

Syntax

function(data, param=value)

function(data, param=value)

Data structures

data.frames, vectors

Lists, NumPy arrays

Output

Complex objects

Simple dicts/tuples

Installation

install.packages()

pip install


Common Function Translations

Descriptive Statistics

R

Real Simple Stats

mean(x)

rss.mean(x)

median(x)

rss.median(x)

sd(x)

rss.sample_std_dev(x)

var(x)

rss.sample_variance(x)

quantile(x, c(0.25, 0.75))

rss.five_number_summary(x)

IQR(x)

rss.interquartile_range(x)

summary(x)

rss.five_number_summary(x)

Example Migration:

# R code
data <- c(1, 2, 3, 4, 5)
mean_val <- mean(data)
sd_val <- sd(data)
# Python equivalent
import real_simple_stats as rss

data = [1, 2, 3, 4, 5]
mean_val = rss.mean(data)
sd_val = rss.sample_std_dev(data)

Hypothesis Tests

R

Real Simple Stats

t.test(x, mu=0)

rss.one_sample_t_test(x, mu0=0)

t.test(x, y)

rss.two_sample_t_test(x, y)

t.test(x, y, paired=TRUE)

rss.paired_t_test(x, y)

chisq.test(obs, p=exp)

rss.chi_square_statistic(obs, exp)

aov(y ~ group)

rss.one_way_anova(groups)

Example Migration:

# R code
group1 <- c(23, 25, 28, 30, 32)
group2 <- c(28, 30, 35, 38, 40)
result <- t.test(group1, group2)
print(result$p.value)
# Python equivalent
import real_simple_stats as rss

group1 = [23, 25, 28, 30, 32]
group2 = [28, 30, 35, 38, 40]
t_stat, p_value = rss.two_sample_t_test(group1, group2)
print(p_value)

Regression

R

Real Simple Stats

cor(x, y)

rss.pearson_correlation(x, y)

lm(y ~ x)

rss.linear_regression(x, y)

summary(lm(y ~ x))$r.squared

rss.coefficient_of_determination(x, y)

predict(model, newdata)

rss.regression_equation(x_new, slope, intercept)

Example Migration:

# R code
x <- c(1, 2, 3, 4, 5)
y <- c(2, 4, 5, 4, 5)
model <- lm(y ~ x)
summary(model)
# Python equivalent
import real_simple_stats as rss

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
slope, intercept, r_value, p_value, std_err = rss.linear_regression(x, y)
r_squared = r_value ** 2

print(f"Slope: {slope:.3f}")
print(f"Intercept: {intercept:.3f}")
print(f"R²: {r_squared:.3f}")
print(f"p-value: {p_value:.4f}")

Distributions

R

Real Simple Stats

dnorm(x, mean, sd)

rss.normal_pdf(x, mu, sigma)

pnorm(x, mean, sd)

rss.normal_cdf(x, mu, sigma)

qnorm(p, mean, sd)

rss.normal_ppf(p, mu, sigma)

dbinom(k, n, p)

rss.binomial_probability(n, k, p)

pbinom(k, n, p)

rss.binomial_cdf(k, n, p)

dpois(k, lambda)

rss.poisson_pmf(k, lam)


Key Differences

  1. Return Values:

    # R returns complex object
    result <- t.test(x, y)
    result$statistic
    result$p.value
    result$conf.int
    
    # Python returns tuple
    t_stat, p_value = rss.two_sample_t_test(x, y)
    # Simpler, but less information
    
  2. Data Frames:

    # R uses data frames natively
    df <- data.frame(x=c(1,2,3), y=c(4,5,6))
    cor(df$x, df$y)
    
    # Python uses lists or pandas
    import pandas as pd
    df = pd.DataFrame({'x': [1,2,3], 'y': [4,5,6]})
    rss.pearson_correlation(df['x'].tolist(), df['y'].tolist())
    
  3. Missing Values:

    # R handles NA automatically
    mean(c(1, 2, NA, 4), na.rm=TRUE)
    
    # Python requires manual handling
    data = [1, 2, None, 4]
    clean_data = [x for x in data if x is not None]
    rss.mean(clean_data)
    

Function Translations

Descriptive Statistics

SciPy/NumPy

Real Simple Stats

np.mean(x)

rss.mean(x)

np.median(x)

rss.median(x)

np.std(x, ddof=1)

rss.sample_std_dev(x)

np.var(x, ddof=1)

rss.sample_variance(x)

stats.mode(x)

rss.mode(x)


Hypothesis Tests

SciPy

Real Simple Stats

stats.ttest_1samp(x, popmean)

rss.one_sample_t_test(x, mu0)

stats.ttest_ind(x, y)

rss.two_sample_t_test(x, y)

stats.ttest_rel(x, y)

rss.paired_t_test(x, y)

stats.chisquare(obs, exp)

rss.chi_square_statistic(obs, exp)

stats.f_oneway(*groups)

rss.one_way_anova(groups)

Example Migration:

# SciPy code
from scipy import stats
import numpy as np

data = [23, 25, 28, 30, 32]
t_stat, p_value = stats.ttest_1samp(data, 30)
# Real Simple Stats equivalent
import real_simple_stats as rss

data = [23, 25, 28, 30, 32]
t_stat, p_value = rss.one_sample_t_test(data, mu0=30)

Distributions

SciPy

Real Simple Stats

stats.norm.pdf(x, loc, scale)

rss.normal_pdf(x, mu, sigma)

stats.norm.cdf(x, loc, scale)

rss.normal_cdf(x, mu, sigma)

stats.norm.ppf(p, loc, scale)

rss.normal_ppf(p, mu, sigma)

stats.binom.pmf(k, n, p)

rss.binomial_probability(n, k, p)

stats.poisson.pmf(k, mu)

rss.poisson_pmf(k, lam)


Regression

SciPy

Real Simple Stats

stats.pearsonr(x, y)

rss.pearson_correlation(x, y)

stats.linregress(x, y)

rss.linear_regression(x, y)

Example Migration:

# SciPy code
from scipy import stats

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
slope, intercept, r_value, p_value, std_err = stats.linregress(x, y)
# Real Simple Stats equivalent (identical!)
import real_simple_stats as rss

x = [1, 2, 3, 4, 5]
y = [2, 4, 5, 4, 5]
slope, intercept, r_value, p_value, std_err = rss.linear_regression(x, y)

Key Advantages of Real Simple Stats

  1. Simpler imports:

    # SciPy
    from scipy import stats
    from scipy.stats import norm, binom
    import numpy as np
    
    # Real Simple Stats
    import real_simple_stats as rss
    
  2. Clearer function names:

    # SciPy
    stats.ttest_ind(group1, group2)
    
    # Real Simple Stats (more descriptive)
    rss.two_sample_t_test(group1, group2)
    
  3. Educational focus:

    # Real Simple Stats has better docstrings
    help(rss.two_sample_t_test)
    # Includes: explanation, formula, interpretation
    

From statsmodels to Real Simple Stats

Function Translations

statsmodels

Real Simple Stats

sm.OLS(y, X).fit()

rss.multiple_regression(X, y)

sm.stats.ztest(x, value=mu0)

rss.one_sample_z_test(x, mu0, sigma)

sm.stats.ttest_ind(x, y)

rss.two_sample_t_test(x, y)

sm.stats.anova_lm()

rss.one_way_anova(groups)

Example Migration:

# statsmodels code
import statsmodels.api as sm
import numpy as np

X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [2, 4, 5, 4, 5]
X_with_const = sm.add_constant(X)
model = sm.OLS(y, X_with_const).fit()
print(model.summary())
# Real Simple Stats equivalent
import real_simple_stats as rss

X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]]
y = [2, 4, 5, 4, 5]
result = rss.multiple_regression(X, y, include_intercept=True)

print(f"R² = {result['r_squared']:.3f}")
print(f"Coefficients: {result['coefficients']}")
print(f"Intercept: {result['intercept']}")

When to Use Each

Use statsmodels when:

  • Need detailed regression diagnostics

  • Require time series models (ARIMA, VAR)

  • Need generalized linear models (GLM)

  • Want comprehensive statistical tests

Use Real Simple Stats when:

  • Learning statistics

  • Quick exploratory analysis

  • Teaching or presentations

  • Simple regression/correlation


💼 From SPSS to Real Simple Stats


Common SPSS Tasks

Task 1: Descriptive Statistics

SPSS:

DESCRIPTIVES VARIABLES=score
  /STATISTICS=MEAN STDDEV MIN MAX.

Real Simple Stats:

import real_simple_stats as rss

score = [85, 90, 78, 92, 88]
print(f"Mean: {rss.mean(score)}")
print(f"Std Dev: {rss.sample_std_dev(score)}")
print(f"Min: {min(score)}")
print(f"Max: {max(score)}")

Task 2: Independent t-test

SPSS:

T-TEST GROUPS=group(1 2)
  /VARIABLES=score.

Real Simple Stats:

import real_simple_stats as rss

group1 = [85, 90, 78, 92, 88]
group2 = [75, 80, 72, 82, 78]

t_stat, p_value = rss.two_sample_t_test(group1, group2)
d = rss.cohens_d(group1, group2)

print(f"t = {t_stat:.3f}, p = {p_value:.3f}")
print(f"Cohen's d = {d:.3f}")

Task 3: Correlation

SPSS:

CORRELATIONS
  /VARIABLES=height weight.

Real Simple Stats:

import real_simple_stats as rss

height = [65, 70, 68, 72, 66]
weight = [150, 180, 165, 190, 155]

r = rss.pearson_correlation(height, weight)
print(f"r = {r:.3f}")

Task 4: Linear Regression

SPSS:

REGRESSION
  /DEPENDENT score
  /METHOD=ENTER hours_studied.

Real Simple Stats:

import real_simple_stats as rss

hours_studied = [1, 2, 3, 4, 5]
score = [55, 65, 70, 80, 85]

slope, intercept, r_value, p_value, std_err = rss.linear_regression(
    hours_studied, score
)

print(f"Equation: score = {slope:.2f} * hours + {intercept:.2f}")
print(f"R² = {r_value**2:.3f}")
print(f"p = {p_value:.4f}")

Advantages of Real Simple Stats over SPSS

  1. Free and open-source (SPSS is expensive)

  2. Reproducible (code vs. clicking)

  3. Automatable (scripts vs. manual)

  4. Portable (runs anywhere Python runs)

  5. Integrates with Python ecosystem (pandas, matplotlib, etc.)


From Excel to Real Simple Stats

Common Excel Functions

Excel

Real Simple Stats

=AVERAGE(A1:A10)

rss.mean(data)

=MEDIAN(A1:A10)

rss.median(data)

=STDEV.S(A1:A10)

rss.sample_std_dev(data)

=CORREL(A1:A10, B1:B10)

rss.pearson_correlation(x, y)

=T.TEST(A1:A10, B1:B10, 2, 2)

rss.two_sample_t_test(x, y)

=SLOPE(Y1:Y10, X1:X10)

rss.linear_regression(x, y)[0]

=INTERCEPT(Y1:Y10, X1:X10)

rss.linear_regression(x, y)[1]


Example Migration: Data Analysis

Excel Workflow:

  1. Enter data in columns A and B

  2. Click Data → Data Analysis → t-Test

  3. Select ranges

  4. Click OK

  5. View output

Real Simple Stats Workflow:

import real_simple_stats as rss
import pandas as pd

# Read Excel file
df = pd.read_excel('data.xlsx')

# Perform t-test
t_stat, p_value = rss.two_sample_t_test(
    df['Group1'].tolist(),
    df['Group2'].tolist()
)

# Calculate effect size
d = rss.cohens_d(
    df['Group1'].tolist(),
    df['Group2'].tolist()
)

# Report results
print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.4f}")
print(f"Cohen's d: {d:.3f}")

Advantages over Excel

  1. Reproducibility: Code can be re-run

  2. Scalability: Handle large datasets

  3. Automation: Process multiple files

  4. Version control: Track changes with Git

  5. Advanced statistics: More methods available


🔄 Complete Migration Example

Scenario: Comparing Two Groups

R Code:

# Load data
group1 <- c(23, 25, 28, 30, 32)
group2 <- c(28, 30, 35, 38, 40)

# Descriptive statistics
mean1 <- mean(group1)
mean2 <- mean(group2)
sd1 <- sd(group1)
sd2 <- sd(group2)

# t-test
result <- t.test(group1, group2)

# Effect size (requires package)
library(effsize)
d <- cohen.d(group1, group2)

# Report
cat(sprintf("Group 1: M=%.2f, SD=%.2f\n", mean1, sd1))
cat(sprintf("Group 2: M=%.2f, SD=%.2f\n", mean2, sd2))
cat(sprintf("t(%.0f)=%.2f, p=%.3f\n",
            result$parameter, result$statistic, result$p.value))
cat(sprintf("Cohen's d=%.2f\n", d$estimate))

Real Simple Stats Code:

import real_simple_stats as rss

# Load data
group1 = [23, 25, 28, 30, 32]
group2 = [28, 30, 35, 38, 40]

# Descriptive statistics
mean1 = rss.mean(group1)
mean2 = rss.mean(group2)
sd1 = rss.sample_std_dev(group1)
sd2 = rss.sample_std_dev(group2)

# t-test
t_stat, p_value = rss.two_sample_t_test(group1, group2)

# Effect size
d = rss.cohens_d(group1, group2)
interpretation = rss.interpret_effect_size(d, 'd')

# Report
print(f"Group 1: M={mean1:.2f}, SD={sd1:.2f}")
print(f"Group 2: M={mean2:.2f}, SD={sd2:.2f}")
print(f"t({len(group1)+len(group2)-2})={t_stat:.2f}, p={p_value:.3f}")
print(f"Cohen's d={d:.2f} ({interpretation})")

📋 Migration Checklist

Before Migration

  • [ ] Identify which functions you use most

  • [ ] Check if Real Simple Stats supports them

  • [ ] Review API Comparison

  • [ ] Test with sample data

During Migration

  • [ ] Install Real Simple Stats: pip install real-simple-stats

  • [ ] Convert data structures (data.frames → lists/arrays)

  • [ ] Translate function calls

  • [ ] Verify results match original

  • [ ] Update documentation/comments

After Migration

  • [ ] Run tests to ensure correctness

  • [ ] Update analysis scripts

  • [ ] Train team members

  • [ ] Document any limitations


Quick Reference Card

Most Common Translations

# Descriptive Statistics
mean(x)                     rss.mean(x)
sd(x) / np.std(x, ddof=1)   rss.sample_std_dev(x)
median(x)                   rss.median(x)

# Hypothesis Tests
t.test(x, y)                rss.two_sample_t_test(x, y)
cor.test(x, y)              rss.pearson_correlation(x, y)
chisq.test(obs, exp)        rss.chi_square_statistic(obs, exp)

# Regression
lm(y ~ x)                   rss.linear_regression(x, y)
predict(model, newdata)     rss.regression_equation(x, slope, intercept)

# Distributions
pnorm(x, mean, sd)          rss.normal_cdf(x, mu, sigma)
qnorm(p, mean, sd)          rss.normal_ppf(p, mu, sigma)

Tips for Successful Migration

  1. Start small: Migrate one analysis at a time

  2. Verify results: Compare outputs with original software

  3. Use version control: Track changes with Git

  4. Document differences: Note any discrepancies

  5. Leverage Python ecosystem: Combine with pandas, matplotlib

  6. Ask for help: Use GitHub issues


Additional Resources


Need help migrating? Open an issue with your use case!

Last Updated: 2025 Version: 0.3.0