Frequently Asked Questions (FAQ)
Common questions about Real Simple Stats, answered.
📦 Installation & Setup
Q: How do I install Real Simple Stats?
A: Use pip:
pip install real-simple-stats
For the latest development version:
pip install git+https://github.com/kylejones200/real_simple_stats.git
Q: What are the system requirements?
A:
Python: 3.7 or higher
Dependencies: NumPy, SciPy (automatically installed)
Optional: matplotlib (for plotting), pandas (for data handling)
Q: Can I use this in Google Colab or Jupyter?
A: Yes! Install in the first cell:
!pip install real-simple-stats
import real_simple_stats as rss
Q: Do I need to install matplotlib separately?
A: No, matplotlib is included as a dependency. However, if you only want the statistical functions without plotting, you can skip it.
General Usage
Q: How do I import the package?
A: Standard import:
import real_simple_stats as rss
# Use functions
mean = rss.mean([1, 2, 3, 4, 5])
Or import specific functions:
from real_simple_stats import mean, median, std_dev
mean([1, 2, 3])
Q: What data types does the package accept?
A: Most functions accept:
Python lists:
[1, 2, 3, 4, 5]NumPy arrays:
np.array([1, 2, 3, 4, 5])Tuples:
(1, 2, 3, 4, 5)
For multivariate functions, use lists of lists or 2D NumPy arrays.
Q: Do functions modify my original data?
A: No! All functions return new values without modifying your input data.
data = [1, 2, 3, 4, 5]
result = rss.mean(data)
# data is unchanged
Q: What’s the difference between sample and population functions?
A:
Sample functions (e.g.,
sample_std_dev): Use \(n-1\) in denominator (Bessel’s correction)Population functions (e.g.,
population_std_dev): Use \(n\) in denominator
Rule of thumb: Use sample functions for real-world data (most common).
# Sample standard deviation (n-1)
rss.sample_std_dev([1, 2, 3, 4, 5])
# Population standard deviation (n)
rss.population_std_dev([1, 2, 3, 4, 5])
Statistical Tests
Q: When should I use a t-test vs. z-test?
A:
t-test: Unknown population standard deviation (most common)
z-test: Known population standard deviation (rare in practice)
# Unknown σ (use t-test)
t_stat, p_value = rss.one_sample_t_test(data, mu0=100)
# Known σ (use z-test)
z_stat, p_value = rss.one_sample_z_test(data, mu0=100, sigma=15)
Q: How do I interpret p-values?
A:
p < 0.05: Statistically significant (reject null hypothesis)
p ≥ 0.05: Not statistically significant (fail to reject null hypothesis)
Important: p-value is NOT the probability that the null hypothesis is true!
t_stat, p_value = rss.two_sample_t_test(group1, group2)
if p_value < 0.05:
print("Significant difference between groups")
else:
print("No significant difference")
Q: What’s the difference between one-tailed and two-tailed tests?
A:
Two-tailed (default): Tests if means are different (either direction)
One-tailed: Tests if one mean is specifically greater or less
Most Real Simple Stats functions use two-tailed tests by default.
Q: Should I use paired or independent t-test?
A:
Paired t-test: Same subjects measured twice (before/after, matched pairs)
Independent t-test: Different subjects in each group
# Paired (same subjects)
before = [120, 130, 125, 135, 140]
after = [115, 125, 120, 130, 135]
t_stat, p_value = rss.paired_t_test(before, after)
# Independent (different subjects)
group1 = [120, 130, 125, 135, 140]
group2 = [115, 125, 120, 130, 135]
t_stat, p_value = rss.two_sample_t_test(group1, group2)
Q: What sample size do I need?
A: Use power analysis:
# For t-test with medium effect size (d=0.5), 80% power
result = rss.power_t_test(delta=0.5, power=0.8, sig_level=0.05)
print(f"Need {result['n']} participants per group")
Regression & Correlation
Q: What’s the difference between correlation and regression?
A:
Correlation (
pearson_correlation): Measures strength of linear relationship (-1 to 1)Regression (
linear_regression): Predicts one variable from another
# Correlation
r = rss.pearson_correlation(x, y) # Just a number
# Regression
slope, intercept, r_value, p_value, std_err = rss.linear_regression(x, y)
# Can make predictions: y = slope*x + intercept
Q: How do I interpret R²?
A: R² (coefficient of determination) = proportion of variance explained
R² = 0.00: No predictive power
R² = 0.25: Weak relationship
R² = 0.50: Moderate relationship
R² = 0.75: Strong relationship
R² = 1.00: Perfect prediction
slope, intercept, r_value, p_value, std_err = rss.linear_regression(x, y)
r_squared = r_value ** 2
print(f"Model explains {r_squared*100:.1f}% of variance")
Q: Can I do multiple regression?
A: Yes! Use multiple_regression:
X = [[1, 2], [2, 3], [3, 4], [4, 5], [5, 6]] # Multiple predictors
y = [2, 4, 5, 4, 5]
result = rss.multiple_regression(X, y)
print(f"R² = {result['r_squared']:.3f}")
print(f"Coefficients: {result['coefficients']}")
Probability & Distributions
Q: How do I calculate probabilities for normal distribution?
A:
# P(X ≤ x)
prob = rss.normal_cdf(x=100, mu=100, sigma=15)
# P(X > x) = 1 - P(X ≤ x)
prob = 1 - rss.normal_cdf(x=100, mu=100, sigma=15)
# P(a < X < b)
prob = rss.normal_cdf(b, mu, sigma) - rss.normal_cdf(a, mu, sigma)
Q: What’s the difference between PDF and CDF?
A:
PDF (Probability Density Function): Height of distribution curve
CDF (Cumulative Distribution Function): Area under curve up to x
For probabilities, use CDF:
# Probability that X ≤ 1.96 for standard normal
prob = rss.normal_cdf(1.96, 0, 1) # ≈ 0.975
Q: How do I find critical values?
A:
# For normal distribution (z-score)
z_critical = rss.normal_ppf(0.975, 0, 1) # 1.96 for 95% CI
# For chi-square
chi_critical = rss.critical_chi_square_value(alpha=0.05, df=5)
🔄 Advanced Topics
Q: What’s the difference between bootstrap and permutation test?
A:
Bootstrap: Estimates uncertainty (confidence intervals)
Permutation test: Tests hypotheses (p-values)
# Bootstrap for CI
result = rss.bootstrap(data, np.mean, n_iterations=1000)
print(f"95% CI: {result['confidence_interval']}")
# Permutation test for hypothesis
result = rss.permutation_test(group1, group2,
lambda d1, d2: np.mean(d1) - np.mean(d2))
print(f"p-value: {result['p_value']}")
Q: When should I use Bayesian vs. frequentist methods?
A:
Frequentist (t-tests, p-values): Traditional, widely accepted
Bayesian: Incorporates prior knowledge, gives probability of hypotheses
Use Bayesian when:
You have prior information
You want probability statements about parameters
You need to update beliefs with new data
# Bayesian update
post_alpha, post_beta = rss.beta_binomial_update(
prior_alpha=1, prior_beta=1, # Uniform prior
successes=7, trials=10
)
# Credible interval (Bayesian CI)
lower, upper = rss.credible_interval('beta',
{'alpha': post_alpha, 'beta': post_beta})
Q: What’s PCA and when should I use it?
A: PCA (Principal Component Analysis) reduces dimensions while preserving variance.
Use when:
You have many correlated variables
You want to visualize high-dimensional data
You need to reduce multicollinearity
result = rss.pca(X, n_components=2)
print(f"Explained variance: {result['explained_variance']}")
Effect Sizes
Q: Why do I need effect sizes?
A: P-values tell you if an effect exists; effect sizes tell you how large it is.
Example:
# Significant but small effect
t_stat, p_value = rss.two_sample_t_test(group1, group2)
d = rss.cohens_d(group1, group2)
print(f"p-value: {p_value:.4f}") # p < 0.05 (significant)
print(f"Cohen's d: {d:.3f}") # d = 0.15 (tiny effect)
Interpretation: Statistically significant but practically meaningless.
Q: Which effect size should I use?
A:
Cohen’s d: Comparing two means
Eta-squared: ANOVA (multiple groups)
Cramér’s V: Categorical data (chi-square)
R²: Regression
# Two groups
d = rss.cohens_d(group1, group2)
# Multiple groups (ANOVA)
eta_sq = rss.eta_squared([group1, group2, group3])
# Categorical
v = rss.cramers_v([[10, 20], [30, 40]])
Q: How do I interpret Cohen’s d?
A:
Small: d ≈ 0.2
Medium: d ≈ 0.5
Large: d ≈ 0.8
d = rss.cohens_d(group1, group2)
interpretation = rss.interpret_effect_size(d, 'd')
print(f"Cohen's d = {d:.3f} ({interpretation})")
🔧 Technical Questions
Q: Are the functions vectorized?
A: Yes, most functions use NumPy internally for efficient computation.
Q: Can I use this with pandas DataFrames?
A: Yes! Convert columns to lists or arrays:
import pandas as pd
import real_simple_stats as rss
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
# Method 1: Convert to list
mean_A = rss.mean(df['A'].tolist())
# Method 2: Use values (NumPy array)
mean_A = rss.mean(df['A'].values)
# Regression
slope, intercept, *_ = rss.linear_regression(df['A'].values, df['B'].values)
Q: How accurate are the calculations?
A: Real Simple Stats uses SciPy and NumPy for numerical computations, which are industry-standard and highly accurate. Results match those from R, SPSS, and other statistical software.
Q: Can I use this for production/research?
A: Yes! The package is:
Well-tested (86% code coverage)
Based on established statistical methods
Uses reliable numerical libraries (SciPy, NumPy)
Documented with references
However, always validate results for critical applications.
Q: Is this package maintained?
A: Yes! Check the GitHub repository for:
Latest updates
Issue tracking
Contribution guidelines
Educational Questions
Q: Can I use this for teaching?
A: Absolutely! Real Simple Stats is designed for education:
Clear function names
Comprehensive docstrings
Step-by-step examples
Educational focus over performance
Q: Is there a textbook or course that uses this?
A: While not tied to a specific textbook, Real Simple Stats aligns with standard introductory statistics curricula. See INTERACTIVE_EXAMPLES.md for tutorials.
Q: How does this compare to R or SPSS?
A:
Simpler: Easier to learn than R
More accessible: Free and open-source (unlike SPSS)
Python-based: Integrates with data science ecosystem
Educational: Designed for learning, not just analysis
See MIGRATION_GUIDE.md for detailed comparisons.
🐛 Troubleshooting
Q: I get “ModuleNotFoundError: No module named ‘real_simple_stats’”
A: Install the package:
pip install real-simple-stats
Make sure you’re using the correct package name (with hyphens).
Q: Functions return unexpected results
A: Check:
Data format: Are you passing lists/arrays?
Sample vs. population: Using correct function?
Parameter order: Check docstring with
help(rss.function_name)
# Check documentation
help(rss.two_sample_t_test)
Q: I get “ValueError: Input arrays must have the same length”
A: For paired tests and correlation, ensure both arrays have the same length:
# Wrong
x = [1, 2, 3]
y = [4, 5] # Different length!
# Correct
x = [1, 2, 3]
y = [4, 5, 6] # Same length
Q: Plots don’t show up
A:
import matplotlib.pyplot as plt
import real_simple_stats as rss
rss.plot_normal_histogram(data)
plt.show() # Add this!
Q: I get warnings about “divide by zero”
A: This can happen with:
Empty datasets
Zero variance (all values the same)
Zero expected frequencies (chi-square)
Check your data:
data = [5, 5, 5, 5, 5]
std = rss.sample_std_dev(data) # Will be 0
Best Practices
Q: What’s the recommended workflow?
A:
Explore data: Use descriptive statistics
Visualize: Create plots
Test hypotheses: Run appropriate tests
Calculate effect sizes: Assess practical significance
Report results: Include all relevant statistics
import real_simple_stats as rss
# 1. Descriptive statistics
print(rss.five_number_summary(data))
# 2. Visualize
rss.plot_box_plot(data)
# 3. Test
t_stat, p_value = rss.one_sample_t_test(data, mu0=100)
# 4. Effect size
d = rss.cohens_d(data, [100]*len(data))
# 5. Report
print(f"t({len(data)-1}) = {t_stat:.2f}, p = {p_value:.3f}, d = {d:.2f}")
Q: How should I report results?
A: Include:
Test statistic and degrees of freedom
P-value
Effect size
Confidence interval (when appropriate)
Example:
"A two-sample t-test revealed a significant difference between groups,
t(18) = 2.45, p = .025, d = 0.73, 95% CI [0.5, 3.2]."
Q: Should I correct for multiple comparisons?
A: Yes, if you’re running multiple tests on the same dataset. Common methods:
Bonferroni correction: Divide α by number of tests
False Discovery Rate (FDR)
# 3 tests, use α = 0.05/3 = 0.0167
alpha_corrected = 0.05 / 3
Additional Resources
Q: Where can I learn more?
A:
Documentation: ReadTheDocs
Examples: Interactive Tutorials
API Reference: Function Comparison
Math Details: Mathematical Formulas
Q: How do I report bugs or request features?
A:
Check existing issues
Create a new issue with:
Description of problem/feature
Example code (if applicable)
Expected vs. actual behavior
Q: Can I contribute?
A: Yes! See CONTRIBUTING.md for guidelines.
📞 Still Have Questions?
GitHub Issues: Ask a question
Documentation: Full docs
Examples: Interactive tutorials
Last Updated: 2025 Version: 0.3.0