Troubleshooting Guide
Solutions to common errors and issues when using Real Simple Stats.
🚨 Installation Issues
Error: “ModuleNotFoundError: No module named ‘real_simple_stats’”
Symptoms:
import real_simple_stats as rss
# ModuleNotFoundError: No module named 'real_simple_stats'
Solutions:
Install the package:
pip install real-simple-stats
Check installation:
pip list | grep real-simple-stats
Verify Python environment:
which python which pip
For Jupyter/Colab:
!pip install real-simple-stats import real_simple_stats as rss
Error: “pip: command not found”
Solution:
# Try pip3 instead
pip3 install real-simple-stats
# Or use python -m pip
python -m pip install real-simple-stats
Error: “Permission denied” during installation
Solution:
# Install for current user only
pip install --user real-simple-stats
# Or use virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install real-simple-stats
Error: Package version conflicts
Symptoms:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.
Solutions:
Upgrade pip:
pip install --upgrade pip
Use virtual environment:
python -m venv clean_env source clean_env/bin/activate pip install real-simple-stats
Check dependencies:
pip show real-simple-stats
Import Errors
Error: “ImportError: cannot import name ‘function_name’”
Symptoms:
from real_simple_stats import nonexistent_function
# ImportError: cannot import name 'nonexistent_function'
Solutions:
Check function name:
import real_simple_stats as rss print(dir(rss)) # List all available functions
Use correct import:
# Correct from real_simple_stats import mean, median # Or import real_simple_stats as rss rss.mean([1, 2, 3])
Check version:
import real_simple_stats print(real_simple_stats.__version__)
Error: “AttributeError: module ‘real_simple_stats’ has no attribute ‘X’”
Cause: Function doesn’t exist or typo in name.
Solution:
# Check available functions
import real_simple_stats as rss
help(rss)
# Common typos:
# Wrong: rss.standard_deviation()
# Right: rss.sample_std_dev()
# Wrong: rss.ttest()
# Right: rss.two_sample_t_test()
Data Input Errors
Error: “TypeError: ‘int’ object is not iterable”
Symptoms:
rss.mean(5)
# TypeError: 'int' object is not iterable
Cause: Passing single value instead of list/array.
Solution:
# Wrong
rss.mean(5)
# Correct
rss.mean([5])
rss.mean([1, 2, 3, 4, 5])
Error: “ValueError: Input arrays must have the same length”
Symptoms:
x = [1, 2, 3]
y = [4, 5]
rss.pearson_correlation(x, y)
# ValueError: Input arrays must have the same length
Cause: Mismatched array lengths for paired operations.
Solution:
# Check lengths
print(f"x length: {len(x)}, y length: {len(y)}")
# Ensure same length
x = [1, 2, 3]
y = [4, 5, 6] # Same length as x
rss.pearson_correlation(x, y)
Error: “ValueError: Data must contain at least one element”
Symptoms:
rss.mean([])
# ValueError: Data must contain at least one element
Cause: Empty dataset.
Solution:
# Check if data is empty
data = []
if len(data) > 0:
mean = rss.mean(data)
else:
print("No data to analyze")
# Or use try-except
try:
mean = rss.mean(data)
except ValueError as e:
print(f"Error: {e}")
Error: “TypeError: unsupported operand type(s)”
Symptoms:
data = ['1', '2', '3']
rss.mean(data)
# TypeError: unsupported operand type(s) for +: 'int' and 'str'
Cause: Non-numeric data (strings, None, etc.).
Solution:
# Convert strings to numbers
data = ['1', '2', '3']
data_numeric = [float(x) for x in data]
rss.mean(data_numeric)
# Handle missing values
data = [1, 2, None, 4, 5]
data_clean = [x for x in data if x is not None]
rss.mean(data_clean)
# With pandas
import pandas as pd
df = pd.DataFrame({'values': [1, 2, None, 4, 5]})
clean_data = df['values'].dropna().tolist()
rss.mean(clean_data)
🔢 Numerical Errors
Warning: “RuntimeWarning: invalid value encountered in divide”
Symptoms:
data = [5, 5, 5, 5, 5]
rss.sample_std_dev(data)
# RuntimeWarning: invalid value encountered in divide
Cause: Division by zero (e.g., zero variance).
Solution:
# Check for constant data
data = [5, 5, 5, 5, 5]
if len(set(data)) == 1:
print("All values are the same (zero variance)")
else:
std = rss.sample_std_dev(data)
# Or handle the warning
import warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
std = rss.sample_std_dev(data)
Error: “ZeroDivisionError: division by zero”
Symptoms:
rss.coefficient_of_variation([0, 0, 0])
# ZeroDivisionError: division by zero
Cause: Mean is zero (CV = std/mean).
Solution:
data = [0, 0, 0]
mean_val = rss.mean(data)
if mean_val == 0:
print("Cannot calculate CV when mean is zero")
else:
cv = rss.coefficient_of_variation(data)
Error: “ValueError: math domain error”
Symptoms:
rss.normal_pdf(-1, 0, -1)
# ValueError: math domain error
Cause: Invalid parameters (e.g., negative standard deviation).
Solution:
# Check parameters
mu = 0
sigma = 1 # Must be positive!
if sigma <= 0:
raise ValueError("Standard deviation must be positive")
result = rss.normal_pdf(x, mu, sigma)
Warning: “RuntimeWarning: overflow encountered”
Cause: Very large numbers in calculations.
Solution:
# Use log-scale for large factorials
import math
log_factorial = math.lgamma(n + 1)
# Or limit input ranges
if n > 170:
print("Value too large for factorial")
Statistical Test Errors
Error: “ValueError: Degrees of freedom must be positive”
Symptoms:
rss.one_sample_t_test([1], mu0=0)
# ValueError: Degrees of freedom must be positive
Cause: Sample size too small (n=1 gives df=0).
Solution:
data = [1, 2, 3] # Need at least 2 observations
if len(data) < 2:
print("Need at least 2 observations for t-test")
else:
t_stat, p_value = rss.one_sample_t_test(data, mu0=0)
Error: “ValueError: Observed and expected lists must be the same length”
Symptoms:
observed = [10, 20, 30]
expected = [15, 25]
rss.chi_square_statistic(observed, expected)
# ValueError: Observed and expected lists must be the same length
Solution:
# Ensure same length
observed = [10, 20, 30]
expected = [15, 25, 35] # Same length
# Or check first
if len(observed) != len(expected):
raise ValueError("Lengths must match")
Issue: P-value is NaN or inf
Cause: Numerical instability or invalid test conditions.
Solutions:
Check data validity:
import numpy as np # Check for NaN or inf if any(np.isnan(data)) or any(np.isinf(data)): print("Data contains NaN or inf values")
Check variance:
# Zero variance causes issues if rss.sample_variance(data) == 0: print("Zero variance - all values are identical")
Check sample size:
if len(data) < 3: print("Sample size too small for reliable inference")
🎨 Plotting Errors
Error: “No module named ‘matplotlib’”
Solution:
pip install matplotlib
Issue: Plots don’t show
Symptoms:
rss.plot_normal_histogram(data)
# Nothing appears
Solutions:
Add plt.show():
import matplotlib.pyplot as plt import real_simple_stats as rss rss.plot_normal_histogram(data) plt.show() # Add this!
For Jupyter:
%matplotlib inline import real_simple_stats as rss rss.plot_normal_histogram(data)
Check backend:
import matplotlib print(matplotlib.get_backend()) # Change if needed matplotlib.use('TkAgg') # or 'Qt5Agg', 'MacOSX'
Error: “UserWarning: No artists with labels found”
Cause: Legend called but no labels defined.
Solution:
# This is just a warning, can be ignored
# Or suppress it:
import warnings
warnings.filterwarnings('ignore', category=UserWarning)
🔄 Advanced Function Errors
Error: “LinAlgError: Singular matrix”
Symptoms:
X = [[1, 2], [2, 4], [3, 6]] # Perfectly correlated
y = [1, 2, 3]
rss.multiple_regression(X, y)
# LinAlgError: Singular matrix
Cause: Perfect multicollinearity in predictors.
Solution:
# Check correlation between predictors
import numpy as np
X_array = np.array(X)
corr_matrix = np.corrcoef(X_array.T)
print(corr_matrix)
# Remove perfectly correlated variables
# Or use regularization (not in this package)
Error: “ValueError: n_components must be <= min(n_samples, n_features)”
Symptoms:
X = [[1, 2], [3, 4]] # 2 samples, 2 features
rss.pca(X, n_components=3)
# ValueError: n_components must be <= 2
Solution:
n_samples, n_features = len(X), len(X[0])
max_components = min(n_samples, n_features)
n_components = min(desired_components, max_components)
result = rss.pca(X, n_components=n_components)
Issue: Bootstrap/Permutation tests are slow
Cause: Too many iterations.
Solutions:
Reduce iterations for testing:
# Fast (for testing) result = rss.bootstrap(data, np.mean, n_iterations=100) # Accurate (for final analysis) result = rss.bootstrap(data, np.mean, n_iterations=10000)
Use progress indicator:
from tqdm import tqdm import numpy as np results = [] for _ in tqdm(range(n_iterations)): sample = np.random.choice(data, size=len(data), replace=True) results.append(np.mean(sample))
Result Interpretation Issues
Issue: “Unexpected p-value”
Checklist:
Using correct test (one-sample vs. two-sample)?
Data in correct format?
Assumptions met (normality, equal variance)?
Using two-tailed vs. one-tailed correctly?
Debug:
# Check descriptive statistics
print(f"Group 1: mean={rss.mean(group1)}, std={rss.sample_std_dev(group1)}")
print(f"Group 2: mean={rss.mean(group2)}, std={rss.sample_std_dev(group2)}")
# Visualize
rss.plot_box_plot(group1)
rss.plot_box_plot(group2)
# Check assumptions
# (normality tests not in this package - use scipy.stats.shapiro)
Issue: “Effect size doesn’t match p-value”
This is normal! P-value depends on sample size; effect size doesn’t.
Example:
# Large sample, small effect
group1 = [100] * 1000
group2 = [100.1] * 1000
t_stat, p_value = rss.two_sample_t_test(group1, group2)
d = rss.cohens_d(group1, group2)
print(f"p-value: {p_value:.4f}") # Very small (significant)
print(f"Cohen's d: {d:.3f}") # Very small (trivial effect)
Lesson: Always report both!
🔧 Performance Issues
Issue: Functions are slow
Solutions:
Use NumPy arrays:
import numpy as np # Slower data_list = list(range(10000)) # Faster data_array = np.array(data_list)
Reduce bootstrap/permutation iterations:
# Faster result = rss.bootstrap(data, np.mean, n_iterations=1000) # Slower but more accurate result = rss.bootstrap(data, np.mean, n_iterations=10000)
Profile your code:
import time start = time.time() result = rss.some_function(data) print(f"Time: {time.time() - start:.2f}s")
🐛 Debugging Strategies
General Debugging Workflow
Check data:
print(f"Data type: {type(data)}") print(f"Data length: {len(data)}") print(f"First few values: {data[:5]}") print(f"Data range: {min(data)} to {max(data)}")
Check for missing values:
import numpy as np if any(x is None for x in data): print("Contains None values") if any(np.isnan(x) for x in data): print("Contains NaN values")
Verify function signature:
help(rss.function_name)
Test with simple data:
# Use known values simple_data = [1, 2, 3, 4, 5] result = rss.mean(simple_data) # Should be 3.0
Enable detailed errors:
import traceback try: result = rss.some_function(data) except Exception as e: traceback.print_exc()
📞 Getting Help
Before asking for help:
Read error message carefully
Check this troubleshooting guide
Review FAQ
Check API documentation
Search existing issues
When reporting issues:
Include:
Error message (full traceback)
Code to reproduce (minimal example)
Expected behavior
Actual behavior
Environment (Python version, OS, package version)
Template:
import real_simple_stats as rss
# Minimal reproducible example
data = [1, 2, 3, 4, 5]
result = rss.some_function(data)
# Error:
# [paste full error message]
# Expected: [describe expected result]
# Actual: [describe actual result]
# Environment:
# Python 3.9.0
# real-simple-stats 0.3.0
# macOS 12.0
Additional Resources
FAQ: Common questions
API Reference: Function lookup
Examples: Interactive tutorials
GitHub Issues: Report bugs
Prevention Tips
Best Practices to Avoid Errors
Validate input data:
def validate_data(data): if not data: raise ValueError("Data is empty") if not all(isinstance(x, (int, float)) for x in data): raise TypeError("Data must be numeric") return True
Use type hints:
from typing import List def my_analysis(data: List[float]) -> float: return rss.mean(data)
Handle exceptions gracefully:
try: result = rss.two_sample_t_test(group1, group2) except ValueError as e: print(f"Invalid input: {e}") except Exception as e: print(f"Unexpected error: {e}")
Document your assumptions:
# Assumes: # - Data is normally distributed # - Equal variances # - Independent samples t_stat, p_value = rss.two_sample_t_test(group1, group2)
Last Updated: 2025 Version: 0.3.0
Still stuck? Open an issue on GitHub!