Troubleshooting Guide

Solutions to common errors and issues when using Real Simple Stats.


🚨 Installation Issues

Error: “ModuleNotFoundError: No module named ‘real_simple_stats’”

Symptoms:

import real_simple_stats as rss
# ModuleNotFoundError: No module named 'real_simple_stats'

Solutions:

  1. Install the package:

    pip install real-simple-stats
    
  2. Check installation:

    pip list | grep real-simple-stats
    
  3. Verify Python environment:

    which python
    which pip
    
  4. For Jupyter/Colab:

    !pip install real-simple-stats
    import real_simple_stats as rss
    

Error: “pip: command not found”

Solution:

# Try pip3 instead
pip3 install real-simple-stats

# Or use python -m pip
python -m pip install real-simple-stats

Error: “Permission denied” during installation

Solution:

# Install for current user only
pip install --user real-simple-stats

# Or use virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install real-simple-stats

Error: Package version conflicts

Symptoms:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.

Solutions:

  1. Upgrade pip:

    pip install --upgrade pip
    
  2. Use virtual environment:

    python -m venv clean_env
    source clean_env/bin/activate
    pip install real-simple-stats
    
  3. Check dependencies:

    pip show real-simple-stats
    

Import Errors

Error: “ImportError: cannot import name ‘function_name’”

Symptoms:

from real_simple_stats import nonexistent_function
# ImportError: cannot import name 'nonexistent_function'

Solutions:

  1. Check function name:

    import real_simple_stats as rss
    print(dir(rss))  # List all available functions
    
  2. Use correct import:

    # Correct
    from real_simple_stats import mean, median
    
    # Or
    import real_simple_stats as rss
    rss.mean([1, 2, 3])
    
  3. Check version:

    import real_simple_stats
    print(real_simple_stats.__version__)
    

Error: “AttributeError: module ‘real_simple_stats’ has no attribute ‘X’”

Cause: Function doesn’t exist or typo in name.

Solution:

# Check available functions
import real_simple_stats as rss
help(rss)

# Common typos:
# Wrong: rss.standard_deviation()
# Right: rss.sample_std_dev()

# Wrong: rss.ttest()
# Right: rss.two_sample_t_test()

Data Input Errors

Error: “TypeError: ‘int’ object is not iterable”

Symptoms:

rss.mean(5)
# TypeError: 'int' object is not iterable

Cause: Passing single value instead of list/array.

Solution:

# Wrong
rss.mean(5)

# Correct
rss.mean([5])
rss.mean([1, 2, 3, 4, 5])

Error: “ValueError: Input arrays must have the same length”

Symptoms:

x = [1, 2, 3]
y = [4, 5]
rss.pearson_correlation(x, y)
# ValueError: Input arrays must have the same length

Cause: Mismatched array lengths for paired operations.

Solution:

# Check lengths
print(f"x length: {len(x)}, y length: {len(y)}")

# Ensure same length
x = [1, 2, 3]
y = [4, 5, 6]  # Same length as x
rss.pearson_correlation(x, y)

Error: “ValueError: Data must contain at least one element”

Symptoms:

rss.mean([])
# ValueError: Data must contain at least one element

Cause: Empty dataset.

Solution:

# Check if data is empty
data = []
if len(data) > 0:
    mean = rss.mean(data)
else:
    print("No data to analyze")

# Or use try-except
try:
    mean = rss.mean(data)
except ValueError as e:
    print(f"Error: {e}")

Error: “TypeError: unsupported operand type(s)”

Symptoms:

data = ['1', '2', '3']
rss.mean(data)
# TypeError: unsupported operand type(s) for +: 'int' and 'str'

Cause: Non-numeric data (strings, None, etc.).

Solution:

# Convert strings to numbers
data = ['1', '2', '3']
data_numeric = [float(x) for x in data]
rss.mean(data_numeric)

# Handle missing values
data = [1, 2, None, 4, 5]
data_clean = [x for x in data if x is not None]
rss.mean(data_clean)

# With pandas
import pandas as pd
df = pd.DataFrame({'values': [1, 2, None, 4, 5]})
clean_data = df['values'].dropna().tolist()
rss.mean(clean_data)

🔢 Numerical Errors

Warning: “RuntimeWarning: invalid value encountered in divide”

Symptoms:

data = [5, 5, 5, 5, 5]
rss.sample_std_dev(data)
# RuntimeWarning: invalid value encountered in divide

Cause: Division by zero (e.g., zero variance).

Solution:

# Check for constant data
data = [5, 5, 5, 5, 5]
if len(set(data)) == 1:
    print("All values are the same (zero variance)")
else:
    std = rss.sample_std_dev(data)

# Or handle the warning
import warnings
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    std = rss.sample_std_dev(data)

Error: “ZeroDivisionError: division by zero”

Symptoms:

rss.coefficient_of_variation([0, 0, 0])
# ZeroDivisionError: division by zero

Cause: Mean is zero (CV = std/mean).

Solution:

data = [0, 0, 0]
mean_val = rss.mean(data)

if mean_val == 0:
    print("Cannot calculate CV when mean is zero")
else:
    cv = rss.coefficient_of_variation(data)

Error: “ValueError: math domain error”

Symptoms:

rss.normal_pdf(-1, 0, -1)
# ValueError: math domain error

Cause: Invalid parameters (e.g., negative standard deviation).

Solution:

# Check parameters
mu = 0
sigma = 1  # Must be positive!

if sigma <= 0:
    raise ValueError("Standard deviation must be positive")

result = rss.normal_pdf(x, mu, sigma)

Warning: “RuntimeWarning: overflow encountered”

Cause: Very large numbers in calculations.

Solution:

# Use log-scale for large factorials
import math
log_factorial = math.lgamma(n + 1)

# Or limit input ranges
if n > 170:
    print("Value too large for factorial")

Statistical Test Errors

Error: “ValueError: Degrees of freedom must be positive”

Symptoms:

rss.one_sample_t_test([1], mu0=0)
# ValueError: Degrees of freedom must be positive

Cause: Sample size too small (n=1 gives df=0).

Solution:

data = [1, 2, 3]  # Need at least 2 observations
if len(data) < 2:
    print("Need at least 2 observations for t-test")
else:
    t_stat, p_value = rss.one_sample_t_test(data, mu0=0)

Error: “ValueError: Observed and expected lists must be the same length”

Symptoms:

observed = [10, 20, 30]
expected = [15, 25]
rss.chi_square_statistic(observed, expected)
# ValueError: Observed and expected lists must be the same length

Solution:

# Ensure same length
observed = [10, 20, 30]
expected = [15, 25, 35]  # Same length

# Or check first
if len(observed) != len(expected):
    raise ValueError("Lengths must match")

Issue: P-value is NaN or inf

Cause: Numerical instability or invalid test conditions.

Solutions:

  1. Check data validity:

    import numpy as np
    
    # Check for NaN or inf
    if any(np.isnan(data)) or any(np.isinf(data)):
        print("Data contains NaN or inf values")
    
  2. Check variance:

    # Zero variance causes issues
    if rss.sample_variance(data) == 0:
        print("Zero variance - all values are identical")
    
  3. Check sample size:

    if len(data) < 3:
        print("Sample size too small for reliable inference")
    

🎨 Plotting Errors

Error: “No module named ‘matplotlib’”

Solution:

pip install matplotlib

Issue: Plots don’t show

Symptoms:

rss.plot_normal_histogram(data)
# Nothing appears

Solutions:

  1. Add plt.show():

    import matplotlib.pyplot as plt
    import real_simple_stats as rss
    
    rss.plot_normal_histogram(data)
    plt.show()  # Add this!
    
  2. For Jupyter:

    %matplotlib inline
    import real_simple_stats as rss
    
    rss.plot_normal_histogram(data)
    
  3. Check backend:

    import matplotlib
    print(matplotlib.get_backend())
    
    # Change if needed
    matplotlib.use('TkAgg')  # or 'Qt5Agg', 'MacOSX'
    

Error: “UserWarning: No artists with labels found”

Cause: Legend called but no labels defined.

Solution:

# This is just a warning, can be ignored
# Or suppress it:
import warnings
warnings.filterwarnings('ignore', category=UserWarning)

🔄 Advanced Function Errors

Error: “LinAlgError: Singular matrix”

Symptoms:

X = [[1, 2], [2, 4], [3, 6]]  # Perfectly correlated
y = [1, 2, 3]
rss.multiple_regression(X, y)
# LinAlgError: Singular matrix

Cause: Perfect multicollinearity in predictors.

Solution:

# Check correlation between predictors
import numpy as np
X_array = np.array(X)
corr_matrix = np.corrcoef(X_array.T)
print(corr_matrix)

# Remove perfectly correlated variables
# Or use regularization (not in this package)

Error: “ValueError: n_components must be <= min(n_samples, n_features)”

Symptoms:

X = [[1, 2], [3, 4]]  # 2 samples, 2 features
rss.pca(X, n_components=3)
# ValueError: n_components must be <= 2

Solution:

n_samples, n_features = len(X), len(X[0])
max_components = min(n_samples, n_features)

n_components = min(desired_components, max_components)
result = rss.pca(X, n_components=n_components)

Issue: Bootstrap/Permutation tests are slow

Cause: Too many iterations.

Solutions:

  1. Reduce iterations for testing:

    # Fast (for testing)
    result = rss.bootstrap(data, np.mean, n_iterations=100)
    
    # Accurate (for final analysis)
    result = rss.bootstrap(data, np.mean, n_iterations=10000)
    
  2. Use progress indicator:

    from tqdm import tqdm
    import numpy as np
    
    results = []
    for _ in tqdm(range(n_iterations)):
        sample = np.random.choice(data, size=len(data), replace=True)
        results.append(np.mean(sample))
    

Result Interpretation Issues

Issue: “Unexpected p-value”

Checklist:

  1. Using correct test (one-sample vs. two-sample)?

  2. Data in correct format?

  3. Assumptions met (normality, equal variance)?

  4. Using two-tailed vs. one-tailed correctly?

Debug:

# Check descriptive statistics
print(f"Group 1: mean={rss.mean(group1)}, std={rss.sample_std_dev(group1)}")
print(f"Group 2: mean={rss.mean(group2)}, std={rss.sample_std_dev(group2)}")

# Visualize
rss.plot_box_plot(group1)
rss.plot_box_plot(group2)

# Check assumptions
# (normality tests not in this package - use scipy.stats.shapiro)

Issue: “Effect size doesn’t match p-value”

This is normal! P-value depends on sample size; effect size doesn’t.

Example:

# Large sample, small effect
group1 = [100] * 1000
group2 = [100.1] * 1000
t_stat, p_value = rss.two_sample_t_test(group1, group2)
d = rss.cohens_d(group1, group2)

print(f"p-value: {p_value:.4f}")  # Very small (significant)
print(f"Cohen's d: {d:.3f}")      # Very small (trivial effect)

Lesson: Always report both!


🔧 Performance Issues

Issue: Functions are slow

Solutions:

  1. Use NumPy arrays:

    import numpy as np
    
    # Slower
    data_list = list(range(10000))
    
    # Faster
    data_array = np.array(data_list)
    
  2. Reduce bootstrap/permutation iterations:

    # Faster
    result = rss.bootstrap(data, np.mean, n_iterations=1000)
    
    # Slower but more accurate
    result = rss.bootstrap(data, np.mean, n_iterations=10000)
    
  3. Profile your code:

    import time
    
    start = time.time()
    result = rss.some_function(data)
    print(f"Time: {time.time() - start:.2f}s")
    

🐛 Debugging Strategies

General Debugging Workflow

  1. Check data:

    print(f"Data type: {type(data)}")
    print(f"Data length: {len(data)}")
    print(f"First few values: {data[:5]}")
    print(f"Data range: {min(data)} to {max(data)}")
    
  2. Check for missing values:

    import numpy as np
    if any(x is None for x in data):
        print("Contains None values")
    if any(np.isnan(x) for x in data):
        print("Contains NaN values")
    
  3. Verify function signature:

    help(rss.function_name)
    
  4. Test with simple data:

    # Use known values
    simple_data = [1, 2, 3, 4, 5]
    result = rss.mean(simple_data)  # Should be 3.0
    
  5. Enable detailed errors:

    import traceback
    
    try:
        result = rss.some_function(data)
    except Exception as e:
        traceback.print_exc()
    

📞 Getting Help

Before asking for help:

  1. Read error message carefully

  2. Check this troubleshooting guide

  3. Review FAQ

  4. Check API documentation

  5. Search existing issues

When reporting issues:

Include:

  • Error message (full traceback)

  • Code to reproduce (minimal example)

  • Expected behavior

  • Actual behavior

  • Environment (Python version, OS, package version)

Template:

import real_simple_stats as rss

# Minimal reproducible example
data = [1, 2, 3, 4, 5]
result = rss.some_function(data)

# Error:
# [paste full error message]

# Expected: [describe expected result]
# Actual: [describe actual result]

# Environment:
# Python 3.9.0
# real-simple-stats 0.3.0
# macOS 12.0

Additional Resources


Prevention Tips

Best Practices to Avoid Errors

  1. Validate input data:

    def validate_data(data):
        if not data:
            raise ValueError("Data is empty")
        if not all(isinstance(x, (int, float)) for x in data):
            raise TypeError("Data must be numeric")
        return True
    
  2. Use type hints:

    from typing import List
    
    def my_analysis(data: List[float]) -> float:
        return rss.mean(data)
    
  3. Handle exceptions gracefully:

    try:
        result = rss.two_sample_t_test(group1, group2)
    except ValueError as e:
        print(f"Invalid input: {e}")
    except Exception as e:
        print(f"Unexpected error: {e}")
    
  4. Document your assumptions:

    # Assumes:
    # - Data is normally distributed
    # - Equal variances
    # - Independent samples
    t_stat, p_value = rss.two_sample_t_test(group1, group2)
    

Last Updated: 2025 Version: 0.3.0

Still stuck? Open an issue on GitHub!