# Troubleshooting Guide

Solutions to common errors and issues when using Real Simple Stats.

---

## 🚨 Installation Issues

### Error: "ModuleNotFoundError: No module named 'real_simple_stats'"

**Symptoms:**
```python
import real_simple_stats as rss
# ModuleNotFoundError: No module named 'real_simple_stats'
```

**Solutions:**

1. **Install the package:**
   ```bash
   pip install real-simple-stats
   ```

2. **Check installation:**
   ```bash
   pip list | grep real-simple-stats
   ```

3. **Verify Python environment:**
   ```bash
   which python
   which pip
   ```

4. **For Jupyter/Colab:**
   ```python
   !pip install real-simple-stats
   import real_simple_stats as rss
   ```

---

### Error: "pip: command not found"

**Solution:**
```bash
# Try pip3 instead
pip3 install real-simple-stats

# Or use python -m pip
python -m pip install real-simple-stats
```

---

### Error: "Permission denied" during installation

**Solution:**
```bash
# Install for current user only
pip install --user real-simple-stats

# Or use virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install real-simple-stats
```

---

### Error: Package version conflicts

**Symptoms:**
```
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed.
```

**Solutions:**

1. **Upgrade pip:**
   ```bash
   pip install --upgrade pip
   ```

2. **Use virtual environment:**
   ```bash
   python -m venv clean_env
   source clean_env/bin/activate
   pip install real-simple-stats
   ```

3. **Check dependencies:**
   ```bash
   pip show real-simple-stats
   ```

---

## Import Errors

### Error: "ImportError: cannot import name 'function_name'"

**Symptoms:**
```python
from real_simple_stats import nonexistent_function
# ImportError: cannot import name 'nonexistent_function'
```

**Solutions:**

1. **Check function name:**
   ```python
   import real_simple_stats as rss
   print(dir(rss))  # List all available functions
   ```

2. **Use correct import:**
   ```python
   # Correct
   from real_simple_stats import mean, median

   # Or
   import real_simple_stats as rss
   rss.mean([1, 2, 3])
   ```

3. **Check version:**
   ```python
   import real_simple_stats
   print(real_simple_stats.__version__)
   ```

---

### Error: "AttributeError: module 'real_simple_stats' has no attribute 'X'"

**Cause:** Function doesn't exist or typo in name.

**Solution:**
```python
# Check available functions
import real_simple_stats as rss
help(rss)

# Common typos:
# Wrong: rss.standard_deviation()
# Right: rss.sample_std_dev()

# Wrong: rss.ttest()
# Right: rss.two_sample_t_test()
```

---

## Data Input Errors

### Error: "TypeError: 'int' object is not iterable"

**Symptoms:**
```python
rss.mean(5)
# TypeError: 'int' object is not iterable
```

**Cause:** Passing single value instead of list/array.

**Solution:**
```python
# Wrong
rss.mean(5)

# Correct
rss.mean([5])
rss.mean([1, 2, 3, 4, 5])
```

---

### Error: "ValueError: Input arrays must have the same length"

**Symptoms:**
```python
x = [1, 2, 3]
y = [4, 5]
rss.pearson_correlation(x, y)
# ValueError: Input arrays must have the same length
```

**Cause:** Mismatched array lengths for paired operations.

**Solution:**
```python
# Check lengths
print(f"x length: {len(x)}, y length: {len(y)}")

# Ensure same length
x = [1, 2, 3]
y = [4, 5, 6]  # Same length as x
rss.pearson_correlation(x, y)
```

---

### Error: "ValueError: Data must contain at least one element"

**Symptoms:**
```python
rss.mean([])
# ValueError: Data must contain at least one element
```

**Cause:** Empty dataset.

**Solution:**
```python
# Check if data is empty
data = []
if len(data) > 0:
    mean = rss.mean(data)
else:
    print("No data to analyze")

# Or use try-except
try:
    mean = rss.mean(data)
except ValueError as e:
    print(f"Error: {e}")
```

---

### Error: "TypeError: unsupported operand type(s)"

**Symptoms:**
```python
data = ['1', '2', '3']
rss.mean(data)
# TypeError: unsupported operand type(s) for +: 'int' and 'str'
```

**Cause:** Non-numeric data (strings, None, etc.).

**Solution:**
```python
# Convert strings to numbers
data = ['1', '2', '3']
data_numeric = [float(x) for x in data]
rss.mean(data_numeric)

# Handle missing values
data = [1, 2, None, 4, 5]
data_clean = [x for x in data if x is not None]
rss.mean(data_clean)

# With pandas
import pandas as pd
df = pd.DataFrame({'values': [1, 2, None, 4, 5]})
clean_data = df['values'].dropna().tolist()
rss.mean(clean_data)
```

---

## 🔢 Numerical Errors

### Warning: "RuntimeWarning: invalid value encountered in divide"

**Symptoms:**
```python
data = [5, 5, 5, 5, 5]
rss.sample_std_dev(data)
# RuntimeWarning: invalid value encountered in divide
```

**Cause:** Division by zero (e.g., zero variance).

**Solution:**
```python
# Check for constant data
data = [5, 5, 5, 5, 5]
if len(set(data)) == 1:
    print("All values are the same (zero variance)")
else:
    std = rss.sample_std_dev(data)

# Or handle the warning
import warnings
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    std = rss.sample_std_dev(data)
```

---

### Error: "ZeroDivisionError: division by zero"

**Symptoms:**
```python
rss.coefficient_of_variation([0, 0, 0])
# ZeroDivisionError: division by zero
```

**Cause:** Mean is zero (CV = std/mean).

**Solution:**
```python
data = [0, 0, 0]
mean_val = rss.mean(data)

if mean_val == 0:
    print("Cannot calculate CV when mean is zero")
else:
    cv = rss.coefficient_of_variation(data)
```

---

### Error: "ValueError: math domain error"

**Symptoms:**
```python
rss.normal_pdf(-1, 0, -1)
# ValueError: math domain error
```

**Cause:** Invalid parameters (e.g., negative standard deviation).

**Solution:**
```python
# Check parameters
mu = 0
sigma = 1  # Must be positive!

if sigma <= 0:
    raise ValueError("Standard deviation must be positive")

result = rss.normal_pdf(x, mu, sigma)
```

---

### Warning: "RuntimeWarning: overflow encountered"

**Cause:** Very large numbers in calculations.

**Solution:**
```python
# Use log-scale for large factorials
import math
log_factorial = math.lgamma(n + 1)

# Or limit input ranges
if n > 170:
    print("Value too large for factorial")
```

---

## Statistical Test Errors

### Error: "ValueError: Degrees of freedom must be positive"

**Symptoms:**
```python
rss.one_sample_t_test([1], mu0=0)
# ValueError: Degrees of freedom must be positive
```

**Cause:** Sample size too small (n=1 gives df=0).

**Solution:**
```python
data = [1, 2, 3]  # Need at least 2 observations
if len(data) < 2:
    print("Need at least 2 observations for t-test")
else:
    t_stat, p_value = rss.one_sample_t_test(data, mu0=0)
```

---

### Error: "ValueError: Observed and expected lists must be the same length"

**Symptoms:**
```python
observed = [10, 20, 30]
expected = [15, 25]
rss.chi_square_statistic(observed, expected)
# ValueError: Observed and expected lists must be the same length
```

**Solution:**
```python
# Ensure same length
observed = [10, 20, 30]
expected = [15, 25, 35]  # Same length

# Or check first
if len(observed) != len(expected):
    raise ValueError("Lengths must match")
```

---

### Issue: P-value is NaN or inf

**Cause:** Numerical instability or invalid test conditions.

**Solutions:**

1. **Check data validity:**
   ```python
   import numpy as np

   # Check for NaN or inf
   if any(np.isnan(data)) or any(np.isinf(data)):
       print("Data contains NaN or inf values")
   ```

2. **Check variance:**
   ```python
   # Zero variance causes issues
   if rss.sample_variance(data) == 0:
       print("Zero variance - all values are identical")
   ```

3. **Check sample size:**
   ```python
   if len(data) < 3:
       print("Sample size too small for reliable inference")
   ```

---

## 🎨 Plotting Errors

### Error: "No module named 'matplotlib'"

**Solution:**
```bash
pip install matplotlib
```

---

### Issue: Plots don't show

**Symptoms:**
```python
rss.plot_normal_histogram(data)
# Nothing appears
```

**Solutions:**

1. **Add plt.show():**
   ```python
   import matplotlib.pyplot as plt
   import real_simple_stats as rss

   rss.plot_normal_histogram(data)
   plt.show()  # Add this!
   ```

2. **For Jupyter:**
   ```python
   %matplotlib inline
   import real_simple_stats as rss

   rss.plot_normal_histogram(data)
   ```

3. **Check backend:**
   ```python
   import matplotlib
   print(matplotlib.get_backend())

   # Change if needed
   matplotlib.use('TkAgg')  # or 'Qt5Agg', 'MacOSX'
   ```

---

### Error: "UserWarning: No artists with labels found"

**Cause:** Legend called but no labels defined.

**Solution:**
```python
# This is just a warning, can be ignored
# Or suppress it:
import warnings
warnings.filterwarnings('ignore', category=UserWarning)
```

---

## 🔄 Advanced Function Errors

### Error: "LinAlgError: Singular matrix"

**Symptoms:**
```python
X = [[1, 2], [2, 4], [3, 6]]  # Perfectly correlated
y = [1, 2, 3]
rss.multiple_regression(X, y)
# LinAlgError: Singular matrix
```

**Cause:** Perfect multicollinearity in predictors.

**Solution:**
```python
# Check correlation between predictors
import numpy as np
X_array = np.array(X)
corr_matrix = np.corrcoef(X_array.T)
print(corr_matrix)

# Remove perfectly correlated variables
# Or use regularization (not in this package)
```

---

### Error: "ValueError: n_components must be <= min(n_samples, n_features)"

**Symptoms:**
```python
X = [[1, 2], [3, 4]]  # 2 samples, 2 features
rss.pca(X, n_components=3)
# ValueError: n_components must be <= 2
```

**Solution:**
```python
n_samples, n_features = len(X), len(X[0])
max_components = min(n_samples, n_features)

n_components = min(desired_components, max_components)
result = rss.pca(X, n_components=n_components)
```

---

### Issue: Bootstrap/Permutation tests are slow

**Cause:** Too many iterations.

**Solutions:**

1. **Reduce iterations for testing:**
   ```python
   # Fast (for testing)
   result = rss.bootstrap(data, np.mean, n_iterations=100)

   # Accurate (for final analysis)
   result = rss.bootstrap(data, np.mean, n_iterations=10000)
   ```

2. **Use progress indicator:**
   ```python
   from tqdm import tqdm
   import numpy as np

   results = []
   for _ in tqdm(range(n_iterations)):
       sample = np.random.choice(data, size=len(data), replace=True)
       results.append(np.mean(sample))
   ```

---

## Result Interpretation Issues

### Issue: "Unexpected p-value"

**Checklist:**
1. Using correct test (one-sample vs. two-sample)?
2. Data in correct format?
3. Assumptions met (normality, equal variance)?
4. Using two-tailed vs. one-tailed correctly?

**Debug:**
```python
# Check descriptive statistics
print(f"Group 1: mean={rss.mean(group1)}, std={rss.sample_std_dev(group1)}")
print(f"Group 2: mean={rss.mean(group2)}, std={rss.sample_std_dev(group2)}")

# Visualize
rss.plot_box_plot(group1)
rss.plot_box_plot(group2)

# Check assumptions
# (normality tests not in this package - use scipy.stats.shapiro)
```

---

### Issue: "Effect size doesn't match p-value"

**This is normal!** P-value depends on sample size; effect size doesn't.

**Example:**
```python
# Large sample, small effect
group1 = [100] * 1000
group2 = [100.1] * 1000
t_stat, p_value = rss.two_sample_t_test(group1, group2)
d = rss.cohens_d(group1, group2)

print(f"p-value: {p_value:.4f}")  # Very small (significant)
print(f"Cohen's d: {d:.3f}")      # Very small (trivial effect)
```

**Lesson:** Always report both!

---

## 🔧 Performance Issues

### Issue: Functions are slow

**Solutions:**

1. **Use NumPy arrays:**
   ```python
   import numpy as np

   # Slower
   data_list = list(range(10000))

   # Faster
   data_array = np.array(data_list)
   ```

2. **Reduce bootstrap/permutation iterations:**
   ```python
   # Faster
   result = rss.bootstrap(data, np.mean, n_iterations=1000)

   # Slower but more accurate
   result = rss.bootstrap(data, np.mean, n_iterations=10000)
   ```

3. **Profile your code:**
   ```python
   import time

   start = time.time()
   result = rss.some_function(data)
   print(f"Time: {time.time() - start:.2f}s")
   ```

---

## 🐛 Debugging Strategies

### General Debugging Workflow

1. **Check data:**
   ```python
   print(f"Data type: {type(data)}")
   print(f"Data length: {len(data)}")
   print(f"First few values: {data[:5]}")
   print(f"Data range: {min(data)} to {max(data)}")
   ```

2. **Check for missing values:**
   ```python
   import numpy as np
   if any(x is None for x in data):
       print("Contains None values")
   if any(np.isnan(x) for x in data):
       print("Contains NaN values")
   ```

3. **Verify function signature:**
   ```python
   help(rss.function_name)
   ```

4. **Test with simple data:**
   ```python
   # Use known values
   simple_data = [1, 2, 3, 4, 5]
   result = rss.mean(simple_data)  # Should be 3.0
   ```

5. **Enable detailed errors:**
   ```python
   import traceback

   try:
       result = rss.some_function(data)
   except Exception as e:
       traceback.print_exc()
   ```

---

## 📞 Getting Help

### Before asking for help:

1. Read error message carefully
2. Check this troubleshooting guide
3. Review [FAQ](FAQ.md)
4. Check [API documentation](API_COMPARISON.md)
5. Search [existing issues](https://github.com/kylejones200/real_simple_stats/issues)

### When reporting issues:

Include:
- **Error message** (full traceback)
- **Code to reproduce** (minimal example)
- **Expected behavior**
- **Actual behavior**
- **Environment** (Python version, OS, package version)

**Template:**
```python
import real_simple_stats as rss

# Minimal reproducible example
data = [1, 2, 3, 4, 5]
result = rss.some_function(data)

# Error:
# [paste full error message]

# Expected: [describe expected result]
# Actual: [describe actual result]

# Environment:
# Python 3.9.0
# real-simple-stats 0.3.0
# macOS 12.0
```

---

## Additional Resources

- **FAQ**: [Common questions](FAQ.md)
- **API Reference**: [Function lookup](API_COMPARISON.md)
- **Examples**: [Interactive tutorials](INTERACTIVE_EXAMPLES.md)
- **GitHub Issues**: [Report bugs](https://github.com/kylejones200/real_simple_stats/issues)

---

## Prevention Tips

### Best Practices to Avoid Errors

1. **Validate input data:**
   ```python
   def validate_data(data):
       if not data:
           raise ValueError("Data is empty")
       if not all(isinstance(x, (int, float)) for x in data):
           raise TypeError("Data must be numeric")
       return True
   ```

2. **Use type hints:**
   ```python
   from typing import List

   def my_analysis(data: List[float]) -> float:
       return rss.mean(data)
   ```

3. **Handle exceptions gracefully:**
   ```python
   try:
       result = rss.two_sample_t_test(group1, group2)
   except ValueError as e:
       print(f"Invalid input: {e}")
   except Exception as e:
       print(f"Unexpected error: {e}")
   ```

4. **Document your assumptions:**
   ```python
   # Assumes:
   # - Data is normally distributed
   # - Equal variances
   # - Independent samples
   t_stat, p_value = rss.two_sample_t_test(group1, group2)
   ```

---

**Last Updated**: 2025
**Version**: 0.3.0

**Still stuck?** [Open an issue](https://github.com/kylejones200/real_simple_stats/issues) on GitHub!