Contributing to Real Simple Stats

We welcome contributions to Real Simple Stats! This guide will help you get started with contributing to the project.

Code Quality Standards

We maintain high code quality standards. All contributions must meet these requirements:

Code Style

  • Formatting: Code is automatically formatted with Black (88 character line length)

  • Linting: Must pass Flake8 linting with our configuration

  • Type Hints: All functions must have comprehensive type annotations

  • Docstrings: All public functions must have Google-style docstrings

Example of properly formatted function:

def calculate_mean(values: List[float]) -> float:
    """Calculate the arithmetic mean of a list of values.

    Args:
        values: List of numeric values to calculate mean for.
               Must contain at least one value.

    Returns:
        The arithmetic mean of the input values.

    Raises:
        ValueError: If the input list is empty.

    Example:
        >>> calculate_mean([1, 2, 3, 4, 5])
        3.0
    """
    if not values:
        raise ValueError("Cannot calculate mean of empty list")
    return sum(values) / len(values)

Testing Requirements

  • Test Coverage: New code should maintain or improve test coverage

  • Test Types: Include unit tests for all new functions

  • Edge Cases: Test error conditions and edge cases

  • Documentation: Test examples in docstrings should work

Example test structure:

def test_calculate_mean():
    """Test mean calculation with various inputs."""
    # Test normal case
    assert calculate_mean([1, 2, 3, 4, 5]) == 3.0

    # Test edge cases
    assert calculate_mean([5]) == 5.0
    assert calculate_mean([1.5, 2.5]) == 2.0

    # Test error conditions
    with pytest.raises(ValueError):
        calculate_mean([])

Quality Checks

Before submitting, ensure all quality checks pass:

make format-check  # Check code formatting
make lint         # Check code style
make type-check   # Check type annotations
make test         # Run all tests
make test-cov     # Run tests with coverage report

Or run everything at once:

make quality

Types of Contributions

Bug Reports

When reporting bugs, please include:

  • Clear description of the issue

  • Steps to reproduce the problem

  • Expected vs actual behavior

  • Environment details (Python version, OS, package version)

  • Minimal code example that demonstrates the issue

Feature Requests

For new features, please:

  • Check existing issues to avoid duplicates

  • Describe the use case and why it’s needed

  • Provide examples of how it would be used

  • Consider implementation complexity

Code Contributions

We welcome various types of code contributions:

New Statistical Functions
  • Implement additional statistical tests

  • Add new probability distributions

  • Extend descriptive statistics

Performance Improvements
  • Optimize existing algorithms

  • Add vectorized operations

  • Improve memory efficiency

Documentation
  • Improve existing documentation

  • Add examples and tutorials

  • Fix typos and clarify explanations

Testing
  • Increase test coverage

  • Add integration tests

  • Improve test quality

Infrastructure
  • Improve build processes

  • Enhance CI/CD pipelines

  • Update development tools

Coding Guidelines

Function Design

  • Single Responsibility: Each function should do one thing well

  • Clear Naming: Use descriptive names that explain what the function does

  • Input Validation: Validate inputs and provide clear error messages

  • Educational Value: Include mathematical explanations in docstrings

Statistical Accuracy

  • Verify Formulas: Ensure statistical formulas are mathematically correct

  • Test Against Known Values: Compare results with established statistical software

  • Handle Edge Cases: Consider what happens with small samples, extreme values, etc.

  • Document Assumptions: Clearly state any assumptions made by the function

Error Handling

  • Meaningful Messages: Error messages should help users understand what went wrong

  • Appropriate Exceptions: Use standard Python exceptions (ValueError, TypeError, etc.)

  • Input Validation: Check inputs early and provide clear feedback

Example:

if not isinstance(values, (list, tuple, np.ndarray)):
    raise TypeError("Values must be a list, tuple, or numpy array")

if len(values) == 0:
    raise ValueError("Cannot calculate statistics for empty dataset")

if not all(isinstance(x, (int, float)) for x in values):
    raise ValueError("All values must be numeric (int or float)")

Documentation Standards

Docstring Format

We use Google-style docstrings:

def function_name(param1: Type1, param2: Type2) -> ReturnType:
    """Brief description of what the function does.

    Longer description if needed, explaining the mathematical
    background or implementation details.

    Args:
        param1: Description of first parameter.
        param2: Description of second parameter.

    Returns:
        Description of return value.

    Raises:
        ExceptionType: Description of when this exception is raised.

    Example:
        >>> function_name(arg1, arg2)
        expected_output

    Note:
        Any additional notes about usage or mathematical background.
    """

Code Comments

  • Explain Why: Comments should explain why something is done, not what is done

  • Mathematical Context: Explain statistical concepts and formulas

  • Complex Logic: Break down complex calculations with comments

Release Process

Version Numbers

We follow semantic versioning (MAJOR.MINOR.PATCH):

  • MAJOR: Breaking changes to the API

  • MINOR: New features, backward compatible

  • PATCH: Bug fixes, backward compatible

Changelog

All changes are documented in the changelog with:

  • Added: New features

  • Changed: Changes in existing functionality

  • Deprecated: Soon-to-be removed features

  • Removed: Removed features

  • Fixed: Bug fixes

  • Security: Security improvements

Getting Help

If you need help with contributing:

  • Check Documentation: Read through this guide and the API documentation

  • Ask Questions: Open a GitHub issue with the “question” label

  • Join Discussions: Participate in GitHub discussions

  • Review Examples: Look at existing code for patterns and style

Communication

  • Be Respectful: Follow our code of conduct

  • Be Patient: Maintainers review contributions in their spare time

  • Be Descriptive: Provide clear descriptions in issues and pull requests

  • Be Collaborative: We’re all working together to improve the project

Recognition

Contributors are recognized in:

  • README: Major contributors listed

  • Changelog: Contributors credited for their changes

  • Documentation: Authors acknowledged in relevant sections

Thank you for contributing to Real Simple Stats!