Contributing to Real Simple Stats
We welcome contributions to Real Simple Stats! This guide will help you get started with contributing to the project.
Code Quality Standards
We maintain high code quality standards. All contributions must meet these requirements:
Code Style
Formatting: Code is automatically formatted with Black (88 character line length)
Linting: Must pass Flake8 linting with our configuration
Type Hints: All functions must have comprehensive type annotations
Docstrings: All public functions must have Google-style docstrings
Example of properly formatted function:
def calculate_mean(values: List[float]) -> float:
"""Calculate the arithmetic mean of a list of values.
Args:
values: List of numeric values to calculate mean for.
Must contain at least one value.
Returns:
The arithmetic mean of the input values.
Raises:
ValueError: If the input list is empty.
Example:
>>> calculate_mean([1, 2, 3, 4, 5])
3.0
"""
if not values:
raise ValueError("Cannot calculate mean of empty list")
return sum(values) / len(values)
Testing Requirements
Test Coverage: New code should maintain or improve test coverage
Test Types: Include unit tests for all new functions
Edge Cases: Test error conditions and edge cases
Documentation: Test examples in docstrings should work
Example test structure:
def test_calculate_mean():
"""Test mean calculation with various inputs."""
# Test normal case
assert calculate_mean([1, 2, 3, 4, 5]) == 3.0
# Test edge cases
assert calculate_mean([5]) == 5.0
assert calculate_mean([1.5, 2.5]) == 2.0
# Test error conditions
with pytest.raises(ValueError):
calculate_mean([])
Quality Checks
Before submitting, ensure all quality checks pass:
make format-check # Check code formatting
make lint # Check code style
make type-check # Check type annotations
make test # Run all tests
make test-cov # Run tests with coverage report
Or run everything at once:
make quality
Types of Contributions
Bug Reports
When reporting bugs, please include:
Clear description of the issue
Steps to reproduce the problem
Expected vs actual behavior
Environment details (Python version, OS, package version)
Minimal code example that demonstrates the issue
Feature Requests
For new features, please:
Check existing issues to avoid duplicates
Describe the use case and why it’s needed
Provide examples of how it would be used
Consider implementation complexity
Code Contributions
We welcome various types of code contributions:
- New Statistical Functions
Implement additional statistical tests
Add new probability distributions
Extend descriptive statistics
- Performance Improvements
Optimize existing algorithms
Add vectorized operations
Improve memory efficiency
- Documentation
Improve existing documentation
Add examples and tutorials
Fix typos and clarify explanations
- Testing
Increase test coverage
Add integration tests
Improve test quality
- Infrastructure
Improve build processes
Enhance CI/CD pipelines
Update development tools
Coding Guidelines
Function Design
Single Responsibility: Each function should do one thing well
Clear Naming: Use descriptive names that explain what the function does
Input Validation: Validate inputs and provide clear error messages
Educational Value: Include mathematical explanations in docstrings
Statistical Accuracy
Verify Formulas: Ensure statistical formulas are mathematically correct
Test Against Known Values: Compare results with established statistical software
Handle Edge Cases: Consider what happens with small samples, extreme values, etc.
Document Assumptions: Clearly state any assumptions made by the function
Error Handling
Meaningful Messages: Error messages should help users understand what went wrong
Appropriate Exceptions: Use standard Python exceptions (ValueError, TypeError, etc.)
Input Validation: Check inputs early and provide clear feedback
Example:
if not isinstance(values, (list, tuple, np.ndarray)):
raise TypeError("Values must be a list, tuple, or numpy array")
if len(values) == 0:
raise ValueError("Cannot calculate statistics for empty dataset")
if not all(isinstance(x, (int, float)) for x in values):
raise ValueError("All values must be numeric (int or float)")
Documentation Standards
Docstring Format
We use Google-style docstrings:
def function_name(param1: Type1, param2: Type2) -> ReturnType:
"""Brief description of what the function does.
Longer description if needed, explaining the mathematical
background or implementation details.
Args:
param1: Description of first parameter.
param2: Description of second parameter.
Returns:
Description of return value.
Raises:
ExceptionType: Description of when this exception is raised.
Example:
>>> function_name(arg1, arg2)
expected_output
Note:
Any additional notes about usage or mathematical background.
"""
Code Comments
Explain Why: Comments should explain why something is done, not what is done
Mathematical Context: Explain statistical concepts and formulas
Complex Logic: Break down complex calculations with comments
Release Process
Version Numbers
We follow semantic versioning (MAJOR.MINOR.PATCH):
MAJOR: Breaking changes to the API
MINOR: New features, backward compatible
PATCH: Bug fixes, backward compatible
Changelog
All changes are documented in the changelog with:
Added: New features
Changed: Changes in existing functionality
Deprecated: Soon-to-be removed features
Removed: Removed features
Fixed: Bug fixes
Security: Security improvements
Getting Help
If you need help with contributing:
Check Documentation: Read through this guide and the API documentation
Ask Questions: Open a GitHub issue with the “question” label
Join Discussions: Participate in GitHub discussions
Review Examples: Look at existing code for patterns and style
Communication
Be Respectful: Follow our code of conduct
Be Patient: Maintainers review contributions in their spare time
Be Descriptive: Provide clear descriptions in issues and pull requests
Be Collaborative: We’re all working together to improve the project
Recognition
Contributors are recognized in:
README: Major contributors listed
Changelog: Contributors credited for their changes
Documentation: Authors acknowledged in relevant sections
Thank you for contributing to Real Simple Stats!