Descriptive Statistics
The descriptive_statistics module provides functions for calculating basic statistical measures that describe the central tendency, variability, and distribution of datasets.
- real_simple_stats.descriptive_statistics.coefficient_of_variation(values: Sequence[float]) float[source]
- real_simple_stats.descriptive_statistics.detect_fake_statistics(survey_sponsor: str, is_voluntary: bool, correlation_not_causation: bool) list[str][source]
Detect potential issues with statistical claims or studies.
- Parameters:
survey_sponsor – Organization sponsoring the survey/study
is_voluntary – Whether the survey uses voluntary response sampling
correlation_not_causation – Whether correlation is being presented as causation
- Returns:
List of warning messages about potential statistical issues
Example
>>> detect_fake_statistics("Diet Pill Company", True, True) ['Potential bias: Self-funded study', 'Warning: Voluntary response samples are biased', 'Warning: Correlation does not imply causation']
- real_simple_stats.descriptive_statistics.draw_cumulative_frequency_table(values: Sequence[int]) dict[int, int][source]
Generate a cumulative frequency table from a list of discrete values.
- Parameters:
values – List of discrete integer values
- Returns:
Dictionary mapping each unique value to its cumulative frequency
Example
>>> draw_cumulative_frequency_table([1, 2, 1, 3, 2, 1]) {1: 3, 2: 5, 3: 6}
- real_simple_stats.descriptive_statistics.draw_frequency_table(values: Sequence[str | int]) dict[str | int, int][source]
Generate a frequency table from a list of categorical or discrete values.
- Parameters:
values – List of categorical or discrete values to count
- Returns:
Dictionary mapping each unique value to its frequency
Example
>>> draw_frequency_table(['A', 'B', 'A', 'C', 'B', 'A']) {'A': 3, 'B': 2, 'C': 1}
- real_simple_stats.descriptive_statistics.five_number_summary(values: Sequence[float]) dict[str, float][source]
Return the five-number summary: min, Q1, median, Q3, max.
- Parameters:
values – List of numerical values
- Returns:
min, Q1, median, Q3, max
- Return type:
Dictionary with keys
- Raises:
ValueError – If the input list is empty
Example
>>> five_number_summary([1, 2, 3, 4, 5]) {'min': 1, 'Q1': 1.5, 'median': 3, 'Q3': 4.5, 'max': 5} >>> five_number_summary([5]) {'min': 5, 'Q1': 5, 'median': 5, 'Q3': 5, 'max': 5}
- real_simple_stats.descriptive_statistics.interquartile_range(values: Sequence[float]) float[source]
- real_simple_stats.descriptive_statistics.is_continuous(values: Sequence[float]) bool[source]
Determine if a variable is continuous (contains non-integer values).
- Parameters:
values – List of numerical values to check
- Returns:
True if any values are non-integers, False if all are integers
Example
>>> is_continuous([1.5, 2.0, 3.0]) True >>> is_continuous([1.0, 2.0, 3.0]) False
- real_simple_stats.descriptive_statistics.is_discrete(values: Sequence[float]) bool[source]
Determine if a variable is discrete (all values are integers).
- Parameters:
values – List of numerical values to check
- Returns:
True if all values are integers, False otherwise
Example
>>> is_discrete([1.0, 2.0, 3.0]) True >>> is_discrete([1.5, 2.0, 3.0]) False
- real_simple_stats.descriptive_statistics.mean(values: Sequence[float]) float[source]
Calculate the arithmetic mean (average) of a dataset.
- Parameters:
values – List of numerical values
- Returns:
The arithmetic mean
- Raises:
ValueError – If the input list is empty
Example
>>> mean([1, 2, 3, 4, 5]) 3.0
- real_simple_stats.descriptive_statistics.median(values: Sequence[float]) float[source]
Calculate the median (middle value) of a dataset.
- Parameters:
values – List of numerical values
- Returns:
The median value
- Raises:
ValueError – If the input list is empty
Example
>>> median([1, 2, 3, 4, 5]) 3.0 >>> median([1, 2, 3, 4]) 2.5
- real_simple_stats.descriptive_statistics.sample_std_dev(values: Sequence[float]) float[source]
Calculate the sample standard deviation of a dataset.
- Parameters:
values – List of numerical values
- Returns:
The sample standard deviation (square root of sample variance)
- Raises:
ValueError – If fewer than 2 values are provided
Example
>>> sample_std_dev([1, 2, 3, 4, 5]) 1.5811388300841898
- real_simple_stats.descriptive_statistics.sample_variance(values: Sequence[float]) float[source]
Calculate the sample variance of a dataset.
Uses the sample variance formula with (n-1) degrees of freedom (Bessel’s correction).
- Parameters:
values – List of numerical values
- Returns:
The sample variance
- Raises:
ValueError – If fewer than 2 values are provided
Example
>>> sample_variance([1, 2, 3, 4, 5]) 2.5
Functions Overview
Central Tendency
- real_simple_stats.descriptive_statistics.mean(values: Sequence[float]) float[source]
Calculate the arithmetic mean (average) of a dataset.
- Parameters:
values – List of numerical values
- Returns:
The arithmetic mean
- Raises:
ValueError – If the input list is empty
Example
>>> mean([1, 2, 3, 4, 5]) 3.0
- real_simple_stats.descriptive_statistics.median(values: Sequence[float]) float[source]
Calculate the median (middle value) of a dataset.
- Parameters:
values – List of numerical values
- Returns:
The median value
- Raises:
ValueError – If the input list is empty
Example
>>> median([1, 2, 3, 4, 5]) 3.0 >>> median([1, 2, 3, 4]) 2.5
Variability
- real_simple_stats.descriptive_statistics.sample_variance(values: Sequence[float]) float[source]
Calculate the sample variance of a dataset.
Uses the sample variance formula with (n-1) degrees of freedom (Bessel’s correction).
- Parameters:
values – List of numerical values
- Returns:
The sample variance
- Raises:
ValueError – If fewer than 2 values are provided
Example
>>> sample_variance([1, 2, 3, 4, 5]) 2.5
Usage Examples
Basic Statistics
Calculate common descriptive statistics for a dataset:
from real_simple_stats import descriptive_statistics as desc
# Sample dataset
data = [12, 15, 18, 20, 22, 25, 28, 30, 32, 35]
# Central tendency
mean_val = desc.mean(data)
median_val = desc.median(data)
mode_val = desc.mode(data)
print(f"Mean: {mean_val}")
print(f"Median: {median_val}")
print(f"Mode: {mode_val}")
# Variability
variance_val = desc.variance(data)
std_dev = desc.standard_deviation(data)
cv = desc.coefficient_of_variation(data)
print(f"Variance: {variance_val:.2f}")
print(f"Standard Deviation: {std_dev:.2f}")
print(f"Coefficient of Variation: {cv:.2f}%")
Population vs Sample Statistics
Understanding the difference between population and sample statistics:
# Same dataset, different calculations
sample_data = [85, 90, 78, 92, 88, 76, 95, 82, 89, 91]
# Population statistics (when you have the entire population)
pop_variance = desc.variance(sample_data)
pop_std = desc.standard_deviation(sample_data)
# Sample statistics (when you have a sample from a larger population)
sample_variance = desc.sample_variance(sample_data)
sample_std = desc.sample_standard_deviation(sample_data)
print("Population Statistics:")
print(f" Variance: {pop_variance:.2f}")
print(f" Standard Deviation: {pop_std:.2f}")
print("Sample Statistics:")
print(f" Variance: {sample_variance:.2f}")
print(f" Standard Deviation: {sample_std:.2f}")
Error Handling
The functions include comprehensive error handling:
import real_simple_stats.descriptive_statistics as desc
# Empty dataset
try:
result = desc.mean([])
except ValueError as e:
print(f"Error: {e}")
# Single value for sample statistics
try:
result = desc.sample_variance([42])
except ValueError as e:
print(f"Error: {e}")
# Five-number summary works with small datasets too
summary_single = desc.five_number_summary([5])
# For a single value, all stats equal that value
summary_two = desc.five_number_summary([1, 2])
# With two values, Q1=min and Q3=max
# Non-numeric data
try:
result = desc.mean([1, 2, "three", 4])
except TypeError as e:
print(f"Error: {e}")
Mathematical Background
Mean (Arithmetic Average)
The arithmetic mean is the sum of all values divided by the number of values:
Where: - \(\bar{x}\) is the sample mean - \(n\) is the number of observations - \(x_i\) is the i-th observation
Median
The median is the middle value when data is arranged in ascending order:
For odd n: median = middle value
For even n: median = average of two middle values
Variance
Population Variance:
Sample Variance:
Standard Deviation
The standard deviation is the square root of the variance:
Population: \(\sigma = \sqrt{\sigma^2}\)
Sample: \(s = \sqrt{s^2}\)
Coefficient of Variation
The coefficient of variation expresses the standard deviation as a percentage of the mean:
This allows comparison of variability between datasets with different units or scales.
See Also
probability_utils - For probability calculations
hypothesis_testing - For statistical testing
../tutorials/basic_statistics - Tutorial on descriptive statistics