| Title: | Iterative Proportional Fitting |
|---|---|
| Description: | Fast raking for survey weighting. The computational core is written in Rust for speed. Supports multiple raking variables, automatic variable selection, weight bounding, and comprehensive diagnostics. |
| Authors: | Christopher T. Kenny [aut, cre] (ORCID: <https://orcid.org/0000-0002-9386-6860>) |
| Maintainer: | Christopher T. Kenny <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.0.0.9000 |
| Built: | 2026-06-08 06:30:30 UTC |
| Source: | https://github.com/christopherkenny/ipf |
A subset of the 2024 American National Election Study (ANES) Time Series face-to-face sample, containing demographic and vote choice variables for 966 respondents. Useful for demonstrating survey raking workflows.
anes24anes24
A tibble with 966 rows and 7 columns:
Two-letter US state abbreviation. NA for respondents whose state is not identified (106 missing).
Respondent sex: "Male" or "Female" (5 missing).
Race/ethnicity: "White", "Black", "Hispanic", "Asian", or "Other" (11 missing).
Household income bracket: "Under $50k", "$50k-$100k", or "Over $100k" (47 missing).
Education: "Less than HS", "High school", "Some college", "Bachelor's", or "Graduate" (451 missing).
Marital status: "Married", "Widowed", "Divorced", "Separated", or "Never married" (277 missing).
2024 presidential vote choice: "Harris" or "Trump" (335 missing).
https://electionstudies.org/data-center/2024-time-series-study/
American National Election Studies. 2025. ANES 2024 Time Series Study Full Release (dataset and documentation). August 8, 2025 version. https://www.electionstudies.org/
Returns the original data frame with a .weight column appended.
## S3 method for class 'ipf_rake' augment(x, ...)## S3 method for class 'ipf_rake' augment(x, ...)
x |
An |
... |
Additional arguments (ignored). |
A tibble with all original columns plus .weight.
data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) augment(result)data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) augment(result)
The design effect (deff) measures the variance inflation factor due to unequal weighting.
The effective sample size is n / deff.
design_effect(weights)design_effect(weights)
weights |
Numeric weight vector. |
A list with deff (design effect) and n_eff (effective sample size).
w <- c(1.2, 0.8, 1.5, 0.5, 1.0) design_effect(w)w <- c(1.2, 0.8, 1.5, 0.5, 1.0) design_effect(w)
Calculates discrepancy between the current weighted distribution and target distributions for each variable, then aggregates using the chosen method.
find_discrepant_vars( data, targets, weights, choosemethod = "total", na_method = c("exclude", "bucket") )find_discrepant_vars( data, targets, weights, choosemethod = "total", na_method = c("exclude", "bucket") )
data |
Data frame. |
targets |
Named list of named numeric target vectors (proportions). |
weights |
Numeric weight vector. |
choosemethod |
Method for aggregating per-category discrepancies.
One of |
na_method |
How to handle |
Named numeric vector of aggregate discrepancy per variable.
data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)), age = sample(c('young', 'old'), 100, replace = TRUE, prob = c(0.7, 0.3)) ) targets <- list( gender = c(M = 0.5, F = 0.5), age = c(young = 0.6, old = 0.4) ) find_discrepant_vars(data, targets, weights = rep(1, 100))data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)), age = sample(c('young', 'old'), 100, replace = TRUE, prob = c(0.7, 0.3)) ) targets <- list( gender = c(M = 0.5, F = 0.5), age = c(young = 0.6, old = 0.4) ) find_discrepant_vars(data, targets, weights = rep(1, 100))
Returns a single-row tibble with summary statistics.
## S3 method for class 'ipf_rake' glance(x, ...)## S3 method for class 'ipf_rake' glance(x, ...)
x |
An |
... |
Additional arguments (ignored). |
A single-row tibble with columns: converged, iterations, max_prop_err, deff, n_eff, n_obs, n_vars.
data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) glance(result)data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) glance(result)
Print an ipf_rake object
## S3 method for class 'ipf_rake' print(x, ...)## S3 method for class 'ipf_rake' print(x, ...)
x |
An |
... |
Additional arguments (ignored). |
Invisibly returns x.
data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) print(result)data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) print(result)
Adjusts survey weights so that weighted marginal distributions match known population targets. Supports automatic variable selection, iterative re-raking, and weight bounding.
rake( data, targets, base_weights = NULL, cap = 5, bounds = NULL, type = c("nolim", "pctlim", "nlim"), pctlim = 0.05, nlim = 5L, choosemethod = c("total", "max", "average", "totalsquared", "maxsquared", "averagesquared"), na_method = c("exclude", "bucket"), iterate = TRUE, max_iter = 1000L, tol = 1e-06, verbose = FALSE, diagnostics_every = 0L )rake( data, targets, base_weights = NULL, cap = 5, bounds = NULL, type = c("nolim", "pctlim", "nlim"), pctlim = 0.05, nlim = 5L, choosemethod = c("total", "max", "average", "totalsquared", "maxsquared", "averagesquared"), na_method = c("exclude", "bucket"), iterate = TRUE, max_iter = 1000L, tol = 1e-06, verbose = FALSE, diagnostics_every = 0L )
data |
A data frame or tibble containing the survey data. |
targets |
A named list of named numeric vectors specifying target proportions for each raking variable.
Names of the list must match column names in |
base_weights |
Optional numeric vector of base (design) weights.
If |
cap |
Maximum weight value (ratio cap).
Weights exceeding this value are trimmed and all weights are renormalized.
Default |
bounds |
Optional numeric vector of length 2, |
type |
Variable selection method:
|
pctlim |
Discrepancy threshold for |
nlim |
Number of variables for |
choosemethod |
Method for aggregating per-category discrepancies into a single variable score.
One of |
na_method |
How to handle |
iterate |
Logical.
If |
max_iter |
Maximum number of raking iterations.
Default |
tol |
Convergence tolerance (max proportional error).
Default |
verbose |
Logical.
If |
diagnostics_every |
Record per-margin diagnostics every |
An ipf_rake object (S3 class) containing:
weights: final raked weight vector
data: the input data frame
converged: logical
iterations: number of iterations
max_prop_err: final max proportional error
targets: normalized targets used
vars_used: character vector of variables raked on
base_weights: original base weights
type, choosemethod, na_method, cap: settings used
deff, n_eff: design effect and effective sample size
diagnostics: tibble of per-iteration diagnostics
data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)), age = sample(c('young', 'old'), 100, replace = TRUE, prob = c(0.7, 0.3)) ) targets <- list( gender = c(M = 0.5, F = 0.5), age = c(young = 0.6, old = 0.4) ) result <- rake(data, targets) print(result)data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)), age = sample(c('young', 'old'), 100, replace = TRUE, prob = c(0.7, 0.3)) ) targets <- list( gender = c(M = 0.5, F = 0.5), age = c(young = 0.6, old = 0.4) ) result <- rake(data, targets) print(result)
Produces a detailed summary including per-variable diagnostic tables showing target vs. achieved distributions.
## S3 method for class 'ipf_rake' summary(object, ...)## S3 method for class 'ipf_rake' summary(object, ...)
object |
An |
... |
Additional arguments (ignored). |
Invisibly returns a list with convergence info, weight summary, design effect, and per-variable assessment tibbles.
data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) summary(result)data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) summary(result)
Returns a one-row-per-variable-per-level tibble with target proportions, weighted proportions, and discrepancy.
## S3 method for class 'ipf_rake' tidy(x, ...)## S3 method for class 'ipf_rake' tidy(x, ...)
x |
An |
... |
Additional arguments (ignored). |
A tibble with columns: variable, level, target, weighted_pct, discrepancy.
data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) tidy(result)data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) tidy(result)
Produces a per-variable diagnostic table comparing target distributions to unweighted and weighted distributions.
weight_assess( data, targets, weights, base_weights = NULL, na_method = c("exclude", "bucket") )weight_assess( data, targets, weights, base_weights = NULL, na_method = c("exclude", "bucket") )
data |
Data frame. |
targets |
Named list of named numeric target vectors (proportions). |
weights |
Final raked weight vector. |
base_weights |
Original base weights before raking.
If |
na_method |
How to handle |
Named list of tibbles, one per variable.
data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) weight_assess(data, targets, result$weights)data <- data.frame( gender = sample(c('M', 'F'), 100, replace = TRUE, prob = c(0.6, 0.4)) ) targets <- list(gender = c(M = 0.5, F = 0.5)) result <- rake(data, targets) weight_assess(data, targets, result$weights)