The Malawi Integrated Household
Survey (IHS) series uses complex sampling designs rather than Simple
Random Sampling (SRS). To obtain unbiased, population-representative
estimates, survey weights, stratification, and clustering must be
accounted for. This vignette describes how to set up and use survey
designs with ihsMW.
To reduce fieldwork costs and improve accuracy, the Malawi National Statistical Office (NSO) designs the IHS using a stratified two-stage cluster sample. - Strata: Usually defined by districts split into urban and rural areas. - Primary Sampling Units (PSUs): The Enumeration Areas (EAs) selected in the first stage. - Survey Weights: Inverted probabilities of selection, adjusted for non-response.
Because different households have different probabilities of selection (e.g. rural households might be over- or under-sampled relative to urban ones), unweighted statistics will be biased. Standard errors computed without accounting for clustering and stratification will also be incorrectly narrow.
The ihs_svydesign() function creates a survey design
object by wrapping survey::svydesign(). It automatically
detects standard weight, strata, and PSU columns inside your harmonised
dataset:
library(ihsMW)
library(haven)
# Load and harmonise IHS5 data
raw_data <- read_dta("path/to/IHS5/hh_mod_a_filt.dta")
harmonised_data <- ihs_harmonise(raw_data, round = "IHS5")
# Create survey design object
# Automatically detects: hh_wgt/hhweight, stratum/strata, and ea_id/psu
design <- ihs_svydesign(harmonised_data)If the standard columns are named differently in your data, you can specify them explicitly:
Once you have the survey design object, you can compute
representative statistics using the survey package:
library(survey)
# Nationally representative mean of household size
svymean(~hhsize, design = design, na.rm = TRUE)
# Nationally representative total of expenditure
svytotal(~food_exp, design = design, na.rm = TRUE)
# Calculate means grouped by a factor variable (e.g., region)
svyby(~food_exp, ~region, design = design, svymean, na.rm = TRUE)If you prefer dplyr-like syntax, the srvyr
package works seamlessly with the survey design objects generated by
ihs_svydesign():
ihsMW provides ihs_report(), which computes
a clean summary statistics table for publication. It supports survey
weights directly:
# Generate a summary statistics table with survey weights
report_tbl <- ihs_report(
data = harmonised_data,
vars = c("hhsize", "food_exp", "nonfood_exp"),
weights = "hh_wgt"
)
print(report_tbl)You can also compute these weighted tables grouped by another variable:
[ ] or dplyr::filter()
before creating the survey design, as this breaks the cluster/strata
structure and results in incorrect standard errors. Instead, define the
design on the full dataset first, and then use subset()
from the survey package or srvyr::filter() on
the design object.weighted.mean()) without specifying clustering (PSUs)
and stratification will yield correct point estimates but incorrect
(usually too small) standard errors. Always use a full survey design
object for inference.