Comparing indicators across
multiple survey rounds is essential for longitudinal policy evaluation.
However, structural differences across the survey instruments pose a
major barrier. This vignette walks through how ihsMW
streamlines cross-round harmonisation and analysis.
Over the years, the Malawi Integrated Household Survey (IHS) questionnaires have evolved. Variables are added, retired, renamed, or relocated to different modules.
For instance, the household size indicator is named
hhsize in some files, hh_size in others, or is
represented by variables counting household members. Similarly, nominal
consumption expenditure variables and agricultural crop names frequently
change, making direct cross-round comparisons error-prone and
tedious.
To resolve this, ihsMW bundles a static crosswalk
database containing mappings for over 5,800 variables across IHS2, IHS3,
IHS4, and IHS5. It acts as a translation layer, mapping round-specific
variable names to consistent, harmonised names.
You can inspect the crosswalk programmatically:
Use ihs_harmonise() to standardise column names in raw
dataframes:
To compile a cross-round dataset for longitudinal analysis, load the data from each round, harmonise them separately, and bind them together:
library(dplyr)
# Load and harmonise IHS4
ihs4_raw <- read_dta("path/to/IHS4/hh_mod_a_filt.dta")
ihs4_harm <- ihs_harmonise(ihs4_raw, round = "IHS4")
# Load and harmonise IHS5
ihs5_raw <- read_dta("path/to/IHS5/hh_mod_a_filt.dta")
ihs5_harm <- ihs_harmonise(ihs5_raw, round = "IHS5")
# Bind rows - ihs_harmonise adds an `ihs_round` column automatically
pooled_data <- bind_rows(ihs4_harm, ihs5_harm)Within a single survey round, information is split across multiple
modules (e.g., household demographics, agriculture, food consumption).
Use ihs_merge() to merge these dataframes:
# Load household demographics and crop harvest modules
hh_demog <- read_dta("path/to/IHS5/hh_mod_a_filt.dta") |> ihs_harmonise("IHS5")
hh_agri <- read_dta("path/to/IHS5/ag_mod_i.dta") |> ihs_harmonise("IHS5")
# Merge modules - automatically detects common ID columns (e.g., case_id)
merged_data <- ihs_merge(hh_demog, hh_agri)ihs_merge() checks if the join type results in
unexpected row expansion and issues a warning if many-to-many joins
occur.
When comparing monetary values (e.g., household consumption, crop
sales) across different years, nominal values must be deflated to
account for inflation. ihs_deflate() uses bundled Malawi
Consumer Price Index (CPI) data to convert nominal values to real
values, with 2019 (IHS5 baseline) as the default reference year:
To verify that the crosswalk mappings are valid and check how many
variables are successfully mapped, use
ihs_crosswalk_check():
Additionally, ihs_panel_ids() returns the standard ID
columns (e.g. household ID, individual ID, enumeration area ID, strata,
weights) for a given round to help you construct panel keys or verify
design structures: