| Title: | Clean and Harmonise 'Malawi Integrated Household Survey' Data |
|---|---|
| Description: | An offline suite of tools to clean, aggregate, and harmonise data from the 'Malawi Integrated Household Survey' ('IHS'). Provides crop-specific unit conversions, stratified winsorization, and automatic cross-round harmonisation for complex survey designs. |
| Authors: | Vitumbiko Kayuni [aut, cre] (ORCID: <https://orcid.org/0009-0008-4331-3839>) |
| Maintainer: | Vitumbiko Kayuni <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.2.1 |
| Built: | 2026-06-08 20:36:19 UTC |
| Source: | https://github.com/vituk123/ihsmw |
Automatically detects variable types and applies sensible aggregations (e.g., 'sum' for continuous quantities, 'max' or logical OR for dummies). Throws warnings for ambiguous columns rather than failing silently.
ihs_aggregate(data, group_col = "case_id")ihs_aggregate(data, group_col = "case_id")
data |
A data.frame at the individual or plot level |
group_col |
The column name identifying the household (e.g., "case_id" or "y4_hhid") |
A data.frame aggregated to the household level
This wrapper function applies standard cleaning procedures to Malawi IHS data. It handles missing value conversions, winsorization of continuous variables, and returns an audit log of all transformations applied.
ihs_clean( data, winsorize_vars = NULL, winsorize_by = NULL, probs = c(0.01, 0.99) )ihs_clean( data, winsorize_vars = NULL, winsorize_by = NULL, probs = c(0.01, 0.99) )
data |
A data.frame (typically loaded from a '.dta' file) |
winsorize_vars |
Character vector of continuous variables to winsorize (e.g., consumption, harvest) |
winsorize_by |
Optional character string of a grouping variable (e.g., region) for stratified winsorization |
probs |
Numeric vector of length 2 specifying the lower and upper quantiles for winsorization. Default is 'c(0.01, 0.99)'. |
A data.frame with cleaning applied. The returned object has an 'ihs_audit' attribute containing a log of modifications.
Converts reported harvest units (e.g., Pails, Oxcarts, Heaps) into standard kilograms using official NSO crop-specific conversion factors.
ihs_convert_units(data, qty_col, unit_col, crop_col, unmapped = "warn")ihs_convert_units(data, qty_col, unit_col, crop_col, unmapped = "warn")
data |
A data.frame |
qty_col |
The name of the column containing the quantity |
unit_col |
The name of the column containing the unit code or name |
crop_col |
The name of the column containing the crop code |
unmapped |
Action to take when a unit cannot be mapped: '"warn"' (default), '"error"', or '"ignore"'. |
A data.frame with a new qty_col_kg column.
Evaluates the completeness and comparability of variables across the available IHS rounds (IHS2, IHS3, IHS4, IHS5) using the bundled crosswalk.
ihs_crosswalk_check(verbose = TRUE)ihs_crosswalk_check(verbose = TRUE)
verbose |
Logical. If |
A tibble containing the full crosswalk. If verbose
is TRUE, also prints a summary.
## Not run: # Check the crosswalk and print a report cw <- ihs_crosswalk_check() ## End(Not run)## Not run: # Check the crosswalk and print a report cw <- ihs_crosswalk_check() ## End(Not run)
Takes a raw data.frame loaded from a Malawi IHS survey round (e.g. from a '.dta' file) and renames its columns to the standard harmonised variable names defined in the crosswalk.
ihs_harmonise(data, round = "IHS5", extra = FALSE)ihs_harmonise(data, round = "IHS5", extra = FALSE)
data |
A data.frame, typically read from a '.dta' file using |
round |
A character string specifying the IHS round (e.g., |
extra |
Logical. If FALSE (default), drops columns that are not in the harmonisation crosswalk or standard ID columns. If TRUE, keeps all original columns. |
A data.frame with columns renamed to standard 'harmonised_name's where applicable.
Searches the manual harmonisation crosswalk bundled within ihsMW for specific variables.
ihs_search(keyword, round = NULL, fields = c("name", "label", "module"))ihs_search(keyword, round = NULL, fields = c("name", "label", "module"))
keyword |
A single search string to find (case-insensitive). |
round |
Limits search to a specific round. Valid inputs are |
fields |
A character vector of fields to include in the search. Valid fields are |
A tibble with cross-round harmonised search results.
ihs_search("consumption") ihs_search("expenditure", round = "IHS5") ihs_search("age", fields = c("name", "label"))ihs_search("consumption") ihs_search("expenditure", round = "IHS5") ihs_search("age", fields = c("name", "label"))
Converts common negative missing codes (like -99 for "Refused" or -98 for "Don't Know") into standard R 'NA' values to prevent them from skewing numeric calculations.
ihs_standardize_missing(data)ihs_standardize_missing(data)
data |
A data.frame |
A data.frame with missing values standardized
Caps extreme outliers at specified percentiles. Crucially, this function allows for stratified winsorization (e.g., by region) to avoid over-trimming poor/rich areas, and it creates new '_w' suffixed columns to preserve raw data provenance.
ihs_winsorize(data, vars, by = NULL, probs = c(0.01, 0.99))ihs_winsorize(data, vars, by = NULL, probs = c(0.01, 0.99))
data |
A data.frame |
vars |
Character vector of column names to winsorize |
by |
Optional grouping variable name (e.g., "region") for stratified thresholds |
probs |
Numeric vector of lower and upper quantiles. Default 'c(0.01, 0.99)' |
A data.frame with new '*_w' columns added.