Working Draft — LGIT framework unpublished. Shared for feedback only. Please do not cite or distribute without permission.

Methodology

How we extract LGIT-relevant data from HILDA Statistical Reports without microdata access

Key Principle: No Microdata Required

All data in this pilot is extracted from published tables and figures in the annual HILDA Statistical Reports. We do not use HILDA microdata, which requires application and ethical approval. This demonstrates what's possible with publicly available aggregate statistics.

Data Sources

HILDA Statistical Report 2025
Chapters 3, 4, 8, 11 · Data through 2023
Complete
HILDA Statistical Report 2024
Chapters 3, 4, 8, 11 · Data through 2022
Complete
HILDA Statistical Report 2023
Chapters 3, 4, 8, 11 · Data through 2021
Complete
HILDA Statistical Report 2022
Chapters 3, 4, 8, 11 · Data through 2020
Complete

Extraction Process

1. Table Identification

We identify tables containing time-series data relevant to LGIT constructs: distress (K10), income, inequality, job security, time stress, housing stress.

2. Manual Extraction

Values are manually transcribed from PDF tables into structured TypeScript. Each observation includes source_item_id for traceability.

3. LGIT Tagging

Each series is tagged with LGIT construct types: grievance_condition, shock_exposure, legitimacy_indicator, vulnerability_marker, etc.

4. Cross-Validation

Where the same measure appears in multiple reports, we use the most recent report's value (which may be revised from earlier publications).

LGIT Construct Mapping

LGIT ConstructHILDA MeasuresChapter
Psychological DistressK10 scores, very high distress rateCh 8
Material DeprivationFinancial stress, poverty rates, incomeCh 3
Housing Stress30/40 rule, housing affordabilityCh 3
Job InsecurityJob loss probability, perceived insecurityCh 4
Time PressureTime stress, WFH patterns, commute burdenCh 11
InequalityGini coefficient, P90/P10 ratiosCh 3

Limitations

No Individual-Level Analysis

Without microdata, we cannot perform individual-level regression, panel analysis, or causal inference. All analysis is at the aggregate level.

Publication Lag

HILDA reports are published ~12 months after data collection. The 2025 report contains data through 2023.

Price Base Differences

Income figures use different price bases across reports (Dec 2020 vs Dec 2023). Direct comparisons require adjustment.

Technical Implementation

// Evidence layer location
apps/institute/src/lib/pilot-hilda-reports/
├── data/
│   ├── registry.ts      // 56 series definitions
│   └── observations.ts  // 481 observations
├── types.ts             // TypeScript interfaces
├── synthesis.ts         // Analysis utilities
└── index.ts             // Public API