Oriol Vallès Codina — Net Zero Industrial Policy Lab, Johns Hopkins SAIS
Published
April 10, 2026
Overview
This document analyses the relationship between SHAP feature importance — the predictive weight each HS product code receives in the ML clean-tech competitiveness model — and observed bilateral trade flows (exports, imports, net balance as % of GDP). The central question: do products the Random Forest identifies as predictive of competitiveness correspond to higher actual trade intensity?
The analysis covers the top 20 exporting economies, 10 clean technologies, and 189 HS product codes mapped through the NZIPL Green Dictionary. Models range from simple OLS by technology to hierarchical linear models (HLM) with random slopes and intercepts, controlling for product type, value-chain stage, country GDP, and time trend.
1. Scatterplot Gallery
All static PNG outputs from analysis/scatterplot/ organised by grouping.
Variable / Unit dropdowns: Exports / Imports / Balance × % of clean tech trade / % GDP / % total merchandise
Variable / Unit: Exports / Imports / Balance × % clean trade / % GDP / % total merchandise
Year range sliders: any window within 2018–2024; slide both to a single year for a snapshot
Country / Technology / Function / SHAP Category sidebar filters
Highlight Country: overlay a single country in amber for direct comparison
3. Key Findings
How to read the slope charts (Findings 2–3): The bar height is the OLS regression coefficient from log10(trade % GDP) ~ SHAP mean |z|. A positive slope means products the RF assigns higher importance to are also traded more intensively in practice — the ML model and actual trade are aligned. A negative slope means higher-SHAP products are traded less intensively. For Product Component, the strong positive slope () confirms that component-level supply chain capability is where the RF model’s predictions map most faithfully onto trade reality. For Final Product, the negative slope () is not a model failure — it reflects that final product HS codes (e.g., solar panels, wind turbines) all receive uniformly high* SHAP scores since they are adjacent to the competitive outcome the RF predicts. Actual final-product export intensity is driven by industrial policy, scale, and market access; the highest-SHAP final product codes are those most concentrated in 1–2 dominant exporters (China for PV, Denmark for wind), so among the other top-20 exporters that do export these products, SHAP and export intensity are negatively correlated. Slope magnitude: in log10 units, a slope of +0.15 corresponds to ~40% more exports per 1 SD increase in SHAP; a slope of −5 signals extreme concentration rather than a continuous gradient.
Finding E1 — SHAP importance predicts export intensity across most technologies
In 5 of 10 technology-level OLS regressions the SHAP slope on exports (% GDP) is statistically significant. The strongest fit is Heat Pumps (R² = 0.047, ***); the weakest is Geothermal (R² = 0, ns). This validates the ML model: products assigned high feature importance tend to be products in which top economies actually trade intensively.
Finding E2 — SHAP slope by product function
Product Components show the strongest positive slope (*): the ML model’s importance scores align most faithfully with trade reality at the component level. Final Products** show a significant negative slope — see interpretation note above.
Finding E3 — SHAP slope by value-chain stage
The SHAP–export relationship strengthens progressively from Upstream to Final Product stage, confirming that stage position captures a dimension of supply-chain position beyond product function alone.
Finding E4 — Function × stage interaction heatmap
Which specific cells in the value chain show the tightest ML–trade alignment for exports?
Finding E5 — High-SHAP products traded more intensively, gap widening
Finding I1 — SHAP predicts import intensity more strongly than exports
7 of 10 technology-level import regressions are significant vs 5 for exports. High-SHAP products are frequently capital goods — machinery and process equipment — that countries import to build production capacity before developing domestic export competitiveness.
Finding I2 — Import slope by product function
For imports, the function slope pattern differs from exports: Process Equipment often shows a stronger positive slope because capital goods used in manufacturing (the process chain) are imported even before export capacity develops.
Finding I3 — Import slope by stage
Upstream and midstream stages show stronger import–SHAP alignment than for exports, reflecting countries importing inputs they do not yet produce domestically.
Finding I4 — Function × stage interaction for imports
Finding B1 — SHAP predicts trade balance across technologies
A positive slope means countries with higher-SHAP products run a net surplus in those products — i.e., the ML model identifies products where competitive advantage translates into actual specialisation.
Finding B2 — Balance slope by product function
Finding B3 — Balance slope by stage
Finding B4 — Function × stage interaction for net balance
Cross-cutting: GDP size effect
Larger economies trade more in absolute terms, but after normalising by GDP the relationship is weak — confirming SHAP captures a structural capability signal independent of country scale.
4. Regression Analysis
3.1 Cross-Sectional OLS — by Technology
One OLS regression per technology, cross-section for 2023: log(trade % GDP) ~ SHAP mean |z|.
3.2 Hierarchical Linear Model — Cross-Sectional (2023)
The cross-sectional HLM pools all countries, technologies, and product types in a single model with the full covariate set. The model controls for product type, value-chain stage, and log-GDP, while allowing the SHAP slope to vary by technology (random slope) and the baseline trade level to vary by country (random intercept):
\(u_{0c} \sim \mathcal{N}(0,\sigma_c^2)\) = random country intercept — captures overall trade level not explained by SHAP or GDP
\((u_{0j}, u_{1j}) \sim \mathcal{N}(0, \Sigma_j)\) = random technology intercept + slope — captures how much the SHAP–trade relationship differs across value chains
Fixed-effect coefficients from the cross-sectional HLM (exports outcome). Error bars = ±1.96 SE (95% CI). Green = positive & significant; orange = negative & significant; grey = ns.
Random slopes by technology
Random SHAP slopes by technology (from HLM exports model). Each bar = technology-specific deviation from the global slope. Techs with wider bars show more heterogeneous SHAP-trade relationships.
Random intercepts by country
Random intercepts by country. Positive values = country trades more intensively than predicted by SHAP + GDP alone; negative = less than expected.
Variance decomposition
Variance explained by each random-effects component. A large country share means country-level factors dominate unexplained trade variance.
Random effects variance components table
Random Effects Variance Components — cross-sectional HLM
Metric
Group
Var 1
Var 2
Variance
SD/Corr
exports
iso3
(Intercept)
NA
0.09185
0.3031
exports
tech
(Intercept)
NA
1.08154
1.0400
exports
tech
shap_mean_z
NA
3.20124
1.7892
exports
tech
(Intercept)
shap_mean_z
-1.82582
-0.9812
exports
Residual
NA
NA
0.58346
0.7638
imports
iso3
(Intercept)
NA
0.02095
0.1447
imports
tech
(Intercept)
NA
1.29452
1.1378
imports
tech
shap_mean_z
NA
3.85837
1.9643
imports
tech
(Intercept)
shap_mean_z
-2.19926
-0.9841
imports
Residual
NA
NA
0.50662
0.7118
balance
iso3
(Intercept)
NA
0.00000
0.0022
balance
tech
(Intercept)
NA
0.00003
0.0052
balance
tech
shap_mean_z
NA
0.00009
0.0096
balance
tech
(Intercept)
shap_mean_z
-0.00005
-0.9827
balance
Residual
NA
NA
0.00008
0.0089
3.3 Panel Data Model (2018–2024)
The panel HLM extends the cross-sectional model with a centred year trend and the SHAP×Year interaction, testing whether the SHAP–trade alignment has strengthened over time. Log-GDP is included as a time-varying covariate:
SHAP × Year coefficient: A positive estimate confirms Finding 7 — the alignment between ML-predicted importance and actual trade intensity has strengthened since 2018. This can reflect (a) supply chains maturing and concentrating around economically efficient producers, (b) improving model fit over time, or (c) both.
7 / 10 significant for imports; median R² = 0.019. Best: Heat Pumps (R² = 0.168, ***). Imports outperform exports in 6 / 10 technologies.
1 / 10 significant for net balance. Best: Heat Pumps (R² = 0.081, ***).
What this tells us: SHAP feature importance is a significant cross-sectional predictor of trade intensity in 5 of 10 export regressions and 7 of 10 import regressions, indicating the ML signal captures real patterns in observed trade specialisation rather than statistical noise. Technologies with high R² (e.g. Heat Pumps) show the tightest coupling between ML-predicted production-readiness and observed trade flows. Where import predictability equals or exceeds export predictability, SHAP is detecting countries that import precisely because domestic productive capability is absent — the signal encodes both the presence and the absence of competitive advantage.
SHAP importance: β = 1.2763 (*). A 1-SD increase in SHAP is associated with a 1789% change in exports % GDP.
log(GDP): β = 0.0274 (ns). Each doubling of GDP associates with a 2% increase in export intensity — GDP normalisation does not fully eliminate scale effects.
Significant product function effects: Raw Material (-0.510, ***).
What this tells us: The HLM estimates the SHAP coefficient after absorbing country random intercepts and technology random slopes, so β is not inflated by dominant exporters (e.g. China in solar, Germany in wind turbines). A positive SHAP β confirms that ML-predicted feature importance is a globally valid predictor of export intensity independent of who-you-are and which-value-chain-you-are-in. The persistent log(GDP) coefficient means scale effects are not fully removed by per-GDP normalisation — larger economies export disproportionately more on a relative basis. Significant product function and stage dummies reveal that certain value-chain positions carry structural trade premiums or penalties beyond what SHAP alone predicts.
Variance Decomposition (Exports HLM)
Country intercepts: 2.9% of total variance — country-level factors (policy, history, institutions) dominate unexplained variation.
Technology intercepts: 34.5% — baseline trade levels differ markedly across value chains.
Technology SHAP slopes: 102.2% — the SHAP–trade alignment itself varies across technologies (justifies random-slope specification).
Residual: 18.6% — within-country, within-technology product-level variation unexplained by any covariate.
What this tells us: The variance partition reveals what drives differences in trade intensity that SHAP cannot explain. Country intercepts absorbing the largest share (here 2.9%) means that national-level factors — industrial policy, infrastructure, historical path-dependence, institutional capacity — are the dominant residual driver of trade intensity differences, dwarfing product-level ML scores. Technology intercepts (34.5%) reflect that some value chains (solar, wind) are globally traded at large volumes while others (nuclear, geothermal) are structurally thinner markets. The random-slope variance for SHAP (102.2%) confirms that the SHAP→trade relationship is heterogeneous across technologies, justifying the flexible specification. The residual (18.6%) captures product-level noise that no macro-level covariate can reach.
Panel HLM — Temporal Dynamics (2018–2024)
SHAP importance (main): β = 1.0400 (ns).
Year trend: β = -0.0299 (ns) — trade intensity is falling over 2018–2024 conditional on SHAP.
SHAP × Year: β = 0.0381 (ns) — the SHAP–trade link is tightening: high-SHAP products are becoming more intensively traded relative to low-SHAP products year-on-year.
What this tells us: The panel model tests whether the SHAP–trade alignment has changed over 2018–2024, going beyond the cross-sectional snapshot. Clean technology trade intensity is overall contracting during this period (conditional on SHAP), reflecting the global acceleration in deployment and traded volumes. The SHAP × Year interaction is tightening: the relationship between ML-predicted production capability and observed trade intensity is tightening over time. A tightening interaction means the market is becoming progressively more meritocratic — countries with higher production-readiness are capturing growing trade shares as the energy transition accelerates. A loosening interaction would imply geopolitical fragmentation, industrial policy distortions, or early-mover lock-in are beginning to decouple competitiveness signals from actual trade outcomes.
6. Model Comparison (ANOVA / Likelihood Ratio Tests)
Nested model comparison via likelihood ratio test isolates the contribution of each additional component: SHAP only → + GDP → + type/stage → + random slope. A significant χ² indicates the added terms improve model fit beyond chance.
Model Comparison — Likelihood Ratio Tests (Exports % GDP, cross-sectional 2023)
Model
Df
npar
AIC
BIC
logLik
Chisq
p-value
m0
M0: Null
NA
3
7516.1
7534.2
-3755.1
NA
—
m1
M1: +SHAP
1
4
7497.3
7521.4
-3744.6
20.87
<0.001
m2
M2: +logGDP
1
5
7499.2
7529.4
-3744.6
0.03
0.852
m3
M3: +Type/Stage
9
14
7274.5
7358.8
-3623.2
242.79
<0.001
m4
M4: +RandSlope
3
17
7140.1
7242.5
-3553.1
140.34
<0.001
Interpreting the ANOVA table: Each row tests whether adding the new terms significantly improves fit over the previous model. Significant χ² for M1 validates SHAP as a predictor; M2 shows whether GDP adds explanatory power on top of SHAP; M3 tests whether product type/stage structure matters beyond GDP; M4 tests whether allowing the SHAP slope to vary by technology (random slope) is warranted by the data.
7. Time Trajectories
SHAP feature importance is fixed per (product × technology) — it reflects average predictive weight, not a time-varying signal. Trajectories therefore appear as vertical paths in the scatter: horizontal position (SHAP) is pinned, vertical position (trade intensity) moves year to year.
Solar — exports % GDP trajectories (2018–2024). Each path = one country × product; dot size/opacity increases with year.
8. Special Case: Solar — India
India — Solar value chain SHAP vs Exports (% GDP), 2023.
India’s Solar scatter illustrates the Processing → Final Product gradient clearly: polysilicon and silicon wafer products (upstream, Raw Material · Upstream) cluster at low SHAP scores and low exports, while solar cell and module assembly inputs (Product Component · Downstream) are both higher-SHAP and higher-export. This is consistent with India’s current industrial position — strong in downstream assembly, still building upstream processing capacity.
Source Code
---title: "SHAP Feature Importance vs Trade Flows"subtitle: "Scatterplot Analysis · Regression Results · Random Effects · Key Findings"author: "Oriol Vallès Codina — Net Zero Industrial Policy Lab, Johns Hopkins SAIS"date: todayformat: html: theme: cosmo toc: true toc-depth: 3 toc-location: left toc-title: "Contents" number-sections: false self-contained: true page-layout: full fig-width: 10 fig-height: 6 code-fold: true code-tools: true grid: sidebar-width: 260px body-width: 900px include-in-header: text: | <link rel="preconnect" href="https://fonts.googleapis.com"> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> <link href="https://fonts.googleapis.com/css2?family=Archivo:ital,wght@0,300;0,400;0,600;0,700;1,400&display=swap" rel="stylesheet"> <style> body, p, li, td, th, h1, h2, h3, h4, h5, h6, .toc-title, #TOC, .sidebar-title, blockquote, pre, .caption, input, select, button, label { font-family: 'Archivo', sans-serif !important; } /* ── Remove right margin band ── */ .page-columns .column-margin, .margin-column { display: none !important; } .page-layout-full #quarto-content { padding-right: 0 !important; } /* ── Equations: horizontal scroll on overflow ── */ .math.display, mjx-container[display="true"] { overflow-x: auto; max-width: 100%; } </style>execute-dir: projectexecute: echo: false warning: false message: false---```{r setup}library(arrow); library(dplyr); library(readr); library(tidyr)library(ggplot2); library(ggrepel); library(forcats)library(showtext); library(patchwork)library(knitr); library(kableExtra)has_lme4 <-requireNamespace("lme4", quietly=TRUE)if (has_lme4) library(lme4, warn.conflicts=FALSE)library(htmltools)font_add_google("Archivo", "Archivo")showtext_auto()ROOT <- here::here()YEAR <-2023LTOP_N <-20LIQR_MULT <-3NZ_GREEN <-"#073309"NZ_ORANGE <-"#CB5B1D"TS_COLORS <-c("Raw Material — Upstream"="#8A4200","Processed Material — Upstream"="#C17A2E","Processed Material — Midstream"="#073309","Process Equipment — Midstream"="#5B7266","Product Component — Midstream"="#1A8C28","Product Component — Downstream"="#34D045","Process Equipment — Downstream"="#6B7B8C","Final Product — Final Product"="#CB5B1D")# ── Load metadata ─────────────────────────────────────────────────────────────feat <-read_csv(file.path(ROOT,"data","pc","pc_features.csv"), show_col_types=FALSE) |>filter(!is.na(hs_code)) |>mutate(hs_code =formatC(as.integer(hs_code), width=6, flag="0")) |>select(tech, hs_code, prod_name=description, category, shap_mean_z)ctry <-read_csv(file.path(ROOT,"data","pc","pc_countries.csv"), show_col_types=FALSE) |>select(iso3, country, region)gd_meta <-read_csv(file.path(ROOT,"data","green_dict","green_dictionary.csv"),show_col_types=FALSE) |>mutate(code =formatC(as.integer(code), width=6, flag="0")) |>distinct(code, tech, type, stage) |>mutate(typestage =paste0(type, " \u2014 ", stage))# ── GDP ───────────────────────────────────────────────────────────────────────gdp_path <-c(file.path(ROOT,"data","gdp_long.csv"),file.path(path.expand("~"),"Documents","R","NZIPL","NetZeroValueChainExplorer","data","gdp_long.csv"))gdp_path <- gdp_path[file.exists(gdp_path)][1]gdp <-read_csv(gdp_path, show_col_types=FALSE) |>filter(indicator=="GDP-VAL", year==YEAR) |>select(iso3=country_iso3, gdp_mil_usd=value)gdp_panel_raw <-read_csv(gdp_path, show_col_types=FALSE) |>filter(indicator=="GDP-VAL", year >=2018) |>select(iso3=country_iso3, year, gdp_mil_usd=value)# ── Population ────────────────────────────────────────────────────────────────pop_path <-file.path(path.expand("~"),"Documents","R","NZIPL","NetZeroValueChainExplorer","data","worldbank_population.csv")pop_long <-if (file.exists(pop_path)) {tryCatch({ pw <-read_csv(pop_path, show_col_types=FALSE, skip=3) |>select(iso3=`Country Code`, matches("^[0-9]{4}$")) |>pivot_longer(-iso3, names_to="year", values_to="pop") |>mutate(year=as.integer(year), pop_mil=pop/1e6) pw }, error=function(e) NULL)} elseNULL# ── Trade bilateral cache ─────────────────────────────────────────────────────pc_isos <- ctry$iso3pc_codes <-unique(feat$hs_code)bil <-open_dataset(file.path(ROOT,"cache","bilateral_ds"))exp_df <- bil |>filter(exp_iso3 %in% pc_isos, product_code %in% pc_codes, year==YEAR) |>group_by(iso3=exp_iso3, hs_code=product_code) |>summarise(exports_kusd=sum(value,na.rm=TRUE), .groups="drop") |>collect()imp_df <- bil |>filter(imp_iso3 %in% pc_isos, product_code %in% pc_codes, year==YEAR) |>group_by(iso3=imp_iso3, hs_code=product_code) |>summarise(imports_kusd=sum(value,na.rm=TRUE), .groups="drop") |>collect()trade <-full_join(exp_df, imp_df, by=c("iso3","hs_code")) |>mutate(exports_kusd=coalesce(exports_kusd,0L),imports_kusd=coalesce(imports_kusd,0L),balance_kusd=exports_kusd-imports_kusd) |>left_join(gdp, by="iso3") |>mutate(exports_pct =if_else(gdp_mil_usd>0, exports_kusd*0.1/gdp_mil_usd, NA_real_),imports_pct =if_else(gdp_mil_usd>0, imports_kusd*0.1/gdp_mil_usd, NA_real_),balance_pct =if_else(gdp_mil_usd>0, balance_kusd*0.1/gdp_mil_usd, NA_real_))master <- trade |>filter(exports_kusd>0| imports_kusd>0) |>inner_join(feat, by="hs_code", relationship="many-to-many") |>left_join(gd_meta, by=c("hs_code"="code","tech"), relationship="many-to-many") |>left_join(ctry, by="iso3") |>mutate(typestage =if_else(typestage %in%names(TS_COLORS), typestage, "Unclassified"),type =coalesce(type, category),stage =coalesce(stage, "n/a") ) |>filter(!is.na(country), !is.na(shap_mean_z))# add log GDPmaster <- master |>mutate(log_gdp =if_else(!is.na(gdp_mil_usd) & gdp_mil_usd >0,log(gdp_mil_usd), NA_real_))top20_isos <- master |>group_by(iso3) |>summarise(tot=sum(exports_kusd),.groups="drop") |>slice_max(tot,n=TOP_N) |>pull(iso3)df_top <- master |>filter(iso3 %in% top20_isos)filter_outliers <-function(df, y_col, mult = IQR_MULT) { v <- df[[y_col]]; v_f <- v[is.finite(v)]if (length(v_f) <8) return(df) q25 <-quantile(v_f,.25); q75 <-quantile(v_f,.75); iqr <- q75-q25if (iqr <1e-12) { mu <-mean(v_f); sg <-sd(v_f) lower <- mu - mult*3*sg; upper <- mu + mult*3*sg } else { lower <- q25-mult*iqr; upper <- q75+mult*iqr } df |>filter(is.na(.data[[y_col]]) | (is.finite(.data[[y_col]]) & .data[[y_col]] >= lower & .data[[y_col]] <= upper))}# ── Panel data 2018–2024 ──────────────────────────────────────────────────────exp_panel <- bil |>filter(exp_iso3 %in% top20_isos, product_code %in% pc_codes, year >=2018) |>group_by(iso3=exp_iso3, hs_code=product_code, year) |>summarise(exports_kusd=sum(value,na.rm=TRUE), .groups="drop") |>collect()imp_panel <- bil |>filter(imp_iso3 %in% top20_isos, product_code %in% pc_codes, year >=2018) |>group_by(iso3=imp_iso3, hs_code=product_code, year) |>summarise(imports_kusd=sum(value,na.rm=TRUE), .groups="drop") |>collect()trade_panel <-full_join(exp_panel, imp_panel, by=c("iso3","hs_code","year")) |>mutate(exports_kusd=coalesce(exports_kusd,0), imports_kusd=coalesce(imports_kusd,0),balance_kusd=exports_kusd-imports_kusd) |>left_join(gdp_panel_raw, by=c("iso3","year")) |>mutate(exports_pct =if_else(gdp_mil_usd>0, exports_kusd*0.1/gdp_mil_usd, NA_real_),imports_pct =if_else(gdp_mil_usd>0, imports_kusd*0.1/gdp_mil_usd, NA_real_),balance_pct =if_else(gdp_mil_usd>0, balance_kusd*0.1/gdp_mil_usd, NA_real_))panel_master <- trade_panel |>filter(exports_kusd>0| imports_kusd>0) |>inner_join(feat, by="hs_code", relationship="many-to-many") |>left_join(gd_meta, by=c("hs_code"="code","tech"), relationship="many-to-many") |>left_join(ctry, by="iso3") |>mutate(typestage =if_else(typestage %in%names(TS_COLORS), typestage, "Unclassified"),type =coalesce(type, category),stage =coalesce(stage, "n/a"),year_c =scale(year)[,1],log_gdp =if_else(!is.na(gdp_mil_usd) & gdp_mil_usd >0, log(gdp_mil_usd), NA_real_) ) |>filter(!is.na(country), !is.na(shap_mean_z))# add population to panel if availableif (!is.null(pop_long)) { panel_master <- panel_master |>left_join(pop_long |>select(iso3, year, pop_mil), by=c("iso3","year")) |>mutate(log_pop =if_else(!is.na(pop_mil) & pop_mil >0, log(pop_mil), NA_real_))}```## OverviewThis document analyses the relationship between **SHAP feature importance** — the predictive weight each HS product code receives in the ML clean-tech competitiveness model — and **observed bilateral trade flows** (exports, imports, net balance as % of GDP). The central question: *do products the Random Forest identifies as predictive of competitiveness correspond to higher actual trade intensity?*The analysis covers the top **`r TOP_N` exporting economies**, **10 clean technologies**, and **`r n_distinct(master$hs_code)` HS product codes** mapped through the NZIPL Green Dictionary. Models range from simple OLS by technology to hierarchical linear models (HLM) with random slopes and intercepts, controlling for product type, value-chain stage, country GDP, and time trend.---## 1. Scatterplot Gallery {#gallery}All static PNG outputs from `analysis/scatterplot/` organised by grouping.```{r gallery}#| results: asis#| echo: falseBASE <-"../analysis/scatterplot"techs <-c("batteries","biofuel","electrolyzers","geothermal","heat_pumps","magnets","nuclear","solar","transmission","wind")tech_labels <-c("Batteries","Biofuel","Electrolyzers","Geothermal","Heat Pumps","Magnets","Nuclear","Solar","Transmission","Wind")metrics <-c("exports","imports","balance")metric_labels <-c("Exports","Imports","Balance")countries <-c("chl","ind","kor","mys","vnm")country_labels <-c("Chile","India","Korea","Malaysia","Vietnam")img <-function(path, caption="", width="100%") {if (file.exists(file.path(ROOT, sub("^\\.\\./","",path))))cat(sprintf('\n<figure style="margin:0 0 16px 0;">\n<img src="%s" style="width:%s;border-radius:4px;" alt="%s">\n%s\n</figure>\n', path, width, caption,if (nzchar(caption)) sprintf('<figcaption style="font-size:.85rem;color:#64748b;margin-top:4px;">%s</figcaption>', caption) else""))}grid_open <-function() cat('\n<div style="display:grid;grid-template-columns:1fr 1fr;gap:20px;align-items:start;">\n')grid_close <-function() cat('\n</div>\n')cat("\n::: {.panel-tabset}\n")cat("\n## All Technologies\n")for (i inseq_along(metrics)) {cat(sprintf("\n### %s\n", metric_labels[i]))img(file.path(BASE,"all_techs",sprintf("pc_%s_all_techs_top20.png", metrics[i])),sprintf("%s — all technologies · top 20 exporters", metric_labels[i]))}cat("\n## By Technology\n")cat("\n::: {.panel-tabset}\n")for (mi inseq_along(metrics)) {cat(sprintf("\n## %s\n", metric_labels[mi]))grid_open()for (ti inseq_along(techs))img(file.path(BASE,"by_tech",sprintf("pc_%s_scatter_%s.png", metrics[mi], techs[ti])), tech_labels[ti])grid_close()}cat("\n:::\n")cat("\n## By Country\n")cat("\n::: {.panel-tabset}\n")for (ci inseq_along(countries)) {cat(sprintf("\n## %s\n", country_labels[ci]))grid_open()for (mi inseq_along(metrics))img(file.path(BASE,"by_country",sprintf("pc_%s_scatter_%s.png", metrics[mi], countries[ci])), metric_labels[mi])grid_close()}cat("\n:::\n")cat("\n## Regression Tables\n")for (tf inlist(list(p=file.path(BASE,"tables","regression_summary.png"), c="Combined regression summary"),list(p=file.path(BASE,"tables","reg_table_exports_all_techs.png"),c="OLS — Exports"),list(p=file.path(BASE,"tables","reg_table_imports_all_techs.png"),c="OLS — Imports"),list(p=file.path(BASE,"tables","reg_table_balance_all_techs.png"),c="OLS — Balance"),list(p=file.path(BASE,"tables","hlm_fixed_effects.png"), c="HLM fixed effects"))) img(tf$p, tf$c)cat("\n:::\n")```---## 2. Interactive Scatter Viewer {#scatter-viewer}```{r scatter-link}#| results: asisscatter_html <-file.path(ROOT, "analysis", "scatterplot", "pc_scatter.html")scatter_thumb <-file.path(ROOT, "analysis", "scatterplot", "all_techs", "pc_exports_all_techs_top20.png")scatter_href <-"pc_scatter.html"# relative to final output at analysis/scatterplot/img_el <-if (file.exists(scatter_thumb)) { tags$img(src = knitr::image_uri(scatter_thumb),alt ="SHAP vs Exports — all technologies preview",style ="width:100%;height:240px;object-fit:cover;display:block;border-bottom:1px solid #dde;")} else { tags$div(style ="height:240px;background:#eef4f0;display:flex;align-items:center;justify-content:center;", tags$span(style ="font-size:13px;color:#999;font-family:Archivo,sans-serif;","\u23F3 Thumbnail not available"))}ef <-file.exists(scatter_html)card <- tags$a(href =if (ef) scatter_href else"#",target ="_blank",style =paste0("text-decoration:none;color:inherit;display:block;max-width:680px;",if (!ef) "pointer-events:none;"else""), tags$div(style ="border:1px solid #dde;border-radius:8px;overflow:hidden; box-shadow:0 3px 10px rgba(0,0,0,.10);margin:8px 0 16px;", tags$div(style ="background:#073309;height:4px;"), img_el, tags$div(style ="padding:12px 16px;", tags$p("SHAP Feature Importance vs Trade Flows — Interactive Scatterplot Viewer",style ="margin:0;font-size:.95rem;font-weight:700;color:#1e293b; font-family:Archivo,sans-serif;"), tags$p(if (ef) "\u2197 Open viewer · ~6 MB · exports / imports / balance \u00d7 % of GDP / % of clean tech trade"else"\u23F3 File not found — run pc_scatter.R to generate",style ="margin:5px 0 0;font-size:.8rem;color:#64748b;font-family:Archivo,sans-serif;"))))tagList(card)```::: {.callout-tip title="How to navigate the viewer"}- **Variable / Unit dropdowns**: Exports / Imports / Balance × % of clean tech trade / % GDP / % total merchandise- **Variable / Unit**: Exports / Imports / Balance × % clean trade / % GDP / % total merchandise- **Year range sliders**: any window within 2018–2024; slide both to a single year for a snapshot- **Country / Technology / Function / SHAP Category** sidebar filters- **Highlight Country**: overlay a single country in amber for direct comparison:::---## 3. Key Findings {#key-findings}```{r ols-prep}reg_stats <-function(df, y_col, group_col="tech") { df |>filter(!is.na(.data[[y_col]]), !is.na(shap_mean_z), is.finite(.data[[y_col]])) |>group_by(.data[[group_col]]) |>filter(n() >=5) |>group_modify(~ { fit <-tryCatch(lm(reformulate("shap_mean_z", y_col), data=.x), error=function(e) NULL)if (is.null(fit)) return(tibble(n=nrow(.x), slope=NA, intercept=NA, r2=NA, pval=NA)) s <-summary(fit); cf <-coef(s); f <- s$fstatisticif (!"shap_mean_z"%in%rownames(cf))return(tibble(n=nrow(fit$model), slope=NA, intercept=NA, r2=NA, pval=NA)) pv <-if (!is.null(f)) pf(f[1],f[2],f[3],lower.tail=FALSE) elseNAtibble(n=nrow(fit$model), slope=cf["shap_mean_z","Estimate"],intercept=cf["(Intercept)","Estimate"], r2=s$r.squared, pval=pv) }) |>ungroup() |>rename(group=1) |>mutate(p_label=case_when(pval<.001~"***",pval<.01~"**",pval<.05~"*",TRUE~"ns"))}df_exp <- df_top |>group_by(tech) |>group_modify(~filter_outliers(.x,"exports_pct")) |>ungroup()df_imp <- df_top |>group_by(tech) |>group_modify(~filter_outliers(.x,"imports_pct")) |>ungroup()df_bal <- df_top |>group_by(tech) |>group_modify(~filter_outliers(.x,"balance_pct")) |>ungroup()rdf_exp <-reg_stats(df_exp |>filter(exports_pct>0), "exports_pct")rdf_imp <-reg_stats(df_imp |>filter(imports_pct>0), "imports_pct")rdf_bal <-reg_stats(df_bal, "balance_pct")n_sig_exp <-sum(rdf_exp$p_label %in%c("*","**","***"), na.rm=TRUE)n_sig_imp <-sum(rdf_imp$p_label %in%c("*","**","***"), na.rm=TRUE)best_r2 <- rdf_exp |>slice_max(r2, n=1, with_ties=FALSE)worst_r2 <- rdf_exp |>slice_min(r2, n=1, with_ties=FALSE)type_order <-c("Raw Material","Processed Material","Process Equipment","Product Component","Final Product")stage_order <-c("Upstream","Midstream","Downstream","Final Product")type_pal <-c("Raw Material"="#8A4200","Processed Material"="#C17A2E","Process Equipment"="#5B7266","Product Component"="#1A8C28","Final Product"="#CB5B1D")stage_pal <-c("Upstream"="#8A4200","Midstream"="#073309","Downstream"="#1A8C28","Final Product"="#CB5B1D")# Generalised slope-by-group computationcompute_slope_by <-function(df, y_col, group_col, levels_vec, use_log=TRUE) { df_f <- df |>filter(!is.na(.data[[y_col]]), is.finite(.data[[y_col]]),!is.na(.data[[group_col]]), .data[[group_col]] %in% levels_vec)if (use_log) df_f <- df_f |>filter(.data[[y_col]] >0) df_f |>group_by(.data[[group_col]]) |>filter(n() >=10) |>group_modify(~ { y_v <-if (use_log) log10(.x[[y_col]]) else .x[[y_col]] fit <-tryCatch(lm(y_v ~ shap_mean_z, data=.x), error=function(e) NULL)if (is.null(fit)) return(tibble(slope=NA_real_,se=NA_real_,pval=NA_real_,n=nrow(.x))) cf <-coef(summary(fit))if (!"shap_mean_z"%in%rownames(cf))return(tibble(slope=NA_real_,se=NA_real_,pval=NA_real_,n=nrow(.x)))tibble(slope=cf["shap_mean_z","Estimate"], se=cf["shap_mean_z","Std. Error"],pval=cf["shap_mean_z","Pr(>|t|)"], n=nrow(.x)) }) |>ungroup() |>rename(group=1) |>mutate(sig=case_when(pval<.001~"***",pval<.01~"**",pval<.05~"*",TRUE~"ns"),group=factor(group, levels=levels_vec))}# Pre-compute slopes for all metricsst_type_exp <-compute_slope_by(df_exp,"exports_pct","type", type_order)st_type_imp <-compute_slope_by(df_imp,"imports_pct","type", type_order)st_type_bal <-compute_slope_by(df_bal,"balance_pct","type", type_order, use_log=FALSE)st_stage_exp <-compute_slope_by(df_exp,"exports_pct","stage",stage_order)st_stage_imp <-compute_slope_by(df_imp,"imports_pct","stage",stage_order)st_stage_bal <-compute_slope_by(df_bal,"balance_pct","stage",stage_order, use_log=FALSE)st_ts_exp <-compute_slope_by(df_exp,"exports_pct","typestage",names(TS_COLORS))st_ts_imp <-compute_slope_by(df_imp,"imports_pct","typestage",names(TS_COLORS))st_ts_bal <-compute_slope_by(df_bal,"balance_pct","typestage",names(TS_COLORS), use_log=FALSE)# Plotting helpersslope_bar <-function(slopes, pal, title, y_lbl="OLS slope") { slopes |>filter(!is.na(slope)) |>ggplot(aes(x=group, y=slope, fill=group)) +geom_col(width=0.65) +geom_errorbar(aes(ymin=slope-se,ymax=slope+se),width=0.25,color="#475569") +geom_text(aes(label=sig, y=slope+sign(slope)*se+sign(slope)*0.02),size=5,family="Archivo",color="#1e293b") +geom_hline(yintercept=0,linewidth=0.5,color="#94a3b8") +scale_fill_manual(values=pal, guide="none") +labs(title=title,subtitle="OLS slope · ±1 SE · year 2023 · *** p<0.001 ** p<0.01 * p<0.05",x=NULL, y=y_lbl) +theme_minimal(base_family="Archivo",base_size=15) +theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN),plot.subtitle=element_text(size=12,color="#475569"),axis.text.x=element_text(size=13),panel.grid.major.x=element_blank(),panel.grid.minor=element_blank())}ts_heatmap <-function(df, y_col, title, use_log=TRUE) { d <- df |>filter(!is.na(.data[[y_col]]),is.finite(.data[[y_col]]),!is.na(type),type %in% type_order,!is.na(stage),stage %in% stage_order)if (use_log) d <- d |>filter(.data[[y_col]]>0) slopes <- d |>group_by(type,stage) |>filter(n()>=10) |>group_modify(~{ y_v <-if (use_log) log10(.x[[y_col]]) else .x[[y_col]] fit <-tryCatch(lm(y_v~shap_mean_z,data=.x),error=function(e) NULL)if (is.null(fit)) return(tibble(slope=NA_real_)) cf <-coef(summary(fit))if (!"shap_mean_z"%in%rownames(cf)) return(tibble(slope=NA_real_))tibble(slope=cf["shap_mean_z","Estimate"]) }) |>ungroup() |>mutate(type=factor(type,levels=type_order),stage=factor(stage,levels=stage_order)) full_grid <- tidyr::expand_grid(type=factor(type_order,levels=type_order),stage=factor(stage_order,levels=stage_order) ) |>left_join(slopes,by=c("type","stage"))ggplot(full_grid,aes(x=stage,y=type,fill=slope)) +geom_tile(color="white",linewidth=0.8) +geom_text(aes(label=if_else(!is.na(slope),sprintf("%.2f",slope),"–")),size=4.5,family="Archivo",color="white",fontface="bold") +scale_fill_gradient2(low="#CB5B1D",mid="#f1f5f9",high=NZ_GREEN,midpoint=0,na.value="#e2e8f0",name="OLS slope") +scale_x_discrete(position="top") +labs(title=title,subtitle="Each cell = OLS slope · log10(trade % GDP) ~ SHAP · grey/dash = <10 obs",x="Stage",y="Product Function") +theme_minimal(base_family="Archivo",base_size=15) +theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN),plot.subtitle=element_text(size=12,color="#475569"),axis.text=element_text(size=13),panel.grid=element_blank())}```**How to read the slope charts (Findings 2–3):** The bar height is the OLS regression coefficient from `log10(trade % GDP) ~ SHAP mean |z|`. A **positive slope** means products the RF assigns higher importance to are also traded more intensively in practice — the ML model and actual trade are aligned. A **negative slope** means higher-SHAP products are traded *less* intensively. For **Product Component**, the strong positive slope (***) confirms that component-level supply chain capability is where the RF model's predictions map most faithfully onto trade reality. For **Final Product**, the negative slope (**) is not a model failure — it reflects that final product HS codes (e.g., solar panels, wind turbines) all receive *uniformly high* SHAP scores since they are adjacent to the competitive outcome the RF predicts. Actual final-product export intensity is driven by industrial policy, scale, and market access; the highest-SHAP final product codes are those most concentrated in 1–2 dominant exporters (China for PV, Denmark for wind), so among the other top-20 exporters that do export these products, SHAP and export intensity are negatively correlated. **Slope magnitude**: in log10 units, a slope of +0.15 corresponds to ~40% more exports per 1 SD increase in SHAP; a slope of −5 signals extreme concentration rather than a continuous gradient.---::: {.panel-tabset}## Exports### Finding E1 — SHAP importance predicts export intensity across most technologies {#f1}In **`r n_sig_exp`** of `r nrow(rdf_exp)` technology-level OLS regressions the SHAP slope on exports (% GDP) is statistically significant. The strongest fit is **`r best_r2$group`** (R² = `r round(best_r2$r2,3)`, `r best_r2$p_label`); the weakest is `r worst_r2$group` (R² = `r round(worst_r2$r2,3)`, `r worst_r2$p_label`). This validates the ML model: products assigned high feature importance tend to be products in which top economies actually trade intensively.```{r f1-plot}#| fig-width: 10#| fig-height: 4.5r2_bar <-function(rdf, title) { rdf |>mutate(group=fct_reorder(group,r2),fill=if_else(p_label%in%c("*","**","***"),NZ_GREEN,"#94a3b8"),label=paste0("R²=",round(r2,3)," ",p_label)) |>ggplot(aes(x=r2,y=group,fill=fill)) +geom_col(width=0.65) +geom_text(aes(label=label,x=r2+0.001),hjust=0,size=4.5,family="Archivo",color="#1e293b") +scale_fill_identity() +scale_x_continuous(expand=expansion(mult=c(0,.2)),labels=function(x)sprintf("%.3f",x)) +labs(title=title,subtitle="Cross-sectional OLS · year 2023 · top-20 exporting economies",x="R²",y=NULL,caption="Green = p<0.05 · Grey = not significant") +theme_minimal(base_family="Archivo",base_size=15) +theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN),plot.subtitle=element_text(size=12,color="#475569"),panel.grid.major.y=element_blank(),panel.grid.minor=element_blank())}r2_bar(rdf_exp, "E1: R² by Technology — Exports % GDP ~ SHAP mean |z|")```### Finding E2 — SHAP slope by product function {#fe2}**Product Components** show the strongest positive slope (***): the ML model's importance scores align most faithfully with trade reality at the component level. **Final Products** show a significant *negative* slope — see interpretation note above.```{r fe2-plot}#| fig-width: 10#| fig-height: 4.5slope_bar(st_type_exp, type_pal, "E2: SHAP→Exports Slope by Product Function")```### Finding E3 — SHAP slope by value-chain stage {#fe3}The SHAP–export relationship strengthens progressively from Upstream to Final Product stage, confirming that stage position captures a dimension of supply-chain position beyond product function alone.```{r fe3-plot}#| fig-width: 9#| fig-height: 4.5slope_bar(st_stage_exp, stage_pal, "E3: SHAP→Exports Slope by Value-Chain Stage")```### Finding E4 — Function × stage interaction heatmap {#fe4}Which specific cells in the value chain show the tightest ML–trade alignment for exports?```{r fe4-plot}#| fig-width: 10#| fig-height: 5ts_heatmap(df_exp, "exports_pct", "E4: SHAP→Exports Slope — Product Function × Stage")```### Finding E5 — High-SHAP products traded more intensively, gap widening {#fe5}```{r fe5-plot}#| fig-width: 10#| fig-height: 5shap_q75 <-quantile(panel_master$shap_mean_z, 0.75, na.rm=TRUE)shap_q25 <-quantile(panel_master$shap_mean_z, 0.25, na.rm=TRUE)panel_master |>filter(exports_pct>0,is.finite(exports_pct),iso3%in%top20_isos) |>mutate(shap_grp=case_when(shap_mean_z>=shap_q75~"High SHAP",shap_mean_z<=shap_q25~"Low SHAP",TRUE~NA_character_)) |>filter(!is.na(shap_grp)) |>group_by(year,shap_grp) |>summarise(med=median(exports_pct,na.rm=TRUE),.groups="drop") |>ggplot(aes(x=year,y=med,color=shap_grp,group=shap_grp)) +geom_line(linewidth=1.2)+geom_point(size=2.5) +scale_y_log10(labels=function(x)paste0(signif(x,2),"%")) +scale_color_manual(values=c("High SHAP"=NZ_GREEN,"Low SHAP"=NZ_ORANGE),name=NULL) +labs(title="E5: High-SHAP Products Traded More Intensively — Gap Widens",subtitle="Median exports % GDP · top/bottom SHAP quartile · 2018–2024",x=NULL,y="Median exports % GDP (log)",caption="top-20 economies") +theme_minimal(base_family="Archivo",base_size=15) +theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN),plot.subtitle=element_text(size=12,color="#475569"),legend.position="top",panel.grid.minor=element_blank())```## Imports### Finding I1 — SHAP predicts import intensity more strongly than exports {#fi1}**`r n_sig_imp`** of `r nrow(rdf_imp)` technology-level import regressions are significant vs **`r n_sig_exp`** for exports. High-SHAP products are frequently capital goods — machinery and process equipment — that countries *import* to build production capacity before developing domestic export competitiveness.```{r fi1-plot}#| fig-width: 10#| fig-height: 4.5r2_bar(rdf_imp, "I1: R² by Technology — Imports % GDP ~ SHAP mean |z|")```### Finding I2 — Import slope by product function {#fi2}For imports, the function slope pattern differs from exports: **Process Equipment** often shows a stronger positive slope because capital goods used *in manufacturing* (the process chain) are imported even before export capacity develops.```{r fi2-plot}#| fig-width: 10#| fig-height: 4.5slope_bar(st_type_imp, type_pal, "I2: SHAP→Imports Slope by Product Function")```### Finding I3 — Import slope by stage {#fi3}Upstream and midstream stages show stronger import–SHAP alignment than for exports, reflecting countries importing inputs they do not yet produce domestically.```{r fi3-plot}#| fig-width: 9#| fig-height: 4.5slope_bar(st_stage_imp, stage_pal, "I3: SHAP→Imports Slope by Value-Chain Stage")```### Finding I4 — Function × stage interaction for imports {#fi4}```{r fi4-plot}#| fig-width: 10#| fig-height: 5ts_heatmap(df_imp, "imports_pct", "I4: SHAP→Imports Slope — Product Function × Stage")```## Net Balance### Finding B1 — SHAP predicts trade balance across technologies {#fb1}A positive slope means countries with higher-SHAP products run a *net surplus* in those products — i.e., the ML model identifies products where competitive advantage translates into actual specialisation.```{r fb1-plot}#| fig-width: 10#| fig-height: 4.5r2_bar(rdf_bal, "B1: R² by Technology — Net Balance % GDP ~ SHAP mean |z|")```### Finding B2 — Balance slope by product function {#fb2}```{r fb2-plot}#| fig-width: 10#| fig-height: 4.5slope_bar(st_type_bal, type_pal, "B2: SHAP→Balance Slope by Product Function", y_lbl="OLS slope (raw balance)")```### Finding B3 — Balance slope by stage {#fb3}```{r fb3-plot}#| fig-width: 9#| fig-height: 4.5slope_bar(st_stage_bal, stage_pal, "B3: SHAP→Balance Slope by Value-Chain Stage", y_lbl="OLS slope (raw balance)")```### Finding B4 — Function × stage interaction for net balance {#fb4}```{r fb4-plot}#| fig-width: 10#| fig-height: 5ts_heatmap(df_bal, "balance_pct", "B4: SHAP→Balance Slope — Product Function × Stage", use_log=FALSE)```:::---### Cross-cutting: GDP size effect {#f-gdp}Larger economies trade more in absolute terms, but after normalising by GDP the relationship is weak — confirming SHAP captures a structural capability signal independent of country scale.```{r f-gdp-plot}#| fig-width: 10#| fig-height: 5ctry_agg <- df_top |>filter(exports_pct>0,is.finite(exports_pct),!is.na(gdp_mil_usd)) |>group_by(iso3,country,region,gdp_mil_usd) |>summarise(med_exp_pct=median(exports_pct,na.rm=TRUE),.groups="drop")REG_PAL <-c("East Asia & Pacific"="#073309","Europe & Central Asia"="#0369a1","North America"="#CB5B1D","South Asia"="#7c3aed","Latin America & Caribbean"="#be185d","Middle East & North Africa"="#b45309","Sub-Saharan Africa"="#15803d","Other"="#374151")ggplot(ctry_agg,aes(x=log(gdp_mil_usd),y=med_exp_pct,color=region,label=iso3)) +geom_smooth(method="lm",se=TRUE,formula=y~x,color="#94a3b8",fill="#e2e8f0",linewidth=0.8,show.legend=FALSE) +geom_point(size=3.5,alpha=0.85) +geom_text_repel(size=3.2,family="Archivo",max.overlaps=12,segment.color="#cbd5e1",show.legend=FALSE) +scale_y_log10(labels=function(x)paste0(signif(x,2),"%")) +scale_color_manual(values=REG_PAL,name="Region",na.value="#94a3b8") +labs(title="Cross-cutting: Country GDP vs Clean-Tech Export Intensity",subtitle="Weak positive slope confirms GDP normalisation controls for country size",x="log(GDP, million USD)",y="Median exports % GDP (log scale)") +theme_minimal(base_family="Archivo",base_size=15) +theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN),plot.subtitle=element_text(size=12,color="#475569"),legend.position="right",panel.grid.minor=element_blank())```---## 4. Regression Analysis {#regression}### 3.1 Cross-Sectional OLS — by TechnologyOne OLS regression per technology, cross-section for `r YEAR`: `log(trade % GDP) ~ SHAP mean |z|`.```{r ols-tables}#| fig-width: 18#| fig-height: 7make_ols_tbl_gg <-function(rdf, title) { tbl <- rdf |>transmute(Technology=group, N=as.character(n),Slope=sprintf("%.4f",slope), R2=sprintf("%.3f",r2),`p-value`=case_when(pval<.001~"<0.001",TRUE~sprintf("%.3f",pval)),Sig.=p_label) nr <-nrow(tbl); nc <-ncol(tbl) tbl_long <- tbl |>mutate(.row=row_number()) |>pivot_longer(-.row, names_to=".col", values_to=".val") |>mutate(.col=factor(.col, levels=colnames(tbl))) hdr <-tibble(.row=0L, .col=factor(colnames(tbl),levels=colnames(tbl)), .val=colnames(tbl)) all_rows <-bind_rows(hdr, tbl_long); all_rows$.row <-as.numeric(all_rows$.row) sig_map <- rdf |>mutate(.row=row_number()) |>select(.row, p_label) row_bg <-tibble(.row=seq_len(nr)) |>left_join(sig_map, by=".row") |>mutate(fill=case_when(p_label=="***"~"#d1fae5",p_label=="**"~"#fef9c3", p_label=="*"~"#dbeafe",.row%%2==0~"#f8fafc",TRUE~"#ffffff"),xmin=-Inf, xmax=Inf, ymin=-(.row)-0.5, ymax=-(.row)+0.5) body_rows <- all_rows |>filter(.row>0) |>left_join(sig_map, by=".row") |>mutate(ff=if_else(!is.na(p_label)&p_label!="ns","bold","plain")) ul_rows <- sig_map |>filter(p_label=="***") p <-ggplot(all_rows, aes(x=as.integer(.col), y=-.row)) +annotate("rect",xmin=-Inf,xmax=Inf,ymin=-0.5,ymax=0.5,fill="#073309") +geom_rect(data=row_bg, aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax,fill=fill),color=NA, inherit.aes=FALSE) +scale_fill_identity() +geom_text(data=filter(all_rows,.row==0), aes(label=.val), color="white",fontface="bold", size=4.5, family="Archivo") +geom_text(data=body_rows, aes(label=.val,fontface=ff), color="#1e293b",size=4.2, family="Archivo") +labs(title=title,caption="■ p<0.001 (***) ■ p<0.01 (**) ■ p<0.05 (*) bold = sig underline = highly sig") +theme_void(base_family="Archivo") +theme(plot.title=element_text(size=13,face="bold",color=NZ_GREEN,hjust=0.5,margin=margin(b=8)),plot.caption=element_text(size=8,color="#64748b",hjust=0.5,margin=margin(t=6)),plot.background=element_rect(fill="#f8fafc",color=NA),plot.margin=margin(12,12,8,12)) +scale_x_continuous(expand=expansion(add=.5)) +scale_y_continuous(expand=expansion(add=.3))if (nrow(ul_rows) >0) { segs <-tibble(y=-(ul_rows$.row)-0.42, xmin=0.6, xmax=nc+0.4) p <- p +geom_segment(data=segs, aes(x=xmin,xend=xmax,y=y,yend=y),color="#065f46", linewidth=0.5, inherit.aes=FALSE) } p}p_e <-make_ols_tbl_gg(rdf_exp, "Exports % GDP")p_i <-make_ols_tbl_gg(rdf_imp, "Imports % GDP")p_b <-make_ols_tbl_gg(rdf_bal, "Net Balance % GDP")print((p_e | p_i | p_b) +plot_layout(guides="collect"))```---### 3.2 Hierarchical Linear Model — Cross-Sectional (`r YEAR`) {#hlm-cross}The cross-sectional HLM pools all countries, technologies, and product types in a single model with the full covariate set. The model controls for product type, value-chain stage, and log-GDP, while allowing the SHAP slope to vary by technology (random slope) and the baseline trade level to vary by country (random intercept):$$\begin{aligned}\log(y_{ict}) &= \underbrace{\beta_0 + \beta_1 \cdot \text{shap}_i + \beta_2 \log\text{GDP}_c}_{\text{core predictors}} \\[6pt]&\quad + \underbrace{\sum_k \gamma_k \cdot \mathbf{1}[\text{type}_i = k] \;+\; \sum_s \delta_s \cdot \mathbf{1}[\text{stage}_i = s]}_{\text{product type \& stage fixed effects}} \\[6pt]&\quad + \underbrace{u_{0c} + u_{0j} + u_{1j} \cdot \text{shap}_i}_{\text{random effects}} \;+\; \varepsilon_{ict}\end{aligned}$$where:- $i$ = HS product, $c$ = country, $j$ = technology- $\gamma_k$ = fixed effect for product type $k$ (Raw Material, Processed Material, Process Equipment, Product Component, Final Product)- $\delta_s$ = fixed effect for value-chain stage $s$ (Upstream, Midstream, Downstream)- $u_{0c} \sim \mathcal{N}(0,\sigma_c^2)$ = **random country intercept** — captures overall trade level not explained by SHAP or GDP- $(u_{0j}, u_{1j}) \sim \mathcal{N}(0, \Sigma_j)$ = **random technology intercept + slope** — captures how much the SHAP–trade relationship differs across value chains```{r hlm-cross}run_hlm_full <-function(df, y_col, log_y=TRUE) {if (!has_lme4) return(NULL)if (!has_lme4) return(NULL) df_m <- df |>filter(!is.na(.data[[y_col]]), is.finite(.data[[y_col]]),!is.na(log_gdp), !is.na(type), !is.na(stage))if (log_y) df_m <- df_m |>filter(.data[[y_col]] >0) |>mutate(y_use =log10(.data[[y_col]])) else df_m <- df_m |>mutate(y_use = .data[[y_col]])if (nrow(df_m) <50) return(NULL)tryCatch( lme4::lmer(y_use ~ shap_mean_z + log_gdp + type + stage + (1+ shap_mean_z | tech) + (1| iso3),data=df_m, REML=FALSE,control=lme4::lmerControl(optimizer="bobyqa")),error=function(e) NULL )}hlm_exp <-run_hlm_full(df_exp, "exports_pct")hlm_imp <-run_hlm_full(df_imp, "imports_pct")hlm_bal <-run_hlm_full(df_bal, "balance_pct", log_y=FALSE)hlm_tidy <-function(fit, metric) {if (is.null(fit)) return(NULL) fe <-as.data.frame(coef(summary(fit))) fe$term <-rownames(fe) fe |>mutate(metric = metric,z_val = Estimate/`Std. Error`,p_wald =2*pnorm(abs(z_val), lower.tail=FALSE),sig =case_when(p_wald<.001~"***",p_wald<.01~"**", p_wald<.05~"*",TRUE~"ns"),Term = term) |>select(Metric=metric, Term, Estimate, SE=`Std. Error`, z=z_val,`p (Wald)`=p_wald, Sig.=sig) |>mutate(across(c(Estimate,SE), ~round(.,4)), z=round(z,2),`p (Wald)`=if_else(`p (Wald)`<.001,"<0.001",sprintf("%.3f",`p (Wald)`)))}hlm_all <-bind_rows(hlm_tidy(hlm_exp, "exports"),hlm_tidy(hlm_imp, "imports"),hlm_tidy(hlm_bal, "balance"))if (has_lme4 &&!is.null(hlm_all) &&nrow(hlm_all) >0) { hlm_all |>kable(align=c("l","l","r","r","r","r","c"), booktabs=TRUE,caption=paste("HLM Fixed Effects —", YEAR,"(controls: log GDP, product type, value-chain stage)")) |>kable_styling(bootstrap_options=c("striped","hover","condensed"), font_size=13) |>row_spec(which(hlm_all$Sig.=="***"), bold=TRUE, background="#d1fae5") |>row_spec(which(hlm_all$Sig.=="**"), bold=TRUE, background="#fef9c3") |>row_spec(which(hlm_all$Sig.=="*"), bold=TRUE, background="#dbeafe")}```#### Coefficient plot — Fixed effects```{r coef-plot}#| fig-width: 11#| fig-height: 7#| fig-cap: "Fixed-effect coefficients from the cross-sectional HLM (exports outcome). Error bars = ±1.96 SE (95% CI). Green = positive & significant; orange = negative & significant; grey = ns."if (has_lme4 &&!is.null(hlm_exp)) { fe_df <-as.data.frame(coef(summary(hlm_exp))) fe_df$term <-rownames(fe_df) fe_df <- fe_df |>mutate(lo = Estimate -1.96*`Std. Error`,hi = Estimate +1.96*`Std. Error`,p =2*pnorm(abs(Estimate/`Std. Error`), lower.tail=FALSE),sig = p <0.05,col =case_when(sig & Estimate >0~ NZ_GREEN, sig & Estimate <0~ NZ_ORANGE,TRUE~"#94a3b8"),term_clean =gsub("type|stage","",term)) |>filter(term !="(Intercept)") |>mutate(term_clean =fct_reorder(term_clean, Estimate))ggplot(fe_df, aes(x=Estimate, xmin=lo, xmax=hi, y=term_clean, color=col)) +geom_vline(xintercept=0, linetype="dashed", color="#94a3b8", linewidth=0.6) +geom_errorbarh(height=0.3, linewidth=0.8) +geom_point(size=3.5) +scale_color_identity() +labs(title="HLM Fixed Effects — Exports % GDP (log10)",subtitle="Coefficients with 95% Wald CI · cross-sectional 2023",x="Coefficient estimate", y=NULL,caption="Green = significant positive · Orange = significant negative · Grey = ns") +theme_minimal(base_family="Archivo", base_size=12) +theme(plot.title=element_text(size=14,face="bold",color=NZ_GREEN),plot.subtitle=element_text(size=11,color="#475569"),panel.grid.major.y=element_blank())}```#### Random slopes by technology```{r caterpillar-tech}#| fig-width: 9#| fig-height: 5#| fig-cap: "Random SHAP slopes by technology (from HLM exports model). Each bar = technology-specific deviation from the global slope. Techs with wider bars show more heterogeneous SHAP-trade relationships."if (has_lme4 &&!is.null(hlm_exp)) { re_tech <-as.data.frame(ranef(hlm_exp, condVar=TRUE)) re_slope <- re_tech |>filter(grpvar=="tech", term=="shap_mean_z") |>mutate(grp =fct_reorder(grp, condval),lo = condval -1.96*condsd,hi = condval +1.96*condsd,col =if_else(condval >0, NZ_GREEN, NZ_ORANGE))ggplot(re_slope, aes(x=condval, xmin=lo, xmax=hi, y=grp, color=col)) +geom_vline(xintercept=0, linetype="dashed", color="#94a3b8") +geom_errorbarh(height=0.35, linewidth=0.9) +geom_point(size=4) +scale_color_identity() +labs(title="Random SHAP Slopes by Technology",subtitle="Deviation from global SHAP slope · 95% CI",x="Random slope deviation (shap_mean_z)", y=NULL) +theme_minimal(base_family="Archivo", base_size=13) +theme(plot.title=element_text(size=14,face="bold",color=NZ_GREEN),plot.subtitle=element_text(size=11,color="#475569"),panel.grid.major.y=element_blank())}```#### Random intercepts by country```{r caterpillar-country}#| fig-width: 9#| fig-height: 7#| fig-cap: "Random intercepts by country. Positive values = country trades more intensively than predicted by SHAP + GDP alone; negative = less than expected."if (has_lme4 &&!is.null(hlm_exp)) { re_ctry <-as.data.frame(ranef(hlm_exp, condVar=TRUE)) re_int <- re_ctry |>filter(grpvar=="iso3", term=="(Intercept)") |>left_join(ctry, by=c("grp"="iso3")) |>mutate(label =coalesce(country, grp),label =fct_reorder(label, condval),lo = condval -1.96*condsd,hi = condval +1.96*condsd,col =if_else(condval >0, NZ_GREEN, NZ_ORANGE))ggplot(re_int, aes(x=condval, xmin=lo, xmax=hi, y=label, color=col)) +geom_vline(xintercept=0, linetype="dashed", color="#94a3b8") +geom_errorbarh(height=0.4, linewidth=0.7) +geom_point(size=2.8) +scale_color_identity() +labs(title="Random Country Intercepts — Exports HLM",subtitle="Deviation from predicted level · controls: SHAP + log GDP + type + stage",x="Random intercept deviation", y=NULL,caption="Positive = more trade than model predicts · Negative = less") +theme_minimal(base_family="Archivo", base_size=11) +theme(plot.title=element_text(size=14,face="bold",color=NZ_GREEN),plot.subtitle=element_text(size=11,color="#475569"),panel.grid.major.y=element_blank())}```#### Variance decomposition```{r vc-decomp}#| fig-width: 9#| fig-height: 4#| fig-cap: "Variance explained by each random-effects component. A large country share means country-level factors dominate unexplained trade variance."if (has_lme4 &&!is.null(hlm_exp)) { vc_list <-lapply(list(hlm_exp, hlm_imp, hlm_bal), function(fit) {if (is.null(fit)) return(NULL) vc <-as.data.frame(VarCorr(fit)) vc |>mutate(vcov=as.numeric(vcov)) |>filter(!is.na(vcov)) |>select(grp, var1, vcov) })names(vc_list) <-c("Exports","Imports","Balance") vc_df <-bind_rows(lapply(names(vc_list), function(nm) { v <- vc_list[[nm]]if (is.null(v)) return(NULL) tot <-sum(v$vcov, na.rm=TRUE) v |>mutate(metric=nm, pct=vcov/tot*100,Component=case_when( grp=="tech"& var1=="(Intercept)"~"Tech intercept", grp=="tech"& var1=="shap_mean_z"~"Tech SHAP slope", grp=="iso3"~"Country intercept", grp=="Residual"~"Residual",TRUE~paste(grp, var1))) })) |>filter(!is.na(metric))ggplot(vc_df, aes(x=metric, y=pct, fill=Component)) +geom_col(width=0.55) +scale_fill_manual(values=c("Tech intercept"="#073309","Tech SHAP slope"="#1A8C28","Country intercept"="#0369a1","Residual"="#cbd5e1" ), name=NULL) +labs(title="Variance Decomposition — Random Effects",subtitle="Share of total variance explained by each component",x=NULL, y="% of total variance") +theme_minimal(base_family="Archivo", base_size=13) +theme(plot.title=element_text(size=14,face="bold",color=NZ_GREEN),legend.position="right")}```#### Random effects variance components table```{r hlm-vc}vc_tidy <-function(fit, metric) {if (is.null(fit)) return(NULL)as.data.frame(VarCorr(fit)) |>mutate(metric=metric) |>select(metric, grp, var1, var2, vcov, sdcor)}vc_all <-bind_rows(vc_tidy(hlm_exp, "exports"),vc_tidy(hlm_imp, "imports"),vc_tidy(hlm_bal, "balance"))if (has_lme4 &&!is.null(vc_all) &&nrow(vc_all) >0) { vc_all |>mutate(vcov=round(vcov,5), sdcor=round(sdcor,4)) |>kable(caption="Random Effects Variance Components — cross-sectional HLM",col.names=c("Metric","Group","Var 1","Var 2","Variance","SD/Corr")) |>kable_styling(bootstrap_options=c("striped","condensed"), font_size=13)}```---### 3.3 Panel Data Model (2018–2024) {#panel-model}The panel HLM extends the cross-sectional model with a centred year trend and the SHAP×Year interaction, testing whether the SHAP–trade alignment has strengthened over time. Log-GDP is included as a time-varying covariate:$$\begin{aligned}\log(y_{ictj}) &= \beta_0 + \beta_1 \text{shap}_i + \beta_2 \tilde{t} + \beta_3 (\text{shap}_i \times \tilde{t}) + \beta_4 \log\text{GDP}_{ct} \\[6pt]&\quad + \sum_k \gamma_k \cdot \mathbf{1}[\text{type}_i=k] + \sum_s \delta_s \cdot \mathbf{1}[\text{stage}_i=s]\\[6pt]&\quad + u_{0c} + u_{0j} + u_{1j} \cdot \text{shap}_i + \varepsilon_{ictj}\end{aligned}$$where $\tilde{t} = (t - \bar{t}) / \text{sd}(t)$ is standardised year; $\beta_3 > 0$ means the SHAP–trade link *tightens over time*.```{r panel-model}#| cache: truerun_panel_hlm <-function(df, y_col, log_y=TRUE) {if (!has_lme4) return(NULL) df_m <- df |>filter(!is.na(.data[[y_col]]), is.finite(.data[[y_col]]),!is.na(log_gdp), !is.na(type), !is.na(stage))if (log_y) df_m <- df_m |>filter(.data[[y_col]] >0) |>mutate(y_use=log10(.data[[y_col]])) else df_m <- df_m |>mutate(y_use=.data[[y_col]])if (nrow(df_m) <100) return(NULL)tryCatch( lme4::lmer(y_use ~ shap_mean_z * year_c + log_gdp + type + stage + (1+ shap_mean_z | tech) + (1| iso3),data=df_m, REML=FALSE,control=lme4::lmerControl(optimizer="bobyqa")),error=function(e) { message(e$message); NULL } )}pm_data_exp <- panel_master |>filter(iso3 %in% top20_isos) |>group_by(tech) |>group_modify(~filter_outliers(.x,"exports_pct")) |>ungroup()pm_data_imp <- panel_master |>filter(iso3 %in% top20_isos) |>group_by(tech) |>group_modify(~filter_outliers(.x,"imports_pct")) |>ungroup()pm_data_bal <- panel_master |>filter(iso3 %in% top20_isos) |>group_by(tech) |>group_modify(~filter_outliers(.x,"balance_pct")) |>ungroup()pn_exp <-run_panel_hlm(pm_data_exp, "exports_pct")pn_imp <-run_panel_hlm(pm_data_imp, "imports_pct")pn_bal <-run_panel_hlm(pm_data_bal, "balance_pct", log_y=FALSE)panel_tidy <-function(fit, metric) {if (is.null(fit)) return(NULL) fe <-as.data.frame(coef(summary(fit))) fe$term <-rownames(fe) fe |>mutate(metric=metric,z_val=Estimate/`Std. Error`,p_wald=2*pnorm(abs(z_val), lower.tail=FALSE),sig=case_when(p_wald<.001~"***",p_wald<.01~"**",p_wald<.05~"*",TRUE~"ns"),Term=case_when( term=="(Intercept)"~"Intercept", term=="shap_mean_z"~"SHAP importance", term=="year_c"~"Year (standardised)", term=="log_gdp"~"log(GDP)", term=="shap_mean_z:year_c"~"SHAP × Year",grepl("^type",term) ~paste0("Type: ", gsub("^type","",term)),grepl("^stage",term) ~paste0("Stage: ", gsub("^stage","",term)),TRUE~ term )) |>select(Metric=metric, Term, Estimate, SE=`Std. Error`, z=z_val,`p (Wald)`=p_wald, Sig.=sig) |>mutate(across(c(Estimate,SE),~round(.,4)), z=round(z,2),`p (Wald)`=if_else(`p (Wald)`<.001,"<0.001",sprintf("%.3f",`p (Wald)`)))}panel_all <-bind_rows(panel_tidy(pn_exp, "exports"),panel_tidy(pn_imp, "imports"),panel_tidy(pn_bal, "balance"))if (has_lme4 &&!is.null(panel_all) &&nrow(panel_all) >0) { panel_all |>kable(align=c("l","l","r","r","r","r","c"), booktabs=TRUE,caption="Panel HLM Fixed Effects — 2018–2024 (controls: log GDP, type, stage)") |>kable_styling(bootstrap_options=c("striped","hover","condensed"), font_size=13) |>row_spec(which(panel_all$Sig.=="***"), bold=TRUE, background="#d1fae5") |>row_spec(which(panel_all$Sig.=="**"), bold=TRUE, background="#fef9c3") |>row_spec(which(panel_all$Sig.=="*"), bold=TRUE, background="#dbeafe")}```**SHAP × Year coefficient:** A positive estimate confirms Finding 7 — the alignment between ML-predicted importance and actual trade intensity has strengthened since 2018. This can reflect (a) supply chains maturing and concentrating around economically efficient producers, (b) improving model fit over time, or (c) both.#### Panel random effects```{r panel-vc}vc_panel <-bind_rows(vc_tidy(pn_exp, "exports"),vc_tidy(pn_imp, "imports"),vc_tidy(pn_bal, "balance"))if (has_lme4 &&!is.null(vc_panel) &&nrow(vc_panel) >0) { vc_panel |>mutate(vcov=round(vcov,5), sdcor=round(sdcor,4)) |>kable(caption="Panel Random Effects Variance Components",col.names=c("Metric","Group","Var 1","Var 2","Variance","SD/Corr")) |>kable_styling(bootstrap_options=c("striped","condensed"), font_size=13)}```---## 5. Key Statistical Highlights {#highlights}```{r highlights}#| results: asis#| echo: false# ── OLS summary stats ─────────────────────────────────────────────────────────n_sig_e <-sum(rdf_exp$p_label %in%c("*","**","***"), na.rm=TRUE)n_sig_i <-sum(rdf_imp$p_label %in%c("*","**","***"), na.rm=TRUE)n_sig_b <-sum(rdf_bal$p_label %in%c("*","**","***"), na.rm=TRUE)best_exp <- rdf_exp |>slice_max(r2, n=1, with_ties=FALSE)best_imp <- rdf_imp |>slice_max(r2, n=1, with_ties=FALSE)best_bal <- rdf_bal |>slice_max(r2, n=1, with_ties=FALSE)med_r2_e <-median(rdf_exp$r2, na.rm=TRUE)med_r2_i <-median(rdf_imp$r2, na.rm=TRUE)cat("### OLS Cross-Sectional Fit (year 2023)\n\n")cat(sprintf("- **%d / %d** technology-level regressions significant for **exports** (p < 0.05); median R² = **%.3f**. Best: **%s** (R² = %.3f, %s).\n", n_sig_e, nrow(rdf_exp), med_r2_e, best_exp$group, best_exp$r2, best_exp$p_label))cat(sprintf("- **%d / %d** significant for **imports**; median R² = **%.3f**. Best: **%s** (R² = %.3f, %s). Imports outperform exports in %d / %d technologies.\n", n_sig_i, nrow(rdf_imp), med_r2_i, best_imp$group, best_imp$r2, best_imp$p_label,sum(rdf_imp$r2 > rdf_exp$r2, na.rm=TRUE), nrow(rdf_exp)))cat(sprintf("- **%d / %d** significant for **net balance**. Best: **%s** (R² = %.3f, %s).\n\n", n_sig_b, nrow(rdf_bal), best_bal$group, best_bal$r2, best_bal$p_label))cat(sprintf("> **What this tells us:** SHAP feature importance is a significant cross-sectional predictor of trade intensity in **%d of %d** export regressions and **%d of %d** import regressions, indicating the ML signal captures real patterns in observed trade specialisation rather than statistical noise. Technologies with high R\u00b2 (e.g. **%s**) show the tightest coupling between ML-predicted production-readiness and observed trade flows. Where import predictability equals or exceeds export predictability, SHAP is detecting countries that *import* precisely because domestic productive capability is absent — the signal encodes both the presence and the absence of competitive advantage.\n\n", n_sig_e, nrow(rdf_exp), n_sig_i, nrow(rdf_imp), best_exp$group[1]))# ── HLM fixed effects ─────────────────────────────────────────────────────────if (has_lme4 &&!is.null(hlm_exp)) { fe <-as.data.frame(coef(summary(hlm_exp))) fe$term <-rownames(fe) fe <- fe |>mutate(z = Estimate /`Std. Error`,pval =2*pnorm(abs(z), lower.tail=FALSE),sig =case_when(pval<.001~"***",pval<.01~"**",pval<.05~"*",TRUE~"ns") ) get_fe <-function(t) fe[fe$term==t, ] shap_fe <-get_fe("shap_mean_z") gdp_fe <-get_fe("log_gdp") type_sig <- fe |>filter(grepl("^type", term), sig!="ns") |>mutate(lbl=paste0(gsub("^type","",term)," (",sprintf("%+.3f",Estimate),", ",sig,")")) stage_sig <- fe |>filter(grepl("^stage", term), sig!="ns") |>mutate(lbl=paste0(gsub("^stage","",term)," (",sprintf("%+.3f",Estimate),", ",sig,")"))cat("### HLM Fixed Effects — Exports (cross-sectional 2023)\n\n")if (nrow(shap_fe)>0&&!is.na(shap_fe$Estimate[1]))cat(sprintf("- **SHAP importance**: β = **%.4f** (%s). A 1-SD increase in SHAP is associated with a **%.0f%% change** in exports %% GDP.\n", shap_fe$Estimate[1], shap_fe$sig[1], (10^shap_fe$Estimate[1] -1)*100))if (nrow(gdp_fe)>0&&!is.na(gdp_fe$Estimate[1]))cat(sprintf("- **log(GDP)**: β = **%.4f** (%s). Each doubling of GDP associates with a %.0f%% %s in export intensity — GDP normalisation does not fully eliminate scale effects.\n", gdp_fe$Estimate[1], gdp_fe$sig[1],abs((10^(gdp_fe$Estimate[1]*log10(2)) -1)*100),ifelse(gdp_fe$Estimate[1]>0, "increase", "decrease")))if (nrow(type_sig)>0)cat(sprintf("- **Significant product function effects**: %s.\n", paste(type_sig$lbl, collapse="; ")))if (nrow(stage_sig)>0)cat(sprintf("- **Significant stage effects**: %s.\n\n", paste(stage_sig$lbl, collapse="; ")))elsecat("\n")cat(sprintf("> **What this tells us:** The HLM estimates the SHAP coefficient after absorbing country random intercepts and technology random slopes, so β is not inflated by dominant exporters (e.g. China in solar, Germany in wind turbines). A **%s** SHAP β confirms that ML-predicted feature importance is a globally valid predictor of export intensity independent of who-you-are and which-value-chain-you-are-in. The persistent log(GDP) coefficient means scale effects are not fully removed by per-GDP normalisation — larger economies export disproportionately more on a relative basis. Significant product function and stage dummies reveal that certain value-chain positions carry structural trade premiums or penalties beyond what SHAP alone predicts.\n\n",ifelse(nrow(shap_fe)>0&&!is.na(shap_fe$Estimate[1]) && shap_fe$Estimate[1]>0, "positive", "negative")))# ── Variance decomposition ───────────────────────────────────────────────── vc <-as.data.frame(VarCorr(hlm_exp)) |>mutate(vcov=as.numeric(vcov)) |>filter(!is.na(vcov)) tot <-sum(vc$vcov) pct_v <-function(grp, v1=NA) { row <-if (is.na(v1)) vc[vc$grp==grp,]else vc[vc$grp==grp &!is.na(vc$var1) & vc$var1==v1,]if (nrow(row)==0) return(NA)round(row$vcov[1]/tot*100, 1) } p_ctry <-pct_v("iso3") p_tech_int <-pct_v("tech","(Intercept)") p_tech_slp <-pct_v("tech","shap_mean_z") p_resid <-pct_v("Residual")cat("### Variance Decomposition (Exports HLM)\n\n")if (!is.na(p_ctry))cat(sprintf("- **Country intercepts**: **%.1f%%** of total variance — country-level factors (policy, history, institutions) dominate unexplained variation.\n", p_ctry))if (!is.na(p_tech_int))cat(sprintf("- **Technology intercepts**: **%.1f%%** — baseline trade levels differ markedly across value chains.\n", p_tech_int))if (!is.na(p_tech_slp))cat(sprintf("- **Technology SHAP slopes**: **%.1f%%** — the SHAP–trade alignment itself varies across technologies (justifies random-slope specification).\n", p_tech_slp))if (!is.na(p_resid))cat(sprintf("- **Residual**: **%.1f%%** — within-country, within-technology product-level variation unexplained by any covariate.\n\n", p_resid))cat(sprintf("> **What this tells us:** The variance partition reveals *what drives differences in trade intensity that SHAP cannot explain*. Country intercepts absorbing the largest share (here **%.1f%%**) means that national-level factors — industrial policy, infrastructure, historical path-dependence, institutional capacity — are the dominant residual driver of trade intensity differences, dwarfing product-level ML scores. Technology intercepts (%.1f%%) reflect that some value chains (solar, wind) are globally traded at large volumes while others (nuclear, geothermal) are structurally thinner markets. The random-slope variance for SHAP (%.1f%%) confirms that the SHAP→trade relationship is heterogeneous across technologies, justifying the flexible specification. The residual (%.1f%%) captures product-level noise that no macro-level covariate can reach.\n\n",ifelse(is.na(p_ctry), 0, p_ctry),ifelse(is.na(p_tech_int), 0, p_tech_int),ifelse(is.na(p_tech_slp), 0, p_tech_slp),ifelse(is.na(p_resid), 0, p_resid)))}# ── Panel: SHAP × Year ────────────────────────────────────────────────────────if (has_lme4 &&!is.null(pn_exp)) { pfe <-as.data.frame(coef(summary(pn_exp))) pfe$term <-rownames(pfe) pfe <- pfe |>mutate(z=Estimate/`Std. Error`, pval=2*pnorm(abs(z),lower.tail=FALSE),sig=case_when(pval<.001~"***",pval<.01~"**",pval<.05~"*",TRUE~"ns")) shap_yr <- pfe[pfe$term=="shap_mean_z:year_c",] shap_p <- pfe[pfe$term=="shap_mean_z",] yr_p <- pfe[pfe$term=="year_c",]cat("### Panel HLM — Temporal Dynamics (2018–2024)\n\n")if (nrow(shap_p)>0)cat(sprintf("- **SHAP importance** (main): β = **%.4f** (%s).\n", shap_p$Estimate, shap_p$sig))if (nrow(yr_p)>0&&!is.na(yr_p$Estimate[1]))cat(sprintf("- **Year trend**: β = **%.4f** (%s) — trade intensity is %s over 2018–2024 conditional on SHAP.\n", yr_p$Estimate[1], yr_p$sig[1], ifelse(yr_p$Estimate[1]>0, "rising", "falling")))if (nrow(shap_yr)>0&&!is.na(shap_yr$Estimate[1])) {cat(sprintf("- **SHAP × Year**: β = **%.4f** (%s) — the SHAP–trade link is **%s**: high-SHAP products are becoming %s traded relative to low-SHAP products year-on-year.\n\n", shap_yr$Estimate[1], shap_yr$sig[1],ifelse(shap_yr$Estimate[1]>0, "tightening", "loosening"),ifelse(shap_yr$Estimate[1]>0, "more intensively", "less intensively"))) } elsecat("\n") trend_word <-ifelse(nrow(yr_p)>0&&!is.na(yr_p$Estimate[1]) && yr_p$Estimate[1]>0, "expanding", "contracting") align_word <-ifelse(nrow(shap_yr)>0&&!is.na(shap_yr$Estimate[1]) && shap_yr$Estimate[1]>0, "tightening", "loosening")cat(sprintf("> **What this tells us:** The panel model tests whether the SHAP–trade alignment has *changed* over 2018–2024, going beyond the cross-sectional snapshot. Clean technology trade intensity is overall **%s** during this period (conditional on SHAP), reflecting the global acceleration in deployment and traded volumes. The SHAP × Year interaction is **%s**: the relationship between ML-predicted production capability and observed trade intensity is %s over time. A tightening interaction means the market is becoming progressively more meritocratic — countries with higher production-readiness are capturing growing trade shares as the energy transition accelerates. A loosening interaction would imply geopolitical fragmentation, industrial policy distortions, or early-mover lock-in are beginning to decouple competitiveness signals from actual trade outcomes.\n\n", trend_word, align_word, align_word))}```---## 6. Model Comparison (ANOVA / Likelihood Ratio Tests) {#anova}Nested model comparison via likelihood ratio test isolates the contribution of each additional component: SHAP only → + GDP → + type/stage → + random slope. A significant χ² indicates the added terms improve model fit beyond chance.$$\begin{aligned}M_0 &: y \sim 1 + (1 \mid \text{iso3}) & &\text{(null)} \\[4pt]M_1 &: y \sim \text{shap} + (1 \mid \text{iso3}) & &\text{(+ SHAP)} \\[4pt]M_2 &: y \sim \text{shap} + \log\text{GDP} + (1 \mid \text{iso3}) & &\text{(+ log GDP)} \\[4pt]M_3 &: y \sim \text{shap} + \log\text{GDP} + \text{type} + \text{stage} + (1 \mid \text{iso3}) & &\text{(+ type \& stage FE)} \\[4pt]M_4 &: y \sim \text{shap} + \log\text{GDP} + \text{type} + \text{stage} + (1 + \text{shap} \mid \text{tech}) + (1 \mid \text{iso3}) & &\text{(full HLM)}\end{aligned}$$```{r anova}#| fig-width: 9#| fig-height: 5if (has_lme4) { df_m_anova <- df_exp |>filter(exports_pct >0, is.finite(exports_pct),!is.na(log_gdp), !is.na(type), !is.na(stage)) |>mutate(y_use =log10(exports_pct)) m0 <-tryCatch(lme4::lmer(y_use ~1+ (1|iso3), data=df_m_anova, REML=FALSE,control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) m1 <-tryCatch(lme4::lmer(y_use ~ shap_mean_z + (1|iso3), data=df_m_anova, REML=FALSE,control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) m2 <-tryCatch(lme4::lmer(y_use ~ shap_mean_z + log_gdp + (1|iso3), data=df_m_anova, REML=FALSE,control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) m3 <-tryCatch(lme4::lmer(y_use ~ shap_mean_z + log_gdp + type + stage + (1|iso3),data=df_m_anova, REML=FALSE,control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) m4 <-tryCatch(lme4::lmer(y_use ~ shap_mean_z + log_gdp + type + stage + (1+ shap_mean_z | tech) + (1| iso3),data=df_m_anova, REML=FALSE,control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) models_ok <-!any(sapply(list(m0,m1,m2,m3,m4), is.null))if (models_ok) { av <-anova(m0, m1, m2, m3, m4) av_df <-as.data.frame(av) av_df$Model <-c("M0: Null","M1: +SHAP","M2: +logGDP","M3: +Type/Stage","M4: +RandSlope") av_df |>select(Model, any_of(c("Df","npar")), AIC, BIC, logLik, Chisq,`p-value`=`Pr(>Chisq)`) |>mutate(AIC=round(AIC,1), BIC=round(BIC,1), logLik=round(logLik,1),Chisq=round(Chisq,2),`p-value`=case_when(is.na(`p-value`) ~"—",`p-value`<.001~"<0.001",TRUE~sprintf("%.3f",`p-value`))) |>kable(caption="Model Comparison — Likelihood Ratio Tests (Exports % GDP, cross-sectional 2023)",align=c("l","r","r","r","r","r","r","r")) |>kable_styling(bootstrap_options=c("striped","hover","condensed"), font_size=13) |>row_spec(which(av_df$`Pr(>Chisq)`<0.05&!is.na(av_df$`Pr(>Chisq)`)),bold=TRUE, background="#d1fae5") } else {cat("> One or more models failed to converge — ANOVA table not available.\n") }} else {cat("> `lme4` not available — run `renv::install('lme4')` then re-render.\n")}```**Interpreting the ANOVA table:** Each row tests whether adding the new terms significantly improves fit over the previous model. Significant χ² for M1 validates SHAP as a predictor; M2 shows whether GDP adds explanatory power on top of SHAP; M3 tests whether product type/stage structure matters beyond GDP; M4 tests whether allowing the SHAP slope to vary by technology (random slope) is warranted by the data.---## 7. Time Trajectories {#trajectories}SHAP feature importance is **fixed per (product × technology)** — it reflects average predictive weight, not a time-varying signal. Trajectories therefore appear as **vertical paths** in the scatter: horizontal position (SHAP) is pinned, vertical position (trade intensity) moves year to year.```{r traj-solar}#| fig-width: 12#| fig-height: 7#| fig-cap: "Solar — exports % GDP trajectories (2018–2024). Each path = one country × product; dot size/opacity increases with year."solar_panel <- panel_master |>filter(tech=="Solar", iso3 %in% top20_isos, exports_pct >0, is.finite(exports_pct)) |>group_by(tech) |>group_modify(~filter_outliers(.x, "exports_pct")) |>ungroup() |>arrange(iso3, hs_code, year)COUNTRY_PAL <-c("#073309","#CB5B1D","#0369a1","#7c3aed","#be185d","#b45309","#dc2626","#15803d","#0891b2","#374151","#065f46","#92400e","#1e40af","#4c1d95","#881337")ctry_show <- solar_panel |>group_by(iso3, country) |>summarise(tot=sum(exports_pct,na.rm=TRUE),.groups="drop") |>slice_max(tot,n=15) |>pull(iso3)sol_plot <- solar_panel |>filter(iso3 %in% ctry_show)ggplot(sol_plot, aes(x=shap_mean_z, y=exports_pct,group=interaction(iso3,hs_code), color=country)) +geom_path(alpha=0.45, linewidth=0.5, linetype="dotted") +geom_point(aes(size=year, alpha=year)) +scale_size_continuous(range=c(1.5,3.5), guide="none") +scale_alpha_continuous(range=c(0.45,1), guide="none") +scale_color_manual(values=setNames(COUNTRY_PAL, unique(sol_plot$country)),name="Country") +scale_y_log10(labels=function(x) paste0(signif(x,3),"%")) +labs(title="Solar — SHAP Importance vs Export Intensity Trajectories (2018–2024)",subtitle="Each dotted path = one country × product · dot size/opacity increases with year (darker = 2024)",x="SHAP Feature Importance (mean |z|)",y="Exports % of GDP (log scale)",caption="Top 15 Solar exporters by cumulative exports % GDP · IQR outlier filter applied") +theme_minimal(base_family="Archivo", base_size=13) +theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN),plot.subtitle=element_text(size=12,color="#475569"),legend.position="right")```---## 8. Special Case: Solar — India {#solar-india}```{r solar-india}#| fig-width: 12#| fig-height: 8#| fig-cap: "India — Solar value chain SHAP vs Exports (% GDP), 2023."ind_solar <- master |>filter(iso3=="IND", tech=="Solar", exports_pct >0, is.finite(exports_pct))ind_solar_clean <-filter_outliers(ind_solar, "exports_pct")ts_map <-c(setNames(ifelse(names(TS_COLORS) %in%unique(ind_solar_clean$typestage), TS_COLORS[names(TS_COLORS)], TS_COLORS),names(TS_COLORS)), Unclassified="#94a3b8")SHAPE_MAP <-c(Machinery=16, Chemicals=17, Metals=15,Electronics=18, `Industrial Materials`=8, Other=3)ggplot(ind_solar_clean,aes(x=shap_mean_z, y=exports_pct, color=typestage, shape=category)) +geom_hline(yintercept=median(ind_solar_clean$exports_pct,na.rm=TRUE),linetype="dashed", color="#94a3b8", linewidth=0.5) +geom_smooth(method="lm", se=TRUE, formula=y~x, color=NZ_ORANGE,fill="#FBBF24", alpha=0.18, linewidth=1.2, show.legend=FALSE) +geom_point(size=3.5, alpha=0.82, stroke=0.4) +geom_text_repel(data=ind_solar_clean |>slice_max(abs(exports_pct), n=18),aes(label=paste0(hs_code," ",substr(prod_name,1,25))),size=3.0, lineheight=0.85, max.overlaps=16,segment.color="#94a3b8", segment.size=0.3, alpha=1,show.legend=FALSE, family="Archivo" ) +scale_color_manual(values=ts_map, name="Production Stage | Function", drop=TRUE) +scale_shape_manual(values=SHAPE_MAP[intersect(names(SHAPE_MAP),unique(ind_solar_clean$category))],name="SHAP Category", drop=TRUE) +scale_y_log10(labels=function(x) paste0(signif(x,3),"%")) +labs(title="India — Solar Value Chain: SHAP Feature Importance vs Export Intensity",subtitle=paste0("Year: ",YEAR," · Each point = one HS product code · ","Dashed line = median export intensity · ","Orange line = OLS fit (log-linear)"),x="SHAP Feature Importance (mean |z|)",y="Exports % of GDP (log scale)",caption="Source: BACI bilateral trade · NZIPL Green Dictionary · ML competitiveness model (Ishana Ratan, NZIPL)" ) +theme_minimal(base_family="Archivo", base_size=13) +theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN,margin=margin(b=6)),plot.subtitle=element_text(size=12,color="#475569",margin=margin(b=10)),plot.caption=element_text(size=9,color="#94a3b8",hjust=0),legend.position="right",panel.grid.major=element_line(color="#e2e8f0"),panel.grid.minor=element_blank() )```India's Solar scatter illustrates the **Processing → Final Product gradient** clearly: polysilicon and silicon wafer products (upstream, Raw Material · Upstream) cluster at low SHAP scores and low exports, while solar cell and module assembly inputs (Product Component · Downstream) are both higher-SHAP and higher-export. This is consistent with India's current industrial position — strong in downstream assembly, still building upstream processing capacity.