SHAP Feature Importance vs Trade Flows

Scatterplot Analysis · Regression Results · Random Effects · Key Findings

Author

Oriol Vallès Codina — Net Zero Industrial Policy Lab, Johns Hopkins SAIS

Published

April 10, 2026

Overview

This document analyses the relationship between SHAP feature importance — the predictive weight each HS product code receives in the ML clean-tech competitiveness model — and observed bilateral trade flows (exports, imports, net balance as % of GDP). The central question: do products the Random Forest identifies as predictive of competitiveness correspond to higher actual trade intensity?

The analysis covers the top 20 exporting economies, 10 clean technologies, and 189 HS product codes mapped through the NZIPL Green Dictionary. Models range from simple OLS by technology to hierarchical linear models (HLM) with random slopes and intercepts, controlling for product type, value-chain stage, country GDP, and time trend.

1. Scatterplot Gallery

All static PNG outputs from analysis/scatterplot/ organised by grouping.

Exports

Imports

Balance

2. Interactive Scatter Viewer

SHAP vs Exports — all technologies preview

SHAP Feature Importance vs Trade Flows — Interactive Scatterplot Viewer

↗ Open viewer · ~6 MB · exports / imports / balance × % of GDP / % of clean tech trade

How to navigate the viewer

Variable / Unit dropdowns: Exports / Imports / Balance × % of clean tech trade / % GDP / % total merchandise
Variable / Unit: Exports / Imports / Balance × % clean trade / % GDP / % total merchandise
Year range sliders: any window within 2018–2024; slide both to a single year for a snapshot
Country / Technology / Function / SHAP Category sidebar filters
Highlight Country: overlay a single country in amber for direct comparison

3. Key Findings

How to read the slope charts (Findings 2–3): The bar height is the OLS regression coefficient from log10(trade % GDP) ~ SHAP mean |z|. A positive slope means products the RF assigns higher importance to are also traded more intensively in practice — the ML model and actual trade are aligned. A negative slope means higher-SHAP products are traded less intensively. For Product Component, the strong positive slope () confirms that component-level supply chain capability is where the RF model’s predictions map most faithfully onto trade reality. For Final Product, the negative slope () is not a model failure — it reflects that final product HS codes (e.g., solar panels, wind turbines) all receive uniformly high* SHAP scores since they are adjacent to the competitive outcome the RF predicts. Actual final-product export intensity is driven by industrial policy, scale, and market access; the highest-SHAP final product codes are those most concentrated in 1–2 dominant exporters (China for PV, Denmark for wind), so among the other top-20 exporters that do export these products, SHAP and export intensity are negatively correlated. Slope magnitude: in log10 units, a slope of +0.15 corresponds to ~40% more exports per 1 SD increase in SHAP; a slope of −5 signals extreme concentration rather than a continuous gradient.

Finding E1 — SHAP importance predicts export intensity across most technologies

In 5 of 10 technology-level OLS regressions the SHAP slope on exports (% GDP) is statistically significant. The strongest fit is Heat Pumps (R² = 0.047, ***); the weakest is Geothermal (R² = 0, ns). This validates the ML model: products assigned high feature importance tend to be products in which top economies actually trade intensively.

Finding E2 — SHAP slope by product function

Product Components show the strongest positive slope (*): the ML model’s importance scores align most faithfully with trade reality at the component level. Final Products** show a significant negative slope — see interpretation note above.

Finding E3 — SHAP slope by value-chain stage

The SHAP–export relationship strengthens progressively from Upstream to Final Product stage, confirming that stage position captures a dimension of supply-chain position beyond product function alone.

Finding E4 — Function × stage interaction heatmap

Which specific cells in the value chain show the tightest ML–trade alignment for exports?

Finding E5 — High-SHAP products traded more intensively, gap widening

Finding I1 — SHAP predicts import intensity more strongly than exports

7 of 10 technology-level import regressions are significant vs 5 for exports. High-SHAP products are frequently capital goods — machinery and process equipment — that countries import to build production capacity before developing domestic export competitiveness.

Finding I2 — Import slope by product function

For imports, the function slope pattern differs from exports: Process Equipment often shows a stronger positive slope because capital goods used in manufacturing (the process chain) are imported even before export capacity develops.

Finding I3 — Import slope by stage

Upstream and midstream stages show stronger import–SHAP alignment than for exports, reflecting countries importing inputs they do not yet produce domestically.

Finding I4 — Function × stage interaction for imports

Finding B1 — SHAP predicts trade balance across technologies

A positive slope means countries with higher-SHAP products run a net surplus in those products — i.e., the ML model identifies products where competitive advantage translates into actual specialisation.

Finding B2 — Balance slope by product function

Finding B3 — Balance slope by stage

Finding B4 — Function × stage interaction for net balance

Cross-cutting: GDP size effect

Larger economies trade more in absolute terms, but after normalising by GDP the relationship is weak — confirming SHAP captures a structural capability signal independent of country scale.

4. Regression Analysis

3.1 Cross-Sectional OLS — by Technology

One OLS regression per technology, cross-section for 2023: log(trade % GDP) ~ SHAP mean |z|.

3.2 Hierarchical Linear Model — Cross-Sectional (2023)

The cross-sectional HLM pools all countries, technologies, and product types in a single model with the full covariate set. The model controls for product type, value-chain stage, and log-GDP, while allowing the SHAP slope to vary by technology (random slope) and the baseline trade level to vary by country (random intercept):

\[ \begin{aligned} \log(y_{ict}) &= \underbrace{\beta_0 + \beta_1 \cdot \text{shap}_i + \beta_2 \log\text{GDP}_c}_{\text{core predictors}} \\[6pt] &\quad + \underbrace{\sum_k \gamma_k \cdot \mathbf{1}[\text{type}_i = k] \;+\; \sum_s \delta_s \cdot \mathbf{1}[\text{stage}_i = s]}_{\text{product type \& stage fixed effects}} \\[6pt] &\quad + \underbrace{u_{0c} + u_{0j} + u_{1j} \cdot \text{shap}_i}_{\text{random effects}} \;+\; \varepsilon_{ict} \end{aligned} \]

where:

$i$ = HS product, $c$ = country, $j$ = technology
$\gamma_k$ = fixed effect for product type $k$ (Raw Material, Processed Material, Process Equipment, Product Component, Final Product)
$\delta_s$ = fixed effect for value-chain stage $s$ (Upstream, Midstream, Downstream)
$u_{0c} \sim \mathcal{N}(0,\sigma_c^2)$ = random country intercept — captures overall trade level not explained by SHAP or GDP
$(u_{0j}, u_{1j}) \sim \mathcal{N}(0, \Sigma_j)$ = random technology intercept + slope — captures how much the SHAP–trade relationship differs across value chains

HLM Fixed Effects — 2023 (controls: log GDP, product type, value-chain stage)
	Metric	Term	Estimate	SE	z	p (Wald)	Sig.
(Intercept)...1	exports	(Intercept)	-4.0352	0.6672	-6.05	<0.001	***
shap_mean_z...2	exports	shap_mean_z	1.2763	0.6421	1.99	0.047	*
log_gdp...3	exports	log_gdp	0.0274	0.0385	0.71	0.476	ns
typeElectronics...4	exports	typeElectronics	0.3530	0.2401	1.47	0.142	ns
typeFinal Product...5	exports	typeFinal Product	0.0188	0.1662	0.11	0.910	ns
typeMachinery...6	exports	typeMachinery	-0.1384	0.1746	-0.79	0.428	ns
typeMetals...7	exports	typeMetals	0.1604	0.1740	0.92	0.357	ns
typeProcess Equipment...8	exports	typeProcess Equipment	0.0871	0.1266	0.69	0.491	ns
typeProcessed Material...9	exports	typeProcessed Material	-0.1476	0.1319	-1.12	0.263	ns
typeProduct Component...10	exports	typeProduct Component	0.0690	0.1259	0.55	0.584	ns
typeRaw Material...11	exports	typeRaw Material	-0.5103	0.1316	-3.88	<0.001	***
stageMidstream...12	exports	stageMidstream	-0.0461	0.0418	-1.10	0.270	ns
(Intercept)...13	imports	(Intercept)	-2.8169	0.4738	-5.95	<0.001	***
shap_mean_z...14	imports	shap_mean_z	1.4519	0.6713	2.16	0.031	*
log_gdp...15	imports	log_gdp	-0.0375	0.0180	-2.08	0.038	*
typeElectronics...16	imports	typeElectronics	0.5066	0.2023	2.50	0.012	*
typeFinal Product...17	imports	typeFinal Product	-0.1857	0.1430	-1.30	0.194	ns
typeMachinery...18	imports	typeMachinery	-0.0630	0.1426	-0.44	0.659	ns
typeMetals...19	imports	typeMetals	0.1215	0.1465	0.83	0.407	ns
typeProcess Equipment...20	imports	typeProcess Equipment	0.1010	0.1036	0.97	0.330	ns
typeProcessed Material...21	imports	typeProcessed Material	-0.2018	0.1079	-1.87	0.062	ns
typeProduct Component...22	imports	typeProduct Component	0.0699	0.1031	0.68	0.498	ns
typeRaw Material...23	imports	typeRaw Material	-0.4325	0.1078	-4.01	<0.001	***
stageMidstream...24	imports	stageMidstream	-0.0448	0.0338	-1.33	0.184	ns
(Intercept)...25	balance	(Intercept)	-0.0161	0.0046	-3.53	<0.001	***
shap_mean_z...26	balance	shap_mean_z	-0.0076	0.0041	-1.84	0.065	ns
log_gdp...27	balance	log_gdp	0.0013	0.0003	4.90	<0.001	***
typeElectronics...28	balance	typeElectronics	-0.0050	0.0024	-2.04	0.041	*
typeFinal Product...29	balance	typeFinal Product	-0.0004	0.0018	-0.25	0.806	ns
typeMachinery...30	balance	typeMachinery	-0.0001	0.0018	-0.07	0.948	ns
typeMetals...31	balance	typeMetals	-0.0018	0.0018	-0.99	0.321	ns
typeProcess Equipment...32	balance	typeProcess Equipment	-0.0009	0.0013	-0.69	0.490	ns
typeProcessed Material...33	balance	typeProcessed Material	-0.0001	0.0013	-0.05	0.959	ns
typeProduct Component...34	balance	typeProduct Component	-0.0016	0.0013	-1.21	0.226	ns
typeRaw Material...35	balance	typeRaw Material	-0.0010	0.0014	-0.76	0.446	ns
stageMidstream...36	balance	stageMidstream	0.0000	0.0004	0.11	0.911	ns

Coefficient plot — Fixed effects

Fixed-effect coefficients from the cross-sectional HLM (exports outcome). Error bars = ±1.96 SE (95% CI). Green = positive & significant; orange = negative & significant; grey = ns.

Random slopes by technology

Random SHAP slopes by technology (from HLM exports model). Each bar = technology-specific deviation from the global slope. Techs with wider bars show more heterogeneous SHAP-trade relationships.

Random intercepts by country

Variance decomposition

Variance explained by each random-effects component. A large country share means country-level factors dominate unexplained trade variance.

Random effects variance components table

Random Effects Variance Components — cross-sectional HLM
Metric	Group	Var 1	Var 2	Variance	SD/Corr
exports	iso3	(Intercept)	NA	0.09185	0.3031
exports	tech	(Intercept)	NA	1.08154	1.0400
exports	tech	shap_mean_z	NA	3.20124	1.7892
exports	tech	(Intercept)	shap_mean_z	-1.82582	-0.9812
exports	Residual	NA	NA	0.58346	0.7638
imports	iso3	(Intercept)	NA	0.02095	0.1447
imports	tech	(Intercept)	NA	1.29452	1.1378
imports	tech	shap_mean_z	NA	3.85837	1.9643
imports	tech	(Intercept)	shap_mean_z	-2.19926	-0.9841
imports	Residual	NA	NA	0.50662	0.7118
balance	iso3	(Intercept)	NA	0.00000	0.0022
balance	tech	(Intercept)	NA	0.00003	0.0052
balance	tech	shap_mean_z	NA	0.00009	0.0096
balance	tech	(Intercept)	shap_mean_z	-0.00005	-0.9827
balance	Residual	NA	NA	0.00008	0.0089

3.3 Panel Data Model (2018–2024)

The panel HLM extends the cross-sectional model with a centred year trend and the SHAP×Year interaction, testing whether the SHAP–trade alignment has strengthened over time. Log-GDP is included as a time-varying covariate:

\[ \begin{aligned} \log(y_{ictj}) &= \beta_0 + \beta_1 \text{shap}_i + \beta_2 \tilde{t} + \beta_3 (\text{shap}_i \times \tilde{t}) + \beta_4 \log\text{GDP}_{ct} \\[6pt] &\quad + \sum_k \gamma_k \cdot \mathbf{1}[\text{type}_i=k] + \sum_s \delta_s \cdot \mathbf{1}[\text{stage}_i=s] \\[6pt] &\quad + u_{0c} + u_{0j} + u_{1j} \cdot \text{shap}_i + \varepsilon_{ictj} \end{aligned} \]

where $\tilde{t} = (t - \bar{t}) / \text{sd}(t)$ is standardised year; $\beta_3 > 0$ means the SHAP–trade link tightens over time.

Panel HLM Fixed Effects — 2018–2024 (controls: log GDP, type, stage)
	Metric	Term	Estimate	SE	z	p (Wald)	Sig.
(Intercept)...1	exports	Intercept	-3.4296	0.5586	-6.14	<0.001	***
shap_mean_z...2	exports	SHAP importance	1.0400	0.6172	1.69	0.092	ns
year_c...3	exports	Year (standardised)	-0.0299	0.0412	-0.73	0.468	ns
log_gdp...4	exports	log(GDP)	-0.0068	0.0306	-0.22	0.824	ns
typeElectronics...5	exports	Type: Electronics	0.1697	0.0802	2.12	0.034	*
typeFinal Product...6	exports	Type: Final Product	-0.0282	0.0642	-0.44	0.661	ns
typeMachinery...7	exports	Type: Machinery	-0.1305	0.0673	-1.94	0.052	ns
typeMetals...8	exports	Type: Metals	0.2182	0.0666	3.27	0.001	**
typeProcess Equipment...9	exports	Type: Process Equipment	0.0603	0.0493	1.22	0.221	ns
typeProcessed Material...10	exports	Type: Processed Material	-0.1352	0.0513	-2.64	0.008	**
typeProduct Component...11	exports	Type: Product Component	0.0989	0.0490	2.02	0.044	*
typeRaw Material...12	exports	Type: Raw Material	-0.5386	0.0513	-10.50	<0.001	***
stageMidstream...13	exports	Stage: Midstream	-0.0577	0.0160	-3.60	<0.001	***
shap_mean_z:year_c...14	exports	SHAP × Year	0.0381	0.0680	0.56	0.576	ns
(Intercept)...15	imports	Intercept	-2.4182	0.4737	-5.10	<0.001	***
shap_mean_z...16	imports	SHAP importance	1.1415	0.7345	1.55	0.120	ns
year_c...17	imports	Year (standardised)	-0.0423	0.0329	-1.28	0.199	ns
log_gdp...18	imports	log(GDP)	-0.0523	0.0145	-3.60	<0.001	***
typeElectronics...19	imports	Type: Electronics	0.1268	0.0645	1.97	0.049	*
typeFinal Product...20	imports	Type: Final Product	-0.3068	0.0531	-5.78	<0.001	***
typeMachinery...21	imports	Type: Machinery	-0.0909	0.0539	-1.69	0.091	ns
typeMetals...22	imports	Type: Metals	0.1048	0.0551	1.90	0.057	ns
typeProcess Equipment...23	imports	Type: Process Equipment	0.0530	0.0391	1.36	0.175	ns
typeProcessed Material...24	imports	Type: Processed Material	-0.2137	0.0407	-5.25	<0.001	***
typeProduct Component...25	imports	Type: Product Component	0.0441	0.0389	1.13	0.257	ns
typeRaw Material...26	imports	Type: Raw Material	-0.4148	0.0407	-10.20	<0.001	***
stageMidstream...27	imports	Stage: Midstream	-0.0426	0.0127	-3.35	<0.001	***
shap_mean_z:year_c...28	imports	SHAP × Year	0.0680	0.0543	1.25	0.210	ns
(Intercept)...29	balance	Intercept	-0.0162	0.0038	-4.28	<0.001	***
shap_mean_z...30	balance	SHAP importance	-0.0064	0.0038	-1.70	0.089	ns
year_c...31	balance	Year (standardised)	0.0003	0.0004	0.70	0.486	ns
log_gdp...32	balance	log(GDP)	0.0013	0.0002	5.57	<0.001	***
typeElectronics...33	balance	Type: Electronics	-0.0018	0.0008	-2.22	0.026	*
typeFinal Product...34	balance	Type: Final Product	-0.0007	0.0006	-1.14	0.253	ns
typeMachinery...35	balance	Type: Machinery	-0.0001	0.0007	-0.16	0.870	ns
typeMetals...36	balance	Type: Metals	-0.0011	0.0007	-1.55	0.121	ns
typeProcess Equipment...37	balance	Type: Process Equipment	-0.0007	0.0005	-1.44	0.149	ns
typeProcessed Material...38	balance	Type: Processed Material	-0.0004	0.0005	-0.84	0.402	ns
typeProduct Component...39	balance	Type: Product Component	-0.0011	0.0005	-2.36	0.018	*
typeRaw Material...40	balance	Type: Raw Material	-0.0011	0.0005	-2.24	0.025	*
stageMidstream...41	balance	Stage: Midstream	0.0000	0.0002	0.11	0.913	ns
shap_mean_z:year_c...42	balance	SHAP × Year	-0.0007	0.0007	-0.98	0.329	ns

SHAP × Year coefficient: A positive estimate confirms Finding 7 — the alignment between ML-predicted importance and actual trade intensity has strengthened since 2018. This can reflect (a) supply chains maturing and concentrating around economically efficient producers, (b) improving model fit over time, or (c) both.

Panel random effects

Panel Random Effects Variance Components
Metric	Group	Var 1	Var 2	Variance	SD/Corr
exports	iso3	(Intercept)	NA	0.09479	0.3079
exports	tech	(Intercept)	NA	1.26116	1.1230
exports	tech	shap_mean_z	NA	3.64581	1.9094
exports	tech	(Intercept)	shap_mean_z	-2.11484	-0.9863
exports	Residual	NA	NA	0.58521	0.7650
imports	iso3	(Intercept)	NA	0.01801	0.1342
imports	tech	(Intercept)	NA	1.79646	1.3403
imports	tech	shap_mean_z	NA	5.28997	2.3000
imports	tech	(Intercept)	shap_mean_z	-3.04866	-0.9889
imports	Residual	NA	NA	0.49805	0.7057
balance	iso3	(Intercept)	NA	0.00000	0.0022
balance	tech	(Intercept)	NA	0.00004	0.0061
balance	tech	shap_mean_z	NA	0.00013	0.0113
balance	tech	(Intercept)	shap_mean_z	-0.00007	-0.9860
balance	Residual	NA	NA	0.00007	0.0086

5. Key Statistical Highlights

OLS Cross-Sectional Fit (year 2023)

5 / 10 technology-level regressions significant for exports (p < 0.05); median R² = 0.011. Best: Heat Pumps (R² = 0.047, ***).
7 / 10 significant for imports; median R² = 0.019. Best: Heat Pumps (R² = 0.168, ***). Imports outperform exports in 6 / 10 technologies.
1 / 10 significant for net balance. Best: Heat Pumps (R² = 0.081, ***).

What this tells us: SHAP feature importance is a significant cross-sectional predictor of trade intensity in 5 of 10 export regressions and 7 of 10 import regressions, indicating the ML signal captures real patterns in observed trade specialisation rather than statistical noise. Technologies with high R² (e.g. Heat Pumps) show the tightest coupling between ML-predicted production-readiness and observed trade flows. Where import predictability equals or exceeds export predictability, SHAP is detecting countries that import precisely because domestic productive capability is absent — the signal encodes both the presence and the absence of competitive advantage.

HLM Fixed Effects — Exports (cross-sectional 2023)

SHAP importance: β = 1.2763 (*). A 1-SD increase in SHAP is associated with a 1789% change in exports % GDP.
log(GDP): β = 0.0274 (ns). Each doubling of GDP associates with a 2% increase in export intensity — GDP normalisation does not fully eliminate scale effects.
Significant product function effects: Raw Material (-0.510, ***).

What this tells us: The HLM estimates the SHAP coefficient after absorbing country random intercepts and technology random slopes, so β is not inflated by dominant exporters (e.g. China in solar, Germany in wind turbines). A positive SHAP β confirms that ML-predicted feature importance is a globally valid predictor of export intensity independent of who-you-are and which-value-chain-you-are-in. The persistent log(GDP) coefficient means scale effects are not fully removed by per-GDP normalisation — larger economies export disproportionately more on a relative basis. Significant product function and stage dummies reveal that certain value-chain positions carry structural trade premiums or penalties beyond what SHAP alone predicts.

Variance Decomposition (Exports HLM)

Country intercepts: 2.9% of total variance — country-level factors (policy, history, institutions) dominate unexplained variation.
Technology intercepts: 34.5% — baseline trade levels differ markedly across value chains.
Technology SHAP slopes: 102.2% — the SHAP–trade alignment itself varies across technologies (justifies random-slope specification).
Residual: 18.6% — within-country, within-technology product-level variation unexplained by any covariate.

What this tells us: The variance partition reveals what drives differences in trade intensity that SHAP cannot explain. Country intercepts absorbing the largest share (here 2.9%) means that national-level factors — industrial policy, infrastructure, historical path-dependence, institutional capacity — are the dominant residual driver of trade intensity differences, dwarfing product-level ML scores. Technology intercepts (34.5%) reflect that some value chains (solar, wind) are globally traded at large volumes while others (nuclear, geothermal) are structurally thinner markets. The random-slope variance for SHAP (102.2%) confirms that the SHAP→trade relationship is heterogeneous across technologies, justifying the flexible specification. The residual (18.6%) captures product-level noise that no macro-level covariate can reach.

Panel HLM — Temporal Dynamics (2018–2024)

SHAP importance (main): β = 1.0400 (ns).
Year trend: β = -0.0299 (ns) — trade intensity is falling over 2018–2024 conditional on SHAP.
SHAP × Year: β = 0.0381 (ns) — the SHAP–trade link is tightening: high-SHAP products are becoming more intensively traded relative to low-SHAP products year-on-year.

What this tells us: The panel model tests whether the SHAP–trade alignment has changed over 2018–2024, going beyond the cross-sectional snapshot. Clean technology trade intensity is overall contracting during this period (conditional on SHAP), reflecting the global acceleration in deployment and traded volumes. The SHAP × Year interaction is tightening: the relationship between ML-predicted production capability and observed trade intensity is tightening over time. A tightening interaction means the market is becoming progressively more meritocratic — countries with higher production-readiness are capturing growing trade shares as the energy transition accelerates. A loosening interaction would imply geopolitical fragmentation, industrial policy distortions, or early-mover lock-in are beginning to decouple competitiveness signals from actual trade outcomes.

6. Model Comparison (ANOVA / Likelihood Ratio Tests)

Nested model comparison via likelihood ratio test isolates the contribution of each additional component: SHAP only → + GDP → + type/stage → + random slope. A significant χ² indicates the added terms improve model fit beyond chance.

\[ \begin{aligned} M_0 &: y \sim 1 + (1 \mid \text{iso3}) & &\text{(null)} \\[4pt] M_1 &: y \sim \text{shap} + (1 \mid \text{iso3}) & &\text{(+ SHAP)} \\[4pt] M_2 &: y \sim \text{shap} + \log\text{GDP} + (1 \mid \text{iso3}) & &\text{(+ log GDP)} \\[4pt] M_3 &: y \sim \text{shap} + \log\text{GDP} + \text{type} + \text{stage} + (1 \mid \text{iso3}) & &\text{(+ type \& stage FE)} \\[4pt] M_4 &: y \sim \text{shap} + \log\text{GDP} + \text{type} + \text{stage} + (1 + \text{shap} \mid \text{tech}) + (1 \mid \text{iso3}) & &\text{(full HLM)} \end{aligned} \]

Model Comparison — Likelihood Ratio Tests (Exports % GDP, cross-sectional 2023)
	Model	Df	npar	AIC	BIC	logLik	Chisq	p-value
m0	M0: Null	NA	3	7516.1	7534.2	-3755.1	NA	—
m1	M1: +SHAP	1	4	7497.3	7521.4	-3744.6	20.87	<0.001
m2	M2: +logGDP	1	5	7499.2	7529.4	-3744.6	0.03	0.852
m3	M3: +Type/Stage	9	14	7274.5	7358.8	-3623.2	242.79	<0.001
m4	M4: +RandSlope	3	17	7140.1	7242.5	-3553.1	140.34	<0.001

Interpreting the ANOVA table: Each row tests whether adding the new terms significantly improves fit over the previous model. Significant χ² for M1 validates SHAP as a predictor; M2 shows whether GDP adds explanatory power on top of SHAP; M3 tests whether product type/stage structure matters beyond GDP; M4 tests whether allowing the SHAP slope to vary by technology (random slope) is warranted by the data.

7. Time Trajectories

SHAP feature importance is fixed per (product × technology) — it reflects average predictive weight, not a time-varying signal. Trajectories therefore appear as vertical paths in the scatter: horizontal position (SHAP) is pinned, vertical position (trade intensity) moves year to year.

Solar — exports % GDP trajectories (2018–2024). Each path = one country × product; dot size/opacity increases with year.

8. Special Case: Solar — India

India — Solar value chain SHAP vs Exports (% GDP), 2023.

India’s Solar scatter illustrates the Processing → Final Product gradient clearly: polysilicon and silicon wafer products (upstream, Raw Material · Upstream) cluster at low SHAP scores and low exports, while solar cell and module assembly inputs (Product Component · Downstream) are both higher-SHAP and higher-export. This is consistent with India’s current industrial position — strong in downstream assembly, still building upstream processing capacity.

--- title: "SHAP Feature Importance vs Trade Flows" subtitle: "Scatterplot Analysis · Regression Results · Random Effects · Key Findings" author: "Oriol Vallès Codina — Net Zero Industrial Policy Lab, Johns Hopkins SAIS" date: today format: html: theme: cosmo toc: true toc-depth: 3 toc-location: left toc-title: "Contents" number-sections: false self-contained: true page-layout: full fig-width: 10 fig-height: 6 code-fold: true code-tools: true grid: sidebar-width: 260px body-width: 900px include-in-header: text: | <link rel="preconnect" href="https://fonts.googleapis.com"> <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin> <link href="https://fonts.googleapis.com/css2?family=Archivo:ital,wght@0,300;0,400;0,600;0,700;1,400&display=swap" rel="stylesheet"> <style> body, p, li, td, th, h1, h2, h3, h4, h5, h6, .toc-title, #TOC, .sidebar-title, blockquote, pre, .caption, input, select, button, label { font-family: 'Archivo', sans-serif !important; } /* ── Remove right margin band ── */ .page-columns .column-margin, .margin-column { display: none !important; } .page-layout-full #quarto-content { padding-right: 0 !important; } /* ── Equations: horizontal scroll on overflow ── */ .math.display, mjx-container[display="true"] { overflow-x: auto; max-width: 100%; } </style> execute-dir: project execute: echo: false warning: false message: false --- ```{r setup} library(arrow); library(dplyr); library(readr); library(tidyr) library(ggplot2); library(ggrepel); library(forcats) library(showtext); library(patchwork) library(knitr); library(kableExtra) has_lme4 <- requireNamespace("lme4", quietly=TRUE) if (has_lme4) library(lme4, warn.conflicts=FALSE) library(htmltools) font_add_google("Archivo", "Archivo") showtext_auto() ROOT <- here::here() YEAR <- 2023L TOP_N <- 20L IQR_MULT <- 3 NZ_GREEN <- "#073309" NZ_ORANGE <- "#CB5B1D" TS_COLORS <- c( "Raw Material — Upstream" = "#8A4200", "Processed Material — Upstream" = "#C17A2E", "Processed Material — Midstream" = "#073309", "Process Equipment — Midstream" = "#5B7266", "Product Component — Midstream" = "#1A8C28", "Product Component — Downstream" = "#34D045", "Process Equipment — Downstream" = "#6B7B8C", "Final Product — Final Product" = "#CB5B1D" ) # ── Load metadata ───────────────────────────────────────────────────────────── feat <- read_csv(file.path(ROOT,"data","pc","pc_features.csv"), show_col_types=FALSE) |> filter(!is.na(hs_code)) |> mutate(hs_code = formatC(as.integer(hs_code), width=6, flag="0")) |> select(tech, hs_code, prod_name=description, category, shap_mean_z) ctry <- read_csv(file.path(ROOT,"data","pc","pc_countries.csv"), show_col_types=FALSE) |> select(iso3, country, region) gd_meta <- read_csv(file.path(ROOT,"data","green_dict","green_dictionary.csv"), show_col_types=FALSE) |> mutate(code = formatC(as.integer(code), width=6, flag="0")) |> distinct(code, tech, type, stage) |> mutate(typestage = paste0(type, " \u2014 ", stage)) # ── GDP ─────────────────────────────────────────────────────────────────────── gdp_path <- c(file.path(ROOT,"data","gdp_long.csv"), file.path(path.expand("~"),"Documents","R","NZIPL", "NetZeroValueChainExplorer","data","gdp_long.csv")) gdp_path <- gdp_path[file.exists(gdp_path)][1] gdp <- read_csv(gdp_path, show_col_types=FALSE) |> filter(indicator=="GDP-VAL", year==YEAR) |> select(iso3=country_iso3, gdp_mil_usd=value) gdp_panel_raw <- read_csv(gdp_path, show_col_types=FALSE) |> filter(indicator=="GDP-VAL", year >= 2018) |> select(iso3=country_iso3, year, gdp_mil_usd=value) # ── Population ──────────────────────────────────────────────────────────────── pop_path <- file.path(path.expand("~"),"Documents","R","NZIPL", "NetZeroValueChainExplorer","data","worldbank_population.csv") pop_long <- if (file.exists(pop_path)) { tryCatch({ pw <- read_csv(pop_path, show_col_types=FALSE, skip=3) |> select(iso3=`Country Code`, matches("^[0-9]{4}$")) |> pivot_longer(-iso3, names_to="year", values_to="pop") |> mutate(year=as.integer(year), pop_mil=pop/1e6) pw }, error=function(e) NULL) } else NULL # ── Trade bilateral cache ───────────────────────────────────────────────────── pc_isos <- ctry$iso3 pc_codes <- unique(feat$hs_code) bil <- open_dataset(file.path(ROOT,"cache","bilateral_ds")) exp_df <- bil |> filter(exp_iso3 %in% pc_isos, product_code %in% pc_codes, year==YEAR) |> group_by(iso3=exp_iso3, hs_code=product_code) |> summarise(exports_kusd=sum(value,na.rm=TRUE), .groups="drop") |> collect() imp_df <- bil |> filter(imp_iso3 %in% pc_isos, product_code %in% pc_codes, year==YEAR) |> group_by(iso3=imp_iso3, hs_code=product_code) |> summarise(imports_kusd=sum(value,na.rm=TRUE), .groups="drop") |> collect() trade <- full_join(exp_df, imp_df, by=c("iso3","hs_code")) |> mutate(exports_kusd=coalesce(exports_kusd,0L), imports_kusd=coalesce(imports_kusd,0L), balance_kusd=exports_kusd-imports_kusd) |> left_join(gdp, by="iso3") |> mutate(exports_pct = if_else(gdp_mil_usd>0, exports_kusd*0.1/gdp_mil_usd, NA_real_), imports_pct = if_else(gdp_mil_usd>0, imports_kusd*0.1/gdp_mil_usd, NA_real_), balance_pct = if_else(gdp_mil_usd>0, balance_kusd*0.1/gdp_mil_usd, NA_real_)) master <- trade |> filter(exports_kusd>0 | imports_kusd>0) |> inner_join(feat, by="hs_code", relationship="many-to-many") |> left_join(gd_meta, by=c("hs_code"="code","tech"), relationship="many-to-many") |> left_join(ctry, by="iso3") |> mutate( typestage = if_else(typestage %in% names(TS_COLORS), typestage, "Unclassified"), type = coalesce(type, category), stage = coalesce(stage, "n/a") ) |> filter(!is.na(country), !is.na(shap_mean_z)) # add log GDP master <- master |> mutate(log_gdp = if_else(!is.na(gdp_mil_usd) & gdp_mil_usd > 0, log(gdp_mil_usd), NA_real_)) top20_isos <- master |> group_by(iso3) |> summarise(tot=sum(exports_kusd),.groups="drop") |> slice_max(tot,n=TOP_N) |> pull(iso3) df_top <- master |> filter(iso3 %in% top20_isos) filter_outliers <- function(df, y_col, mult = IQR_MULT) { v <- df[[y_col]]; v_f <- v[is.finite(v)] if (length(v_f) < 8) return(df) q25 <- quantile(v_f,.25); q75 <- quantile(v_f,.75); iqr <- q75-q25 if (iqr < 1e-12) { mu <- mean(v_f); sg <- sd(v_f) lower <- mu - mult*3*sg; upper <- mu + mult*3*sg } else { lower <- q25-mult*iqr; upper <- q75+mult*iqr } df |> filter(is.na(.data[[y_col]]) | (is.finite(.data[[y_col]]) & .data[[y_col]] >= lower & .data[[y_col]] <= upper)) } # ── Panel data 2018–2024 ────────────────────────────────────────────────────── exp_panel <- bil |> filter(exp_iso3 %in% top20_isos, product_code %in% pc_codes, year >= 2018) |> group_by(iso3=exp_iso3, hs_code=product_code, year) |> summarise(exports_kusd=sum(value,na.rm=TRUE), .groups="drop") |> collect() imp_panel <- bil |> filter(imp_iso3 %in% top20_isos, product_code %in% pc_codes, year >= 2018) |> group_by(iso3=imp_iso3, hs_code=product_code, year) |> summarise(imports_kusd=sum(value,na.rm=TRUE), .groups="drop") |> collect() trade_panel <- full_join(exp_panel, imp_panel, by=c("iso3","hs_code","year")) |> mutate(exports_kusd=coalesce(exports_kusd,0), imports_kusd=coalesce(imports_kusd,0), balance_kusd=exports_kusd-imports_kusd) |> left_join(gdp_panel_raw, by=c("iso3","year")) |> mutate(exports_pct = if_else(gdp_mil_usd>0, exports_kusd*0.1/gdp_mil_usd, NA_real_), imports_pct = if_else(gdp_mil_usd>0, imports_kusd*0.1/gdp_mil_usd, NA_real_), balance_pct = if_else(gdp_mil_usd>0, balance_kusd*0.1/gdp_mil_usd, NA_real_)) panel_master <- trade_panel |> filter(exports_kusd>0 | imports_kusd>0) |> inner_join(feat, by="hs_code", relationship="many-to-many") |> left_join(gd_meta, by=c("hs_code"="code","tech"), relationship="many-to-many") |> left_join(ctry, by="iso3") |> mutate( typestage = if_else(typestage %in% names(TS_COLORS), typestage, "Unclassified"), type = coalesce(type, category), stage = coalesce(stage, "n/a"), year_c = scale(year)[,1], log_gdp = if_else(!is.na(gdp_mil_usd) & gdp_mil_usd > 0, log(gdp_mil_usd), NA_real_) ) |> filter(!is.na(country), !is.na(shap_mean_z)) # add population to panel if available if (!is.null(pop_long)) { panel_master <- panel_master |> left_join(pop_long |> select(iso3, year, pop_mil), by=c("iso3","year")) |> mutate(log_pop = if_else(!is.na(pop_mil) & pop_mil > 0, log(pop_mil), NA_real_)) } ``` ## Overview This document analyses the relationship between **SHAP feature importance** — the predictive weight each HS product code receives in the ML clean-tech competitiveness model — and **observed bilateral trade flows** (exports, imports, net balance as % of GDP). The central question: *do products the Random Forest identifies as predictive of competitiveness correspond to higher actual trade intensity?* The analysis covers the top **`r TOP_N` exporting economies**, **10 clean technologies**, and **`r n_distinct(master$hs_code)` HS product codes** mapped through the NZIPL Green Dictionary. Models range from simple OLS by technology to hierarchical linear models (HLM) with random slopes and intercepts, controlling for product type, value-chain stage, country GDP, and time trend. --- ## 1. Scatterplot Gallery {#gallery} All static PNG outputs from `analysis/scatterplot/` organised by grouping. ```{r gallery} #| results: asis #| echo: false BASE <- "../analysis/scatterplot" techs <- c("batteries","biofuel","electrolyzers","geothermal","heat_pumps", "magnets","nuclear","solar","transmission","wind") tech_labels <- c("Batteries","Biofuel","Electrolyzers","Geothermal","Heat Pumps", "Magnets","Nuclear","Solar","Transmission","Wind") metrics <- c("exports","imports","balance") metric_labels <- c("Exports","Imports","Balance") countries <- c("chl","ind","kor","mys","vnm") country_labels <- c("Chile","India","Korea","Malaysia","Vietnam") img <- function(path, caption="", width="100%") { if (file.exists(file.path(ROOT, sub("^\\.\\./","",path)))) cat(sprintf('\n<figure style="margin:0 0 16px 0;">\n<img src="%s" style="width:%s;border-radius:4px;" alt="%s">\n%s\n</figure>\n', path, width, caption, if (nzchar(caption)) sprintf('<figcaption style="font-size:.85rem;color:#64748b;margin-top:4px;">%s</figcaption>', caption) else "")) } grid_open <- function() cat('\n<div style="display:grid;grid-template-columns:1fr 1fr;gap:20px;align-items:start;">\n') grid_close <- function() cat('\n</div>\n') cat("\n::: {.panel-tabset}\n") cat("\n## All Technologies\n") for (i in seq_along(metrics)) { cat(sprintf("\n### %s\n", metric_labels[i])) img(file.path(BASE,"all_techs",sprintf("pc_%s_all_techs_top20.png", metrics[i])), sprintf("%s — all technologies · top 20 exporters", metric_labels[i])) } cat("\n## By Technology\n") cat("\n::: {.panel-tabset}\n") for (mi in seq_along(metrics)) { cat(sprintf("\n## %s\n", metric_labels[mi])) grid_open() for (ti in seq_along(techs)) img(file.path(BASE,"by_tech",sprintf("pc_%s_scatter_%s.png", metrics[mi], techs[ti])), tech_labels[ti]) grid_close() } cat("\n:::\n") cat("\n## By Country\n") cat("\n::: {.panel-tabset}\n") for (ci in seq_along(countries)) { cat(sprintf("\n## %s\n", country_labels[ci])) grid_open() for (mi in seq_along(metrics)) img(file.path(BASE,"by_country",sprintf("pc_%s_scatter_%s.png", metrics[mi], countries[ci])), metric_labels[mi]) grid_close() } cat("\n:::\n") cat("\n## Regression Tables\n") for (tf in list( list(p=file.path(BASE,"tables","regression_summary.png"), c="Combined regression summary"), list(p=file.path(BASE,"tables","reg_table_exports_all_techs.png"),c="OLS — Exports"), list(p=file.path(BASE,"tables","reg_table_imports_all_techs.png"),c="OLS — Imports"), list(p=file.path(BASE,"tables","reg_table_balance_all_techs.png"),c="OLS — Balance"), list(p=file.path(BASE,"tables","hlm_fixed_effects.png"), c="HLM fixed effects") )) img(tf$p, tf$c) cat("\n:::\n") ``` --- ## 2. Interactive Scatter Viewer {#scatter-viewer} ```{r scatter-link} #| results: asis scatter_html <- file.path(ROOT, "analysis", "scatterplot", "pc_scatter.html") scatter_thumb <- file.path(ROOT, "analysis", "scatterplot", "all_techs", "pc_exports_all_techs_top20.png") scatter_href <- "pc_scatter.html" # relative to final output at analysis/scatterplot/ img_el <- if (file.exists(scatter_thumb)) { tags$img( src = knitr::image_uri(scatter_thumb), alt = "SHAP vs Exports — all technologies preview", style = "width:100%;height:240px;object-fit:cover;display:block;border-bottom:1px solid #dde;") } else { tags$div( style = "height:240px;background:#eef4f0;display:flex;align-items:center;justify-content:center;", tags$span(style = "font-size:13px;color:#999;font-family:Archivo,sans-serif;", "\u23F3 Thumbnail not available")) } ef <- file.exists(scatter_html) card <- tags$a( href = if (ef) scatter_href else "#", target = "_blank", style = paste0("text-decoration:none;color:inherit;display:block;max-width:680px;", if (!ef) "pointer-events:none;" else ""), tags$div( style = "border:1px solid #dde;border-radius:8px;overflow:hidden; box-shadow:0 3px 10px rgba(0,0,0,.10);margin:8px 0 16px;", tags$div(style = "background:#073309;height:4px;"), img_el, tags$div( style = "padding:12px 16px;", tags$p("SHAP Feature Importance vs Trade Flows — Interactive Scatterplot Viewer", style = "margin:0;font-size:.95rem;font-weight:700;color:#1e293b; font-family:Archivo,sans-serif;"), tags$p(if (ef) "\u2197 Open viewer · ~6 MB · exports / imports / balance \u00d7 % of GDP / % of clean tech trade" else "\u23F3 File not found — run pc_scatter.R to generate", style = "margin:5px 0 0;font-size:.8rem;color:#64748b;font-family:Archivo,sans-serif;")))) tagList(card) ``` ::: {.callout-tip title="How to navigate the viewer"} - **Variable / Unit dropdowns**: Exports / Imports / Balance × % of clean tech trade / % GDP / % total merchandise - **Variable / Unit**: Exports / Imports / Balance × % clean trade / % GDP / % total merchandise - **Year range sliders**: any window within 2018–2024; slide both to a single year for a snapshot - **Country / Technology / Function / SHAP Category** sidebar filters - **Highlight Country**: overlay a single country in amber for direct comparison ::: --- ## 3. Key Findings {#key-findings} ```{r ols-prep} reg_stats <- function(df, y_col, group_col="tech") { df |> filter(!is.na(.data[[y_col]]), !is.na(shap_mean_z), is.finite(.data[[y_col]])) |> group_by(.data[[group_col]]) |> filter(n() >= 5) |> group_modify(~ { fit <- tryCatch(lm(reformulate("shap_mean_z", y_col), data=.x), error=function(e) NULL) if (is.null(fit)) return(tibble(n=nrow(.x), slope=NA, intercept=NA, r2=NA, pval=NA)) s <- summary(fit); cf <- coef(s); f <- s$fstatistic if (!"shap_mean_z" %in% rownames(cf)) return(tibble(n=nrow(fit$model), slope=NA, intercept=NA, r2=NA, pval=NA)) pv <- if (!is.null(f)) pf(f[1],f[2],f[3],lower.tail=FALSE) else NA tibble(n=nrow(fit$model), slope=cf["shap_mean_z","Estimate"], intercept=cf["(Intercept)","Estimate"], r2=s$r.squared, pval=pv) }) |> ungroup() |> rename(group=1) |> mutate(p_label=case_when(pval<.001~"***",pval<.01~"**",pval<.05~"*",TRUE~"ns")) } df_exp <- df_top |> group_by(tech) |> group_modify(~filter_outliers(.x,"exports_pct")) |> ungroup() df_imp <- df_top |> group_by(tech) |> group_modify(~filter_outliers(.x,"imports_pct")) |> ungroup() df_bal <- df_top |> group_by(tech) |> group_modify(~filter_outliers(.x,"balance_pct")) |> ungroup() rdf_exp <- reg_stats(df_exp |> filter(exports_pct>0), "exports_pct") rdf_imp <- reg_stats(df_imp |> filter(imports_pct>0), "imports_pct") rdf_bal <- reg_stats(df_bal, "balance_pct") n_sig_exp <- sum(rdf_exp$p_label %in% c("*","**","***"), na.rm=TRUE) n_sig_imp <- sum(rdf_imp$p_label %in% c("*","**","***"), na.rm=TRUE) best_r2 <- rdf_exp |> slice_max(r2, n=1, with_ties=FALSE) worst_r2 <- rdf_exp |> slice_min(r2, n=1, with_ties=FALSE) type_order <- c("Raw Material","Processed Material","Process Equipment", "Product Component","Final Product") stage_order <- c("Upstream","Midstream","Downstream","Final Product") type_pal <- c("Raw Material"="#8A4200","Processed Material"="#C17A2E", "Process Equipment"="#5B7266","Product Component"="#1A8C28", "Final Product"="#CB5B1D") stage_pal <- c("Upstream"="#8A4200","Midstream"="#073309", "Downstream"="#1A8C28","Final Product"="#CB5B1D") # Generalised slope-by-group computation compute_slope_by <- function(df, y_col, group_col, levels_vec, use_log=TRUE) { df_f <- df |> filter(!is.na(.data[[y_col]]), is.finite(.data[[y_col]]), !is.na(.data[[group_col]]), .data[[group_col]] %in% levels_vec) if (use_log) df_f <- df_f |> filter(.data[[y_col]] > 0) df_f |> group_by(.data[[group_col]]) |> filter(n() >= 10) |> group_modify(~ { y_v <- if (use_log) log10(.x[[y_col]]) else .x[[y_col]] fit <- tryCatch(lm(y_v ~ shap_mean_z, data=.x), error=function(e) NULL) if (is.null(fit)) return(tibble(slope=NA_real_,se=NA_real_,pval=NA_real_,n=nrow(.x))) cf <- coef(summary(fit)) if (!"shap_mean_z" %in% rownames(cf)) return(tibble(slope=NA_real_,se=NA_real_,pval=NA_real_,n=nrow(.x))) tibble(slope=cf["shap_mean_z","Estimate"], se=cf["shap_mean_z","Std. Error"], pval=cf["shap_mean_z","Pr(>|t|)"], n=nrow(.x)) }) |> ungroup() |> rename(group=1) |> mutate(sig=case_when(pval<.001~"***",pval<.01~"**",pval<.05~"*",TRUE~"ns"), group=factor(group, levels=levels_vec)) } # Pre-compute slopes for all metrics st_type_exp <- compute_slope_by(df_exp,"exports_pct","type", type_order) st_type_imp <- compute_slope_by(df_imp,"imports_pct","type", type_order) st_type_bal <- compute_slope_by(df_bal,"balance_pct","type", type_order, use_log=FALSE) st_stage_exp <- compute_slope_by(df_exp,"exports_pct","stage",stage_order) st_stage_imp <- compute_slope_by(df_imp,"imports_pct","stage",stage_order) st_stage_bal <- compute_slope_by(df_bal,"balance_pct","stage",stage_order, use_log=FALSE) st_ts_exp <- compute_slope_by(df_exp,"exports_pct","typestage",names(TS_COLORS)) st_ts_imp <- compute_slope_by(df_imp,"imports_pct","typestage",names(TS_COLORS)) st_ts_bal <- compute_slope_by(df_bal,"balance_pct","typestage",names(TS_COLORS), use_log=FALSE) # Plotting helpers slope_bar <- function(slopes, pal, title, y_lbl="OLS slope") { slopes |> filter(!is.na(slope)) |> ggplot(aes(x=group, y=slope, fill=group)) + geom_col(width=0.65) + geom_errorbar(aes(ymin=slope-se,ymax=slope+se),width=0.25,color="#475569") + geom_text(aes(label=sig, y=slope+sign(slope)*se+sign(slope)*0.02), size=5,family="Archivo",color="#1e293b") + geom_hline(yintercept=0,linewidth=0.5,color="#94a3b8") + scale_fill_manual(values=pal, guide="none") + labs(title=title, subtitle="OLS slope · ±1 SE · year 2023 · *** p<0.001 ** p<0.01 * p<0.05", x=NULL, y=y_lbl) + theme_minimal(base_family="Archivo",base_size=15) + theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN), plot.subtitle=element_text(size=12,color="#475569"), axis.text.x=element_text(size=13), panel.grid.major.x=element_blank(),panel.grid.minor=element_blank()) } ts_heatmap <- function(df, y_col, title, use_log=TRUE) { d <- df |> filter(!is.na(.data[[y_col]]),is.finite(.data[[y_col]]), !is.na(type),type %in% type_order,!is.na(stage),stage %in% stage_order) if (use_log) d <- d |> filter(.data[[y_col]]>0) slopes <- d |> group_by(type,stage) |> filter(n()>=10) |> group_modify(~{ y_v <- if (use_log) log10(.x[[y_col]]) else .x[[y_col]] fit <- tryCatch(lm(y_v~shap_mean_z,data=.x),error=function(e) NULL) if (is.null(fit)) return(tibble(slope=NA_real_)) cf <- coef(summary(fit)) if (!"shap_mean_z"%in%rownames(cf)) return(tibble(slope=NA_real_)) tibble(slope=cf["shap_mean_z","Estimate"]) }) |> ungroup() |> mutate(type=factor(type,levels=type_order),stage=factor(stage,levels=stage_order)) full_grid <- tidyr::expand_grid( type=factor(type_order,levels=type_order), stage=factor(stage_order,levels=stage_order) ) |> left_join(slopes,by=c("type","stage")) ggplot(full_grid,aes(x=stage,y=type,fill=slope)) + geom_tile(color="white",linewidth=0.8) + geom_text(aes(label=if_else(!is.na(slope),sprintf("%.2f",slope),"–")), size=4.5,family="Archivo",color="white",fontface="bold") + scale_fill_gradient2(low="#CB5B1D",mid="#f1f5f9",high=NZ_GREEN, midpoint=0,na.value="#e2e8f0",name="OLS slope") + scale_x_discrete(position="top") + labs(title=title, subtitle="Each cell = OLS slope · log10(trade % GDP) ~ SHAP · grey/dash = <10 obs", x="Stage",y="Product Function") + theme_minimal(base_family="Archivo",base_size=15) + theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN), plot.subtitle=element_text(size=12,color="#475569"), axis.text=element_text(size=13),panel.grid=element_blank()) } ``` **How to read the slope charts (Findings 2–3):** The bar height is the OLS regression coefficient from `log10(trade % GDP) ~ SHAP mean |z|`. A **positive slope** means products the RF assigns higher importance to are also traded more intensively in practice — the ML model and actual trade are aligned. A **negative slope** means higher-SHAP products are traded *less* intensively. For **Product Component**, the strong positive slope (***) confirms that component-level supply chain capability is where the RF model's predictions map most faithfully onto trade reality. For **Final Product**, the negative slope (**) is not a model failure — it reflects that final product HS codes (e.g., solar panels, wind turbines) all receive *uniformly high* SHAP scores since they are adjacent to the competitive outcome the RF predicts. Actual final-product export intensity is driven by industrial policy, scale, and market access; the highest-SHAP final product codes are those most concentrated in 1–2 dominant exporters (China for PV, Denmark for wind), so among the other top-20 exporters that do export these products, SHAP and export intensity are negatively correlated. **Slope magnitude**: in log10 units, a slope of +0.15 corresponds to ~40% more exports per 1 SD increase in SHAP; a slope of −5 signals extreme concentration rather than a continuous gradient. --- ::: {.panel-tabset} ## Exports ### Finding E1 — SHAP importance predicts export intensity across most technologies {#f1} In **`r n_sig_exp`** of `r nrow(rdf_exp)` technology-level OLS regressions the SHAP slope on exports (% GDP) is statistically significant. The strongest fit is **`r best_r2$group`** (R² = `r round(best_r2$r2,3)`, `r best_r2$p_label`); the weakest is `r worst_r2$group` (R² = `r round(worst_r2$r2,3)`, `r worst_r2$p_label`). This validates the ML model: products assigned high feature importance tend to be products in which top economies actually trade intensively. ```{r f1-plot} #| fig-width: 10 #| fig-height: 4.5 r2_bar <- function(rdf, title) { rdf |> mutate(group=fct_reorder(group,r2), fill=if_else(p_label%in%c("*","**","***"),NZ_GREEN,"#94a3b8"), label=paste0("R²=",round(r2,3)," ",p_label)) |> ggplot(aes(x=r2,y=group,fill=fill)) + geom_col(width=0.65) + geom_text(aes(label=label,x=r2+0.001),hjust=0,size=4.5,family="Archivo",color="#1e293b") + scale_fill_identity() + scale_x_continuous(expand=expansion(mult=c(0,.2)),labels=function(x)sprintf("%.3f",x)) + labs(title=title,subtitle="Cross-sectional OLS · year 2023 · top-20 exporting economies", x="R²",y=NULL,caption="Green = p<0.05 · Grey = not significant") + theme_minimal(base_family="Archivo",base_size=15) + theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN), plot.subtitle=element_text(size=12,color="#475569"), panel.grid.major.y=element_blank(),panel.grid.minor=element_blank()) } r2_bar(rdf_exp, "E1: R² by Technology — Exports % GDP ~ SHAP mean |z|") ``` ### Finding E2 — SHAP slope by product function {#fe2} **Product Components** show the strongest positive slope (***): the ML model's importance scores align most faithfully with trade reality at the component level. **Final Products** show a significant *negative* slope — see interpretation note above. ```{r fe2-plot} #| fig-width: 10 #| fig-height: 4.5 slope_bar(st_type_exp, type_pal, "E2: SHAP→Exports Slope by Product Function") ``` ### Finding E3 — SHAP slope by value-chain stage {#fe3} The SHAP–export relationship strengthens progressively from Upstream to Final Product stage, confirming that stage position captures a dimension of supply-chain position beyond product function alone. ```{r fe3-plot} #| fig-width: 9 #| fig-height: 4.5 slope_bar(st_stage_exp, stage_pal, "E3: SHAP→Exports Slope by Value-Chain Stage") ``` ### Finding E4 — Function × stage interaction heatmap {#fe4} Which specific cells in the value chain show the tightest ML–trade alignment for exports? ```{r fe4-plot} #| fig-width: 10 #| fig-height: 5 ts_heatmap(df_exp, "exports_pct", "E4: SHAP→Exports Slope — Product Function × Stage") ``` ### Finding E5 — High-SHAP products traded more intensively, gap widening {#fe5} ```{r fe5-plot} #| fig-width: 10 #| fig-height: 5 shap_q75 <- quantile(panel_master$shap_mean_z, 0.75, na.rm=TRUE) shap_q25 <- quantile(panel_master$shap_mean_z, 0.25, na.rm=TRUE) panel_master |> filter(exports_pct>0,is.finite(exports_pct),iso3%in%top20_isos) |> mutate(shap_grp=case_when(shap_mean_z>=shap_q75~"High SHAP",shap_mean_z<=shap_q25~"Low SHAP",TRUE~NA_character_)) |> filter(!is.na(shap_grp)) |> group_by(year,shap_grp) |> summarise(med=median(exports_pct,na.rm=TRUE),.groups="drop") |> ggplot(aes(x=year,y=med,color=shap_grp,group=shap_grp)) + geom_line(linewidth=1.2)+geom_point(size=2.5) + scale_y_log10(labels=function(x)paste0(signif(x,2),"%")) + scale_color_manual(values=c("High SHAP"=NZ_GREEN,"Low SHAP"=NZ_ORANGE),name=NULL) + labs(title="E5: High-SHAP Products Traded More Intensively — Gap Widens", subtitle="Median exports % GDP · top/bottom SHAP quartile · 2018–2024", x=NULL,y="Median exports % GDP (log)",caption="top-20 economies") + theme_minimal(base_family="Archivo",base_size=15) + theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN), plot.subtitle=element_text(size=12,color="#475569"),legend.position="top",panel.grid.minor=element_blank()) ``` ## Imports ### Finding I1 — SHAP predicts import intensity more strongly than exports {#fi1} **`r n_sig_imp`** of `r nrow(rdf_imp)` technology-level import regressions are significant vs **`r n_sig_exp`** for exports. High-SHAP products are frequently capital goods — machinery and process equipment — that countries *import* to build production capacity before developing domestic export competitiveness. ```{r fi1-plot} #| fig-width: 10 #| fig-height: 4.5 r2_bar(rdf_imp, "I1: R² by Technology — Imports % GDP ~ SHAP mean |z|") ``` ### Finding I2 — Import slope by product function {#fi2} For imports, the function slope pattern differs from exports: **Process Equipment** often shows a stronger positive slope because capital goods used *in manufacturing* (the process chain) are imported even before export capacity develops. ```{r fi2-plot} #| fig-width: 10 #| fig-height: 4.5 slope_bar(st_type_imp, type_pal, "I2: SHAP→Imports Slope by Product Function") ``` ### Finding I3 — Import slope by stage {#fi3} Upstream and midstream stages show stronger import–SHAP alignment than for exports, reflecting countries importing inputs they do not yet produce domestically. ```{r fi3-plot} #| fig-width: 9 #| fig-height: 4.5 slope_bar(st_stage_imp, stage_pal, "I3: SHAP→Imports Slope by Value-Chain Stage") ``` ### Finding I4 — Function × stage interaction for imports {#fi4} ```{r fi4-plot} #| fig-width: 10 #| fig-height: 5 ts_heatmap(df_imp, "imports_pct", "I4: SHAP→Imports Slope — Product Function × Stage") ``` ## Net Balance ### Finding B1 — SHAP predicts trade balance across technologies {#fb1} A positive slope means countries with higher-SHAP products run a *net surplus* in those products — i.e., the ML model identifies products where competitive advantage translates into actual specialisation. ```{r fb1-plot} #| fig-width: 10 #| fig-height: 4.5 r2_bar(rdf_bal, "B1: R² by Technology — Net Balance % GDP ~ SHAP mean |z|") ``` ### Finding B2 — Balance slope by product function {#fb2} ```{r fb2-plot} #| fig-width: 10 #| fig-height: 4.5 slope_bar(st_type_bal, type_pal, "B2: SHAP→Balance Slope by Product Function", y_lbl="OLS slope (raw balance)") ``` ### Finding B3 — Balance slope by stage {#fb3} ```{r fb3-plot} #| fig-width: 9 #| fig-height: 4.5 slope_bar(st_stage_bal, stage_pal, "B3: SHAP→Balance Slope by Value-Chain Stage", y_lbl="OLS slope (raw balance)") ``` ### Finding B4 — Function × stage interaction for net balance {#fb4} ```{r fb4-plot} #| fig-width: 10 #| fig-height: 5 ts_heatmap(df_bal, "balance_pct", "B4: SHAP→Balance Slope — Product Function × Stage", use_log=FALSE) ``` ::: --- ### Cross-cutting: GDP size effect {#f-gdp} Larger economies trade more in absolute terms, but after normalising by GDP the relationship is weak — confirming SHAP captures a structural capability signal independent of country scale. ```{r f-gdp-plot} #| fig-width: 10 #| fig-height: 5 ctry_agg <- df_top |> filter(exports_pct>0,is.finite(exports_pct),!is.na(gdp_mil_usd)) |> group_by(iso3,country,region,gdp_mil_usd) |> summarise(med_exp_pct=median(exports_pct,na.rm=TRUE),.groups="drop") REG_PAL <- c("East Asia & Pacific"="#073309","Europe & Central Asia"="#0369a1", "North America"="#CB5B1D","South Asia"="#7c3aed", "Latin America & Caribbean"="#be185d","Middle East & North Africa"="#b45309", "Sub-Saharan Africa"="#15803d","Other"="#374151") ggplot(ctry_agg,aes(x=log(gdp_mil_usd),y=med_exp_pct,color=region,label=iso3)) + geom_smooth(method="lm",se=TRUE,formula=y~x,color="#94a3b8",fill="#e2e8f0",linewidth=0.8,show.legend=FALSE) + geom_point(size=3.5,alpha=0.85) + geom_text_repel(size=3.2,family="Archivo",max.overlaps=12,segment.color="#cbd5e1",show.legend=FALSE) + scale_y_log10(labels=function(x)paste0(signif(x,2),"%")) + scale_color_manual(values=REG_PAL,name="Region",na.value="#94a3b8") + labs(title="Cross-cutting: Country GDP vs Clean-Tech Export Intensity", subtitle="Weak positive slope confirms GDP normalisation controls for country size", x="log(GDP, million USD)",y="Median exports % GDP (log scale)") + theme_minimal(base_family="Archivo",base_size=15) + theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN), plot.subtitle=element_text(size=12,color="#475569"), legend.position="right",panel.grid.minor=element_blank()) ``` --- ## 4. Regression Analysis {#regression} ### 3.1 Cross-Sectional OLS — by Technology One OLS regression per technology, cross-section for `r YEAR`: `log(trade % GDP) ~ SHAP mean |z|`. ```{r ols-tables} #| fig-width: 18 #| fig-height: 7 make_ols_tbl_gg <- function(rdf, title) { tbl <- rdf |> transmute(Technology=group, N=as.character(n), Slope=sprintf("%.4f",slope), R2=sprintf("%.3f",r2), `p-value`=case_when(pval<.001~"<0.001",TRUE~sprintf("%.3f",pval)), Sig.=p_label) nr <- nrow(tbl); nc <- ncol(tbl) tbl_long <- tbl |> mutate(.row=row_number()) |> pivot_longer(-.row, names_to=".col", values_to=".val") |> mutate(.col=factor(.col, levels=colnames(tbl))) hdr <- tibble(.row=0L, .col=factor(colnames(tbl),levels=colnames(tbl)), .val=colnames(tbl)) all_rows <- bind_rows(hdr, tbl_long); all_rows$.row <- as.numeric(all_rows$.row) sig_map <- rdf |> mutate(.row=row_number()) |> select(.row, p_label) row_bg <- tibble(.row=seq_len(nr)) |> left_join(sig_map, by=".row") |> mutate(fill=case_when(p_label=="***"~"#d1fae5",p_label=="**"~"#fef9c3", p_label=="*"~"#dbeafe",.row%%2==0~"#f8fafc",TRUE~"#ffffff"), xmin=-Inf, xmax=Inf, ymin=-(.row)-0.5, ymax=-(.row)+0.5) body_rows <- all_rows |> filter(.row>0) |> left_join(sig_map, by=".row") |> mutate(ff=if_else(!is.na(p_label)&p_label!="ns","bold","plain")) ul_rows <- sig_map |> filter(p_label=="***") p <- ggplot(all_rows, aes(x=as.integer(.col), y=-.row)) + annotate("rect",xmin=-Inf,xmax=Inf,ymin=-0.5,ymax=0.5,fill="#073309") + geom_rect(data=row_bg, aes(xmin=xmin,xmax=xmax,ymin=ymin,ymax=ymax,fill=fill), color=NA, inherit.aes=FALSE) + scale_fill_identity() + geom_text(data=filter(all_rows,.row==0), aes(label=.val), color="white", fontface="bold", size=4.5, family="Archivo") + geom_text(data=body_rows, aes(label=.val,fontface=ff), color="#1e293b", size=4.2, family="Archivo") + labs(title=title, caption="■ p<0.001 (***) ■ p<0.01 (**) ■ p<0.05 (*) bold = sig underline = highly sig") + theme_void(base_family="Archivo") + theme(plot.title=element_text(size=13,face="bold",color=NZ_GREEN,hjust=0.5,margin=margin(b=8)), plot.caption=element_text(size=8,color="#64748b",hjust=0.5,margin=margin(t=6)), plot.background=element_rect(fill="#f8fafc",color=NA), plot.margin=margin(12,12,8,12)) + scale_x_continuous(expand=expansion(add=.5)) + scale_y_continuous(expand=expansion(add=.3)) if (nrow(ul_rows) > 0) { segs <- tibble(y=-(ul_rows$.row)-0.42, xmin=0.6, xmax=nc+0.4) p <- p + geom_segment(data=segs, aes(x=xmin,xend=xmax,y=y,yend=y), color="#065f46", linewidth=0.5, inherit.aes=FALSE) } p } p_e <- make_ols_tbl_gg(rdf_exp, "Exports % GDP") p_i <- make_ols_tbl_gg(rdf_imp, "Imports % GDP") p_b <- make_ols_tbl_gg(rdf_bal, "Net Balance % GDP") print((p_e | p_i | p_b) + plot_layout(guides="collect")) ``` --- ### 3.2 Hierarchical Linear Model — Cross-Sectional (`r YEAR`) {#hlm-cross} The cross-sectional HLM pools all countries, technologies, and product types in a single model with the full covariate set. The model controls for product type, value-chain stage, and log-GDP, while allowing the SHAP slope to vary by technology (random slope) and the baseline trade level to vary by country (random intercept): $$ \begin{aligned} \log(y_{ict}) &= \underbrace{\beta_0 + \beta_1 \cdot \text{shap}_i + \beta_2 \log\text{GDP}_c}_{\text{core predictors}} \\[6pt] &\quad + \underbrace{\sum_k \gamma_k \cdot \mathbf{1}[\text{type}_i = k] \;+\; \sum_s \delta_s \cdot \mathbf{1}[\text{stage}_i = s]}_{\text{product type \& stage fixed effects}} \\[6pt] &\quad + \underbrace{u_{0c} + u_{0j} + u_{1j} \cdot \text{shap}_i}_{\text{random effects}} \;+\; \varepsilon_{ict} \end{aligned} $$ where: - $i$ = HS product, $c$ = country, $j$ = technology - $\gamma_k$ = fixed effect for product type $k$ (Raw Material, Processed Material, Process Equipment, Product Component, Final Product) - $\delta_s$ = fixed effect for value-chain stage $s$ (Upstream, Midstream, Downstream) - $u_{0c} \sim \mathcal{N}(0,\sigma_c^2)$ = **random country intercept** — captures overall trade level not explained by SHAP or GDP - $(u_{0j}, u_{1j}) \sim \mathcal{N}(0, \Sigma_j)$ = **random technology intercept + slope** — captures how much the SHAP–trade relationship differs across value chains ```{r hlm-cross} run_hlm_full <- function(df, y_col, log_y=TRUE) { if (!has_lme4) return(NULL) if (!has_lme4) return(NULL) df_m <- df |> filter(!is.na(.data[[y_col]]), is.finite(.data[[y_col]]), !is.na(log_gdp), !is.na(type), !is.na(stage)) if (log_y) df_m <- df_m |> filter(.data[[y_col]] > 0) |> mutate(y_use = log10(.data[[y_col]])) else df_m <- df_m |> mutate(y_use = .data[[y_col]]) if (nrow(df_m) < 50) return(NULL) tryCatch( lme4::lmer(y_use ~ shap_mean_z + log_gdp + type + stage + (1 + shap_mean_z | tech) + (1 | iso3), data=df_m, REML=FALSE, control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL ) } hlm_exp <- run_hlm_full(df_exp, "exports_pct") hlm_imp <- run_hlm_full(df_imp, "imports_pct") hlm_bal <- run_hlm_full(df_bal, "balance_pct", log_y=FALSE) hlm_tidy <- function(fit, metric) { if (is.null(fit)) return(NULL) fe <- as.data.frame(coef(summary(fit))) fe$term <- rownames(fe) fe |> mutate(metric = metric, z_val = Estimate/`Std. Error`, p_wald = 2*pnorm(abs(z_val), lower.tail=FALSE), sig = case_when(p_wald<.001~"***",p_wald<.01~"**", p_wald<.05~"*",TRUE~"ns"), Term = term) |> select(Metric=metric, Term, Estimate, SE=`Std. Error`, z=z_val, `p (Wald)`=p_wald, Sig.=sig) |> mutate(across(c(Estimate,SE), ~round(.,4)), z=round(z,2), `p (Wald)`=if_else(`p (Wald)`<.001,"<0.001",sprintf("%.3f",`p (Wald)`))) } hlm_all <- bind_rows( hlm_tidy(hlm_exp, "exports"), hlm_tidy(hlm_imp, "imports"), hlm_tidy(hlm_bal, "balance") ) if (has_lme4 && !is.null(hlm_all) && nrow(hlm_all) > 0) { hlm_all |> kable(align=c("l","l","r","r","r","r","c"), booktabs=TRUE, caption=paste("HLM Fixed Effects —", YEAR, "(controls: log GDP, product type, value-chain stage)")) |> kable_styling(bootstrap_options=c("striped","hover","condensed"), font_size=13) |> row_spec(which(hlm_all$Sig.=="***"), bold=TRUE, background="#d1fae5") |> row_spec(which(hlm_all$Sig.=="**"), bold=TRUE, background="#fef9c3") |> row_spec(which(hlm_all$Sig.=="*"), bold=TRUE, background="#dbeafe") } ``` #### Coefficient plot — Fixed effects ```{r coef-plot} #| fig-width: 11 #| fig-height: 7 #| fig-cap: "Fixed-effect coefficients from the cross-sectional HLM (exports outcome). Error bars = ±1.96 SE (95% CI). Green = positive & significant; orange = negative & significant; grey = ns." if (has_lme4 && !is.null(hlm_exp)) { fe_df <- as.data.frame(coef(summary(hlm_exp))) fe_df$term <- rownames(fe_df) fe_df <- fe_df |> mutate(lo = Estimate - 1.96*`Std. Error`, hi = Estimate + 1.96*`Std. Error`, p = 2*pnorm(abs(Estimate/`Std. Error`), lower.tail=FALSE), sig = p < 0.05, col = case_when(sig & Estimate > 0 ~ NZ_GREEN, sig & Estimate < 0 ~ NZ_ORANGE, TRUE ~ "#94a3b8"), term_clean = gsub("type|stage","",term)) |> filter(term != "(Intercept)") |> mutate(term_clean = fct_reorder(term_clean, Estimate)) ggplot(fe_df, aes(x=Estimate, xmin=lo, xmax=hi, y=term_clean, color=col)) + geom_vline(xintercept=0, linetype="dashed", color="#94a3b8", linewidth=0.6) + geom_errorbarh(height=0.3, linewidth=0.8) + geom_point(size=3.5) + scale_color_identity() + labs(title="HLM Fixed Effects — Exports % GDP (log10)", subtitle="Coefficients with 95% Wald CI · cross-sectional 2023", x="Coefficient estimate", y=NULL, caption="Green = significant positive · Orange = significant negative · Grey = ns") + theme_minimal(base_family="Archivo", base_size=12) + theme(plot.title=element_text(size=14,face="bold",color=NZ_GREEN), plot.subtitle=element_text(size=11,color="#475569"), panel.grid.major.y=element_blank()) } ``` #### Random slopes by technology ```{r caterpillar-tech} #| fig-width: 9 #| fig-height: 5 #| fig-cap: "Random SHAP slopes by technology (from HLM exports model). Each bar = technology-specific deviation from the global slope. Techs with wider bars show more heterogeneous SHAP-trade relationships." if (has_lme4 && !is.null(hlm_exp)) { re_tech <- as.data.frame(ranef(hlm_exp, condVar=TRUE)) re_slope <- re_tech |> filter(grpvar=="tech", term=="shap_mean_z") |> mutate(grp = fct_reorder(grp, condval), lo = condval - 1.96*condsd, hi = condval + 1.96*condsd, col = if_else(condval > 0, NZ_GREEN, NZ_ORANGE)) ggplot(re_slope, aes(x=condval, xmin=lo, xmax=hi, y=grp, color=col)) + geom_vline(xintercept=0, linetype="dashed", color="#94a3b8") + geom_errorbarh(height=0.35, linewidth=0.9) + geom_point(size=4) + scale_color_identity() + labs(title="Random SHAP Slopes by Technology", subtitle="Deviation from global SHAP slope · 95% CI", x="Random slope deviation (shap_mean_z)", y=NULL) + theme_minimal(base_family="Archivo", base_size=13) + theme(plot.title=element_text(size=14,face="bold",color=NZ_GREEN), plot.subtitle=element_text(size=11,color="#475569"), panel.grid.major.y=element_blank()) } ``` #### Random intercepts by country ```{r caterpillar-country} #| fig-width: 9 #| fig-height: 7 #| fig-cap: "Random intercepts by country. Positive values = country trades more intensively than predicted by SHAP + GDP alone; negative = less than expected." if (has_lme4 && !is.null(hlm_exp)) { re_ctry <- as.data.frame(ranef(hlm_exp, condVar=TRUE)) re_int <- re_ctry |> filter(grpvar=="iso3", term=="(Intercept)") |> left_join(ctry, by=c("grp"="iso3")) |> mutate(label = coalesce(country, grp), label = fct_reorder(label, condval), lo = condval - 1.96*condsd, hi = condval + 1.96*condsd, col = if_else(condval > 0, NZ_GREEN, NZ_ORANGE)) ggplot(re_int, aes(x=condval, xmin=lo, xmax=hi, y=label, color=col)) + geom_vline(xintercept=0, linetype="dashed", color="#94a3b8") + geom_errorbarh(height=0.4, linewidth=0.7) + geom_point(size=2.8) + scale_color_identity() + labs(title="Random Country Intercepts — Exports HLM", subtitle="Deviation from predicted level · controls: SHAP + log GDP + type + stage", x="Random intercept deviation", y=NULL, caption="Positive = more trade than model predicts · Negative = less") + theme_minimal(base_family="Archivo", base_size=11) + theme(plot.title=element_text(size=14,face="bold",color=NZ_GREEN), plot.subtitle=element_text(size=11,color="#475569"), panel.grid.major.y=element_blank()) } ``` #### Variance decomposition ```{r vc-decomp} #| fig-width: 9 #| fig-height: 4 #| fig-cap: "Variance explained by each random-effects component. A large country share means country-level factors dominate unexplained trade variance." if (has_lme4 && !is.null(hlm_exp)) { vc_list <- lapply(list(hlm_exp, hlm_imp, hlm_bal), function(fit) { if (is.null(fit)) return(NULL) vc <- as.data.frame(VarCorr(fit)) vc |> mutate(vcov=as.numeric(vcov)) |> filter(!is.na(vcov)) |> select(grp, var1, vcov) }) names(vc_list) <- c("Exports","Imports","Balance") vc_df <- bind_rows(lapply(names(vc_list), function(nm) { v <- vc_list[[nm]] if (is.null(v)) return(NULL) tot <- sum(v$vcov, na.rm=TRUE) v |> mutate(metric=nm, pct=vcov/tot*100, Component=case_when( grp=="tech" & var1=="(Intercept)" ~ "Tech intercept", grp=="tech" & var1=="shap_mean_z" ~ "Tech SHAP slope", grp=="iso3" ~ "Country intercept", grp=="Residual" ~ "Residual", TRUE ~ paste(grp, var1))) })) |> filter(!is.na(metric)) ggplot(vc_df, aes(x=metric, y=pct, fill=Component)) + geom_col(width=0.55) + scale_fill_manual(values=c( "Tech intercept" = "#073309", "Tech SHAP slope" = "#1A8C28", "Country intercept" = "#0369a1", "Residual" = "#cbd5e1" ), name=NULL) + labs(title="Variance Decomposition — Random Effects", subtitle="Share of total variance explained by each component", x=NULL, y="% of total variance") + theme_minimal(base_family="Archivo", base_size=13) + theme(plot.title=element_text(size=14,face="bold",color=NZ_GREEN), legend.position="right") } ``` #### Random effects variance components table ```{r hlm-vc} vc_tidy <- function(fit, metric) { if (is.null(fit)) return(NULL) as.data.frame(VarCorr(fit)) |> mutate(metric=metric) |> select(metric, grp, var1, var2, vcov, sdcor) } vc_all <- bind_rows( vc_tidy(hlm_exp, "exports"), vc_tidy(hlm_imp, "imports"), vc_tidy(hlm_bal, "balance") ) if (has_lme4 && !is.null(vc_all) && nrow(vc_all) > 0) { vc_all |> mutate(vcov=round(vcov,5), sdcor=round(sdcor,4)) |> kable(caption="Random Effects Variance Components — cross-sectional HLM", col.names=c("Metric","Group","Var 1","Var 2","Variance","SD/Corr")) |> kable_styling(bootstrap_options=c("striped","condensed"), font_size=13) } ``` --- ### 3.3 Panel Data Model (2018–2024) {#panel-model} The panel HLM extends the cross-sectional model with a centred year trend and the SHAP×Year interaction, testing whether the SHAP–trade alignment has strengthened over time. Log-GDP is included as a time-varying covariate: $$ \begin{aligned} \log(y_{ictj}) &= \beta_0 + \beta_1 \text{shap}_i + \beta_2 \tilde{t} + \beta_3 (\text{shap}_i \times \tilde{t}) + \beta_4 \log\text{GDP}_{ct} \\[6pt] &\quad + \sum_k \gamma_k \cdot \mathbf{1}[\text{type}_i=k] + \sum_s \delta_s \cdot \mathbf{1}[\text{stage}_i=s] \\[6pt] &\quad + u_{0c} + u_{0j} + u_{1j} \cdot \text{shap}_i + \varepsilon_{ictj} \end{aligned} $$ where $\tilde{t} = (t - \bar{t}) / \text{sd}(t)$ is standardised year; $\beta_3 > 0$ means the SHAP–trade link *tightens over time*. ```{r panel-model} #| cache: true run_panel_hlm <- function(df, y_col, log_y=TRUE) { if (!has_lme4) return(NULL) df_m <- df |> filter(!is.na(.data[[y_col]]), is.finite(.data[[y_col]]), !is.na(log_gdp), !is.na(type), !is.na(stage)) if (log_y) df_m <- df_m |> filter(.data[[y_col]] > 0) |> mutate(y_use=log10(.data[[y_col]])) else df_m <- df_m |> mutate(y_use=.data[[y_col]]) if (nrow(df_m) < 100) return(NULL) tryCatch( lme4::lmer(y_use ~ shap_mean_z * year_c + log_gdp + type + stage + (1 + shap_mean_z | tech) + (1 | iso3), data=df_m, REML=FALSE, control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) { message(e$message); NULL } ) } pm_data_exp <- panel_master |> filter(iso3 %in% top20_isos) |> group_by(tech) |> group_modify(~filter_outliers(.x,"exports_pct")) |> ungroup() pm_data_imp <- panel_master |> filter(iso3 %in% top20_isos) |> group_by(tech) |> group_modify(~filter_outliers(.x,"imports_pct")) |> ungroup() pm_data_bal <- panel_master |> filter(iso3 %in% top20_isos) |> group_by(tech) |> group_modify(~filter_outliers(.x,"balance_pct")) |> ungroup() pn_exp <- run_panel_hlm(pm_data_exp, "exports_pct") pn_imp <- run_panel_hlm(pm_data_imp, "imports_pct") pn_bal <- run_panel_hlm(pm_data_bal, "balance_pct", log_y=FALSE) panel_tidy <- function(fit, metric) { if (is.null(fit)) return(NULL) fe <- as.data.frame(coef(summary(fit))) fe$term <- rownames(fe) fe |> mutate(metric=metric, z_val=Estimate/`Std. Error`, p_wald=2*pnorm(abs(z_val), lower.tail=FALSE), sig=case_when(p_wald<.001~"***",p_wald<.01~"**",p_wald<.05~"*",TRUE~"ns"), Term=case_when( term=="(Intercept)" ~ "Intercept", term=="shap_mean_z" ~ "SHAP importance", term=="year_c" ~ "Year (standardised)", term=="log_gdp" ~ "log(GDP)", term=="shap_mean_z:year_c" ~ "SHAP × Year", grepl("^type",term) ~ paste0("Type: ", gsub("^type","",term)), grepl("^stage",term) ~ paste0("Stage: ", gsub("^stage","",term)), TRUE ~ term )) |> select(Metric=metric, Term, Estimate, SE=`Std. Error`, z=z_val, `p (Wald)`=p_wald, Sig.=sig) |> mutate(across(c(Estimate,SE),~round(.,4)), z=round(z,2), `p (Wald)`=if_else(`p (Wald)`<.001,"<0.001",sprintf("%.3f",`p (Wald)`))) } panel_all <- bind_rows( panel_tidy(pn_exp, "exports"), panel_tidy(pn_imp, "imports"), panel_tidy(pn_bal, "balance") ) if (has_lme4 && !is.null(panel_all) && nrow(panel_all) > 0) { panel_all |> kable(align=c("l","l","r","r","r","r","c"), booktabs=TRUE, caption="Panel HLM Fixed Effects — 2018–2024 (controls: log GDP, type, stage)") |> kable_styling(bootstrap_options=c("striped","hover","condensed"), font_size=13) |> row_spec(which(panel_all$Sig.=="***"), bold=TRUE, background="#d1fae5") |> row_spec(which(panel_all$Sig.=="**"), bold=TRUE, background="#fef9c3") |> row_spec(which(panel_all$Sig.=="*"), bold=TRUE, background="#dbeafe") } ``` **SHAP × Year coefficient:** A positive estimate confirms Finding 7 — the alignment between ML-predicted importance and actual trade intensity has strengthened since 2018. This can reflect (a) supply chains maturing and concentrating around economically efficient producers, (b) improving model fit over time, or (c) both. #### Panel random effects ```{r panel-vc} vc_panel <- bind_rows( vc_tidy(pn_exp, "exports"), vc_tidy(pn_imp, "imports"), vc_tidy(pn_bal, "balance") ) if (has_lme4 && !is.null(vc_panel) && nrow(vc_panel) > 0) { vc_panel |> mutate(vcov=round(vcov,5), sdcor=round(sdcor,4)) |> kable(caption="Panel Random Effects Variance Components", col.names=c("Metric","Group","Var 1","Var 2","Variance","SD/Corr")) |> kable_styling(bootstrap_options=c("striped","condensed"), font_size=13) } ``` --- ## 5. Key Statistical Highlights {#highlights} ```{r highlights} #| results: asis #| echo: false # ── OLS summary stats ───────────────────────────────────────────────────────── n_sig_e <- sum(rdf_exp$p_label %in% c("*","**","***"), na.rm=TRUE) n_sig_i <- sum(rdf_imp$p_label %in% c("*","**","***"), na.rm=TRUE) n_sig_b <- sum(rdf_bal$p_label %in% c("*","**","***"), na.rm=TRUE) best_exp <- rdf_exp |> slice_max(r2, n=1, with_ties=FALSE) best_imp <- rdf_imp |> slice_max(r2, n=1, with_ties=FALSE) best_bal <- rdf_bal |> slice_max(r2, n=1, with_ties=FALSE) med_r2_e <- median(rdf_exp$r2, na.rm=TRUE) med_r2_i <- median(rdf_imp$r2, na.rm=TRUE) cat("### OLS Cross-Sectional Fit (year 2023)\n\n") cat(sprintf( "- **%d / %d** technology-level regressions significant for **exports** (p < 0.05); median R² = **%.3f**. Best: **%s** (R² = %.3f, %s).\n", n_sig_e, nrow(rdf_exp), med_r2_e, best_exp$group, best_exp$r2, best_exp$p_label)) cat(sprintf( "- **%d / %d** significant for **imports**; median R² = **%.3f**. Best: **%s** (R² = %.3f, %s). Imports outperform exports in %d / %d technologies.\n", n_sig_i, nrow(rdf_imp), med_r2_i, best_imp$group, best_imp$r2, best_imp$p_label, sum(rdf_imp$r2 > rdf_exp$r2, na.rm=TRUE), nrow(rdf_exp))) cat(sprintf( "- **%d / %d** significant for **net balance**. Best: **%s** (R² = %.3f, %s).\n\n", n_sig_b, nrow(rdf_bal), best_bal$group, best_bal$r2, best_bal$p_label)) cat(sprintf( "> **What this tells us:** SHAP feature importance is a significant cross-sectional predictor of trade intensity in **%d of %d** export regressions and **%d of %d** import regressions, indicating the ML signal captures real patterns in observed trade specialisation rather than statistical noise. Technologies with high R\u00b2 (e.g. **%s**) show the tightest coupling between ML-predicted production-readiness and observed trade flows. Where import predictability equals or exceeds export predictability, SHAP is detecting countries that *import* precisely because domestic productive capability is absent — the signal encodes both the presence and the absence of competitive advantage.\n\n", n_sig_e, nrow(rdf_exp), n_sig_i, nrow(rdf_imp), best_exp$group[1])) # ── HLM fixed effects ───────────────────────────────────────────────────────── if (has_lme4 && !is.null(hlm_exp)) { fe <- as.data.frame(coef(summary(hlm_exp))) fe$term <- rownames(fe) fe <- fe |> mutate( z = Estimate / `Std. Error`, pval = 2 * pnorm(abs(z), lower.tail=FALSE), sig = case_when(pval<.001~"***",pval<.01~"**",pval<.05~"*",TRUE~"ns") ) get_fe <- function(t) fe[fe$term==t, ] shap_fe <- get_fe("shap_mean_z") gdp_fe <- get_fe("log_gdp") type_sig <- fe |> filter(grepl("^type", term), sig!="ns") |> mutate(lbl=paste0(gsub("^type","",term)," (",sprintf("%+.3f",Estimate),", ",sig,")")) stage_sig <- fe |> filter(grepl("^stage", term), sig!="ns") |> mutate(lbl=paste0(gsub("^stage","",term)," (",sprintf("%+.3f",Estimate),", ",sig,")")) cat("### HLM Fixed Effects — Exports (cross-sectional 2023)\n\n") if (nrow(shap_fe)>0 && !is.na(shap_fe$Estimate[1])) cat(sprintf("- **SHAP importance**: β = **%.4f** (%s). A 1-SD increase in SHAP is associated with a **%.0f%% change** in exports %% GDP.\n", shap_fe$Estimate[1], shap_fe$sig[1], (10^shap_fe$Estimate[1] - 1)*100)) if (nrow(gdp_fe)>0 && !is.na(gdp_fe$Estimate[1])) cat(sprintf("- **log(GDP)**: β = **%.4f** (%s). Each doubling of GDP associates with a %.0f%% %s in export intensity — GDP normalisation does not fully eliminate scale effects.\n", gdp_fe$Estimate[1], gdp_fe$sig[1], abs((10^(gdp_fe$Estimate[1]*log10(2)) - 1)*100), ifelse(gdp_fe$Estimate[1]>0, "increase", "decrease"))) if (nrow(type_sig)>0) cat(sprintf("- **Significant product function effects**: %s.\n", paste(type_sig$lbl, collapse="; "))) if (nrow(stage_sig)>0) cat(sprintf("- **Significant stage effects**: %s.\n\n", paste(stage_sig$lbl, collapse="; "))) else cat("\n") cat(sprintf( "> **What this tells us:** The HLM estimates the SHAP coefficient after absorbing country random intercepts and technology random slopes, so β is not inflated by dominant exporters (e.g. China in solar, Germany in wind turbines). A **%s** SHAP β confirms that ML-predicted feature importance is a globally valid predictor of export intensity independent of who-you-are and which-value-chain-you-are-in. The persistent log(GDP) coefficient means scale effects are not fully removed by per-GDP normalisation — larger economies export disproportionately more on a relative basis. Significant product function and stage dummies reveal that certain value-chain positions carry structural trade premiums or penalties beyond what SHAP alone predicts.\n\n", ifelse(nrow(shap_fe)>0 && !is.na(shap_fe$Estimate[1]) && shap_fe$Estimate[1]>0, "positive", "negative"))) # ── Variance decomposition ───────────────────────────────────────────────── vc <- as.data.frame(VarCorr(hlm_exp)) |> mutate(vcov=as.numeric(vcov)) |> filter(!is.na(vcov)) tot <- sum(vc$vcov) pct_v <- function(grp, v1=NA) { row <- if (is.na(v1)) vc[vc$grp==grp,] else vc[vc$grp==grp & !is.na(vc$var1) & vc$var1==v1,] if (nrow(row)==0) return(NA) round(row$vcov[1]/tot*100, 1) } p_ctry <- pct_v("iso3") p_tech_int <- pct_v("tech","(Intercept)") p_tech_slp <- pct_v("tech","shap_mean_z") p_resid <- pct_v("Residual") cat("### Variance Decomposition (Exports HLM)\n\n") if (!is.na(p_ctry)) cat(sprintf("- **Country intercepts**: **%.1f%%** of total variance — country-level factors (policy, history, institutions) dominate unexplained variation.\n", p_ctry)) if (!is.na(p_tech_int)) cat(sprintf("- **Technology intercepts**: **%.1f%%** — baseline trade levels differ markedly across value chains.\n", p_tech_int)) if (!is.na(p_tech_slp)) cat(sprintf("- **Technology SHAP slopes**: **%.1f%%** — the SHAP–trade alignment itself varies across technologies (justifies random-slope specification).\n", p_tech_slp)) if (!is.na(p_resid)) cat(sprintf("- **Residual**: **%.1f%%** — within-country, within-technology product-level variation unexplained by any covariate.\n\n", p_resid)) cat(sprintf( "> **What this tells us:** The variance partition reveals *what drives differences in trade intensity that SHAP cannot explain*. Country intercepts absorbing the largest share (here **%.1f%%**) means that national-level factors — industrial policy, infrastructure, historical path-dependence, institutional capacity — are the dominant residual driver of trade intensity differences, dwarfing product-level ML scores. Technology intercepts (%.1f%%) reflect that some value chains (solar, wind) are globally traded at large volumes while others (nuclear, geothermal) are structurally thinner markets. The random-slope variance for SHAP (%.1f%%) confirms that the SHAP→trade relationship is heterogeneous across technologies, justifying the flexible specification. The residual (%.1f%%) captures product-level noise that no macro-level covariate can reach.\n\n", ifelse(is.na(p_ctry), 0, p_ctry), ifelse(is.na(p_tech_int), 0, p_tech_int), ifelse(is.na(p_tech_slp), 0, p_tech_slp), ifelse(is.na(p_resid), 0, p_resid))) } # ── Panel: SHAP × Year ──────────────────────────────────────────────────────── if (has_lme4 && !is.null(pn_exp)) { pfe <- as.data.frame(coef(summary(pn_exp))) pfe$term <- rownames(pfe) pfe <- pfe |> mutate(z=Estimate/`Std. Error`, pval=2*pnorm(abs(z),lower.tail=FALSE), sig=case_when(pval<.001~"***",pval<.01~"**",pval<.05~"*",TRUE~"ns")) shap_yr <- pfe[pfe$term=="shap_mean_z:year_c",] shap_p <- pfe[pfe$term=="shap_mean_z",] yr_p <- pfe[pfe$term=="year_c",] cat("### Panel HLM — Temporal Dynamics (2018–2024)\n\n") if (nrow(shap_p)>0) cat(sprintf("- **SHAP importance** (main): β = **%.4f** (%s).\n", shap_p$Estimate, shap_p$sig)) if (nrow(yr_p)>0 && !is.na(yr_p$Estimate[1])) cat(sprintf("- **Year trend**: β = **%.4f** (%s) — trade intensity is %s over 2018–2024 conditional on SHAP.\n", yr_p$Estimate[1], yr_p$sig[1], ifelse(yr_p$Estimate[1]>0, "rising", "falling"))) if (nrow(shap_yr)>0 && !is.na(shap_yr$Estimate[1])) { cat(sprintf("- **SHAP × Year**: β = **%.4f** (%s) — the SHAP–trade link is **%s**: high-SHAP products are becoming %s traded relative to low-SHAP products year-on-year.\n\n", shap_yr$Estimate[1], shap_yr$sig[1], ifelse(shap_yr$Estimate[1]>0, "tightening", "loosening"), ifelse(shap_yr$Estimate[1]>0, "more intensively", "less intensively"))) } else cat("\n") trend_word <- ifelse(nrow(yr_p)>0 && !is.na(yr_p$Estimate[1]) && yr_p$Estimate[1]>0, "expanding", "contracting") align_word <- ifelse(nrow(shap_yr)>0 && !is.na(shap_yr$Estimate[1]) && shap_yr$Estimate[1]>0, "tightening", "loosening") cat(sprintf( "> **What this tells us:** The panel model tests whether the SHAP–trade alignment has *changed* over 2018–2024, going beyond the cross-sectional snapshot. Clean technology trade intensity is overall **%s** during this period (conditional on SHAP), reflecting the global acceleration in deployment and traded volumes. The SHAP × Year interaction is **%s**: the relationship between ML-predicted production capability and observed trade intensity is %s over time. A tightening interaction means the market is becoming progressively more meritocratic — countries with higher production-readiness are capturing growing trade shares as the energy transition accelerates. A loosening interaction would imply geopolitical fragmentation, industrial policy distortions, or early-mover lock-in are beginning to decouple competitiveness signals from actual trade outcomes.\n\n", trend_word, align_word, align_word)) } ``` --- ## 6. Model Comparison (ANOVA / Likelihood Ratio Tests) {#anova} Nested model comparison via likelihood ratio test isolates the contribution of each additional component: SHAP only → + GDP → + type/stage → + random slope. A significant χ² indicates the added terms improve model fit beyond chance. $$ \begin{aligned} M_0 &: y \sim 1 + (1 \mid \text{iso3}) & &\text{(null)} \\[4pt] M_1 &: y \sim \text{shap} + (1 \mid \text{iso3}) & &\text{(+ SHAP)} \\[4pt] M_2 &: y \sim \text{shap} + \log\text{GDP} + (1 \mid \text{iso3}) & &\text{(+ log GDP)} \\[4pt] M_3 &: y \sim \text{shap} + \log\text{GDP} + \text{type} + \text{stage} + (1 \mid \text{iso3}) & &\text{(+ type \& stage FE)} \\[4pt] M_4 &: y \sim \text{shap} + \log\text{GDP} + \text{type} + \text{stage} + (1 + \text{shap} \mid \text{tech}) + (1 \mid \text{iso3}) & &\text{(full HLM)} \end{aligned} $$ ```{r anova} #| fig-width: 9 #| fig-height: 5 if (has_lme4) { df_m_anova <- df_exp |> filter(exports_pct > 0, is.finite(exports_pct), !is.na(log_gdp), !is.na(type), !is.na(stage)) |> mutate(y_use = log10(exports_pct)) m0 <- tryCatch(lme4::lmer(y_use ~ 1 + (1|iso3), data=df_m_anova, REML=FALSE, control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) m1 <- tryCatch(lme4::lmer(y_use ~ shap_mean_z + (1|iso3), data=df_m_anova, REML=FALSE, control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) m2 <- tryCatch(lme4::lmer(y_use ~ shap_mean_z + log_gdp + (1|iso3), data=df_m_anova, REML=FALSE, control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) m3 <- tryCatch(lme4::lmer(y_use ~ shap_mean_z + log_gdp + type + stage + (1|iso3), data=df_m_anova, REML=FALSE, control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) m4 <- tryCatch(lme4::lmer(y_use ~ shap_mean_z + log_gdp + type + stage + (1 + shap_mean_z | tech) + (1 | iso3), data=df_m_anova, REML=FALSE, control=lme4::lmerControl(optimizer="bobyqa")), error=function(e) NULL) models_ok <- !any(sapply(list(m0,m1,m2,m3,m4), is.null)) if (models_ok) { av <- anova(m0, m1, m2, m3, m4) av_df <- as.data.frame(av) av_df$Model <- c("M0: Null","M1: +SHAP","M2: +logGDP","M3: +Type/Stage","M4: +RandSlope") av_df |> select(Model, any_of(c("Df","npar")), AIC, BIC, logLik, Chisq, `p-value`=`Pr(>Chisq)`) |> mutate(AIC=round(AIC,1), BIC=round(BIC,1), logLik=round(logLik,1), Chisq=round(Chisq,2), `p-value`=case_when( is.na(`p-value`) ~ "—", `p-value`<.001 ~ "<0.001", TRUE ~ sprintf("%.3f",`p-value`))) |> kable(caption="Model Comparison — Likelihood Ratio Tests (Exports % GDP, cross-sectional 2023)", align=c("l","r","r","r","r","r","r","r")) |> kable_styling(bootstrap_options=c("striped","hover","condensed"), font_size=13) |> row_spec(which(av_df$`Pr(>Chisq)` < 0.05 & !is.na(av_df$`Pr(>Chisq)`)), bold=TRUE, background="#d1fae5") } else { cat("> One or more models failed to converge — ANOVA table not available.\n") } } else { cat("> `lme4` not available — run `renv::install('lme4')` then re-render.\n") } ``` **Interpreting the ANOVA table:** Each row tests whether adding the new terms significantly improves fit over the previous model. Significant χ² for M1 validates SHAP as a predictor; M2 shows whether GDP adds explanatory power on top of SHAP; M3 tests whether product type/stage structure matters beyond GDP; M4 tests whether allowing the SHAP slope to vary by technology (random slope) is warranted by the data. --- ## 7. Time Trajectories {#trajectories} SHAP feature importance is **fixed per (product × technology)** — it reflects average predictive weight, not a time-varying signal. Trajectories therefore appear as **vertical paths** in the scatter: horizontal position (SHAP) is pinned, vertical position (trade intensity) moves year to year. ```{r traj-solar} #| fig-width: 12 #| fig-height: 7 #| fig-cap: "Solar — exports % GDP trajectories (2018–2024). Each path = one country × product; dot size/opacity increases with year." solar_panel <- panel_master |> filter(tech=="Solar", iso3 %in% top20_isos, exports_pct > 0, is.finite(exports_pct)) |> group_by(tech) |> group_modify(~filter_outliers(.x, "exports_pct")) |> ungroup() |> arrange(iso3, hs_code, year) COUNTRY_PAL <- c("#073309","#CB5B1D","#0369a1","#7c3aed","#be185d", "#b45309","#dc2626","#15803d","#0891b2","#374151", "#065f46","#92400e","#1e40af","#4c1d95","#881337") ctry_show <- solar_panel |> group_by(iso3, country) |> summarise(tot=sum(exports_pct,na.rm=TRUE),.groups="drop") |> slice_max(tot,n=15) |> pull(iso3) sol_plot <- solar_panel |> filter(iso3 %in% ctry_show) ggplot(sol_plot, aes(x=shap_mean_z, y=exports_pct, group=interaction(iso3,hs_code), color=country)) + geom_path(alpha=0.45, linewidth=0.5, linetype="dotted") + geom_point(aes(size=year, alpha=year)) + scale_size_continuous(range=c(1.5,3.5), guide="none") + scale_alpha_continuous(range=c(0.45,1), guide="none") + scale_color_manual(values=setNames(COUNTRY_PAL, unique(sol_plot$country)), name="Country") + scale_y_log10(labels=function(x) paste0(signif(x,3),"%")) + labs(title="Solar — SHAP Importance vs Export Intensity Trajectories (2018–2024)", subtitle="Each dotted path = one country × product · dot size/opacity increases with year (darker = 2024)", x="SHAP Feature Importance (mean |z|)", y="Exports % of GDP (log scale)", caption="Top 15 Solar exporters by cumulative exports % GDP · IQR outlier filter applied") + theme_minimal(base_family="Archivo", base_size=13) + theme(plot.title=element_text(size=16,face="bold",color=NZ_GREEN), plot.subtitle=element_text(size=12,color="#475569"), legend.position="right") ``` --- ## 8. Special Case: Solar — India {#solar-india} ```{r solar-india} #| fig-width: 12 #| fig-height: 8 #| fig-cap: "India — Solar value chain SHAP vs Exports (% GDP), 2023." ind_solar <- master |> filter(iso3=="IND", tech=="Solar", exports_pct > 0, is.finite(exports_pct)) ind_solar_clean <- filter_outliers(ind_solar, "exports_pct") ts_map <- c(setNames( ifelse(names(TS_COLORS) %in% unique(ind_solar_clean$typestage), TS_COLORS[names(TS_COLORS)], TS_COLORS), names(TS_COLORS)), Unclassified="#94a3b8") SHAPE_MAP <- c(Machinery=16, Chemicals=17, Metals=15, Electronics=18, `Industrial Materials`=8, Other=3) ggplot(ind_solar_clean, aes(x=shap_mean_z, y=exports_pct, color=typestage, shape=category)) + geom_hline(yintercept=median(ind_solar_clean$exports_pct,na.rm=TRUE), linetype="dashed", color="#94a3b8", linewidth=0.5) + geom_smooth(method="lm", se=TRUE, formula=y~x, color=NZ_ORANGE, fill="#FBBF24", alpha=0.18, linewidth=1.2, show.legend=FALSE) + geom_point(size=3.5, alpha=0.82, stroke=0.4) + geom_text_repel( data=ind_solar_clean |> slice_max(abs(exports_pct), n=18), aes(label=paste0(hs_code," ",substr(prod_name,1,25))), size=3.0, lineheight=0.85, max.overlaps=16, segment.color="#94a3b8", segment.size=0.3, alpha=1, show.legend=FALSE, family="Archivo" ) + scale_color_manual(values=ts_map, name="Production Stage | Function", drop=TRUE) + scale_shape_manual(values=SHAPE_MAP[intersect(names(SHAPE_MAP),unique(ind_solar_clean$category))], name="SHAP Category", drop=TRUE) + scale_y_log10(labels=function(x) paste0(signif(x,3),"%")) + labs( title="India — Solar Value Chain: SHAP Feature Importance vs Export Intensity", subtitle=paste0("Year: ",YEAR," · Each point = one HS product code · ", "Dashed line = median export intensity · ", "Orange line = OLS fit (log-linear)"), x="SHAP Feature Importance (mean |z|)", y="Exports % of GDP (log scale)", caption="Source: BACI bilateral trade · NZIPL Green Dictionary · ML competitiveness model (Ishana Ratan, NZIPL)" ) + theme_minimal(base_family="Archivo", base_size=13) + theme( plot.title=element_text(size=16,face="bold",color=NZ_GREEN,margin=margin(b=6)), plot.subtitle=element_text(size=12,color="#475569",margin=margin(b=10)), plot.caption=element_text(size=9,color="#94a3b8",hjust=0), legend.position="right", panel.grid.major=element_line(color="#e2e8f0"), panel.grid.minor=element_blank() ) ``` India's Solar scatter illustrates the **Processing → Final Product gradient** clearly: polysilicon and silicon wafer products (upstream, Raw Material · Upstream) cluster at low SHAP scores and low exports, while solar cell and module assembly inputs (Product Component · Downstream) are both higher-SHAP and higher-export. This is consistent with India's current industrial position — strong in downstream assembly, still building upstream processing capacity.