Random Forest · SHAP Feature Importance · Replication Roadmap
Author
NZIPL Clean Value Chain Explorer
Published
April 8, 2026
1 Executive Summary
This document reviews the Predicted Competitiveness (PC) machine learning pipeline developed by NZIPL for the Atlas of the Global Clean Industrial Base. The pipeline:
Uses BACI bilateral trade data to compute Revealed Comparative Advantage (RCA) per country × HS6 product × year
Trains a Random Forest classifier for each of the 10 clean technologies to predict whether a country will achieve RCA > 1 (i.e., become competitively specialised)
Applies SHAP (SHapley Additive exPlanations) to extract which product-level RCA variables most drive each prediction
Exports predicted_comp scores (0–1 probability) and per-feature shap_mean_z importance values — the two datasets now embedded in CVCE
On the SHAP ↔︎ Trade correlation
The scatter plots reveal a correlation between shap_mean_z (SHAP feature importance) and trade volumes. This is expected and correct, not a flaw. The SHAP values measure how much a product’s export RCA drives the prediction of technology competitiveness. Since both inputs and outputs are trade-derived, higher-volume products naturally have higher SHAP scores — the model is correctly learning that existing export strength in related sectors predicts future competitiveness. This encodes the product-space logic of economic complexity.
2 Notebook Inventory
The Python notebooks are located at: OneDrive/Bentley Allan's files - NZIPL/Projects/001 Industrial Base/Modeling -- ML
Code
notebooks <-tribble(~Group, ~Notebook, ~Purpose,"Core ML", "Analysis/RCA Batteries Analysis.ipynb", "RF model + SHAP + rankings (Batteries)","Core ML", "Analysis/RCA Solar Analysis.ipynb", "RF model + SHAP + rankings (Solar)","Core ML", "Analysis/RCA Wind Analysis.ipynb", "RF model + SHAP + rankings (Wind)","Core ML", "Analysis/RCA {Tech} Analysis.ipynb", "Same pattern for remaining 7 techs","Features", "RCA Construction/Battery/Batteries_Vars.ipynb", "Feature matrix: lagged RCA for 69 battery HS codes + 20 proximate + macro","Features", "RCA Construction/{Tech}/{Tech}_Vars.ipynb", "Same pattern for each of 10 techs","Data Prep", "RCA/Code + Cleaned Files/RCA.ipynb", "Base RCA calculation from BACI","Data Prep", "ML Code/Product Proximity Update.ipynb","Product proximity (phi) matrix calculation","Data Prep", "ML Data/Bentley_Backup/Imputation script.ipynb", "KNN imputation (k=3) for missing trade values","Data Prep", "ML Data/Trade Data/GTA code.ipynb", "Global Trade Alert policy intervention processing","Viz", "ML Code/Nature Paper Figures.ipynb", "Publication bubble charts (country × tech × score)","Viz", "ML Code/Old/Country Deep Dives.ipynb", "RCA time series per country","Utility", "FDI Methodology/Firm Names/Fuzzymatch.ipynb", "Fuzzy name matching for FDI firm data")notebooks |>kable(col.names =c("Group", "Notebook", "Purpose"),caption ="NZIPL ML Pipeline — Notebook Inventory") |>kable_styling(bootstrap_options =c("striped", "hover", "condensed"),full_width =TRUE, font_size =13) |>column_spec(1, bold =TRUE, color = nzipl_green) |>row_spec(0, bold =TRUE, background = nzipl_green, color ="white")
feat |>filter(!is.na(hs_code)) |>mutate(tech =factor(tech, levels =names(TECH_COLORS))) |>ggplot(aes(x = shap_mean_z, y =reorder(category, shap_mean_z),color = category)) +geom_jitter(height =0.2, alpha =0.6, size =1.5) +stat_summary(fun = mean, geom ="point", size =3.5, shape =18, color ="black") +facet_wrap(~ tech, ncol =5, scales ="free_x") +scale_color_brewer(palette ="Dark2", guide ="none") +labs(title ="SHAP Importance Distribution by Category and Technology",subtitle ="Each dot = one HS6 code · Diamond = category mean",x ="SHAP mean |z-score|", y =NULL) +theme_nzipl()
4.3 Top-10 SHAP features per technology
Code
feat |>filter(!is.na(hs_code)) |>group_by(tech) |>slice_max(shap_mean_z, n =10) |>ungroup() |>mutate(hs_code =formatC(as.integer(hs_code), width =6, flag ="0"),shap_mean_z =round(shap_mean_z, 4)) |>select(Technology = tech, `HS Code`= hs_code,Description = description, Category = category,`SHAP mean |z|`= shap_mean_z) |>kable(caption ="Top-10 SHAP features per technology (by mean |z-score|)") |>kable_styling(bootstrap_options =c("striped","hover","condensed"),full_width =TRUE, font_size =11) |>row_spec(0, bold =TRUE, background = nzipl_green, color ="white") |>collapse_rows(columns =1, valign ="middle")
Top-10 SHAP features per technology (by mean |z-score|)
Technology
HS Code
Description
Category
SHAP mean |z|
Batteries
391190
Polysulphides, polysulphones and similar products of chemical synthesis n.e.s. in chapter 39; in primary forms
Chemicals
0.6565
847990
Machines and mechanical appliances; parts, of those having individual functions
Machinery
0.6478
370130
Photographic plates and film; in the flat, sensitised, unexposed, with any side exceeding 225mm, of any materials other than paper, paperboard or textiles
Chemicals
0.6424
903190
Instruments, appliances and machines; parts and accessories for those measuring or checking devices of heading no. 9031
Machinery
0.6392
282540
Nickel oxides and hydroxides
Chemicals
0.6192
902790
Microtomes and parts and accessories thereof
Machinery
0.6099
291030
Epoxides, epoxyalcohols, epoxyphenols and epoxyethers; with a three-membered ring and their halogenated, sulphonated, nitrated or nitrosated derivatives, 1-chloro-2,3-epoxypropane (epichlorohydrin)
Chemicals
0.6020
282200
Cobalt oxides and hydroxides; commercial cobalt oxides
Chemicals
0.5977
382490
Chemical products, preparations and residual products of the chemical or allied industries, n.e.s. or included in heading no. 3824
Chemicals
0.5843
903180
Instruments, appliances and machines; for measuring or checking n.e.s. in chapter 90
Machinery
0.5817
Biofuel
940390
Furniture; parts
Machinery
0.8113
381590
Reaction initiators, reaction accelerators and catalytic preparations, unsupported, n.e.s. or included
Chemicals
0.8098
844190
Machinery; parts of machinery for making up paper pulp, paper or paperboard, including cutting machines of all kinds
Machinery
0.7910
841939
Dryers; for products n.e.s. in heading no. 8419, not used for domestic purposes
Machinery
0.7764
121299
Vegetable products (including unroasted chicory roots, chicorium intybus sativum variety); n.e.s. in chapter 12, fresh, chilled, frozen or dried, ground or unground, primarily for human consumption
Other
0.7746
841381
Pumps and liquid elevators; n.e.s. in heading no. 8413
Machinery
0.7726
441520
Wood; pallets, box pallets and other load boards; pallet collars
Industrial Materials
0.7718
842121
Machinery; for filtering or purifying water
Machinery
0.7502
281122
Silicon dioxide
Chemicals
0.7448
841410
Pumps; vacuum
Machinery
0.7413
Electrolyzers
845610
Machine-tools; operated by laser or other light or photon beam process
Machinery
0.7167
841480
Pumps and compressors; for air, vacuum or gas, n.e.s. in heading no. 8414
Machinery
0.7120
390710
Polyacetals; in primary forms
Chemicals
0.6961
381512
Catalysts, supported; reaction initiators, reaction accelerators and catalytic preparations, with precious metal or precious metal compounds as the active substance, n.e.s. or included
Chemicals
0.6612
842121
Machinery; for filtering or purifying water
Machinery
0.6519
841370
Pumps; centrifugal, n.e.s. in heading no. 8413, for liquids
Machinery
0.6458
841490
Pumps and compressors; parts, of air or vacuum pumps, air or other gas compressors and fans, ventilating or recycling hoods incorporating a fan
Machinery
0.6453
381519
Catalysts, supported; reaction initiators, reaction accelerators and catalytic preparations, with an active substance other than nickel or precious metals or their compounds, n.e.s. or included
Chemicals
0.6421
711021
Metals; palladium, unwrought or in powder form
Metals
0.6421
842139
Machinery; for filtering or purifying gases, other than intake air filters for internal combustion engines
Machinery
0.6374
Geothermal
722490
Steel, alloy; semi-finished products
Metals
0.7247
842199
Machinery; parts for filtering or purifying liquids or gases
Machinery
0.7138
845931
Machine-tools; for boring-milling by removing metal, numerically controlled
Machinery
0.7132
845891
Lathes; for removing metal, numerically controlled, other than horizontal lathes
Machinery
0.6970
851580
Welding machines and apparatus; n.e.s. in heading no. 8515, whether or not capable of cutting
Electronics
0.6941
845811
Lathes; for removing metal, horizontal, numerically controlled
Machinery
0.6938
720270
Ferro-alloys; ferro-molybdenum
Metals
0.6911
842139
Machinery; for filtering or purifying gases, other than intake air filters for internal combustion engines
Machinery
0.6876
730721
Steel, stainless; tube or pipe fittings, flanges, of stainless steel
Metals
0.6804
401693
Rubber; vulcanised (other than hard rubber), gaskets, washers and other seals, of non-cellular rubber
Chemicals
0.6800
Heat Pumps
841410
Pumps; vacuum
Machinery
0.8217
841430
Compressors; of a kind used in refrigerating equipment
Machinery
0.7803
841590
Air conditioning machines; with motor driven fan and elements for temperature control, parts thereof
Machinery
0.7517
846241
Machine-tools; punching or notching machines (including presses), including combined punching and shearing machines, numerically controlled, for working metal
Machinery
0.7439
841490
Pumps and compressors; parts, of air or vacuum pumps, air or other gas compressors and fans, ventilating or recycling hoods incorporating a fan
Machinery
0.7278
841899
Refrigerating or freezing equipment; parts thereof, other than furniture
Machinery
0.7152
853400
Circuits; printed
Electronics
0.7095
820720
Tools, interchangeable; (for machine or hand tools, whether or not power-operated), dies for drawing or extruding metal
Metals
0.6900
841582
Air conditioning machines; containing a motor driven fan, other than window or wall types, incorporating a refrigerating unit
Machinery
0.6842
854121
Electrical apparatus; transistors, (other than photosensitive), with a dissipation rate of less than 1W
Electronics
0.6747
Magnets
820720
Tools, interchangeable; (for machine or hand tools, whether or not power-operated), dies for drawing or extruding metal
Metals
0.6669
847982
Machines; for mixing, kneading, crushing, grinding, screening, sifting, homogenising, emulsifying or stirring
Machinery
0.6036
283324
Sulphates; of nickel
Chemicals
0.5930
680421
Millstones, grindstones, grinding wheels and the like; of agglomerated synthetic or natural diamond
Industrial Materials
0.5852
847990
Machines and mechanical appliances; parts, of those having individual functions
Machinery
0.5751
851440
Heating equipment; for the heat treatment of materials by induction or dielectric loss, industrial or laboratory, other than furnaces and ovens
Electronics
0.5583
260300
Copper ores and concentrates
Metals
0.5537
260600
Aluminium ores and concentrates
Metals
0.5289
Nuclear
841940
Distilling or rectifying plant; not used for domestic purposes
Machinery
0.6995
721922
Steel, stainless; flat-rolled, width 600mm or more, hot-rolled, (not in coils), of a thickness of 4.75mm or more but not exceeding 10mm
Metals
0.6501
720270
Ferro-alloys; ferro-molybdenum
Metals
0.6333
841989
Machinery, plant and laboratory equipment; for treating materials by change of temperature, other than for making hot drinks or cooking or heating food
Machinery
0.6170
840420
Boilers; condensers, for steam or other vapour power units
Machinery
0.5963
282520
Lithium oxide and hydroxide
Chemicals
0.5817
840690
Turbines; parts of steam and other vapour turbines
Machinery
0.5785
261390
Molybdenum ores and concentrates; other than roasted
Metals
0.5707
722790
Steel, alloy; bars and rods, hot-rolled, in irregularly wound coils, n.e.s. in heading no. 7227
Metals
0.5611
722011
Steel, stainless; flat-rolled, width less than 600mm, hot-rolled, of a thickness of 4.75mm or more
Metals
0.5430
Solar
830249
Mountings, fittings and similar articles; suitable for other than buildings or furniture, of base metal
Metals
0.6722
321490
Mastics; n.e.s. in heading no. 3214
Chemicals
0.6681
940390
Furniture; parts
Machinery
0.6591
392350
Plastics; stoppers, lids, caps and other closures, for the conveyance or packing of goods
Chemicals
0.6246
851430
Furnaces and ovens; industrial or laboratory, other than those functioning by induction, dielectric loss or resistance heated
Electronics
0.6240
760429
Aluminium; alloys, bars, rods and profiles, other than hollow
Metals
0.6019
392010
Plastics; plates, sheets, film, foil and strip, of polymers of ethylene, non-cellular and not reinforced, laminated, supported or similarly combined with other materials
Chemicals
0.5885
730890
Iron or steel; structures and parts thereof, n.e.s. in heading no. 7308
Metals
0.5780
320890
Paints and varnishes; based on polymers n.e.s. in heading no. 3208, dispersed or dissolved in a non-aqueous medium
Chemicals
0.5717
841989
Machinery, plant and laboratory equipment; for treating materials by change of temperature, other than for making hot drinks or cooking or heating food
Machinery
0.5591
Transmission
270900
Oils; petroleum oils and oils obtained from bituminous minerals, crude
Industrial Materials
0.8174
847990
Machines and mechanical appliances; parts, of those having individual functions
Machinery
0.8147
850490
Electrical transformers, static converters and inductors; parts thereof
Electronics
0.8145
847981
Machines and mechanical appliances; for treating metal, including electric wire coil-winders
Machinery
0.8052
722611
Steel, alloy; flat-rolled, width less than 600mm, of silicon-electrical steel, grain-oriented
Metals
0.7743
847989
Machines and mechanical appliances; n.e.s. in item no. 8479.8, having individual functions
Machinery
0.7709
846241
Machine-tools; punching or notching machines (including presses), including combined punching and shearing machines, numerically controlled, for working metal
Machinery
0.7643
842890
Lifting machinery; handling, loading or unloading machinery n.e.s. in heading no. 8428
Machinery
0.7355
850300
Electric motors and generators; parts suitable for use solely or principally with the machines of heading no. 8501 or 8502
Electronics
0.7307
845811
Lathes; for removing metal, horizontal, numerically controlled
Machinery
0.7247
Wind
730799
Iron or steel; tube or pipe fittings, n.e.s. in item no. 7307.9, other than stainless steel
Metals
0.6965
903180
Instruments, appliances and machines; for measuring or checking n.e.s. in chapter 90
Machinery
0.6823
732690
Iron or steel; articles n.e.s. in heading no. 7326
Metals
0.6669
850431
Electrical transformers; n.e.s. in item no. 8504.2, having a power handling capacity not exceeding 1kVA
Electronics
0.6213
401693
Rubber; vulcanised (other than hard rubber), gaskets, washers and other seals, of non-cellular rubber
Chemicals
0.6193
401699
Rubber; vulcanised (other than hard rubber), articles n.e.s. in heading no. 4016, of non-cellular rubber
Chemicals
0.6173
732399
Iron or steel; table, kitchen and other household articles and parts thereof, of iron or steel n.e.s. in heading no. 7323
Metals
0.6138
871690
Trailers, semi-trailers and other vehicles not mechanically propelled; parts thereof for heading no. 8716
Machinery
0.6111
321490
Mastics; n.e.s. in heading no. 3214
Chemicals
0.6017
854460
Insulated electric conductors; for a voltage exceeding 1000 volts
7 Why SHAP Correlates with Trade — and What It Means
7.1 The causal logic
Product-space theory
The NZIPL model is grounded in economic complexity theory (Hidalgo & Hausmann 2009). Products that are “proximate” — meaning countries tend to export them together — share underlying productive capabilities (skilled labour, specialised equipment, supply chain infrastructure). The Random Forest is learning this capability-sharing structure: countries that already have RCA in upstream/related products are more likely to develop RCA in the target technology.
SHAP measures the contribution of each input RCA variable to the model’s prediction. High-SHAP products are the most informative capability proxies for the target technology.
Average SHAP importance by technology × SHAP category
Technology
Category
N codes
Mean SHAP |z|
SD SHAP
Batteries
Machinery
10
0.5770
0.0442
Chemicals
15
0.5682
0.0471
Metals
3
0.5561
0.0127
Electronics
2
0.5286
0.0181
Industrial Materials
1
0.5044
NA
Biofuel
Machinery
18
0.6721
0.0884
Industrial Materials
14
0.6474
0.0641
Other
10
0.6446
0.0787
Chemicals
17
0.6442
0.0871
Metals
7
0.5721
0.0394
Electrolyzers
Chemicals
15
0.5791
0.0520
Machinery
19
0.5777
0.0690
Metals
7
0.5702
0.0474
Electronics
1
0.5380
NA
Geothermal
Electronics
3
0.6505
0.0467
Machinery
27
0.6092
0.0620
Chemicals
6
0.6035
0.0740
Metals
14
0.5842
0.0708
Industrial Materials
4
0.5627
0.0340
Heat Pumps
Machinery
12
0.6756
0.0994
Electronics
3
0.6703
0.0416
Chemicals
1
0.6373
NA
Metals
8
0.6147
0.0587
Industrial Materials
2
0.5398
0.0253
Magnets
Chemicals
1
0.5930
NA
Machinery
2
0.5894
0.0202
Industrial Materials
1
0.5852
NA
Metals
3
0.5832
0.0736
Electronics
1
0.5583
NA
Nuclear
Machinery
5
0.5999
0.0690
Metals
11
0.5515
0.0490
Chemicals
5
0.5307
0.0305
Industrial Materials
1
0.5194
NA
Solar
Machinery
4
0.5651
0.0665
Chemicals
13
0.5632
0.0426
Metals
8
0.5566
0.0570
Electronics
8
0.5350
0.0408
Industrial Materials
1
0.5085
NA
Transmission
Industrial Materials
1
0.8174
NA
Electronics
3
0.7534
0.0535
Machinery
12
0.7043
0.0807
Chemicals
1
0.6687
NA
Metals
7
0.6450
0.0819
Wind
Chemicals
9
0.5674
0.0467
Metals
15
0.5599
0.0571
Electronics
6
0.5554
0.0520
Machinery
19
0.5552
0.0436
Industrial Materials
1
0.5007
NA
Interpretation guide:
SHAP
z
> 0.75
Critical: direct industrial prerequisite for this technology
0.60 – 0.75
High: closely related capability or upstream material
0.50 – 0.60
Moderate: broader industrial base indicator
< 0.50
Low: weak signal, general economic complexity proxy
The narrow observed range (0.50–0.82) across all products indicates that all features in the final dataset passed a threshold filter — only above-threshold features were exported to CSV. The absolute scale is less important than the relative ranking within a technology.
The replication would add one new build script to the CVCE pipeline:
# scripts/build_data/11_build_ml_pc.R (proposed)## Replicates the Python NZIPL ML pipeline in R:# 1. Compute RCA matrix from bilateral_ds# 2. Build product proximity (phi) using co-export correlation# 3. Construct per-tech feature matrices (lagged RCA + macro)# 4. Train ranger Random Forest (equivalent to sklearn RFC)# 5. Compute SHAP values via fastshap# 6. Export pc_scores.parquet + pc_features.csv (replacing datawheel inputs)library(ranger) # RF: similar to sklearn RFClibrary(fastshap) # SHAP values for any ML modellibrary(VIM) # KNN imputation
Key advantage of replication: - Re-run with each BACI update (currently 2024 data available, model trained on 2023) - Add new features: ORBIS firm counts, patent data, supply chain distance - Extend to new technologies or product-type subsets - Produce SHAP values at the country-year level (not just aggregated)
8.3 The one hard piece: Product Proximity
The phi matrix (product-product proximity, used to identify which non-technology products to include as features) is the only component that requires significant new code. The formula is:
φ(p,p') = min[ P(RCA_p | RCA_p'), P(RCA_p' | RCA_p) ]
= min[ (countries with RCA>1 in both) / (countries with RCA>1 in p),
(countries with RCA>1 in both) / (countries with RCA>1 in p') ]
This can be computed from the existing bilateral data in R, but requires iterating over ~5,000 HS6 pairs, which is computationally intensive (~2–4 hours for full matrix, or ~10 min for the green tech subset).
9 Known Limitations
Code
limits <-tribble(~Issue, ~Description, ~Severity,"Survivorship bias", "Only countries with continuous trade records included; sparse traders dropped", "Medium","Forward SHAP bias", "SHAP computed on full dataset, not temporally held-out test set", "High","Lag depth", "1-year lag may be insufficient for policy impacts (typically 2–3 years)", "Low","Hyperparameter tuning", "Fixed params (n=100, depth=10) across all techs; Solar/Wind only ones with GridSearchCV", "Medium","Class imbalance", "Not handled; ~10-40% of country-years are 'competitive' depending on tech", "Medium","RCA volatility", "Single extreme trade events can spike RCA for one year", "Low","Target variable scope", "RCA > 1 on any category; not distinguishing manufacturing vs raw material competitiveness", "Medium","SHAP range compression", "Only above-threshold features exported; absolute z-scores not comparable across techs", "Low")limits |>kable(col.names =c("Issue", "Description", "Severity"),caption ="Known model limitations") |>kable_styling(bootstrap_options =c("striped","hover","condensed"),full_width =TRUE, font_size =13) |>column_spec(3,color =ifelse(limits$Severity =="High", "#dc2626",ifelse(limits$Severity =="Medium", "#d97706", "#16a34a")),bold =TRUE) |>row_spec(0, bold =TRUE, background = nzipl_green, color ="white")
Known model limitations
Issue
Description
Severity
Survivorship bias
Only countries with continuous trade records included; sparse traders dropped
Medium
Forward SHAP bias
SHAP computed on full dataset, not temporally held-out test set
High
Lag depth
1-year lag may be insufficient for policy impacts (typically 2–3 years)
Low
Hyperparameter tuning
Fixed params (n=100, depth=10) across all techs; Solar/Wind only ones with GridSearchCV
Medium
Class imbalance
Not handled; ~10-40% of country-years are 'competitive' depending on tech
Medium
RCA volatility
Single extreme trade events can spike RCA for one year
Low
Target variable scope
RCA > 1 on any category; not distinguishing manufacturing vs raw material competitiveness
Medium
SHAP range compression
Only above-threshold features exported; absolute z-scores not comparable across techs