Mapping the Industrial Base for the New Energy Economy

Paper Summary — Nature submission · Ratan, Bagozzi, Goldman, Han, Sahay, Allan

Published

April 9, 2026

Summary: “Mapping the Industrial Base for the New Energy Economy”

Authors: Ishana Ratan · Benjamin E. Bagozzi · Jonas Goldman · Becky Han · Tim Sahay · Bentley B. Allan
Venue: Nature (submission in progress)
Latest draft: Industrial base manuscript 04072026.docx
Code: github.com/ishanaratan/CICE-V1 · Python 3.11.5
Data: BACI (CEPII), World Development Indicators, 2003–2023, 155 countries

Research Question

How can countries identify which clean energy technologies they are best positioned to compete in, and what specific industrial capabilities drive competitiveness in each?

Core Argument

Green industrial policy faces two structural risks:

Misallocation — countries invest in sectors where they lack the underlying capabilities to compete.
Strategic herding — poor information leads many countries to pile into the same few technologies (e.g., 2020–2021 hydrogen/solar manufacturing announcements worldwide), risking crowding and price collapse (analogous to 1960s commodity overinvestment).

The paper argues that a granular, inductive ML model — trained on trade-revealed capabilities rather than pre-specified theory — can map both the current competitive landscape and the upstream industrial base that underpins it, giving countries actionable tools to diversify and target investments strategically.

Contribution

Builds on three predecessor traditions:

Tradition	Prior work	This paper’s advance
Economic complexity / product space	Hidalgo & Hausmann (2007); Mealy & Teytelboym (2022)	Broader coverage (10 technologies vs. 3); adds process chain (machinery used in manufacturing, not just supply chain)
Green comparative advantage indices	Rosenow & Mealy (2024)	More inductive (random forest, no strong priors); includes process chain capital goods
Industrial policy theory	Amsden; Liu; Lane; Rodrik	Empirical operationalisation of upstream capability building

Key novel element: the process chain — HS codes for the machines and equipment used to manufacture each supply chain component (e.g., the mixers that transform cathode active material into slurry, not just the cathode itself). This captures the capital goods dimension of industrial capability that prior work misses.

Data & Methods

Dependent Variable

Binary indicator: RCA > 1 in the final product HS code(s) for each technology.
RCA = (country share of product exports) / (world share of product exports). Value > 1 means the country exports more than its “fair share.”

Technology	HS Code(s)	Mean RCA	% RCA > 1
Solar	854142, 854143 (PV cells/panels)	0.26	5.87%
Wind	850231 (wind generating sets)	0.56	4.85%
Batteries	850760, 850780 (Li-ion + other accumulators)	0.22	4.92%
Electrolyzers	854330 (electrolysis apparatus)	0.32	7.90%
Heat Pumps	841861, 841581	0.44	14.01%
Permanent Magnets	850511, 850519	0.27	5.10%
Nuclear	840110, 840140 (reactors + parts)	0.30	6.27%
Biofuels	220710, 220720 (ethanol)	2.15	21.38%
Geothermal	841950 (heat exchangers)	0.40	13.09%
Transmission	850431–850434 (transformers)	0.69	20.80%

Independent Variables (three sets)

RCA in supply chain products — each technology’s upstream and midstream HS codes, including the process chain (machinery for manufacturing). Assigned 6-digit HS codes per component; mapped as upstream/midstream/downstream.
Co-export probabilities — conditional probability of co-exporting supply chain products with other HS codes. Products with co-export probability > 0.55 are added as “proximate” predictors (following Hausmann & Hidalgo).
Country characteristics (World Development Indicators) — log(FDI), log(GDP), log(trade), log(population), industry share of GDP, high-tech exports share of GDP, tariff dummies (whether country imposed or was targeted by a tariff on that technology).

Supply Chain Coverage

Technology	N Supply Chain Products	N Proximate Products
Solar	46	27
Wind	87	39
Batteries	69	20
Biofuels	63	16
Geothermal	74	24
Nuclear	38	6
Electrolyzers	48	23
Heat Pumps	26	4
Permanent Magnets	17	1
Transmission	25	3

Model

Algorithm: Random Forest classification (scikit-learn)
Sample split: 75% training / 25% test
Tuning: 5-fold cross-validation over tree depths {5, 10, 15} × n_trees {50, 100, 150}
Universe: 155 countries (population > 1 million), 2003–2023
One model per technology

SHAP Feature Importance

Post-estimation SHAP values identify each predictor’s marginal contribution to predictions. Raw SHAP values are standardised to a year-specific z-score and averaged across years. Features with mean |z| > 0.5 are retained as the “industrial base” for each technology.

Five capability clusters identified by aggregating SHAP by HS code structure:

Cluster	Description
Electronics	Electrical apparatus, electronic devices, semiconductors, solar cells
Machinery	Pumps, cutting machines, process equipment in supply and process chains
Industrial Materials	Wood, stone, glass, gypsum, ceramics used in industrial production
Metals	Extracted minerals through finished metal products
Chemicals	Petrochemical byproducts, polymers, catalysts

Key Results

1. Country Rankings

Top performers globally: China, Japan, Korea, Germany across most technologies.
US: strong in geothermal, electrolyzers, nuclear, biofuels — lags peers given its size.
Europe: Italy, Czechia, Denmark, France, Spain all competitive.
EMDEs: India (solar, wind, magnets, nuclear, biofuels), Malaysia (solar, magnets, transmission), Thailand (electrolyzers, transmission), Turkey, Mexico.

Industrial base for clean energy is highly concentrated in Asia and Europe.

2. Technology-Specific Industrial Bases

Each technology has a unique capability profile:

Solar — driven by metals (mounting structures) and chemicals (chemically treated glass); requires a well-rounded industrial base
Batteries — driven by machinery (precise cutting/rolling of copper foil) and chemicals (cathode slurry preparation)
Electrolyzers — chemicals dominant (zirconium dioxide as top predictor)
Transmission — machinery and metals
The diversity of profiles is the central finding: there is no universal industrial base for clean energy

3. Country-Level Industrial Base Mapping

Countries can be mapped on the five capability dimensions for any technology. Example: - Hungary (ranked 7th in batteries): strong RCA across chemicals, machinery, electronics - Mexico (ranked 14th): some machinery capability, but chemicals gap is the binding constraint

4. Model Performance

Technology	Precision	Recall	F1	AUC
Solar	0.97	0.71	0.82	0.97
Wind	0.91	0.68	0.78	0.96
Battery	0.96	0.51	0.67	0.97
Biofuel	0.92	0.62	0.74	0.96
Geothermal	0.95	0.83	0.89	0.98
Nuclear	0.94	0.63	0.75	0.97
Electrolyzers	0.85	0.52	0.65	0.92
Heat Pumps	0.86	0.65	0.74	0.97
Permanent Magnets	0.98	0.80	0.88	0.98
Transmission	0.91	0.68	0.78	0.96

AUC ranges 0.92–0.98. Best: Geothermal, Permanent Magnets. Weakest: Electrolyzers (class imbalance).

5. Rank Stability (Sensitivity)

Model run over 50 random states. Top-10 rankings are extremely stable. Below rank ~15–25, rankings become noisier — absolute predicted competitiveness scores cluster near zero, so small differences matter more. Implications are drawn from the stable top-10–15 range.

Discussion & Policy Implications

Reframe from “picking winners” to “building capabilities”: the five capability clusters suggest upstream investment in machinery, chemicals, electronics is more robustly beneficial than betting on specific final products.
Avoid mechanical interpretation: top SHAP predictors (e.g. flat-rolled chrome-coated steel for batteries) should be read as signals of the broader capability cluster needed, not as specific investment targets.
Diversification is critical: the model maps the full opportunity space across 10 technologies, enabling countries to identify uncrowded niches aligned with their industrial base.
India example: model suggests India’s manufacturing push is more successful than commonly perceived; US-India supply chain investments may be well-placed for long-term diversification.

Limitations

RCA as proxy: whether value-added manufacturing is happening locally vs. re-exports cannot be distinguished. Re-exports may inflate some country rankings.
Binary target: does not capture depth of specialisation — a country just above RCA = 1 is treated the same as a dominant exporter.
Cross-referencing with production/value-added data (at country level) would be advisable before making specific industrial strategy decisions; not possible in a universal cross-country study.

Figures Referenced in Paper

Figure	Description
Figure 1	Top 20 countries by predicted competitiveness across 9 technologies (2023)
Figure 2	Industrial base for Solar (left) and Batteries (right) by capability cluster
Figure 3	Radar plot: Battery industrial base, Hungary vs. Mexico
Figure 4	Technology competitiveness in selected countries (2023)
SI.21–30	Feature importance plots for all 10 technologies
SI.31–40	Rank stability plots (top 50 countries, 50 random states)

Relationship to CVCE / This Repository

The data/pc/ files in this repo are derived from this paper’s model:

Paper output	CVCE file
Predicted competitiveness scores	`data/pc/pc_scores.parquet`
SHAP feature importance (mean \|z\|)	`data/pc/pc_features.csv`
Country RCA by category	`data/pc/pc_rca.parquet`
Country metadata	`data/pc/pc_countries.csv`

The CVCE scatterplot analysis (scatterplot_report.html) tests whether the SHAP feature importance from this paper predicts actual trade intensity — serving as an external validation of the model’s feature assignments against observed bilateral trade flows.

Key methodological note (see ml_model_trade_variables.md): the paper’s RF uses raw export volumes + RCA as features (not GDP-normalised). The CVCE regression analysis therefore adds log(GDP) as a covariate when regressing SHAP importance on GDP-normalised trade intensity to control for the residual country-size signal embedded in the SHAP scores.