Mapping the Industrial Base for the New Energy Economy

Paper Summary — Nature submission · Ratan, Bagozzi, Goldman, Han, Sahay, Allan

Published

April 9, 2026

Summary: “Mapping the Industrial Base for the New Energy Economy”

Authors: Ishana Ratan · Benjamin E. Bagozzi · Jonas Goldman · Becky Han · Tim Sahay · Bentley B. Allan
Venue: Nature (submission in progress)
Latest draft: Industrial base manuscript 04072026.docx
Code: github.com/ishanaratan/CICE-V1 · Python 3.11.5
Data: BACI (CEPII), World Development Indicators, 2003–2023, 155 countries


Research Question

How can countries identify which clean energy technologies they are best positioned to compete in, and what specific industrial capabilities drive competitiveness in each?


Core Argument

Green industrial policy faces two structural risks:

  1. Misallocation — countries invest in sectors where they lack the underlying capabilities to compete.
  2. Strategic herding — poor information leads many countries to pile into the same few technologies (e.g., 2020–2021 hydrogen/solar manufacturing announcements worldwide), risking crowding and price collapse (analogous to 1960s commodity overinvestment).

The paper argues that a granular, inductive ML model — trained on trade-revealed capabilities rather than pre-specified theory — can map both the current competitive landscape and the upstream industrial base that underpins it, giving countries actionable tools to diversify and target investments strategically.


Contribution

Builds on three predecessor traditions:

Tradition Prior work This paper’s advance
Economic complexity / product space Hidalgo & Hausmann (2007); Mealy & Teytelboym (2022) Broader coverage (10 technologies vs. 3); adds process chain (machinery used in manufacturing, not just supply chain)
Green comparative advantage indices Rosenow & Mealy (2024) More inductive (random forest, no strong priors); includes process chain capital goods
Industrial policy theory Amsden; Liu; Lane; Rodrik Empirical operationalisation of upstream capability building

Key novel element: the process chain — HS codes for the machines and equipment used to manufacture each supply chain component (e.g., the mixers that transform cathode active material into slurry, not just the cathode itself). This captures the capital goods dimension of industrial capability that prior work misses.


Data & Methods

Dependent Variable

Binary indicator: RCA > 1 in the final product HS code(s) for each technology.
RCA = (country share of product exports) / (world share of product exports). Value > 1 means the country exports more than its “fair share.”

Technology HS Code(s) Mean RCA % RCA > 1
Solar 854142, 854143 (PV cells/panels) 0.26 5.87%
Wind 850231 (wind generating sets) 0.56 4.85%
Batteries 850760, 850780 (Li-ion + other accumulators) 0.22 4.92%
Electrolyzers 854330 (electrolysis apparatus) 0.32 7.90%
Heat Pumps 841861, 841581 0.44 14.01%
Permanent Magnets 850511, 850519 0.27 5.10%
Nuclear 840110, 840140 (reactors + parts) 0.30 6.27%
Biofuels 220710, 220720 (ethanol) 2.15 21.38%
Geothermal 841950 (heat exchangers) 0.40 13.09%
Transmission 850431–850434 (transformers) 0.69 20.80%

Independent Variables (three sets)

  1. RCA in supply chain products — each technology’s upstream and midstream HS codes, including the process chain (machinery for manufacturing). Assigned 6-digit HS codes per component; mapped as upstream/midstream/downstream.

  2. Co-export probabilities — conditional probability of co-exporting supply chain products with other HS codes. Products with co-export probability > 0.55 are added as “proximate” predictors (following Hausmann & Hidalgo).

  3. Country characteristics (World Development Indicators) — log(FDI), log(GDP), log(trade), log(population), industry share of GDP, high-tech exports share of GDP, tariff dummies (whether country imposed or was targeted by a tariff on that technology).

Supply Chain Coverage

Technology N Supply Chain Products N Proximate Products
Solar 46 27
Wind 87 39
Batteries 69 20
Biofuels 63 16
Geothermal 74 24
Nuclear 38 6
Electrolyzers 48 23
Heat Pumps 26 4
Permanent Magnets 17 1
Transmission 25 3

Model

  • Algorithm: Random Forest classification (scikit-learn)
  • Sample split: 75% training / 25% test
  • Tuning: 5-fold cross-validation over tree depths {5, 10, 15} × n_trees {50, 100, 150}
  • Universe: 155 countries (population > 1 million), 2003–2023
  • One model per technology

SHAP Feature Importance

Post-estimation SHAP values identify each predictor’s marginal contribution to predictions. Raw SHAP values are standardised to a year-specific z-score and averaged across years. Features with mean |z| > 0.5 are retained as the “industrial base” for each technology.

Five capability clusters identified by aggregating SHAP by HS code structure:

Cluster Description
Electronics Electrical apparatus, electronic devices, semiconductors, solar cells
Machinery Pumps, cutting machines, process equipment in supply and process chains
Industrial Materials Wood, stone, glass, gypsum, ceramics used in industrial production
Metals Extracted minerals through finished metal products
Chemicals Petrochemical byproducts, polymers, catalysts

Key Results

1. Country Rankings

Top performers globally: China, Japan, Korea, Germany across most technologies.
US: strong in geothermal, electrolyzers, nuclear, biofuels — lags peers given its size.
Europe: Italy, Czechia, Denmark, France, Spain all competitive.
EMDEs: India (solar, wind, magnets, nuclear, biofuels), Malaysia (solar, magnets, transmission), Thailand (electrolyzers, transmission), Turkey, Mexico.

Industrial base for clean energy is highly concentrated in Asia and Europe.

2. Technology-Specific Industrial Bases

Each technology has a unique capability profile:

  • Solar — driven by metals (mounting structures) and chemicals (chemically treated glass); requires a well-rounded industrial base
  • Batteries — driven by machinery (precise cutting/rolling of copper foil) and chemicals (cathode slurry preparation)
  • Electrolyzers — chemicals dominant (zirconium dioxide as top predictor)
  • Transmission — machinery and metals
  • The diversity of profiles is the central finding: there is no universal industrial base for clean energy

3. Country-Level Industrial Base Mapping

Countries can be mapped on the five capability dimensions for any technology. Example: - Hungary (ranked 7th in batteries): strong RCA across chemicals, machinery, electronics - Mexico (ranked 14th): some machinery capability, but chemicals gap is the binding constraint

4. Model Performance

Technology Precision Recall F1 AUC
Solar 0.97 0.71 0.82 0.97
Wind 0.91 0.68 0.78 0.96
Battery 0.96 0.51 0.67 0.97
Biofuel 0.92 0.62 0.74 0.96
Geothermal 0.95 0.83 0.89 0.98
Nuclear 0.94 0.63 0.75 0.97
Electrolyzers 0.85 0.52 0.65 0.92
Heat Pumps 0.86 0.65 0.74 0.97
Permanent Magnets 0.98 0.80 0.88 0.98
Transmission 0.91 0.68 0.78 0.96

AUC ranges 0.92–0.98. Best: Geothermal, Permanent Magnets. Weakest: Electrolyzers (class imbalance).

5. Rank Stability (Sensitivity)

Model run over 50 random states. Top-10 rankings are extremely stable. Below rank ~15–25, rankings become noisier — absolute predicted competitiveness scores cluster near zero, so small differences matter more. Implications are drawn from the stable top-10–15 range.


Discussion & Policy Implications

  • Reframe from “picking winners” to “building capabilities”: the five capability clusters suggest upstream investment in machinery, chemicals, electronics is more robustly beneficial than betting on specific final products.
  • Avoid mechanical interpretation: top SHAP predictors (e.g. flat-rolled chrome-coated steel for batteries) should be read as signals of the broader capability cluster needed, not as specific investment targets.
  • Diversification is critical: the model maps the full opportunity space across 10 technologies, enabling countries to identify uncrowded niches aligned with their industrial base.
  • India example: model suggests India’s manufacturing push is more successful than commonly perceived; US-India supply chain investments may be well-placed for long-term diversification.

Limitations

  1. RCA as proxy: whether value-added manufacturing is happening locally vs. re-exports cannot be distinguished. Re-exports may inflate some country rankings.
  2. Binary target: does not capture depth of specialisation — a country just above RCA = 1 is treated the same as a dominant exporter.
  3. Cross-referencing with production/value-added data (at country level) would be advisable before making specific industrial strategy decisions; not possible in a universal cross-country study.

Figures Referenced in Paper

Figure Description
Figure 1 Top 20 countries by predicted competitiveness across 9 technologies (2023)
Figure 2 Industrial base for Solar (left) and Batteries (right) by capability cluster
Figure 3 Radar plot: Battery industrial base, Hungary vs. Mexico
Figure 4 Technology competitiveness in selected countries (2023)
SI.21–30 Feature importance plots for all 10 technologies
SI.31–40 Rank stability plots (top 50 countries, 50 random states)

Relationship to CVCE / This Repository

The data/pc/ files in this repo are derived from this paper’s model:

Paper output CVCE file
Predicted competitiveness scores data/pc/pc_scores.parquet
SHAP feature importance (mean |z|) data/pc/pc_features.csv
Country RCA by category data/pc/pc_rca.parquet
Country metadata data/pc/pc_countries.csv

The CVCE scatterplot analysis (scatterplot_report.html) tests whether the SHAP feature importance from this paper predicts actual trade intensity — serving as an external validation of the model’s feature assignments against observed bilateral trade flows.

Key methodological note (see ml_model_trade_variables.md): the paper’s RF uses raw export volumes + RCA as features (not GDP-normalised). The CVCE regression analysis therefore adds log(GDP) as a covariate when regressing SHAP importance on GDP-normalised trade intensity to control for the residual country-size signal embedded in the SHAP scores.