Mapping the Industrial Base for the New Energy Economy
Paper Summary — Nature submission · Ratan, Bagozzi, Goldman, Han, Sahay, Allan
Summary: “Mapping the Industrial Base for the New Energy Economy”
Authors: Ishana Ratan · Benjamin E. Bagozzi · Jonas Goldman · Becky Han · Tim Sahay · Bentley B. Allan
Venue: Nature (submission in progress)
Latest draft: Industrial base manuscript 04072026.docx
Code: github.com/ishanaratan/CICE-V1 · Python 3.11.5
Data: BACI (CEPII), World Development Indicators, 2003–2023, 155 countries
Research Question
How can countries identify which clean energy technologies they are best positioned to compete in, and what specific industrial capabilities drive competitiveness in each?
Core Argument
Green industrial policy faces two structural risks:
- Misallocation — countries invest in sectors where they lack the underlying capabilities to compete.
- Strategic herding — poor information leads many countries to pile into the same few technologies (e.g., 2020–2021 hydrogen/solar manufacturing announcements worldwide), risking crowding and price collapse (analogous to 1960s commodity overinvestment).
The paper argues that a granular, inductive ML model — trained on trade-revealed capabilities rather than pre-specified theory — can map both the current competitive landscape and the upstream industrial base that underpins it, giving countries actionable tools to diversify and target investments strategically.
Contribution
Builds on three predecessor traditions:
| Tradition | Prior work | This paper’s advance |
|---|---|---|
| Economic complexity / product space | Hidalgo & Hausmann (2007); Mealy & Teytelboym (2022) | Broader coverage (10 technologies vs. 3); adds process chain (machinery used in manufacturing, not just supply chain) |
| Green comparative advantage indices | Rosenow & Mealy (2024) | More inductive (random forest, no strong priors); includes process chain capital goods |
| Industrial policy theory | Amsden; Liu; Lane; Rodrik | Empirical operationalisation of upstream capability building |
Key novel element: the process chain — HS codes for the machines and equipment used to manufacture each supply chain component (e.g., the mixers that transform cathode active material into slurry, not just the cathode itself). This captures the capital goods dimension of industrial capability that prior work misses.
Data & Methods
Dependent Variable
Binary indicator: RCA > 1 in the final product HS code(s) for each technology.
RCA = (country share of product exports) / (world share of product exports). Value > 1 means the country exports more than its “fair share.”
| Technology | HS Code(s) | Mean RCA | % RCA > 1 |
|---|---|---|---|
| Solar | 854142, 854143 (PV cells/panels) | 0.26 | 5.87% |
| Wind | 850231 (wind generating sets) | 0.56 | 4.85% |
| Batteries | 850760, 850780 (Li-ion + other accumulators) | 0.22 | 4.92% |
| Electrolyzers | 854330 (electrolysis apparatus) | 0.32 | 7.90% |
| Heat Pumps | 841861, 841581 | 0.44 | 14.01% |
| Permanent Magnets | 850511, 850519 | 0.27 | 5.10% |
| Nuclear | 840110, 840140 (reactors + parts) | 0.30 | 6.27% |
| Biofuels | 220710, 220720 (ethanol) | 2.15 | 21.38% |
| Geothermal | 841950 (heat exchangers) | 0.40 | 13.09% |
| Transmission | 850431–850434 (transformers) | 0.69 | 20.80% |
Independent Variables (three sets)
RCA in supply chain products — each technology’s upstream and midstream HS codes, including the process chain (machinery for manufacturing). Assigned 6-digit HS codes per component; mapped as upstream/midstream/downstream.
Co-export probabilities — conditional probability of co-exporting supply chain products with other HS codes. Products with co-export probability > 0.55 are added as “proximate” predictors (following Hausmann & Hidalgo).
Country characteristics (World Development Indicators) — log(FDI), log(GDP), log(trade), log(population), industry share of GDP, high-tech exports share of GDP, tariff dummies (whether country imposed or was targeted by a tariff on that technology).
Supply Chain Coverage
| Technology | N Supply Chain Products | N Proximate Products |
|---|---|---|
| Solar | 46 | 27 |
| Wind | 87 | 39 |
| Batteries | 69 | 20 |
| Biofuels | 63 | 16 |
| Geothermal | 74 | 24 |
| Nuclear | 38 | 6 |
| Electrolyzers | 48 | 23 |
| Heat Pumps | 26 | 4 |
| Permanent Magnets | 17 | 1 |
| Transmission | 25 | 3 |
Model
- Algorithm: Random Forest classification (scikit-learn)
- Sample split: 75% training / 25% test
- Tuning: 5-fold cross-validation over tree depths {5, 10, 15} × n_trees {50, 100, 150}
- Universe: 155 countries (population > 1 million), 2003–2023
- One model per technology
SHAP Feature Importance
Post-estimation SHAP values identify each predictor’s marginal contribution to predictions. Raw SHAP values are standardised to a year-specific z-score and averaged across years. Features with mean |z| > 0.5 are retained as the “industrial base” for each technology.
Five capability clusters identified by aggregating SHAP by HS code structure:
| Cluster | Description |
|---|---|
| Electronics | Electrical apparatus, electronic devices, semiconductors, solar cells |
| Machinery | Pumps, cutting machines, process equipment in supply and process chains |
| Industrial Materials | Wood, stone, glass, gypsum, ceramics used in industrial production |
| Metals | Extracted minerals through finished metal products |
| Chemicals | Petrochemical byproducts, polymers, catalysts |
Key Results
1. Country Rankings
Top performers globally: China, Japan, Korea, Germany across most technologies.
US: strong in geothermal, electrolyzers, nuclear, biofuels — lags peers given its size.
Europe: Italy, Czechia, Denmark, France, Spain all competitive.
EMDEs: India (solar, wind, magnets, nuclear, biofuels), Malaysia (solar, magnets, transmission), Thailand (electrolyzers, transmission), Turkey, Mexico.
Industrial base for clean energy is highly concentrated in Asia and Europe.
2. Technology-Specific Industrial Bases
Each technology has a unique capability profile:
- Solar — driven by metals (mounting structures) and chemicals (chemically treated glass); requires a well-rounded industrial base
- Batteries — driven by machinery (precise cutting/rolling of copper foil) and chemicals (cathode slurry preparation)
- Electrolyzers — chemicals dominant (zirconium dioxide as top predictor)
- Transmission — machinery and metals
- The diversity of profiles is the central finding: there is no universal industrial base for clean energy
3. Country-Level Industrial Base Mapping
Countries can be mapped on the five capability dimensions for any technology. Example: - Hungary (ranked 7th in batteries): strong RCA across chemicals, machinery, electronics - Mexico (ranked 14th): some machinery capability, but chemicals gap is the binding constraint
4. Model Performance
| Technology | Precision | Recall | F1 | AUC |
|---|---|---|---|---|
| Solar | 0.97 | 0.71 | 0.82 | 0.97 |
| Wind | 0.91 | 0.68 | 0.78 | 0.96 |
| Battery | 0.96 | 0.51 | 0.67 | 0.97 |
| Biofuel | 0.92 | 0.62 | 0.74 | 0.96 |
| Geothermal | 0.95 | 0.83 | 0.89 | 0.98 |
| Nuclear | 0.94 | 0.63 | 0.75 | 0.97 |
| Electrolyzers | 0.85 | 0.52 | 0.65 | 0.92 |
| Heat Pumps | 0.86 | 0.65 | 0.74 | 0.97 |
| Permanent Magnets | 0.98 | 0.80 | 0.88 | 0.98 |
| Transmission | 0.91 | 0.68 | 0.78 | 0.96 |
AUC ranges 0.92–0.98. Best: Geothermal, Permanent Magnets. Weakest: Electrolyzers (class imbalance).
5. Rank Stability (Sensitivity)
Model run over 50 random states. Top-10 rankings are extremely stable. Below rank ~15–25, rankings become noisier — absolute predicted competitiveness scores cluster near zero, so small differences matter more. Implications are drawn from the stable top-10–15 range.
Discussion & Policy Implications
- Reframe from “picking winners” to “building capabilities”: the five capability clusters suggest upstream investment in machinery, chemicals, electronics is more robustly beneficial than betting on specific final products.
- Avoid mechanical interpretation: top SHAP predictors (e.g. flat-rolled chrome-coated steel for batteries) should be read as signals of the broader capability cluster needed, not as specific investment targets.
- Diversification is critical: the model maps the full opportunity space across 10 technologies, enabling countries to identify uncrowded niches aligned with their industrial base.
- India example: model suggests India’s manufacturing push is more successful than commonly perceived; US-India supply chain investments may be well-placed for long-term diversification.
Limitations
- RCA as proxy: whether value-added manufacturing is happening locally vs. re-exports cannot be distinguished. Re-exports may inflate some country rankings.
- Binary target: does not capture depth of specialisation — a country just above RCA = 1 is treated the same as a dominant exporter.
- Cross-referencing with production/value-added data (at country level) would be advisable before making specific industrial strategy decisions; not possible in a universal cross-country study.
Figures Referenced in Paper
| Figure | Description |
|---|---|
| Figure 1 | Top 20 countries by predicted competitiveness across 9 technologies (2023) |
| Figure 2 | Industrial base for Solar (left) and Batteries (right) by capability cluster |
| Figure 3 | Radar plot: Battery industrial base, Hungary vs. Mexico |
| Figure 4 | Technology competitiveness in selected countries (2023) |
| SI.21–30 | Feature importance plots for all 10 technologies |
| SI.31–40 | Rank stability plots (top 50 countries, 50 random states) |
Relationship to CVCE / This Repository
The data/pc/ files in this repo are derived from this paper’s model:
| Paper output | CVCE file |
|---|---|
| Predicted competitiveness scores | data/pc/pc_scores.parquet |
| SHAP feature importance (mean |z|) | data/pc/pc_features.csv |
| Country RCA by category | data/pc/pc_rca.parquet |
| Country metadata | data/pc/pc_countries.csv |
The CVCE scatterplot analysis (scatterplot_report.html) tests whether the SHAP feature importance from this paper predicts actual trade intensity — serving as an external validation of the model’s feature assignments against observed bilateral trade flows.
Key methodological note (see ml_model_trade_variables.md): the paper’s RF uses raw export volumes + RCA as features (not GDP-normalised). The CVCE regression analysis therefore adds log(GDP) as a covariate when regressing SHAP importance on GDP-normalised trade intensity to control for the residual country-size signal embedded in the SHAP scores.