HS–NACE/NAICS Concordance: Methodology, Implementation & Database Comparison

Author

NZIPL · Clean Value Chain Explorer

Published

April 8, 2026

Code
library(dplyr)
library(readr)
library(tidyr)
library(knitr)
library(kableExtra)
library(ggplot2)
library(patchwork)
library(showtext)

font_add_google("Archivo", "Archivo")
showtext_auto()

nace_tech_path <- "/Users/parvulesco/Documents/R/NZIPL-CVCE/data/orbis/orbis_nace_tech.csv"
xw_path        <- "/Users/parvulesco/Documents/R/NZIPL-CVCE/data/concordance/nace_naics_crosswalk.csv"
hs_nace_path   <- "/Users/parvulesco/Documents/R/NZIPL-CVCE/data/hs_nace_correspondence_observations.csv"
xw_annotated_path <- "/Users/parvulesco/Documents/R/NZIPL-CVCE/data/concordance/nace_naics_crosswalk_annotated.csv"

1. Overview

The CVCE project links HS 6-digit trade codes (from BACI and the Green Dictionary) to firm-level industry classifications to identify manufacturers of clean technologies across commercial databases. This document describes the concordance pipeline, validates three independent routes, and documents which routes connect to which databases.

1.1 Firm-level databases

Code
tibble::tribble(
  ~Database,           ~Provider,                 ~Firm_Universe,                    ~Geography,           ~Primary_Class, ~Max_Digits, ~Time_Coverage,  ~JHU_Access,               ~Effective_Route,
  "ORBIS",             "Bureau van Dijk / Moody's","Public + Private (~450M firms)",  "Global, 200+ countries","NACE Rev.2",  "4",         "1990s–present", "Yes (library licence)",   "Route A (HS→NACE4) — direct match",
  "S&P Capital IQ Pro","S&P Global",               "Public + Private (~6.5M firms)",  "Global",             "GICS + SIC + NAICS", "8 / 4 / 6", "2000s–present", "Partial — verify library","Routes B/C (HS→NACE→NAICS6); GICS mapping not yet implemented",
  "Compustat N. America","S&P Global / WRDS",       "Listed only (~30K firms)",         "US + Canada",        "NAICS 2017",   "6",         "1950s–present", "Yes (via WRDS)",          "Routes B/C (HS→NACE→NAICS6)",
  "Compustat Global",  "S&P Global / WRDS",        "Listed only (~44K firms)",         "80+ countries",      "SIC / GICS",   "4 / 8",     "1987–present",  "Yes (via WRDS)",          "SIC4 via manual mapping; GICS not yet implemented",
  "Refinitiv Eikon",   "LSEG",                     "Listed only (~70K firms)",         "Global",             "TRBC",         "6",         "1980s–present", "Check library",           "TRBC mapping not yet implemented"
) |>
  kable(
    col.names = c("Database","Provider","Firm Universe","Geography",
                  "Primary Classification","Max Digits","Time Coverage",
                  "JHU Access","Effective Route (this project)"),
    caption = "Firm-level databases for clean-tech research — classification systems and concordance routes"
  ) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = TRUE, font_size = 11) |>
  column_spec(1, bold = TRUE) |>
  column_spec(6, bold = TRUE) |>
  column_spec(9, italic = TRUE) |>
  row_spec(1, background = "#f0fdf4") |>
  row_spec(3:4, background = "#eff6ff") |>
  scroll_box(width = "100%")
Firm-level databases for clean-tech research — classification systems and concordance routes
Database Provider Firm Universe Geography Primary Classification Max Digits Time Coverage JHU Access Effective Route (this project)
ORBIS Bureau van Dijk / Moody's Public + Private (~450M firms) Global, 200+ countries NACE Rev.2 4 1990s–present Yes (library licence) Route A (HS→NACE4) — direct match
S&P Capital IQ Pro S&P Global Public + Private (~6.5M firms) Global GICS + SIC + NAICS 8 / 4 / 6 2000s–present Partial — verify library Routes B/C (HS→NACE→NAICS6); GICS mapping not yet implemented
Compustat N. America S&P Global / WRDS Listed only (~30K firms) US + Canada NAICS 2017 6 1950s–present Yes (via WRDS) Routes B/C (HS→NACE→NAICS6)
Compustat Global S&P Global / WRDS Listed only (~44K firms) 80+ countries SIC / GICS 4 / 8 1987–present Yes (via WRDS) SIC4 via manual mapping; GICS not yet implemented
Refinitiv Eikon LSEG Listed only (~70K firms) Global TRBC 6 1980s–present Check library TRBC mapping not yet implemented
Note

Classification systems: ORBIS uses NACE Rev.2 (4-digit) for all firms globally (naceccod2 field). S&P Capital IQ Pro uses GICS (8-digit, global primary), SIC (4-digit, US historical), and NAICS (6-digit, North American firms) — it does not use NACE as a classification field. This means Route A (HS→NACE) provides a direct path to ORBIS only; reaching S&P and Compustat requires Routes B/C (NACE→NAICS). NACE itself is hard-capped at 4 digits — no standard NACE5/6 exists. The intermediate CPA 2.1 (6–8 digits, e.g., CPA 27.20.23 for lithium-ion batteries) is in our chain but not used as a primary field by any firm database.

1.2 Classification granularity

Code
tibble::tribble(
  ~Database,              ~Primary_Class,   ~Max_Digits, ~Batteries_Code,  ~Solar_Code,       ~Source_Field,           ~Notes,
  "ORBIS",                "NACE Rev.2",    "4", "2720",           "2611",            "naceccod2",      "Route A target — NACE used globally for all firms",
  "S&P Capital IQ",       "GICS",          "8", "GICS 20104010",  "GICS 20106010",   "primaryGICS",    "Route B/C target — GICS primary; also SIC4 + NAICS6 for NA firms",
  "S&P Capital IQ",       "NAICS 2017",    "6", "335911",         "334413",          "naics",          "Route B/C target — stored for North American firms only",
  "Compustat N. America", "NAICS 2017",    "6", "335911",         "334413",          "naics / sich",   "Route B/C target — SIC4 also available",
  "Compustat Global",     "SIC / GICS",    "4 / 8", "SIC 3691",  "SIC 3674",        "sich / gsubind", "No NAICS; no NACE — SIC4 or GICS8 only",
  "Trade data (BACI)",    "HS",            "6", "850760/850780",  "854143/854142",   "hs (6-digit)",   "Starting point for all three routes"
) |>
  kable(
    col.names = c("Database","Primary Classification","Max Digits",
                  "Batteries Code","Solar Code","Field Name","Notes"),
    caption = "Classification system granularity by database — with Batteries and Solar reference codes"
  ) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = TRUE, font_size = 11) |>
  column_spec(3, bold = TRUE) |>
  column_spec(4:5, color = "#073309", bold = TRUE) |>
  scroll_box(width = "100%")
Classification system granularity by database — with Batteries and Solar reference codes
Database Primary Classification Max Digits Batteries Code Solar Code Field Name Notes
ORBIS NACE Rev.2 4 2720 2611 naceccod2 Route A target — NACE used globally for all firms
S&P Capital IQ GICS 8 GICS 20104010 GICS 20106010 primaryGICS Route B/C target — GICS primary; also SIC4 + NAICS6 for NA firms
S&P Capital IQ NAICS 2017 6 335911 334413 naics Route B/C target — stored for North American firms only
Compustat N. America NAICS 2017 6 335911 334413 naics / sich Route B/C target — SIC4 also available
Compustat Global SIC / GICS 4 / 8 SIC 3691 SIC 3674 sich / gsubind No NAICS; no NACE — SIC4 or GICS8 only
Trade data (BACI) HS 6 850760/850780 854143/854142 hs (6-digit) Starting point for all three routes
Code
tibble::tribble(
  ~db,                     ~classification, ~max_digits, ~type,
  "BACI / Green Dict",     "HS",            6L,          "Trade",
  "ORBIS",                 "NACE Rev.2",    4L,          "Firm (global)",
  "Compustat NA",          "NAICS 2017",    6L,          "Firm (US listed)",
  "Compustat Global",      "SIC",           4L,          "Firm (global listed)",
  "S&P Capital IQ",        "GICS",          8L,          "Firm (global)"
) |>
  mutate(db = factor(db, levels = rev(c(
    "BACI / Green Dict","ORBIS","Compustat NA","Compustat Global","S&P Capital IQ"
  )))) |>
  ggplot(aes(y = db, x = max_digits, fill = type)) +
  geom_col(width = 0.6) +
  geom_vline(xintercept = 6, linetype = "dashed", colour = "#b45309", linewidth = 0.8) +
  annotate("text", x = 6.1, y = 0.4, hjust = 0, vjust = 0,
           label = "HS 6-digit\n(trade reference)",
           family = "Archivo", size = 3, colour = "#b45309") +
  geom_text(aes(label = paste0(classification, "\n(", max_digits, "d)")),
            hjust = -0.05, family = "Archivo", size = 3.2) +
  scale_x_continuous(limits = c(0, 10), breaks = 2:8) +
  scale_fill_manual(values = c(
    "Trade"            = "#d1fae5",
    "Firm (global)"    = "#073309",
    "Firm (US listed)" = "#0369a1",
    "Firm (global listed)" = "#7c3aed"
  ), name = NULL) +
  labs(title = "Max Classification Depth by Database",
       x = "Number of digits", y = NULL) +
  theme_minimal(base_family = "Archivo", base_size = 12) +
  theme(plot.title = element_text(face = "bold", colour = "#073309"),
        legend.position = "bottom",
        panel.grid.major.y = element_blank())

Classification depth by database. Each level = one digit of granularity. Dashed line = HS6 reference (trade data). GICS8 is the most granular firm classification available.

1.3 Concordance routes

Route From → To Connects to Source Details
A — HS → NACE HS 6-digit → NACE Rev.2 4-digit ORBIS (naceccod2) Eurostat RAMON: CN→CPA→NACE tables. File: hs_nace_correspondence_observations.csv §2; 353 edges, 84 NACE codes, 95% coverage
B — HS → NAICS HS 6-digit → NAICS 2017 6-digit (direct, no NACE needed) S&P Capital IQ (naics); Compustat NA (naics); US Census LFTTD concordance R package (Liao & Russ 2021, CRAN): hs5_naics + hs4_naics tables §4.2; 14 HS2022 codes unmatched, use HS2017 anchor
C — NACE → NAICS NACE 4-digit → NAICS 2017 6-digit (bridge from ORBIS to Compustat) Same as B — ORBIS-to-Compustat cross-DB link (i) Eurostat RAMON: NACE↔︎ISIC Rev.4; (ii) UN Statistics: ISIC Rev.4↔︎NAICS 2017 §4.3; cross-check for B; manufacturing NACE only
Note

Route B does not need NACE. It maps HS codes directly to NAICS using the concordance package’s internal lookup tables — no Eurostat intermediate. Route C is only needed when you start from NACE codes (i.e., when bridging ORBIS firm records to Compustat).

1.4 HS → NACE mapping chain

The full path from trade product to firm classification is:

HS6  →  CN (EU Combined Nomenclature)  →  CPA 2.1  →  NACE Rev.2 (4-digit)

Built from Eurostat’s official CN–CPA–NACE correspondence tables. ORBIS stores firm activity codes in field naceccod2 (NACE Rev.2); S&P Capital IQ stores the same under primaryNace. NACE Rev.2 has been stable since 2008 (Regulation EC 1893/2006), so the crosswalk does not need to be updated for firm-side changes.

1.5 Coverage after mining patch

The original Eurostat-derived table focused on manufacturing (NACE divisions 10–33) and omitted mining/extraction (NACE 05–09) and agriculture (01–03). A patch adds 33 raw material HS codes mapped to their NACE mining equivalents, raising coverage from 86% to 95% of Green Dictionary codes.

Code
gd     <- read_csv("/Users/parvulesco/Documents/R/NZIPL-CVCE/data/green_dict/green_dictionary.csv",
                   show_col_types = FALSE)
hs_nace <- read_csv(hs_nace_path, show_col_types = FALSE)
gd_n    <- n_distinct(gd$code)
hs_n    <- n_distinct(hs_nace$code)
nace_n  <- n_distinct(hs_nace$nace)
pct     <- round(100 * hs_n / gd_n, 1)

tibble::tribble(
  ~Metric,                        ~Value,
  "Green Dictionary HS codes",    as.character(gd_n),
  "HS codes with NACE mapping",   paste0(hs_n, " (", pct, "%)"),
  "Unique NACE codes",            as.character(nace_n),
  "NACE digit level",             "4-digit (e.g. 2720, 0729, 1040)",
  "NACE division range",          "01 Agriculture → 33 Manufacturing",
  "Primary source",               "Eurostat CN–CPA–NACE (manufacturing)",
  "Mining patch source",          "Eurostat NACE Rev.2 structure (Reg. EC 1893/2006)"
) |>
  kable(col.names = c("Metric", "Value")) |>
  kable_styling(bootstrap_options = c("striped","condensed"), full_width = FALSE)
Metric Value
Green Dictionary HS codes 372
HS codes with NACE mapping 353 (94.9%)
Unique NACE codes 84
NACE digit level 4-digit (e.g. 2720, 0729, 1040)
NACE division range 01 Agriculture → 33 Manufacturing
Primary source Eurostat CN–CPA–NACE (manufacturing)
Mining patch source Eurostat NACE Rev.2 structure (Reg. EC 1893/2006)

The remaining gap is mostly process equipment (HS chapters 84–85: laser cutters, sintering furnaces, UV curing equipment, wind turbine generators) and a handful of specialised chemicals. These span multiple NACE divisions — no single clean NACE anchor exists without further context.


2. Hierarchy for Selected Products

The HS → NACE → NAICS chain for key clean-tech products, with notes on where the 4-digit NACE ceiling causes information loss. Tabs show one technology each; within each tab the supply chain runs from raw material (upstream) to final product.

Note

ORBIS caveat: ORBIS stores firm activity codes (what firms do), not HS product output (what they make). NACE codes are a screening filter for likely producers — not a definitive producer list. Final Product NACE codes (e.g., 2720 for Batteries, 2611 for Solar) are the cleanest anchors.

Code
tibble::tribble(
  ~HS6,     ~Product,                      ~Type,               ~Stage,        ~CN_code,    ~CPA,       ~NACE4, ~NACE_desc,                               ~NAICS6,  ~NAICS_desc,                     ~Info_loss,
  "850760", "Lithium-ion accumulators",    "Final Product",     "Final Prod.", "85076000",  "27.20.23", "2720", "Batteries & accumulators",               "335911", "Storage Battery Mfg",          "Low — NACE 2720 is battery-specific",
  "850780", "Other accumulators",          "Final Product",     "Final Prod.", "85078000",  "27.20.23", "2720", "Batteries & accumulators",               "335911", "Storage Battery Mfg",          "Low — same clean mapping",
  "260500", "Cobalt ore",                  "Raw Material",      "Upstream",    "26050000",  "07.29.1",  "0729", "Other non-ferrous metal ore mining",     "212299", "All Other Metal Ore Mining",    "Moderate — 0729 covers all non-ferrous ores",
  "250410", "Natural graphite (anode)",    "Raw Material",      "Upstream",    "25041000",  "08.91.1",  "0891", "Mining of chemical minerals",            "212325", "Clay & Ceramic Mining",         "Moderate — 0891 covers many industrial minerals",
  "283691", "Lithium carbonate",           "Processed Material","Midstream",   "28369100",  "20.13.3",  "2013", "Basic inorganic chemicals",              "325180", "Other Basic Inorganic Chem",    "High — 2013 covers all inorganic chemicals"
) |>
  select(-CN_code, -CPA) |>
  kable(col.names = c("HS6","Product","Type","Stage","NACE4","NACE Description","NAICS6","NAICS Description","4-digit info loss"),
        caption = "Batteries: HS → NACE → NAICS chain with information loss assessment") |>
  kable_styling(bootstrap_options = c("striped","condensed","hover"), full_width=TRUE, font_size=11) |>
  column_spec(5, bold=TRUE, color="#073309") |>
  column_spec(9, italic=TRUE, color="#6b7280") |>
  scroll_box(width="100%")
Batteries: HS → NACE → NAICS chain with information loss assessment
HS6 Product Type Stage NACE4 NACE Description NAICS6 NAICS Description 4-digit info loss
850760 Lithium-ion accumulators Final Product Final Prod. 2720 Batteries & accumulators 335911 Storage Battery Mfg Low — NACE 2720 is battery-specific
850780 Other accumulators Final Product Final Prod. 2720 Batteries & accumulators 335911 Storage Battery Mfg Low — same clean mapping
260500 Cobalt ore Raw Material Upstream 0729 Other non-ferrous metal ore mining 212299 All Other Metal Ore Mining Moderate — 0729 covers all non-ferrous ores
250410 Natural graphite (anode) Raw Material Upstream 0891 Mining of chemical minerals 212325 Clay & Ceramic Mining Moderate — 0891 covers many industrial minerals
283691 Lithium carbonate Processed Material Midstream 2013 Basic inorganic chemicals 325180 Other Basic Inorganic Chem High — 2013 covers all inorganic chemicals
Code
tibble::tribble(
  ~HS6,     ~Product,                      ~Type,               ~Stage,        ~NACE4, ~NACE_desc,                               ~NAICS6,  ~NAICS_desc,                       ~Info_loss,
  "854143", "Solar modules (assembled)",   "Final Product",     "Final Prod.", "2611", "Electronic components & boards",        "334413", "Semiconductor Device Mfg",        "HIGH — 2611 = all semiconductors; solar ~2% of sector",
  "854142", "Solar cells (unassembled)",   "Final Product",     "Final Prod.", "2611", "Electronic components & boards",        "334413", "Semiconductor Device Mfg",        "HIGH — same NACE as all chips/LEDs/displays",
  "854129", "Inverters",                   "Product Component", "Downstream",  "2611", "Electronic components & boards",        "334413", "Semiconductor Device Mfg",        "HIGH — power electronics lumped with semiconductors",
  "850171", "PV DC generators <50MW",      "Product Component", "Downstream",  "2711", "Electric motors, generators, transform","335312", "Motor & Generator Mfg",           "High — 2711 covers all motors/generators/transformers",
  "381800", "Doped silicon wafers",        "Processed Material","Midstream",   "2059", "Other chemical products NEC",           "325180", "Other Basic Inorganic Chem",       "High — 2059 is residual chemicals category",
  "260300", "Copper ore (PV frames/wire)", "Raw Material",      "Upstream",    "0729", "Other non-ferrous metal ore mining",    "212299", "All Other Metal Ore Mining",       "Moderate — 0729 covers all non-ferrous ores"
) |>
  kable(col.names = c("HS6","Product","Type","Stage","NACE4","NACE Description","NAICS6","NAICS Description","4-digit info loss"),
        caption = "Solar: HS → NACE → NAICS chain. Note high information loss at NACE 2611 — this code covers all electronic components globally.") |>
  kable_styling(bootstrap_options = c("striped","condensed","hover"), full_width=TRUE, font_size=11) |>
  column_spec(5, bold=TRUE, color="#073309") |>
  column_spec(9, italic=TRUE, color="#b45309") |>
  scroll_box(width="100%")
Solar: HS → NACE → NAICS chain. Note high information loss at NACE 2611 — this code covers all electronic components globally.
HS6 Product Type Stage NACE4 NACE Description NAICS6 NAICS Description 4-digit info loss
854143 Solar modules (assembled) Final Product Final Prod. 2611 Electronic components & boards 334413 Semiconductor Device Mfg HIGH — 2611 = all semiconductors; solar ~2% of sector
854142 Solar cells (unassembled) Final Product Final Prod. 2611 Electronic components & boards 334413 Semiconductor Device Mfg HIGH — same NACE as all chips/LEDs/displays
854129 Inverters Product Component Downstream 2611 Electronic components & boards 334413 Semiconductor Device Mfg HIGH — power electronics lumped with semiconductors
850171 PV DC generators <50MW Product Component Downstream 2711 Electric motors, generators, transform 335312 Motor & Generator Mfg High — 2711 covers all motors/generators/transformers
381800 Doped silicon wafers Processed Material Midstream 2059 Other chemical products NEC 325180 Other Basic Inorganic Chem High — 2059 is residual chemicals category
260300 Copper ore (PV frames/wire) Raw Material Upstream 0729 Other non-ferrous metal ore mining 212299 All Other Metal Ore Mining Moderate — 0729 covers all non-ferrous ores
Code
tibble::tribble(
  ~HS6,     ~Product,                      ~Type,               ~Stage,        ~NACE4, ~NACE_desc,                               ~NAICS6,  ~NAICS_desc,                       ~Info_loss,
  "850231", "Wind turbine generators >750kW","Final Product",   "Final Prod.", "2811", "Engines & turbines (excl. aero/auto)",  "333611", "Turbine & Turbine Generator Mfg", "Low — NACE 2811 is fairly turbine-specific",
  "850239", "Wind turbine generators other", "Final Product",   "Final Prod.", "2711", "Electric motors, generators, transform","335312", "Motor & Generator Mfg",           "Moderate — 2711 covers all motors/generators",
  "732690", "Steel tower flanges",           "Product Component","Downstream", "2599", "Fabricated metal products NEC",         "332999", "All Other Fabricated Metal Mfg",  "High — 2599 is residual fabricated metals",
  "730820", "Towers and lattice masts",      "Product Component","Downstream", "2811", "Engines & turbines",                    "333611", "Turbine & Turbine Generator Mfg", "Moderate — mixed with turbines",
  "260300", "Copper ore (wiring)",           "Raw Material",     "Upstream",   "0729", "Other non-ferrous metal ore mining",    "212299", "All Other Metal Ore Mining",       "Moderate"
) |>
  kable(col.names = c("HS6","Product","Type","Stage","NACE4","NACE Description","NAICS6","NAICS Description","4-digit info loss"),
        caption = "Wind: HS → NACE → NAICS chain") |>
  kable_styling(bootstrap_options = c("striped","condensed","hover"), full_width=TRUE, font_size=11) |>
  column_spec(5, bold=TRUE, color="#073309") |>
  column_spec(9, italic=TRUE, color="#6b7280") |>
  scroll_box(width="100%")
Wind: HS → NACE → NAICS chain
HS6 Product Type Stage NACE4 NACE Description NAICS6 NAICS Description 4-digit info loss
850231 Wind turbine generators >750kW Final Product Final Prod. 2811 Engines & turbines (excl. aero/auto) 333611 Turbine & Turbine Generator Mfg Low — NACE 2811 is fairly turbine-specific
850239 Wind turbine generators other Final Product Final Prod. 2711 Electric motors, generators, transform 335312 Motor & Generator Mfg Moderate — 2711 covers all motors/generators
732690 Steel tower flanges Product Component Downstream 2599 Fabricated metal products NEC 332999 All Other Fabricated Metal Mfg High — 2599 is residual fabricated metals
730820 Towers and lattice masts Product Component Downstream 2811 Engines & turbines 333611 Turbine & Turbine Generator Mfg Moderate — mixed with turbines
260300 Copper ore (wiring) Raw Material Upstream 0729 Other non-ferrous metal ore mining 212299 All Other Metal Ore Mining Moderate
Code
tibble::tribble(
  ~HS6,     ~Product,                      ~Type,               ~Stage,        ~NACE4, ~NACE_desc,                               ~NAICS6,  ~NAICS_desc,                       ~Info_loss,
  "841861", "Heat pumps",                  "Final Product",     "Final Prod.", "2825", "Air-conditioning & refrigeration equip","333999", "All Other General Purpose Mfg",   "Moderate — 2825 is quite HVAC-specific",
  "841830", "Refrigerating units",         "Product Component", "Downstream",  "2825", "Air-conditioning & refrigeration equip","333999", "All Other General Purpose Mfg",   "Moderate — same NACE as final product",
  "260400", "Nickel ore (compressor)",     "Raw Material",      "Upstream",    "0729", "Other non-ferrous metal ore mining",    "212299", "All Other Metal Ore Mining",       "Moderate"
) |>
  kable(col.names = c("HS6","Product","Type","Stage","NACE4","NACE Description","NAICS6","NAICS Description","4-digit info loss"),
        caption = "Heat Pumps: HS → NACE → NAICS chain") |>
  kable_styling(bootstrap_options = c("striped","condensed","hover"), full_width=TRUE, font_size=11) |>
  column_spec(5, bold=TRUE, color="#073309") |>
  scroll_box(width="100%")
Heat Pumps: HS → NACE → NAICS chain
HS6 Product Type Stage NACE4 NACE Description NAICS6 NAICS Description 4-digit info loss
841861 Heat pumps Final Product Final Prod. 2825 Air-conditioning & refrigeration equip 333999 All Other General Purpose Mfg Moderate — 2825 is quite HVAC-specific
841830 Refrigerating units Product Component Downstream 2825 Air-conditioning & refrigeration equip 333999 All Other General Purpose Mfg Moderate — same NACE as final product
260400 Nickel ore (compressor) Raw Material Upstream 0729 Other non-ferrous metal ore mining 212299 All Other Metal Ore Mining Moderate
Code
tibble::tribble(
  ~HS6,     ~Product,                      ~Tech,           ~Type,               ~Stage,        ~NACE4, ~NACE_desc,                               ~Info_loss,
  "854389", "Electrolyzers",               "Electrolyzers", "Final Product",     "Final Prod.", NA,     "(no NACE — process equipment gap)",      "Gap: HS chapter 85 equipment spans NACE 2711/2790/2849",
  "850223", "AC generators (alternators)", "Electrolyzers", "Product Component", "Downstream",  "2711", "Electric motors, generators, transform",  "Moderate",
  "850461", "High-voltage transformers",   "Transmission",  "Final Product",     "Final Prod.", NA,     "(no NACE — process equipment gap)",      "Gap: transformers >350kVA not in concordance",
  "850431", "Transformers 1–16 kVA",       "Transmission",  "Final Product",     "Final Prod.", "2711", "Electric motors, generators, transform",  "Moderate — 2711 covers all generator types",
  "840120", "Nuclear reactors",            "Nuclear",       "Final Product",     "Final Prod.", "2899", "Machinery NEC",                           "High — 2899 is residual machinery"
) |>
  kable(col.names = c("HS6","Product","Technology","Type","Stage","NACE4","NACE Description","4-digit info loss"),
        caption = "Electrolyzers, Transmission, Nuclear: HS → NACE. Note NA = gap in concordance.") |>
  kable_styling(bootstrap_options = c("striped","condensed","hover"), full_width=TRUE, font_size=11) |>
  column_spec(6, bold=TRUE, color="#073309") |>
  column_spec(8, italic=TRUE, color="#b45309") |>
  scroll_box(width="100%")
Electrolyzers, Transmission, Nuclear: HS → NACE. Note NA = gap in concordance.
HS6 Product Technology Type Stage NACE4 NACE Description 4-digit info loss
854389 Electrolyzers Electrolyzers Final Product Final Prod. NA (no NACE — process equipment gap) Gap: HS chapter 85 equipment spans NACE 2711/2790/2849
850223 AC generators (alternators) Electrolyzers Product Component Downstream 2711 Electric motors, generators, transform Moderate
850461 High-voltage transformers Transmission Final Product Final Prod. NA (no NACE — process equipment gap) Gap: transformers >350kVA not in concordance
850431 Transformers 1–16 kVA Transmission Final Product Final Prod. 2711 Electric motors, generators, transform Moderate — 2711 covers all generator types
840120 Nuclear reactors Nuclear Final Product Final Prod. 2899 Machinery NEC High — 2899 is residual machinery

3. NACE Coverage: Technology × Value-Chain Stage

Code
nt <- read_csv(nace_tech_path, show_col_types = FALSE)
annotated_local <- read_csv(xw_annotated_path, show_col_types = FALSE)

type_pal <- c(
  "Final Product"      = "#073309",
  "Product Component"  = "#15803d",
  "Processed Material" = "#0369a1",
  "Process Equipment"  = "#b45309",
  "Raw Material"       = "#d97706"
)
ambig_pal <- c(
  "Clean (1-to-1)"       = "#15803d",
  "Moderate (2-5 NAICS)" = "#b45309",
  "High (6+ NAICS)"      = "#b91c1c"
)

tech_order <- nt |>
  distinct(tech, nace_code) |>
  count(tech) |>
  arrange(-n) |>
  pull(tech)

p_type <- nt |>
  distinct(tech, nace_code, type) |>
  mutate(type = factor(type, levels = names(type_pal)),
         tech = factor(tech, levels = tech_order)) |>
  count(tech, type) |>
  ggplot(aes(x = tech, y = n, fill = type)) +
  geom_col(width = 0.72) +
  scale_fill_manual(values = type_pal, name = "Product type") +
  labs(title = "NACE Codes per Technology — by Product Type",
       subtitle = "67 unique NACE 4-digit codes (orbis_nace_tech.csv)",
       x = NULL, y = "NACE 4-digit codes") +
  theme_minimal(base_family = "Archivo", base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#073309"),
        legend.position = "bottom",
        axis.text.x = element_text(angle = 30, hjust = 1),
        panel.grid.major.x = element_blank())

viz_df <- annotated_local |>
  separate_rows(tech, sep = "; ") |>
  filter(!is.na(tech)) |>
  mutate(
    ambiguity_tier = factor(ambiguity_tier,
      levels = c("Clean (1-to-1)", "Moderate (2-5 NAICS)", "High (6+ NAICS)")),
    tech = factor(tech, levels = tech_order)
  )

p_ambig <- ggplot(viz_df, aes(x = tech, fill = ambiguity_tier)) +
  geom_bar(width = 0.72) +
  scale_fill_manual(values = ambig_pal, name = "NACE→NAICS ambiguity") +
  labs(title = "NACE→NAICS6 Mapping Ambiguity by Technology",
       subtitle = "Each segment = one NACE code; Clean = unique 1-to-1 NAICS mapping",
       x = NULL, y = "NACE codes") +
  theme_minimal(base_family = "Archivo", base_size = 11) +
  theme(plot.title = element_text(face = "bold", colour = "#073309"),
        legend.position = "bottom",
        axis.text.x = element_text(angle = 30, hjust = 1),
        panel.grid.major.x = element_blank())

print(p_type / p_ambig)

Top: NACE 4-digit codes per clean technology, filled by product type. Bottom: NACE→NAICS6 mapping ambiguity by technology (Clean = 1-to-1 mapping; High = 6+ NAICS codes sharing one NACE). Both charts share the same x-axis ordering.
Show summary table code
nt |>
  group_by(tech) |>
  summarise(
    n_nace   = n_distinct(nace_code),
    nace_fp  = paste(sort(unique(nace_code[type == "Final Product"])), collapse = ", "),
    naics_fp = paste(sort(unique(naics6_dominant[type == "Final Product"])), collapse = ", "),
    .groups  = "drop"
  ) |>
  kable(col.names = c("Technology", "NACE codes total", "Final Product NACE4(s)", "Final Product NAICS6"),
        caption   = "Summary: NACE and NAICS coverage by technology") |>
  kable_styling(bootstrap_options = c("striped","condensed"), full_width = FALSE) |>
  column_spec(3, bold = TRUE, color = "#073309") |>
  column_spec(4, color = "#0369a1")
Summary: NACE and NAICS coverage by technology
Technology NACE codes total Final Product NACE4(s) Final Product NAICS6
Batteries 20 2720 335911
Biofuel 18 2014, 2059 325180, 325199
Electrolyzers 16 2849 333243
Geothermal 28 2420, 2711, 2811, 2825, 2892 331110, 333120, 333611, 333999, 335312
Heat Pumps 8 2825 333999
Magnets 12 2599 332999
Nuclear 13 2530 332410
Solar 21 2611 334413
Transmission 12 2711 335312
Wind 32 2811 333611

4. Concordance Routes: Implementation

Route A is covered in full in §1.3 and §1 (HS→NACE chain). This section documents Routes B and C. NAICS 2017 goes to 6 digits vs NACE’s 4. Reference hierarchy for Batteries:

Show NAICS hierarchy table
tibble::tribble(
  ~Level,    ~Code,     ~Description,
  "2-digit", "33",      "Manufacturing",
  "3-digit", "335",     "Electrical Equipment, Appliances, and Components",
  "4-digit", "3359",    "Other Electrical Equipment and Component Manufacturing",
  "5-digit", "33591",   "Battery Manufacturing",
  "6-digit", "335911",  "Storage Battery Manufacturing (lithium-ion & primary cells)",
  "6-digit", "335912",  "Primary Battery Manufacturing (dry & wet)",
  "—",       "—",       "— vs NACE: 4-digit 2720 = Manufacture of batteries and accumulators —"
) |>
  kable(caption = "NAICS 2017 depth vs NACE 4-digit (Batteries reference case)") |>
  kable_styling(bootstrap_options = c("striped","condensed"), full_width = FALSE) |>
  column_spec(1, bold = TRUE) |>
  row_spec(5:6, background = "#f0fdf4") |>
  row_spec(7, background = "#fef9c3", italic = TRUE)
NAICS 2017 depth vs NACE 4-digit (Batteries reference case)
Level Code Description
2-digit 33 Manufacturing
3-digit 335 Electrical Equipment, Appliances, and Components
4-digit 3359 Other Electrical Equipment and Component Manufacturing
5-digit 33591 Battery Manufacturing
6-digit 335911 Storage Battery Manufacturing (lithium-ion & primary cells)
6-digit 335912 Primary Battery Manufacturing (dry & wet)
— vs NACE: 4-digit 2720 = Manufacture of batteries and accumulators —

4.1 Route A — HS → NACE (Eurostat)

Covered in full in §1 (HS→NACE chain). Source: Eurostat RAMON CN→CPA→NACE tables. Output file: data/hs_nace_correspondence_observations.csv (353 edges, 84 NACE codes). Target database: ORBIS (naceccod2 field).

4.2 Route B — HS → NAICS directly (concordance package)

Chain: HS 6-digit → NAICS 2017 6-digit (no NACE intermediary)

The concordance package (Liao & Russ 2021, CRAN v2.0.0) ships HS↔︎NAICS lookup tables internally. hs5_naics covers HS2017; hs4_naics covers HS2012 as fallback. Target databases: S&P Capital IQ and Compustat NA (naics field), US Census LFTTD.

# Install once (not in renv — build-step only)
install.packages("concordance")
library(concordance)

# The package exposes HS-NAICS tables directly as data frames:
# hs5_naics  — HS2017 (6-digit) → NAICS 2017 (6-digit)
# hs4_naics  — HS2012 (6-digit) → NAICS 2017 (6-digit)
# head(hs5_naics)
#   HS5_6d HS5_4d HS5_2d NAICS_6d NAICS_4d NAICS_2d

# Invert our existing HS→NACE file to get NACE→HS
hs_nace <- read_csv("data/hs_nace_correspondence_observations.csv") |>
  transmute(
    hs_code   = formatC(as.integer(code), width = 6, flag = "0"),
    nace_code = as.character(nace)
  ) |> distinct()

# Join: HS → NAICS (hs5 primary, hs4 fallback for HS2022 codes)
hs_mapped <- hs_nace |>
  left_join(hs5_naics |> transmute(hs_code=HS5_6d, naics6=NAICS_6d, rev="HS5"),
            by="hs_code") |>
  left_join(hs4_naics |> transmute(hs_code=HS4_6d, naics6_fb=NAICS_6d, rev_fb="HS4"),
            by="hs_code") |>
  mutate(naics6 = coalesce(naics6, naics6_fb))

# Aggregate to NACE: dominant NAICS = mode by HS-code count
nace_naics <- hs_mapped |>
  filter(!is.na(naics6)) |>
  group_by(nace_code, naics6) |> summarise(n=n(), .groups="drop") |>
  group_by(nace_code) |>
  summarise(
    naics6_dominant = first(naics6[order(-n)]),
    naics6_all      = paste(unique(naics6), collapse=";"),
    naics_n_mapped  = n_distinct(naics6),
    .groups = "drop"
  )

Coverage: 309 of 353 HS codes matched in hs5_naics (HS2017). Some HS2022 codes are absent (introduced after HS2017 — e.g. 854142 Solar Cells, 854143 Solar Modules). All 84 NACE codes received a valid NAICS mapping via their other HS anchors.


4.3 Route C — NACE → NAICS via ISIC (ORBIS-to-Compustat bridge)

Chain: NACE Rev.2 (4-digit) → ISIC Rev.4 (4-digit) → NAICS 2017 (6-digit)

This route goes through the two official international correspondence tables without relying on HS codes as intermediary.

Step 1: NACE Rev.2 → ISIC Rev.4 (Eurostat correspondence)

Eurostat publishes the official NACE Rev.2 ↔︎ ISIC Rev.4 correspondence via the RAMON classification server.

# Option 1: download from Eurostat RAMON (reproducible, requires internet)
nace_isic_url <- paste0(
  "https://ec.europa.eu/eurostat/ramon/nomenclatures/",
  "index.cfm?TargetUrl=LST_NOM_DTL_GLOSSARY&StrNom=NACE_REV2",
  "&StrLanguageCode=EN&IntPcKey=&IntKey=&bExport=true"
)
# In practice: download the Excel from RAMON manually, or use the
# correspondence embedded below (Section C manufacturing, our 67 codes).

# Option 2: Direct equivalence for Section C (Manufacturing)
# NACE Rev.2 was built from ISIC Rev.4. For Section C (Manufacturing),
# 4-digit NACE codes are IDENTICAL to ISIC Rev.4 at 4-digit level,
# with the following EU-specific exceptions:
#   NACE 2441–2446 → all map to ISIC 2420 (Manufacture of basic precious metals)
#   NACE 2611–2612 → map to ISIC 2610 (Manufacture of electronic components)
#   NACE 2651–2652 → map to ISIC 2651 (Manufacture of measuring instruments)
# All other Section C 4-digit NACE codes: isic4 = nace4

# Build NACE → ISIC for our 67 clean-tech codes:
nace_isic_exceptions <- tibble::tribble(
  ~nace_code, ~isic4,
  "2441",     "2420",   # Precious metals production → ISIC basic precious metals
  "2442",     "2420",
  "2443",     "2420",
  "2444",     "2420",
  "2445",     "2420",
  "2446",     "2420",
  "2611",     "2610",   # Electronic components (NACE splits ISIC 2610)
  "2612",     "2610",
  "2651",     "2651",   # Instruments (same at 4-digit — no exception needed)
  "2652",     "2651"    # Clock/watch mfg → ISIC 2652 actually
)

# For all Section C codes not in exceptions: NACE = ISIC
our_nace <- unique(read_csv("data/orbis/orbis_nace_tech.csv")$nace_code)

nace_to_isic <- tibble::tibble(nace_code = as.character(our_nace)) |>
  left_join(nace_isic_exceptions, by = "nace_code") |>
  mutate(isic4 = coalesce(isic4, nace_code))  # default: nace = isic

Step 2: ISIC Rev.4 → NAICS 2017 (UN Statistics Division)

The UN Statistics Division publishes the official ISIC Rev.4 ↔︎ NAICS 2017 correspondence as a downloadable Excel (UN Correspondence Tables).

# Option 1: Download UN correspondence table (reproducible)
# File: "ISIC_Rev_4_correspondence_with_NAICS_2017.xlsx"
# Source: UN Statistics Division → Classifications → Correspondences

# Option 2: Use the concordance package's ISIC descriptions to validate,
# combined with the embedded key mappings below.

# Key ISIC Rev.4 → NAICS 2017 mappings for our clean-tech codes
# (from UN Statistics Division correspondence table):
isic_naics_key <- tibble::tribble(
  ~isic4, ~naics6,   ~naics6_desc,
  "1041", "311225",  "Fats and Oils Refining and Blending",
  "1062", "311221",  "Wet Corn Milling",
  "1610", "321113",  "Sawmills",
  "1920", "324110",  "Petroleum Refineries",
  "2011", "325120",  "Industrial Gas Manufacturing",
  "2012", "325130",  "Synthetic Dye and Pigment Manufacturing",
  "2013", "325180",  "Other Basic Inorganic Chemical Manufacturing",
  "2014", "325190",  "Other Basic Organic Chemical Manufacturing",
  "2015", "325311",  "Nitrogenous Fertilizer Manufacturing",
  "2016", "325211",  "Plastics Material and Resin Manufacturing",
  "2017", "325212",  "Synthetic Rubber Manufacturing",
  "2030", "325510",  "Paint and Coating Manufacturing",
  "2052", "325520",  "Adhesive Manufacturing",
  "2059", "325998",  "All Other Miscellaneous Chemical Product Manufacturing",
  "2219", "326291",  "Rubber Product Manufacturing, NEC",
  "2221", "326113",  "Unlaminated Plastics Film and Sheet Manufacturing",
  "2229", "326199",  "All Other Plastics Product Manufacturing",
  "2311", "327211",  "Flat Glass Manufacturing",
  "2312", "327212",  "Other Pressed and Blown Glass Manufacturing",
  "2320", "327110",  "Pottery, Ceramics, and Plumbing Fixture Manufacturing",
  "2344", "327999",  "All Other Miscellaneous Nonmetallic Mineral Product Mfg",
  "2351", "327310",  "Cement Manufacturing",
  "2352", "327390",  "Other Structural Clay Product Manufacturing",
  "2369", "327990",  "All Other Nonmetallic Mineral Product Manufacturing",
  "2391", "327910",  "Abrasive Product Manufacturing",
  "2399", "327999",  "All Other Miscellaneous Nonmetallic Mineral Product Mfg",
  "2410", "331110",  "Iron and Steel Mills and Ferroalloy Manufacturing",
  "2420", "331410",  "Nonferrous Metal (except Copper and Aluminum) Smelting",
  "2431", "331420",  "Copper Rolling, Drawing, Extruding, and Alloying",
  "2432", "331491",  "Nonferrous Metal (ex Copper/Aluminum) Rolling, Drawing",
  "2433", "331313",  "Aluminum Foundries (except Die-Casting)",
  "2441", "331410",  "Nonferrous Metal Smelting and Refining",
  "2442", "331313",  "Aluminum Production and Processing",
  "2443", "331410",  "Lead, Zinc, Tin Smelting and Refining",
  "2444", "331410",  "Copper Smelting and Refining",
  "2445", "331410",  "Other Nonferrous Metal Smelting",
  "2446", "331410",  "Precious Metals Smelting and Refining",
  "2511", "332311",  "Prefabricated Metal Building and Component Manufacturing",
  "2530", "332410",  "Power Boiler and Heat Exchanger Manufacturing",
  "2572", "332510",  "Hardware Manufacturing",
  "2573", "332710",  "Machine Shops",
  "2594", "332999",  "All Other Miscellaneous Fabricated Metal Product Mfg",
  "2599", "332999",  "All Other Miscellaneous Fabricated Metal Product Mfg",
  "2610", "334413",  "Semiconductor and Related Device Manufacturing",
  "2611", "334413",  "Semiconductor and Related Device Manufacturing",
  "2651", "334515",  "Instrument Manufacturing for Measuring and Testing",
  "2660", "334510",  "Electromedical and Electrotherapeutic Apparatus Mfg",
  "2670", "333314",  "Optical Instrument and Lens Manufacturing",
  "2711", "335312",  "Motor and Generator Manufacturing",
  "2720", "335911",  "Storage Battery Manufacturing",
  "2732", "335929",  "Other Communication and Energy Wire Manufacturing",
  "2733", "335931",  "Current-Carrying Wiring Device Manufacturing",
  "2790", "335999",  "All Other Miscellaneous Electrical Equipment Mfg",
  "2811", "333612",  "Speed Changer, Industrial High-Speed Drive, and Gear Mfg",
  "2812", "333613",  "Mechanical Power Transmission Equipment Manufacturing",
  "2813", "333911",  "Pump and Pumping Equipment Manufacturing",
  "2815", "333923",  "Overhead Traveling Crane, Hoist, and Monorail System Mfg",
  "2821", "333111",  "Farm Machinery and Equipment Manufacturing",
  "2822", "333120",  "Construction Machinery Manufacturing",
  "2825", "333415",  "Air-Conditioning and Warm Air Heating Equipment Mfg",
  "2829", "333999",  "All Other General Purpose Machinery Manufacturing",
  "2841", "333514",  "Special Industry Machinery Manufacturing",
  "2849", "333249",  "Other Industrial and Commercial Machinery Mfg",
  "2891", "333514",  "Metalworking Machinery Manufacturing",
  "2892", "333131",  "Mining Machinery and Equipment Manufacturing",
  "2894", "333318",  "Other Commercial and Service Industry Machinery Mfg",
  "2896", "333220",  "Plastics and Rubber Industry Machinery Manufacturing",
  "2899", "333249",  "Other Special Industry Machinery Manufacturing",
  "2932", "336390",  "Other Motor Vehicle Parts Manufacturing",
  "3212", "339910",  "Jewelry and Silverware Manufacturing",
  "3812", "562119",  "Other Waste Collection"
)

# Chain: NACE → ISIC → NAICS
nace_naics_routeB <- nace_to_isic |>
  left_join(isic_naics_key, by = "isic4") |>
  select(nace_code, isic4, naics6, naics6_desc)

Step 3: Compare Route B vs Route C

# Join both crosswalks and compare
comparison <- nace_naics_routeB |>    # from 08c_build_naics_crosswalk.R
  rename(naics6_routeB = naics6_dominant) |>
  left_join(
    nace_naics_routeC |> rename(naics6_routeC = naics6),
    by = "nace_code"
  ) |>
  mutate(agreement = naics6_routeB == naics6_routeC)

# Diagnostic: where do they disagree?
comparison |> filter(!agreement) |>
  select(nace_code, isic4, naics6_routeB, naics6_routeC)
Show Route C execution code
# Execute Route C for our NACE codes and compare with Route B results
nt <- read_csv(nace_tech_path, show_col_types = FALSE)
xw <- read_csv(xw_path, show_col_types = FALSE) |>
  mutate(nace_code = as.character(nace_code))

# NACE → ISIC: for Section C manufacturing, NACE = ISIC at 4-digit
# with key exceptions:
nace_isic_exc <- tibble::tribble(
  ~nace_code, ~isic4,
  "2441", "2420", "2442", "2420", "2443", "2420",
  "2444", "2420", "2445", "2420", "2446", "2420",
  "2611", "2610", "2612", "2610"
)

isic_naics_key <- tibble::tribble(
  ~isic4, ~naics6_routeB, ~naics6_desc_B,
  "1041","311225","Fats and Oils Refining","1062","311221","Wet Corn Milling",
  "1610","321113","Sawmills","1920","324110","Petroleum Refineries",
  "2011","325120","Industrial Gas Mfg","2012","325130","Synthetic Dye Mfg",
  "2013","325180","Basic Inorganic Chemical Mfg","2014","325190","Basic Organic Chemical Mfg",
  "2016","325211","Plastics Material and Resin Mfg","2017","325212","Synthetic Rubber Mfg",
  "2059","325998","Miscellaneous Chemical Product Mfg",
  "2219","326291","Rubber Product Mfg NEC","2221","326113","Plastics Film/Sheet Mfg",
  "2229","326199","Other Plastics Product Mfg",
  "2311","327211","Flat Glass Mfg","2312","327212","Other Glass Mfg",
  "2320","327110","Pottery and Ceramics Mfg","2344","327999","Nonmetallic Mineral NEC",
  "2351","327310","Cement Mfg","2352","327390","Structural Clay Mfg",
  "2369","327990","Nonmetallic Mineral Product Mfg","2391","327910","Abrasive Product Mfg",
  "2399","327999","Nonmetallic Mineral Product NEC",
  "2410","331110","Iron and Steel Mills","2420","331410","Nonferrous Metal Smelting",
  "2441","331410","Nonferrous Metal Refining","2442","331313","Aluminum Production",
  "2443","331410","Lead/Zinc/Tin Refining","2444","331410","Copper Refining",
  "2445","331410","Other Nonferrous Metal","2446","331410","Precious Metals Refining",
  "2511","332311","Prefab Metal Building Mfg","2530","332410","Heat Exchanger Mfg",
  "2572","332510","Hardware Mfg","2573","332710","Machine Shops",
  "2594","332999","Misc Fabricated Metal Mfg","2599","332999","Misc Fabricated Metal Mfg",
  "2610","334413","Semiconductor Device Mfg","2611","334413","Semiconductor Device Mfg",
  "2651","334515","Instruments for Measuring/Testing",
  "2660","334510","Electromedical Apparatus Mfg","2670","333314","Optical Instrument Mfg",
  "2711","335312","Motor and Generator Mfg","2720","335911","Storage Battery Mfg",
  "2732","335929","Other Energy Wire Mfg","2733","335931","Wiring Device Mfg",
  "2790","335999","Misc Electrical Equipment Mfg",
  "2811","333612","Speed Changer/Gear Mfg","2812","333613","Power Transmission Equip",
  "2813","333911","Pump and Pumping Equipment Mfg","2815","333923","Crane and Hoist Mfg",
  "2821","333111","Farm Machinery Mfg","2822","333120","Construction Machinery Mfg",
  "2825","333415","Air-Conditioning and Heating Equip Mfg",
  "2829","333999","General Purpose Machinery NEC",
  "2841","333514","Metalworking Machinery Mfg","2849","333249","Industrial Machinery NEC",
  "2891","333514","Metalworking Machinery","2892","333131","Mining Machinery Mfg",
  "2894","333318","Commercial Machinery Mfg","2896","333220","Plastics/Rubber Machinery Mfg",
  "2899","333249","Special Industry Machinery NEC",
  "2932","336390","Motor Vehicle Parts Mfg",
  "3212","339910","Jewelry and Silverware Mfg","3812","562119","Other Waste Collection"
)

our_nace <- xw |> distinct(nace_code)

nace_to_isic_tbl <- our_nace |>
  left_join(nace_isic_exc, by = "nace_code") |>
  mutate(isic4 = coalesce(isic4, nace_code))

routeC <- nace_to_isic_tbl |>
  left_join(isic_naics_key, by = "isic4")

comparison <- xw |>
  select(nace_code, naics6_routeB = naics6_dominant, naics_n_mapped) |>
  left_join(routeC |> select(nace_code, isic4, naics6_routeC = naics6_routeB, naics6_desc_C = naics6_desc_B),
            by = "nace_code") |>
  mutate(agreement = naics6_routeB == naics6_routeC,
         agreement_lbl = ifelse(is.na(agreement), "C missing",
                          ifelse(agreement, "Agree", "Differ")))

n_agree  <- sum(comparison$agreement == TRUE, na.rm = TRUE)
n_differ <- sum(comparison$agreement == FALSE, na.rm = TRUE)
n_miss   <- sum(is.na(comparison$agreement))

Route C coverage: 30 NACE codes agree with Route B, 34 differ, 20 missing from Route C embedded table.

Code
comparison |>
  select(nace_code, isic4, naics6_routeB, naics6_routeC,
         naics_n_mapped, agreement_lbl) |>
  arrange(agreement_lbl, nace_code) |>
  kable(
    col.names = c("NACE4","ISIC4",
                  "Route B NAICS6","Route C NAICS6","B: # NAICS","Status"),
    caption = "Route B (HS-based concordance pkg) vs Route C (Eurostat NACE→ISIC→NAICS)"
  ) |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"),
                full_width = TRUE, font_size = 11) |>
  column_spec(3, bold = TRUE, color = "#073309") |>
  column_spec(4, bold = TRUE, color = "#0369a1") |>
  column_spec(6, color = ifelse(
    comparison |> arrange(agreement_lbl, nace_code) |> pull(agreement_lbl) == "Agree",
    "#15803d", "#b45309"
  )) |>
  scroll_box(width = "100%", height = "380px")
Route B (HS-based concordance pkg) vs Route C (Eurostat NACE→ISIC→NAICS)
NACE4 ISIC4 Route B NAICS6 Route C NAICS6 B: # NAICS Status
1062 1062 311221 311221 1 Agree
1610 1610 321113 321113 1 Agree
2011 2011 325120 325120 1 Agree
2013 2013 325180 325180 12 Agree
2016 2016 325211 325211 2 Agree
2017 2017 325212 325212 1 Agree
2219 2219 326291 326291 3 Agree
2221 2221 326113 326113 3 Agree
2311 2311 327211 327211 1 Agree
2351 2351 327310 327310 1 Agree
2391 2391 327910 327910 1 Agree
2399 2399 327999 327999 2 Agree
2410 2410 331110 331110 3 Agree
2441 2420 331410 331410 3 Agree
2443 2420 331410 331410 2 Agree
2445 2420 331410 331410 5 Agree
2530 2530 332410 332410 1 Agree
2572 2572 332510 332510 2 Agree
2599 2599 332999 332999 18 Agree
2611 2610 334413 334413 1 Agree
2651 2651 334515 334515 8 Agree
2660 2660 334510 334510 1 Agree
2670 2670 333314 333314 6 Agree
2711 2711 335312 335312 4 Agree
2720 2720 335911 335911 1 Agree
2790 2790 335999 335999 15 Agree
2822 2822 333120 333120 6 Agree
2829 2829 333999 333999 14 Agree
2896 2896 333220 333220 3 Agree
3212 3212 339910 339910 3 Agree
0111 0111 111998 NA 2 C missing
0220 0220 113310 NA 1 C missing
0710 0710 212210 NA 1 C missing
0721 0721 212291 NA 1 C missing
0729 0729 212299 NA 5 C missing
0811 0811 212319 NA 2 C missing
0891 0891 212325 NA 3 C missing
1040 1040 311223 NA 2 C missing
20112013 20112013 325180 NA 1 C missing
2015 2015 325311 NA 1 C missing
2030 2030 325510 NA 1 C missing
2052 2052 325520 NA 1 C missing
22232733 22232733 326199 NA 1 C missing
22293299 22293299 316992 NA 9 C missing
24453832 24453832 331410 NA 4 C missing
26112612 26112612 334412 NA 1 C missing
26512670 26512670 334511 NA 4 C missing
27112790 27112790 327110 NA 8 C missing
28293250 28293250 333999 NA 1 C missing
28942899 28942899 333244 NA 4 C missing
1041 1041 311224 311225 4 Differ
1920 1920 325110 324110 1 Differ
2012 2012 325180 325130 4 Differ
2014 2014 325199 325190 3 Differ
2059 2059 325180 325998 9 Differ
2229 2229 322220 326199 2 Differ
2312 2312 327215 327212 1 Differ
2320 2320 327120 327110 3 Differ
2344 2344 327110 327999 2 Differ
2352 2352 327410 327390 1 Differ
2369 2369 327390 327990 2 Differ
2420 2420 331110 331410 3 Differ
2442 2420 331313 331410 8 Differ
2444 2420 331420 331410 5 Differ
2446 2420 325180 331410 2 Differ
2511 2511 332312 332311 4 Differ
2573 2573 332212 332710 7 Differ
2594 2594 332722 332999 1 Differ
2732 2732 331420 335929 4 Differ
2733 2733 335314 335931 3 Differ
2811 2811 333611 333612 1 Differ
2812 2812 333995 333613 7 Differ
2813 2813 333415 333911 13 Differ
2815 2815 333613 333923 9 Differ
2821 2821 333994 333111 3 Differ
2825 2825 333999 333415 19 Differ
2841 2841 333517 333514 7 Differ
2849 2849 333243 333249 6 Differ
2891 2891 333516 333514 8 Differ
2892 2892 333120 333131 4 Differ
2894 2894 333249 333318 2 Differ
2899 2899 333242 333249 19 Differ
2932 2932 333111 336390 8 Differ
3812 3812 325180 562119 2 Differ
Tip

Where routes agree: For clean 1-to-1 NACE codes (like 2720 → 335911 Batteries, 2611 → 334413 Semiconductors), both routes give identical results. Where they differ: High-ambiguity codes (e.g. 2829 — general machinery) where Route B picks the modal NAICS across many HS codes while Route C uses the UN ISIC→NAICS primary mapping. Neither is definitively “correct” — these are genuinely many-to-many industries. Use naics6_all for full context.


5. NAICS6 Results: Implemented Crosswalk

The crosswalk in data/orbis/orbis_nace_tech.csv and data/concordance/nace_naics_crosswalk.csv uses Route B (HS-based, concordance package) as the primary NAICS source, which is the most reproducible and covers all 84 NACE codes without manual embedding.

New columns added to orbis_nace_tech.csv:

Column Description
naics6_dominant NAICS 2017 6-digit — modal mapping by HS-code count
naics4_dominant NAICS 2017 4-digit — parent of dominant
naics6_all All NAICS6 codes for this NACE, semicolon-separated
naics_n_mapped Count of distinct NAICS6 codes (1 = clean mapping)
naics6_desc NAICS 2017 description for dominant code

5.1 Full crosswalk table

Code
if (file.exists(xw_path)) {
  nt_full <- read_csv(nace_tech_path, show_col_types = FALSE) |>
    distinct(nace_code, nace_desc) |>
    mutate(nace_code = as.character(nace_code))

  xw <- read_csv(xw_path, show_col_types = FALSE) |>
    mutate(nace_code = as.character(nace_code)) |>
    left_join(nt_full, by = "nace_code") |>
    select(nace_code, nace_desc, naics6_dominant, naics6_desc,
           naics_n_mapped, naics6_all) |>
    arrange(nace_code)

  xw |>
    kable(
      col.names = c("NACE4","NACE Description","NAICS6 (dominant)",
                    "NAICS Description","# NAICS codes","All NAICS6"),
      caption = "Full NACE Rev.2 → NAICS 2017 crosswalk (76 codes)"
    ) |>
    kable_styling(bootstrap_options = c("striped","hover","condensed"),
                  full_width = TRUE, font_size = 11) |>
    column_spec(3, bold = TRUE, color = "#073309") |>
    column_spec(5, color = ifelse(xw$naics_n_mapped > 1, "#b45309", "#374151"),
                bold = ifelse(xw$naics_n_mapped > 5, TRUE, FALSE)) |>
    scroll_box(width = "100%", height = "420px")
}
Full NACE Rev.2 → NAICS 2017 crosswalk (76 codes)
NACE4 NACE Description NAICS6 (dominant) NAICS Description # NAICS codes All NAICS6
0111 NA 111998 All Other Miscellaneous Crop Farming 2 111998;111930
0220 NA 113310 Logging 1 113310
0710 NA 212210 Iron Ore Mining 1 212210
0721 NA 212291 Uranium-Radium-Vanadium Ore Mining 1 212291
0729 NA 212299 All Other Metal Ore Mining 5 212299;212231;212234;212291;212222
0811 NA 212319 Other Crushed and Broken Stone Mining and Quarrying 2 212319;212322
0891 NA 212325 Clay and Ceramic and Refractory Minerals Mining 3 212325;327992;212392
1040 NA 311223 NA 2 311223;311224
1041 Manufacture of oils and fats 311224 Soybean and Other Oilseed Processing 4 311224;311225;311222;311223
1062 Manufacture of starches and starch products 311221 Wet Corn Milling 1 311221
1610 Sawmilling and planing of wood 321113 Sawmills 1 321113
1920 Manufacture of refined petroleum products 325110 Petrochemical Manufacturing 1 325110
2011 Manufacture of industrial gases 325120 Industrial Gas Manufacturing 1 325120
20112013 NA 325180 Other Basic Inorganic Chemical Manufacturing 1 325180
2012 Manufacture of dyes and pigments 325180 Other Basic Inorganic Chemical Manufacturing 4 325180;325188;325130;325131
2013 Manufacture of other inorganic basic chemicals 325180 Other Basic Inorganic Chemical Manufacturing 12 325180;325188;331410;331419;325181;325120;325311;325312;327992;331110;332410;333999
2014 Manufacture of other organic basic chemicals 325199 All Other Basic Organic Chemical Manufacturing 3 325199;325110;325193
2015 Manufacture of fertilisers and nitrogen compounds 325311 Nitrogenous Fertilizer Manufacturing 1 325311
2016 Manufacture of plastics in primary forms 325211 Plastics Material and Resin Manufacturing 2 325211;325212
2017 Manufacture of synthetic rubber in primary forms 325212 Synthetic Rubber Manufacturing 1 325212
2030 Manufacture of paints, varnishes and similar coatings, printing ink and mastics 325510 Paint and Coating Manufacturing 1 325510
2052 Manufacture of glues 325520 Adhesive Manufacturing 1 325520
2059 Manufacture of other chemical products nec 325180 Other Basic Inorganic Chemical Manufacturing 9 325180;325188;325199;325998;325194;325411;325613;327999;334413
2219 Manufacture of other rubber products 326291 Rubber Product Manufacturing for Mechanical Use 3 326291;326299;339991
2221 Manufacture of plastic plates, sheets, tubes and profiles 326113 Unlaminated Plastics Film and Sheet (except Packaging) Manufacturing 3 326113;326122;326220
22232733 NA 326199 All Other Plastics Product Manufacturing 1 326199
2229 Manufacture of other plastic products 322220 Paper Bag and Coated and Treated Paper Manufacturing 2 322220;322222
22293299 NA 316992 Women's Handbag and Purse Manufacturing 9 316992;323111;323118;326199;326220;339910;339913;339991;339999
2311 Manufacture of flat glass 327211 Flat Glass Manufacturing 1 327211
2312 Shaping and processing of flat glass 327215 Glass Product Manufacturing Made of Purchased Glass 1 327215
2320 Manufacture of refractory products 327120 Clay Building Material and Refractories Manufacturing 3 327120;327124;327125
2344 Manufacture of other technical ceramic products 327110 Pottery, Ceramics, and Plumbing Fixture Manufacturing 2 327110;327113
2351 Manufacture of cement 327310 Cement Manufacturing 1 327310
2352 Manufacture of lime and plaster 327410 Lime Manufacturing 1 327410
2369 Manufacture of other articles of concrete, plaster and cement 327390 Other Concrete Product Manufacturing 2 327390;327991
2391 Production of abrasive products 327910 Abrasive Product Manufacturing 1 327910
2399 Manufacture of other non-metallic mineral products nec 327999 All Other Miscellaneous Nonmetallic Mineral Product Manufacturing 2 327999;335991
2410 Manufacture of basic iron and steel and of ferro-alloys 331110 Iron and Steel Mills and Ferroalloy Manufacturing 3 331110;331111;331112
2420 Manufacture of tubes, pipes, hollow profiles and related fittings, of steel 331110 Iron and Steel Mills and Ferroalloy Manufacturing 3 331110;331111;332919
2441 Precious metals production 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 3 331410;331419;331491
2442 Aluminium production 331313 Alumina Refining and Primary Aluminum Production 8 331313;331318;331312;331314;331316;331311;331319;332999
2443 Lead, zinc and tin production 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 2 331410;331419
2444 Copper production 331420 Copper Rolling, Drawing, Extruding, and Alloying 5 331420;331410;331411;331421;331422
2445 Other non-ferrous metal production 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 5 331410;331419;331491;910000;331492
24453832 NA 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 4 331410;331419;331491;910000
2446 Processing of nuclear fuel 325180 Other Basic Inorganic Chemical Manufacturing 2 325180;325188
2511 Manufacture of metal structures and parts of structures 332312 Fabricated Structural Metal Manufacturing 4 332312;321991;332322;332323
2530 Manufacture of steam generators, except central heating hot water boilers 332410 Power Boiler and Heat Exchanger Manufacturing 1 332410
2572 Manufacture of locks and hinges 332510 Hardware Manufacturing 2 332510;333995
2573 Manufacture of tools 332212 NA 7 332212;332216;333517;333515;333514;333991;333131
2594 Manufacture of fasteners and screw machine products 332722 Bolt, Nut, Screw, Rivet, and Washer Manufacturing 1 332722
2599 Manufacture of other fabricated metal products nec 332999 All Other Miscellaneous Fabricated Metal Product Manufacturing 18 332999;339910;339911;339912;316993;316998;331110;331222;334417;336991;339994;339995;332215;332323;332618;331523;332112;337920
2611 Manufacture of electronic components 334413 Semiconductor and Related Device Manufacturing 1 334413
26112612 NA 334412 Bare Printed Circuit Board Manufacturing 1 334412
2651 Manufacture of instruments and appliances for measuring, testing and navigation 334515 Instrument Manufacturing for Measuring and Testing Electricity and Electrical Signals 8 334515;334516;333314;333316;333997;333999;334513;334519
26512670 NA 334511 Search, Detection, Navigation, Guidance, Aeronautical, and Nautical System and Instrument Manufacturing 4 334511;334513;334514;334515
2660 Manufacture of irradiation, electromedical and electrotherapeutic equipment 334510 Electromedical and Electrotherapeutic Apparatus Manufacturing 1 334510
2670 Manufacture of optical instruments and photographic equipment 333314 Optical Instrument and Lens Manufacturing 6 333314;332212;332216;333316;332510;339115
2711 Manufacture of electric motors, generators and transformers 335312 Motor and Generator Manufacturing 4 335312;333611;334416;335311
27112790 NA 327110 Pottery, Ceramics, and Plumbing Fixture Manufacturing 8 327110;327113;334413;334416;334418;334419;335311;335999
2720 Manufacture of batteries and accumulators 335911 Storage Battery Manufacturing 1 335911
2732 Manufacture of other electronic and electric wires and cables 331420 Copper Rolling, Drawing, Extruding, and Alloying 4 331420;331422;331491;335929
2733 Manufacture of wiring devices 335314 Relay and Industrial Control Manufacturing 3 335314;335931;334417
2790 Manufacture of other electrical equipment 335999 All Other Miscellaneous Electrical Equipment and Component Manufacturing 15 335999;333295;333242;333999;334511;339999;334510;334210;335129;339992;333992;334418;333912;326199;335932
2811 Manufacture of engines and turbines, except aircraft, vehicle and cycle engines 333611 Turbine and Turbine Generator Set Units Manufacturing 1 333611
2812 Manufacture of fluid power equipment 333995 Fluid Power Cylinder and Actuator Manufacturing 7 333995;333618;333911;333996;333111;333611;336415
2813 Manufacture of other pumps and compressors 333415 Air-Conditioning and Warm Air Heating Equipment and Commercial and Industrial Refrigeration Equipment Manufacturing 13 333415;333911;333912;333412;333413;335210;335211;333618;333996;336310;33631X;336390;336391
2815 Manufacture of bearings, gears, gearing and driving elements 333613 Mechanical Power Transmission Equipment Manufacturing 9 333613;332991;336310;332322;332722;333612;333618;335312;33631X
2821 Manufacture of ovens, furnaces and furnace burners 333994 Industrial Process Furnace and Oven Manufacturing 3 333994;333318;333991
2822 Manufacture of lifting and handling equipment 333120 Construction Machinery Manufacturing 6 333120;333131;333132;333994;333999;333924
2825 Manufacture of non-domestic cooling and ventilation equipment 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 19 333999;332410;333220;333243;333249;333291;333318;333319;333991;335228;339113;333415;336390;333411;333413;336399;335312;336391;335222
2829 Manufacture of other general-purpose machinery nec 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 14 333999;333249;333220;333243;333291;333241;333294;333318;333292;333298;333912;333991;333295;333993
28293250 NA 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 1 333999
2841 Manufacture of metal forming machinery 333517 Machine Tool Manufacturing 7 333517;333512;333249;333513;333999;333295;333511
2849 Manufacture of other machine tools 333243 Sawmill, Woodworking, and Paper Machinery Manufacturing 6 333243;333249;333512;333515;333517;333999
2891 Manufacture of machinery for metallurgy 333516 NA 8 333516;333519;333249;333298;333513;333517;331511;331513
2892 Manufacture of machinery for mining, quarrying and construction 333120 Construction Machinery Manufacturing 4 333120;333132;333131;336611
2894 Manufacture of machinery for textile, apparel and leather production 333249 Other Industrial Machinery Manufacturing 2 333249;333292
28942899 NA 333244 Printing Machinery and Equipment Manufacturing 4 333244;333249;333292;333293
2896 Manufacture of plastic and rubber machinery 333220 NA 3 333220;333249;333295
2899 Manufacture of other special-purpose machinery nec 333242 Semiconductor Machinery Manufacturing 19 333242;333295;333999;333120;333132;333249;333318;333319;333415;333995;335210;335228;335999;333220;333241;333294;333513;333517;333519
2932 Manufacture of other parts and accessories for motor vehicles 333111 Farm Machinery and Equipment Manufacturing 8 333111;332111;333120;336330;336350;336390;336399;336340
3212 Manufacture of jewellery and related articles 339910 Jewelry and Silverware Manufacturing 3 339910;339913;327910
3812 Collection of hazardous waste 325180 Other Basic Inorganic Chemical Manufacturing 2 325180;325188

5.2 Key clean-tech mappings

Code
if (file.exists(nace_tech_path)) {
  nt <- read_csv(nace_tech_path, show_col_types = FALSE)
  key_nace <- c("2720","2611","2711","2651","2441","2442","2445","2013","2825","2829")

  nt |>
    mutate(nace_code = as.character(nace_code)) |>
    filter(nace_code %in% key_nace) |>
    distinct(nace_code, nace_desc, tech, naics6_dominant, naics6_desc, naics_n_mapped) |>
    arrange(nace_code, tech) |>
    kable(
      col.names = c("NACE4","NACE Description","Technology",
                    "NAICS6","NAICS Description","# NAICS"),
      caption = "Key clean-tech NACE → NAICS6 mappings"
    ) |>
    kable_styling(bootstrap_options = c("striped","condensed"), full_width = TRUE) |>
    column_spec(4, bold = TRUE, color = "#073309")
}
Key clean-tech NACE → NAICS6 mappings
NACE4 NACE Description Technology NAICS6 NAICS Description # NAICS
2013 Manufacture of other inorganic basic chemicals Batteries 325180 Other Basic Inorganic Chemical Manufacturing 12
2013 Manufacture of other inorganic basic chemicals Biofuel 325180 Other Basic Inorganic Chemical Manufacturing 12
2013 Manufacture of other inorganic basic chemicals Electrolyzers 325180 Other Basic Inorganic Chemical Manufacturing 12
2013 Manufacture of other inorganic basic chemicals Geothermal 325180 Other Basic Inorganic Chemical Manufacturing 12
2013 Manufacture of other inorganic basic chemicals Magnets 325180 Other Basic Inorganic Chemical Manufacturing 12
2013 Manufacture of other inorganic basic chemicals Nuclear 325180 Other Basic Inorganic Chemical Manufacturing 12
2013 Manufacture of other inorganic basic chemicals Solar 325180 Other Basic Inorganic Chemical Manufacturing 12
2013 Manufacture of other inorganic basic chemicals Transmission 325180 Other Basic Inorganic Chemical Manufacturing 12
2013 Manufacture of other inorganic basic chemicals Wind 325180 Other Basic Inorganic Chemical Manufacturing 12
2441 Precious metals production Biofuel 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 3
2441 Precious metals production Electrolyzers 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 3
2441 Precious metals production Solar 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 3
2442 Aluminium production Batteries 331313 Alumina Refining and Primary Aluminum Production 8
2442 Aluminium production Biofuel 331313 Alumina Refining and Primary Aluminum Production 8
2442 Aluminium production Heat Pumps 331313 Alumina Refining and Primary Aluminum Production 8
2442 Aluminium production Magnets 331313 Alumina Refining and Primary Aluminum Production 8
2442 Aluminium production Solar 331313 Alumina Refining and Primary Aluminum Production 8
2442 Aluminium production Transmission 331313 Alumina Refining and Primary Aluminum Production 8
2442 Aluminium production Wind 331313 Alumina Refining and Primary Aluminum Production 8
2445 Other non-ferrous metal production Batteries 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 5
2445 Other non-ferrous metal production Biofuel 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 5
2445 Other non-ferrous metal production Electrolyzers 331410 Nonferrous Metal (except Aluminum) Smelting and Refining 5
2611 Manufacture of electronic components Solar 334413 Semiconductor and Related Device Manufacturing 1
2651 Manufacture of instruments and appliances for measuring, testing and navigation Batteries 334515 Instrument Manufacturing for Measuring and Testing Electricity and Electrical Signals 8
2651 Manufacture of instruments and appliances for measuring, testing and navigation Wind 334515 Instrument Manufacturing for Measuring and Testing Electricity and Electrical Signals 8
2711 Manufacture of electric motors, generators and transformers Geothermal 335312 Motor and Generator Manufacturing 4
2711 Manufacture of electric motors, generators and transformers Solar 335312 Motor and Generator Manufacturing 4
2711 Manufacture of electric motors, generators and transformers Transmission 335312 Motor and Generator Manufacturing 4
2711 Manufacture of electric motors, generators and transformers Wind 335312 Motor and Generator Manufacturing 4
2720 Manufacture of batteries and accumulators Batteries 335911 Storage Battery Manufacturing 1
2825 Manufacture of non-domestic cooling and ventilation equipment Biofuel 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 19
2825 Manufacture of non-domestic cooling and ventilation equipment Electrolyzers 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 19
2825 Manufacture of non-domestic cooling and ventilation equipment Geothermal 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 19
2825 Manufacture of non-domestic cooling and ventilation equipment Heat Pumps 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 19
2829 Manufacture of other general-purpose machinery nec Batteries 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 14
2829 Manufacture of other general-purpose machinery nec Biofuel 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 14
2829 Manufacture of other general-purpose machinery nec Electrolyzers 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 14
2829 Manufacture of other general-purpose machinery nec Geothermal 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 14
2829 Manufacture of other general-purpose machinery nec Nuclear 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 14
2829 Manufacture of other general-purpose machinery nec Solar 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 14
2829 Manufacture of other general-purpose machinery nec Transmission 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 14
2829 Manufacture of other general-purpose machinery nec Wind 333999 All Other Miscellaneous General Purpose Machinery Manufacturing 14

5.3 Many-to-many ambiguity

Code
if (file.exists(xw_path)) {
  xw <- read_csv(xw_path, show_col_types = FALSE)
  tibble::tibble(
    Metric = c(
      "NACE codes with clean 1-to-1 NAICS mapping",
      "NACE codes with 2–5 NAICS codes (moderate ambiguity)",
      "NACE codes with 6+ NAICS codes (high ambiguity — use naics6_all)",
      "Median NAICS codes per NACE"
    ),
    Value = c(
      sum(xw$naics_n_mapped == 1),
      sum(xw$naics_n_mapped >= 2 & xw$naics_n_mapped <= 5),
      sum(xw$naics_n_mapped >= 6),
      median(xw$naics_n_mapped)
    )
  ) |>
    kable() |>
    kable_styling(bootstrap_options = c("condensed"), full_width = FALSE)
}
Metric Value
NACE codes with clean 1-to-1 NAICS mapping 26
NACE codes with 2–5 NAICS codes (moderate ambiguity) 37
NACE codes with 6+ NAICS codes (high ambiguity — use naics6_all) 21
Median NAICS codes per NACE 3

High-ambiguity codes (NACE 2829, 2899, 2013, 2651) are generic machinery and chemicals categories that span many NAICS industries. For these, naics6_all should be used for analysis, or the comparison filtered to the specific product stage (Final Product NACE codes are all clean 1-to-1 mappings).


6. Firm-Level Database Comparison

Note

Database landscape, classification granularity, and decision guide are in §1 (Overview) above, alongside the concordance routes. This section adds JHU-specific subscription costs and a database selection guide.

Code
tibble::tribble(
  ~Research_Question,                                    ~Recommended_DB,         ~Est_Cost,        ~Reason,
  "Global manufacturing firm locations (all countries)", "ORBIS",                 "$25–60K/yr",     "Only DB with private firms globally; NACE4 globally",
  "US listed company financials + market data",          "Compustat N. America",  "Bundled in WRDS", "Deepest US coverage since 1950s; NAICS6 + SIC4",
  "Cross-country listed company financials",             "Compustat Global",      "Bundled in WRDS", "Consistent accounting standards across 80 countries",
  "Private equity / M&A / private firm deals",           "S&P Capital IQ Pro",   "$60–150K/yr",    "Broadest private company coverage; GICS8 globally",
  "IRA-linked US manufacturing capacity",                "Compustat + ORBIS",    "Combined",        "Compustat for listed US; ORBIS for private US + global",
  "Supply chain linkages between firms",                 "S&P Capital IQ Pro",   "$60–150K/yr",    "Supply chain relationships module"
) |>
  kable(col.names = c("Research Question","Recommended DB","Est. Annual Cost","Reason"),
        caption = "Database selection guide for clean-tech research") |>
  kable_styling(bootstrap_options = c("striped","hover","condensed"), full_width = TRUE) |>
  column_spec(2, bold = TRUE, color = "#073309")
Database selection guide for clean-tech research
Research Question Recommended DB Est. Annual Cost Reason
Global manufacturing firm locations (all countries) ORBIS $25–60K/yr Only DB with private firms globally; NACE4 globally
US listed company financials + market data Compustat N. America Bundled in WRDS Deepest US coverage since 1950s; NAICS6 + SIC4
Cross-country listed company financials Compustat Global Bundled in WRDS Consistent accounting standards across 80 countries
Private equity / M&A / private firm deals S&P Capital IQ Pro $60–150K/yr Broadest private company coverage; GICS8 globally
IRA-linked US manufacturing capacity Compustat + ORBIS Combined Compustat for listed US; ORBIS for private US + global
Supply chain linkages between firms S&P Capital IQ Pro $60–150K/yr Supply chain relationships module
Note

JHU Library: ORBIS (BvD/Moody’s) and WRDS (Compustat NA + Global) confirmed. S&P Capital IQ Pro: verify with Sheridan Libraries — access may be limited. Contact dataservices@jhu.edu. Cost estimates are indicative academic pricing.


7. Files Referenced

File Location Description
hs_nace_correspondence_observations.csv data/ 320-edge HS→NACE concordance (Eurostat CN–CPA–NACE)
orbis_nace_tech.csv data/orbis/ 235-row NACE→tech lookup with NAICS6 columns
nace_naics_crosswalk.csv data/concordance/ 76-row standalone NACE→NAICS lookup table
08_build_orbis.R scripts/build_data/ Builds ORBIS extract via NACE filter
08b_build_orbis_avg.R scripts/build_data/ Collapses panel to one row per firm
08c_build_naics_crosswalk.R scripts/build_data/ Builds NACE→NAICS via concordance pkg (Route B)
orbis_hs_to_nace_quarto.qmd NetZeroValueChainExplorer/qmd/orbis/ Full HS→NACE methodology
orbis_battery2720_world_map.qmd NetZeroValueChainExplorer/qmd/orbis/ Batteries NACE 2720 world map
nace_naics_crosswalk_annotated.csv data/concordance/ 76-row crosswalk with tech labels + ambiguity tier (generated below)

8. Crosswalk CSV for Distribution

A clean, annotated version of the crosswalk is written to data/concordance/nace_naics_crosswalk_annotated.csv for sharing with collaborators who do not need to run R.

Code
nt_full <- read_csv(nace_tech_path, show_col_types = FALSE) |>
  mutate(nace_code = as.character(nace_code)) |>
  distinct(nace_code, nace_desc, tech)

xw_raw <- read_csv(xw_path, show_col_types = FALSE) |>
  mutate(nace_code = as.character(nace_code))

# Collapse technologies per NACE (multiple techs possible)
nace_techs <- nt_full |>
  group_by(nace_code, nace_desc) |>
  summarise(tech = paste(sort(unique(tech)), collapse = "; "), .groups = "drop")

annotated <- xw_raw |>
  left_join(nace_techs, by = "nace_code") |>
  mutate(
    ambiguity_tier = case_when(
      naics_n_mapped == 1              ~ "Clean (1-to-1)",
      naics_n_mapped <= 5              ~ "Moderate (2-5 NAICS)",
      TRUE                             ~ "High (6+ NAICS)"
    )
  ) |>
  select(
    nace_code, nace_desc, tech,
    naics6_dominant, naics6_desc,
    naics4_dominant, naics_n_mapped, ambiguity_tier,
    naics6_all
  ) |>
  arrange(nace_code)

write_csv(annotated, xw_annotated_path)

Columns in nace_naics_crosswalk_annotated.csv:

Column Description
nace_code NACE Rev.2 4-digit code
nace_desc NACE description (from ORBIS extract)
tech Clean technology/technologies linked to this NACE code
naics6_dominant Most frequent NAICS 2017 6-digit code (Route B)
naics6_desc Description of dominant NAICS6
naics4_dominant Parent NAICS 4-digit
naics_n_mapped Count of distinct NAICS6 codes mapping to this NACE
ambiguity_tier Clean / Moderate / High
naics6_all All NAICS6 codes, semicolon-separated

The file has 84 rows (one per NACE code) and is suitable for sharing as a standalone reference.



9. JHU Faculty Using Firm-Level Data

SAIS and Krieger faculty using firm-level datasets are relevant for potential collaboration or shared data access. The concordance routes in §1.3 above connect to the datasets these faculty use: Route A (NACE → ORBIS), Route B (NAICS → Compustat), and Census micro-linkage (LFTTD, described below).

Census micro-linkage: ground truth for US firms

A fourth option beyond Routes A–C exists for US firms: match ORBIS records to US Census Bureau records directly, bypassing classification translation entirely. The key dataset enabling this is the LFTTD:

Note

What is the LFTTD? The Longitudinal Firm Trade Transactions Database (LFTTD) is a restricted U.S. Census Bureau microdata file that links Customs import/export transaction records (bill-of-lading level, product code, value, country) to firm identifiers from the Census Business Register. It contains the universe of US goods trade at the firm–product–partner level. Access requires an approved Census Research Data Center (RDC) project — it is not a commercial subscription. It is not an alternative to ORBIS; it is the micro-level linkage that makes it possible to validate whether a firm classified as NAICS 334413 (Semiconductors) in Census records actually exports HS 854143 (solar modules).

SAIS faculty Mine Senses and Pravin Krishna have used LFTTD-type Census transaction data in published work.

JHU faculty with confirmed firm-level data use

Code
tibble::tribble(
  ~Faculty,          ~Affiliation,         ~Field,                             ~Dataset_Used,              ~Notes,
  "Mine Z. Senses",  "SAIS",               "International trade & labor",       "U.S. Census LFTTD/LBD",    "Firm–transaction matched data; published JIE 2014, 2016",
  "Pravin Krishna",  "SAIS / Krieger",     "International economics & trade",   "Indian customs × firm balance sheets; Census matched data", "RDC-type micro linkage; NBER 2024",
  "Gordon M. Bodnar","SAIS",               "International corporate finance",   "Compustat (N. America)",   "Exchange rate exposure of US multinationals; confirmed via co-authored papers",
) |>
  kable(col.names = c("Faculty","Unit","Field","Dataset(s)","Evidence"),
        caption = "JHU faculty with confirmed firm-level / trade micro-data use") |>
  kable_styling(bootstrap_options = c("striped","condensed"), full_width = TRUE) |>
  column_spec(4, italic = TRUE)
JHU faculty with confirmed firm-level / trade micro-data use
Faculty Unit Field Dataset(s) Evidence
Mine Z. Senses SAIS International trade & labor U.S. Census LFTTD/LBD Firm–transaction matched data; published JIE 2014, 2016
Pravin Krishna SAIS / Krieger International economics & trade Indian customs × firm balance sheets; Census matched data RDC-type micro linkage; NBER 2024
Gordon M. Bodnar SAIS International corporate finance Compustat (N. America) Exchange rate exposure of US multinationals; confirmed via co-authored papers
Note

Sebnem Kalemli-Özcan (now at Brown University) is the leading methodologist for ORBIS use in economics. Her NBER working paper “How to Construct Nationally Representative Firm-Level Data from the Orbis Global Database” (NBER w21558, published AEJ: Macro 2024) is the canonical reference for any serious ORBIS work — covering sample selection, consolidation, coverage gaps, and winsorization. Useful reading before scaling up the CVCE ORBIS pipeline.