The CVCE project links HS 6-digit trade codes (from BACI and the Green Dictionary) to firm-level industry classifications to identify manufacturers of clean technologies across commercial databases. This document describes the concordance pipeline, validates three independent routes, and documents which routes connect to which databases.
1.1 Firm-level databases
Code
tibble::tribble(~Database, ~Provider, ~Firm_Universe, ~Geography, ~Primary_Class, ~Max_Digits, ~Time_Coverage, ~JHU_Access, ~Effective_Route,"ORBIS", "Bureau van Dijk / Moody's","Public + Private (~450M firms)", "Global, 200+ countries","NACE Rev.2", "4", "1990s–present", "Yes (library licence)", "Route A (HS→NACE4) — direct match","S&P Capital IQ Pro","S&P Global", "Public + Private (~6.5M firms)", "Global", "GICS + SIC + NAICS", "8 / 4 / 6", "2000s–present", "Partial — verify library","Routes B/C (HS→NACE→NAICS6); GICS mapping not yet implemented","Compustat N. America","S&P Global / WRDS", "Listed only (~30K firms)", "US + Canada", "NAICS 2017", "6", "1950s–present", "Yes (via WRDS)", "Routes B/C (HS→NACE→NAICS6)","Compustat Global", "S&P Global / WRDS", "Listed only (~44K firms)", "80+ countries", "SIC / GICS", "4 / 8", "1987–present", "Yes (via WRDS)", "SIC4 via manual mapping; GICS not yet implemented","Refinitiv Eikon", "LSEG", "Listed only (~70K firms)", "Global", "TRBC", "6", "1980s–present", "Check library", "TRBC mapping not yet implemented") |>kable(col.names =c("Database","Provider","Firm Universe","Geography","Primary Classification","Max Digits","Time Coverage","JHU Access","Effective Route (this project)"),caption ="Firm-level databases for clean-tech research — classification systems and concordance routes" ) |>kable_styling(bootstrap_options =c("striped","hover","condensed"),full_width =TRUE, font_size =11) |>column_spec(1, bold =TRUE) |>column_spec(6, bold =TRUE) |>column_spec(9, italic =TRUE) |>row_spec(1, background ="#f0fdf4") |>row_spec(3:4, background ="#eff6ff") |>scroll_box(width ="100%")
Firm-level databases for clean-tech research — classification systems and concordance routes
Database
Provider
Firm Universe
Geography
Primary Classification
Max Digits
Time Coverage
JHU Access
Effective Route (this project)
ORBIS
Bureau van Dijk / Moody's
Public + Private (~450M firms)
Global, 200+ countries
NACE Rev.2
4
1990s–present
Yes (library licence)
Route A (HS→NACE4) — direct match
S&P Capital IQ Pro
S&P Global
Public + Private (~6.5M firms)
Global
GICS + SIC + NAICS
8 / 4 / 6
2000s–present
Partial — verify library
Routes B/C (HS→NACE→NAICS6); GICS mapping not yet implemented
Compustat N. America
S&P Global / WRDS
Listed only (~30K firms)
US + Canada
NAICS 2017
6
1950s–present
Yes (via WRDS)
Routes B/C (HS→NACE→NAICS6)
Compustat Global
S&P Global / WRDS
Listed only (~44K firms)
80+ countries
SIC / GICS
4 / 8
1987–present
Yes (via WRDS)
SIC4 via manual mapping; GICS not yet implemented
Refinitiv Eikon
LSEG
Listed only (~70K firms)
Global
TRBC
6
1980s–present
Check library
TRBC mapping not yet implemented
Note
Classification systems: ORBIS uses NACE Rev.2 (4-digit) for all firms globally (naceccod2 field). S&P Capital IQ Pro uses GICS (8-digit, global primary), SIC (4-digit, US historical), and NAICS (6-digit, North American firms) — it does not use NACE as a classification field. This means Route A (HS→NACE) provides a direct path to ORBIS only; reaching S&P and Compustat requires Routes B/C (NACE→NAICS). NACE itself is hard-capped at 4 digits — no standard NACE5/6 exists. The intermediate CPA 2.1 (6–8 digits, e.g., CPA 27.20.23 for lithium-ion batteries) is in our chain but not used as a primary field by any firm database.
1.2 Classification granularity
Code
tibble::tribble(~Database, ~Primary_Class, ~Max_Digits, ~Batteries_Code, ~Solar_Code, ~Source_Field, ~Notes,"ORBIS", "NACE Rev.2", "4", "2720", "2611", "naceccod2", "Route A target — NACE used globally for all firms","S&P Capital IQ", "GICS", "8", "GICS 20104010", "GICS 20106010", "primaryGICS", "Route B/C target — GICS primary; also SIC4 + NAICS6 for NA firms","S&P Capital IQ", "NAICS 2017", "6", "335911", "334413", "naics", "Route B/C target — stored for North American firms only","Compustat N. America", "NAICS 2017", "6", "335911", "334413", "naics / sich", "Route B/C target — SIC4 also available","Compustat Global", "SIC / GICS", "4 / 8", "SIC 3691", "SIC 3674", "sich / gsubind", "No NAICS; no NACE — SIC4 or GICS8 only","Trade data (BACI)", "HS", "6", "850760/850780", "854143/854142", "hs (6-digit)", "Starting point for all three routes") |>kable(col.names =c("Database","Primary Classification","Max Digits","Batteries Code","Solar Code","Field Name","Notes"),caption ="Classification system granularity by database — with Batteries and Solar reference codes" ) |>kable_styling(bootstrap_options =c("striped","hover","condensed"),full_width =TRUE, font_size =11) |>column_spec(3, bold =TRUE) |>column_spec(4:5, color ="#073309", bold =TRUE) |>scroll_box(width ="100%")
Classification system granularity by database — with Batteries and Solar reference codes
Database
Primary Classification
Max Digits
Batteries Code
Solar Code
Field Name
Notes
ORBIS
NACE Rev.2
4
2720
2611
naceccod2
Route A target — NACE used globally for all firms
S&P Capital IQ
GICS
8
GICS 20104010
GICS 20106010
primaryGICS
Route B/C target — GICS primary; also SIC4 + NAICS6 for NA firms
S&P Capital IQ
NAICS 2017
6
335911
334413
naics
Route B/C target — stored for North American firms only
Compustat N. America
NAICS 2017
6
335911
334413
naics / sich
Route B/C target — SIC4 also available
Compustat Global
SIC / GICS
4 / 8
SIC 3691
SIC 3674
sich / gsubind
No NAICS; no NACE — SIC4 or GICS8 only
Trade data (BACI)
HS
6
850760/850780
854143/854142
hs (6-digit)
Starting point for all three routes
Code
tibble::tribble(~db, ~classification, ~max_digits, ~type,"BACI / Green Dict", "HS", 6L, "Trade","ORBIS", "NACE Rev.2", 4L, "Firm (global)","Compustat NA", "NAICS 2017", 6L, "Firm (US listed)","Compustat Global", "SIC", 4L, "Firm (global listed)","S&P Capital IQ", "GICS", 8L, "Firm (global)") |>mutate(db =factor(db, levels =rev(c("BACI / Green Dict","ORBIS","Compustat NA","Compustat Global","S&P Capital IQ" )))) |>ggplot(aes(y = db, x = max_digits, fill = type)) +geom_col(width =0.6) +geom_vline(xintercept =6, linetype ="dashed", colour ="#b45309", linewidth =0.8) +annotate("text", x =6.1, y =0.4, hjust =0, vjust =0,label ="HS 6-digit\n(trade reference)",family ="Archivo", size =3, colour ="#b45309") +geom_text(aes(label =paste0(classification, "\n(", max_digits, "d)")),hjust =-0.05, family ="Archivo", size =3.2) +scale_x_continuous(limits =c(0, 10), breaks =2:8) +scale_fill_manual(values =c("Trade"="#d1fae5","Firm (global)"="#073309","Firm (US listed)"="#0369a1","Firm (global listed)"="#7c3aed" ), name =NULL) +labs(title ="Max Classification Depth by Database",x ="Number of digits", y =NULL) +theme_minimal(base_family ="Archivo", base_size =12) +theme(plot.title =element_text(face ="bold", colour ="#073309"),legend.position ="bottom",panel.grid.major.y =element_blank())
Classification depth by database. Each level = one digit of granularity. Dashed line = HS6 reference (trade data). GICS8 is the most granular firm classification available.
§4.2; 14 HS2022 codes unmatched, use HS2017 anchor
C — NACE → NAICS
NACE 4-digit → NAICS 2017 6-digit (bridge from ORBIS to Compustat)
Same as B — ORBIS-to-Compustat cross-DB link
(i) Eurostat RAMON: NACE↔︎ISIC Rev.4; (ii) UN Statistics: ISIC Rev.4↔︎NAICS 2017
§4.3; cross-check for B; manufacturing NACE only
Note
Route B does not need NACE. It maps HS codes directly to NAICS using the concordance package’s internal lookup tables — no Eurostat intermediate. Route C is only needed when you start from NACE codes (i.e., when bridging ORBIS firm records to Compustat).
1.4 HS → NACE mapping chain
The full path from trade product to firm classification is:
Built from Eurostat’s official CN–CPA–NACE correspondence tables. ORBIS stores firm activity codes in field naceccod2 (NACE Rev.2); S&P Capital IQ stores the same under primaryNace. NACE Rev.2 has been stable since 2008 (Regulation EC 1893/2006), so the crosswalk does not need to be updated for firm-side changes.
1.5 Coverage after mining patch
The original Eurostat-derived table focused on manufacturing (NACE divisions 10–33) and omitted mining/extraction (NACE 05–09) and agriculture (01–03). A patch adds 33 raw material HS codes mapped to their NACE mining equivalents, raising coverage from 86% to 95% of Green Dictionary codes.
The remaining gap is mostly process equipment (HS chapters 84–85: laser cutters, sintering furnaces, UV curing equipment, wind turbine generators) and a handful of specialised chemicals. These span multiple NACE divisions — no single clean NACE anchor exists without further context.
2. Hierarchy for Selected Products
The HS → NACE → NAICS chain for key clean-tech products, with notes on where the 4-digit NACE ceiling causes information loss. Tabs show one technology each; within each tab the supply chain runs from raw material (upstream) to final product.
Note
ORBIS caveat: ORBIS stores firm activity codes (what firms do), not HS product output (what they make). NACE codes are a screening filter for likely producers — not a definitive producer list. Final Product NACE codes (e.g., 2720 for Batteries, 2611 for Solar) are the cleanest anchors.
Top: NACE 4-digit codes per clean technology, filled by product type. Bottom: NACE→NAICS6 mapping ambiguity by technology (Clean = 1-to-1 mapping; High = 6+ NAICS codes sharing one NACE). Both charts share the same x-axis ordering.
Show summary table code
nt |>group_by(tech) |>summarise(n_nace =n_distinct(nace_code),nace_fp =paste(sort(unique(nace_code[type =="Final Product"])), collapse =", "),naics_fp =paste(sort(unique(naics6_dominant[type =="Final Product"])), collapse =", "),.groups ="drop" ) |>kable(col.names =c("Technology", "NACE codes total", "Final Product NACE4(s)", "Final Product NAICS6"),caption ="Summary: NACE and NAICS coverage by technology") |>kable_styling(bootstrap_options =c("striped","condensed"), full_width =FALSE) |>column_spec(3, bold =TRUE, color ="#073309") |>column_spec(4, color ="#0369a1")
Summary: NACE and NAICS coverage by technology
Technology
NACE codes total
Final Product NACE4(s)
Final Product NAICS6
Batteries
20
2720
335911
Biofuel
18
2014, 2059
325180, 325199
Electrolyzers
16
2849
333243
Geothermal
28
2420, 2711, 2811, 2825, 2892
331110, 333120, 333611, 333999, 335312
Heat Pumps
8
2825
333999
Magnets
12
2599
332999
Nuclear
13
2530
332410
Solar
21
2611
334413
Transmission
12
2711
335312
Wind
32
2811
333611
4. Concordance Routes: Implementation
Route A is covered in full in §1.3 and §1 (HS→NACE chain). This section documents Routes B and C. NAICS 2017 goes to 6 digits vs NACE’s 4. Reference hierarchy for Batteries:
— vs NACE: 4-digit 2720 = Manufacture of batteries and accumulators —
4.1 Route A — HS → NACE (Eurostat)
Covered in full in §1 (HS→NACE chain). Source: Eurostat RAMON CN→CPA→NACE tables. Output file: data/hs_nace_correspondence_observations.csv (353 edges, 84 NACE codes). Target database: ORBIS (naceccod2 field).
4.2 Route B — HS → NAICS directly (concordance package)
Chain:HS 6-digit → NAICS 2017 6-digit (no NACE intermediary)
The concordance package (Liao & Russ 2021, CRAN v2.0.0) ships HS↔︎NAICS lookup tables internally. hs5_naics covers HS2017; hs4_naics covers HS2012 as fallback. Target databases: S&P Capital IQ and Compustat NA (naics field), US Census LFTTD.
# Install once (not in renv — build-step only)install.packages("concordance")library(concordance)# The package exposes HS-NAICS tables directly as data frames:# hs5_naics — HS2017 (6-digit) → NAICS 2017 (6-digit)# hs4_naics — HS2012 (6-digit) → NAICS 2017 (6-digit)# head(hs5_naics)# HS5_6d HS5_4d HS5_2d NAICS_6d NAICS_4d NAICS_2d# Invert our existing HS→NACE file to get NACE→HShs_nace <-read_csv("data/hs_nace_correspondence_observations.csv") |>transmute(hs_code =formatC(as.integer(code), width =6, flag ="0"),nace_code =as.character(nace) ) |>distinct()# Join: HS → NAICS (hs5 primary, hs4 fallback for HS2022 codes)hs_mapped <- hs_nace |>left_join(hs5_naics |>transmute(hs_code=HS5_6d, naics6=NAICS_6d, rev="HS5"),by="hs_code") |>left_join(hs4_naics |>transmute(hs_code=HS4_6d, naics6_fb=NAICS_6d, rev_fb="HS4"),by="hs_code") |>mutate(naics6 =coalesce(naics6, naics6_fb))# Aggregate to NACE: dominant NAICS = mode by HS-code countnace_naics <- hs_mapped |>filter(!is.na(naics6)) |>group_by(nace_code, naics6) |>summarise(n=n(), .groups="drop") |>group_by(nace_code) |>summarise(naics6_dominant =first(naics6[order(-n)]),naics6_all =paste(unique(naics6), collapse=";"),naics_n_mapped =n_distinct(naics6),.groups ="drop" )
Coverage: 309 of 353 HS codes matched in hs5_naics (HS2017). Some HS2022 codes are absent (introduced after HS2017 — e.g. 854142 Solar Cells, 854143 Solar Modules). All 84 NACE codes received a valid NAICS mapping via their other HS anchors.
4.3 Route C — NACE → NAICS via ISIC (ORBIS-to-Compustat bridge)
Eurostat publishes the official NACE Rev.2 ↔︎ ISIC Rev.4 correspondence via the RAMON classification server.
# Option 1: download from Eurostat RAMON (reproducible, requires internet)nace_isic_url <-paste0("https://ec.europa.eu/eurostat/ramon/nomenclatures/","index.cfm?TargetUrl=LST_NOM_DTL_GLOSSARY&StrNom=NACE_REV2","&StrLanguageCode=EN&IntPcKey=&IntKey=&bExport=true")# In practice: download the Excel from RAMON manually, or use the# correspondence embedded below (Section C manufacturing, our 67 codes).# Option 2: Direct equivalence for Section C (Manufacturing)# NACE Rev.2 was built from ISIC Rev.4. For Section C (Manufacturing),# 4-digit NACE codes are IDENTICAL to ISIC Rev.4 at 4-digit level,# with the following EU-specific exceptions:# NACE 2441–2446 → all map to ISIC 2420 (Manufacture of basic precious metals)# NACE 2611–2612 → map to ISIC 2610 (Manufacture of electronic components)# NACE 2651–2652 → map to ISIC 2651 (Manufacture of measuring instruments)# All other Section C 4-digit NACE codes: isic4 = nace4# Build NACE → ISIC for our 67 clean-tech codes:nace_isic_exceptions <- tibble::tribble(~nace_code, ~isic4,"2441", "2420", # Precious metals production → ISIC basic precious metals"2442", "2420","2443", "2420","2444", "2420","2445", "2420","2446", "2420","2611", "2610", # Electronic components (NACE splits ISIC 2610)"2612", "2610","2651", "2651", # Instruments (same at 4-digit — no exception needed)"2652", "2651"# Clock/watch mfg → ISIC 2652 actually)# For all Section C codes not in exceptions: NACE = ISICour_nace <-unique(read_csv("data/orbis/orbis_nace_tech.csv")$nace_code)nace_to_isic <- tibble::tibble(nace_code =as.character(our_nace)) |>left_join(nace_isic_exceptions, by ="nace_code") |>mutate(isic4 =coalesce(isic4, nace_code)) # default: nace = isic
Step 2: ISIC Rev.4 → NAICS 2017 (UN Statistics Division)
The UN Statistics Division publishes the official ISIC Rev.4 ↔︎ NAICS 2017 correspondence as a downloadable Excel (UN Correspondence Tables).
# Option 1: Download UN correspondence table (reproducible)# File: "ISIC_Rev_4_correspondence_with_NAICS_2017.xlsx"# Source: UN Statistics Division → Classifications → Correspondences# Option 2: Use the concordance package's ISIC descriptions to validate,# combined with the embedded key mappings below.# Key ISIC Rev.4 → NAICS 2017 mappings for our clean-tech codes# (from UN Statistics Division correspondence table):isic_naics_key <- tibble::tribble(~isic4, ~naics6, ~naics6_desc,"1041", "311225", "Fats and Oils Refining and Blending","1062", "311221", "Wet Corn Milling","1610", "321113", "Sawmills","1920", "324110", "Petroleum Refineries","2011", "325120", "Industrial Gas Manufacturing","2012", "325130", "Synthetic Dye and Pigment Manufacturing","2013", "325180", "Other Basic Inorganic Chemical Manufacturing","2014", "325190", "Other Basic Organic Chemical Manufacturing","2015", "325311", "Nitrogenous Fertilizer Manufacturing","2016", "325211", "Plastics Material and Resin Manufacturing","2017", "325212", "Synthetic Rubber Manufacturing","2030", "325510", "Paint and Coating Manufacturing","2052", "325520", "Adhesive Manufacturing","2059", "325998", "All Other Miscellaneous Chemical Product Manufacturing","2219", "326291", "Rubber Product Manufacturing, NEC","2221", "326113", "Unlaminated Plastics Film and Sheet Manufacturing","2229", "326199", "All Other Plastics Product Manufacturing","2311", "327211", "Flat Glass Manufacturing","2312", "327212", "Other Pressed and Blown Glass Manufacturing","2320", "327110", "Pottery, Ceramics, and Plumbing Fixture Manufacturing","2344", "327999", "All Other Miscellaneous Nonmetallic Mineral Product Mfg","2351", "327310", "Cement Manufacturing","2352", "327390", "Other Structural Clay Product Manufacturing","2369", "327990", "All Other Nonmetallic Mineral Product Manufacturing","2391", "327910", "Abrasive Product Manufacturing","2399", "327999", "All Other Miscellaneous Nonmetallic Mineral Product Mfg","2410", "331110", "Iron and Steel Mills and Ferroalloy Manufacturing","2420", "331410", "Nonferrous Metal (except Copper and Aluminum) Smelting","2431", "331420", "Copper Rolling, Drawing, Extruding, and Alloying","2432", "331491", "Nonferrous Metal (ex Copper/Aluminum) Rolling, Drawing","2433", "331313", "Aluminum Foundries (except Die-Casting)","2441", "331410", "Nonferrous Metal Smelting and Refining","2442", "331313", "Aluminum Production and Processing","2443", "331410", "Lead, Zinc, Tin Smelting and Refining","2444", "331410", "Copper Smelting and Refining","2445", "331410", "Other Nonferrous Metal Smelting","2446", "331410", "Precious Metals Smelting and Refining","2511", "332311", "Prefabricated Metal Building and Component Manufacturing","2530", "332410", "Power Boiler and Heat Exchanger Manufacturing","2572", "332510", "Hardware Manufacturing","2573", "332710", "Machine Shops","2594", "332999", "All Other Miscellaneous Fabricated Metal Product Mfg","2599", "332999", "All Other Miscellaneous Fabricated Metal Product Mfg","2610", "334413", "Semiconductor and Related Device Manufacturing","2611", "334413", "Semiconductor and Related Device Manufacturing","2651", "334515", "Instrument Manufacturing for Measuring and Testing","2660", "334510", "Electromedical and Electrotherapeutic Apparatus Mfg","2670", "333314", "Optical Instrument and Lens Manufacturing","2711", "335312", "Motor and Generator Manufacturing","2720", "335911", "Storage Battery Manufacturing","2732", "335929", "Other Communication and Energy Wire Manufacturing","2733", "335931", "Current-Carrying Wiring Device Manufacturing","2790", "335999", "All Other Miscellaneous Electrical Equipment Mfg","2811", "333612", "Speed Changer, Industrial High-Speed Drive, and Gear Mfg","2812", "333613", "Mechanical Power Transmission Equipment Manufacturing","2813", "333911", "Pump and Pumping Equipment Manufacturing","2815", "333923", "Overhead Traveling Crane, Hoist, and Monorail System Mfg","2821", "333111", "Farm Machinery and Equipment Manufacturing","2822", "333120", "Construction Machinery Manufacturing","2825", "333415", "Air-Conditioning and Warm Air Heating Equipment Mfg","2829", "333999", "All Other General Purpose Machinery Manufacturing","2841", "333514", "Special Industry Machinery Manufacturing","2849", "333249", "Other Industrial and Commercial Machinery Mfg","2891", "333514", "Metalworking Machinery Manufacturing","2892", "333131", "Mining Machinery and Equipment Manufacturing","2894", "333318", "Other Commercial and Service Industry Machinery Mfg","2896", "333220", "Plastics and Rubber Industry Machinery Manufacturing","2899", "333249", "Other Special Industry Machinery Manufacturing","2932", "336390", "Other Motor Vehicle Parts Manufacturing","3212", "339910", "Jewelry and Silverware Manufacturing","3812", "562119", "Other Waste Collection")# Chain: NACE → ISIC → NAICSnace_naics_routeB <- nace_to_isic |>left_join(isic_naics_key, by ="isic4") |>select(nace_code, isic4, naics6, naics6_desc)
Step 3: Compare Route B vs Route C
# Join both crosswalks and comparecomparison <- nace_naics_routeB |># from 08c_build_naics_crosswalk.Rrename(naics6_routeB = naics6_dominant) |>left_join( nace_naics_routeC |>rename(naics6_routeC = naics6),by ="nace_code" ) |>mutate(agreement = naics6_routeB == naics6_routeC)# Diagnostic: where do they disagree?comparison |>filter(!agreement) |>select(nace_code, isic4, naics6_routeB, naics6_routeC)
Show Route C execution code
# Execute Route C for our NACE codes and compare with Route B resultsnt <-read_csv(nace_tech_path, show_col_types =FALSE)xw <-read_csv(xw_path, show_col_types =FALSE) |>mutate(nace_code =as.character(nace_code))# NACE → ISIC: for Section C manufacturing, NACE = ISIC at 4-digit# with key exceptions:nace_isic_exc <- tibble::tribble(~nace_code, ~isic4,"2441", "2420", "2442", "2420", "2443", "2420","2444", "2420", "2445", "2420", "2446", "2420","2611", "2610", "2612", "2610")isic_naics_key <- tibble::tribble(~isic4, ~naics6_routeB, ~naics6_desc_B,"1041","311225","Fats and Oils Refining","1062","311221","Wet Corn Milling","1610","321113","Sawmills","1920","324110","Petroleum Refineries","2011","325120","Industrial Gas Mfg","2012","325130","Synthetic Dye Mfg","2013","325180","Basic Inorganic Chemical Mfg","2014","325190","Basic Organic Chemical Mfg","2016","325211","Plastics Material and Resin Mfg","2017","325212","Synthetic Rubber Mfg","2059","325998","Miscellaneous Chemical Product Mfg","2219","326291","Rubber Product Mfg NEC","2221","326113","Plastics Film/Sheet Mfg","2229","326199","Other Plastics Product Mfg","2311","327211","Flat Glass Mfg","2312","327212","Other Glass Mfg","2320","327110","Pottery and Ceramics Mfg","2344","327999","Nonmetallic Mineral NEC","2351","327310","Cement Mfg","2352","327390","Structural Clay Mfg","2369","327990","Nonmetallic Mineral Product Mfg","2391","327910","Abrasive Product Mfg","2399","327999","Nonmetallic Mineral Product NEC","2410","331110","Iron and Steel Mills","2420","331410","Nonferrous Metal Smelting","2441","331410","Nonferrous Metal Refining","2442","331313","Aluminum Production","2443","331410","Lead/Zinc/Tin Refining","2444","331410","Copper Refining","2445","331410","Other Nonferrous Metal","2446","331410","Precious Metals Refining","2511","332311","Prefab Metal Building Mfg","2530","332410","Heat Exchanger Mfg","2572","332510","Hardware Mfg","2573","332710","Machine Shops","2594","332999","Misc Fabricated Metal Mfg","2599","332999","Misc Fabricated Metal Mfg","2610","334413","Semiconductor Device Mfg","2611","334413","Semiconductor Device Mfg","2651","334515","Instruments for Measuring/Testing","2660","334510","Electromedical Apparatus Mfg","2670","333314","Optical Instrument Mfg","2711","335312","Motor and Generator Mfg","2720","335911","Storage Battery Mfg","2732","335929","Other Energy Wire Mfg","2733","335931","Wiring Device Mfg","2790","335999","Misc Electrical Equipment Mfg","2811","333612","Speed Changer/Gear Mfg","2812","333613","Power Transmission Equip","2813","333911","Pump and Pumping Equipment Mfg","2815","333923","Crane and Hoist Mfg","2821","333111","Farm Machinery Mfg","2822","333120","Construction Machinery Mfg","2825","333415","Air-Conditioning and Heating Equip Mfg","2829","333999","General Purpose Machinery NEC","2841","333514","Metalworking Machinery Mfg","2849","333249","Industrial Machinery NEC","2891","333514","Metalworking Machinery","2892","333131","Mining Machinery Mfg","2894","333318","Commercial Machinery Mfg","2896","333220","Plastics/Rubber Machinery Mfg","2899","333249","Special Industry Machinery NEC","2932","336390","Motor Vehicle Parts Mfg","3212","339910","Jewelry and Silverware Mfg","3812","562119","Other Waste Collection")our_nace <- xw |>distinct(nace_code)nace_to_isic_tbl <- our_nace |>left_join(nace_isic_exc, by ="nace_code") |>mutate(isic4 =coalesce(isic4, nace_code))routeC <- nace_to_isic_tbl |>left_join(isic_naics_key, by ="isic4")comparison <- xw |>select(nace_code, naics6_routeB = naics6_dominant, naics_n_mapped) |>left_join(routeC |>select(nace_code, isic4, naics6_routeC = naics6_routeB, naics6_desc_C = naics6_desc_B),by ="nace_code") |>mutate(agreement = naics6_routeB == naics6_routeC,agreement_lbl =ifelse(is.na(agreement), "C missing",ifelse(agreement, "Agree", "Differ")))n_agree <-sum(comparison$agreement ==TRUE, na.rm =TRUE)n_differ <-sum(comparison$agreement ==FALSE, na.rm =TRUE)n_miss <-sum(is.na(comparison$agreement))
Route C coverage: 30 NACE codes agree with Route B, 34 differ, 20 missing from Route C embedded table.
Code
comparison |>select(nace_code, isic4, naics6_routeB, naics6_routeC, naics_n_mapped, agreement_lbl) |>arrange(agreement_lbl, nace_code) |>kable(col.names =c("NACE4","ISIC4","Route B NAICS6","Route C NAICS6","B: # NAICS","Status"),caption ="Route B (HS-based concordance pkg) vs Route C (Eurostat NACE→ISIC→NAICS)" ) |>kable_styling(bootstrap_options =c("striped","hover","condensed"),full_width =TRUE, font_size =11) |>column_spec(3, bold =TRUE, color ="#073309") |>column_spec(4, bold =TRUE, color ="#0369a1") |>column_spec(6, color =ifelse( comparison |>arrange(agreement_lbl, nace_code) |>pull(agreement_lbl) =="Agree","#15803d", "#b45309" )) |>scroll_box(width ="100%", height ="380px")
Route B (HS-based concordance pkg) vs Route C (Eurostat NACE→ISIC→NAICS)
NACE4
ISIC4
Route B NAICS6
Route C NAICS6
B: # NAICS
Status
1062
1062
311221
311221
1
Agree
1610
1610
321113
321113
1
Agree
2011
2011
325120
325120
1
Agree
2013
2013
325180
325180
12
Agree
2016
2016
325211
325211
2
Agree
2017
2017
325212
325212
1
Agree
2219
2219
326291
326291
3
Agree
2221
2221
326113
326113
3
Agree
2311
2311
327211
327211
1
Agree
2351
2351
327310
327310
1
Agree
2391
2391
327910
327910
1
Agree
2399
2399
327999
327999
2
Agree
2410
2410
331110
331110
3
Agree
2441
2420
331410
331410
3
Agree
2443
2420
331410
331410
2
Agree
2445
2420
331410
331410
5
Agree
2530
2530
332410
332410
1
Agree
2572
2572
332510
332510
2
Agree
2599
2599
332999
332999
18
Agree
2611
2610
334413
334413
1
Agree
2651
2651
334515
334515
8
Agree
2660
2660
334510
334510
1
Agree
2670
2670
333314
333314
6
Agree
2711
2711
335312
335312
4
Agree
2720
2720
335911
335911
1
Agree
2790
2790
335999
335999
15
Agree
2822
2822
333120
333120
6
Agree
2829
2829
333999
333999
14
Agree
2896
2896
333220
333220
3
Agree
3212
3212
339910
339910
3
Agree
0111
0111
111998
NA
2
C missing
0220
0220
113310
NA
1
C missing
0710
0710
212210
NA
1
C missing
0721
0721
212291
NA
1
C missing
0729
0729
212299
NA
5
C missing
0811
0811
212319
NA
2
C missing
0891
0891
212325
NA
3
C missing
1040
1040
311223
NA
2
C missing
20112013
20112013
325180
NA
1
C missing
2015
2015
325311
NA
1
C missing
2030
2030
325510
NA
1
C missing
2052
2052
325520
NA
1
C missing
22232733
22232733
326199
NA
1
C missing
22293299
22293299
316992
NA
9
C missing
24453832
24453832
331410
NA
4
C missing
26112612
26112612
334412
NA
1
C missing
26512670
26512670
334511
NA
4
C missing
27112790
27112790
327110
NA
8
C missing
28293250
28293250
333999
NA
1
C missing
28942899
28942899
333244
NA
4
C missing
1041
1041
311224
311225
4
Differ
1920
1920
325110
324110
1
Differ
2012
2012
325180
325130
4
Differ
2014
2014
325199
325190
3
Differ
2059
2059
325180
325998
9
Differ
2229
2229
322220
326199
2
Differ
2312
2312
327215
327212
1
Differ
2320
2320
327120
327110
3
Differ
2344
2344
327110
327999
2
Differ
2352
2352
327410
327390
1
Differ
2369
2369
327390
327990
2
Differ
2420
2420
331110
331410
3
Differ
2442
2420
331313
331410
8
Differ
2444
2420
331420
331410
5
Differ
2446
2420
325180
331410
2
Differ
2511
2511
332312
332311
4
Differ
2573
2573
332212
332710
7
Differ
2594
2594
332722
332999
1
Differ
2732
2732
331420
335929
4
Differ
2733
2733
335314
335931
3
Differ
2811
2811
333611
333612
1
Differ
2812
2812
333995
333613
7
Differ
2813
2813
333415
333911
13
Differ
2815
2815
333613
333923
9
Differ
2821
2821
333994
333111
3
Differ
2825
2825
333999
333415
19
Differ
2841
2841
333517
333514
7
Differ
2849
2849
333243
333249
6
Differ
2891
2891
333516
333514
8
Differ
2892
2892
333120
333131
4
Differ
2894
2894
333249
333318
2
Differ
2899
2899
333242
333249
19
Differ
2932
2932
333111
336390
8
Differ
3812
3812
325180
562119
2
Differ
Tip
Where routes agree: For clean 1-to-1 NACE codes (like 2720 → 335911 Batteries, 2611 → 334413 Semiconductors), both routes give identical results. Where they differ: High-ambiguity codes (e.g. 2829 — general machinery) where Route B picks the modal NAICS across many HS codes while Route C uses the UN ISIC→NAICS primary mapping. Neither is definitively “correct” — these are genuinely many-to-many industries. Use naics6_all for full context.
5. NAICS6 Results: Implemented Crosswalk
The crosswalk in data/orbis/orbis_nace_tech.csv and data/concordance/nace_naics_crosswalk.csv uses Route B (HS-based, concordance package) as the primary NAICS source, which is the most reproducible and covers all 84 NACE codes without manual embedding.
New columns added to orbis_nace_tech.csv:
Column
Description
naics6_dominant
NAICS 2017 6-digit — modal mapping by HS-code count
naics4_dominant
NAICS 2017 4-digit — parent of dominant
naics6_all
All NAICS6 codes for this NACE, semicolon-separated
naics_n_mapped
Count of distinct NAICS6 codes (1 = clean mapping)
Nonferrous Metal (except Aluminum) Smelting and Refining
3
2441
Precious metals production
Electrolyzers
331410
Nonferrous Metal (except Aluminum) Smelting and Refining
3
2441
Precious metals production
Solar
331410
Nonferrous Metal (except Aluminum) Smelting and Refining
3
2442
Aluminium production
Batteries
331313
Alumina Refining and Primary Aluminum Production
8
2442
Aluminium production
Biofuel
331313
Alumina Refining and Primary Aluminum Production
8
2442
Aluminium production
Heat Pumps
331313
Alumina Refining and Primary Aluminum Production
8
2442
Aluminium production
Magnets
331313
Alumina Refining and Primary Aluminum Production
8
2442
Aluminium production
Solar
331313
Alumina Refining and Primary Aluminum Production
8
2442
Aluminium production
Transmission
331313
Alumina Refining and Primary Aluminum Production
8
2442
Aluminium production
Wind
331313
Alumina Refining and Primary Aluminum Production
8
2445
Other non-ferrous metal production
Batteries
331410
Nonferrous Metal (except Aluminum) Smelting and Refining
5
2445
Other non-ferrous metal production
Biofuel
331410
Nonferrous Metal (except Aluminum) Smelting and Refining
5
2445
Other non-ferrous metal production
Electrolyzers
331410
Nonferrous Metal (except Aluminum) Smelting and Refining
5
2611
Manufacture of electronic components
Solar
334413
Semiconductor and Related Device Manufacturing
1
2651
Manufacture of instruments and appliances for measuring, testing and navigation
Batteries
334515
Instrument Manufacturing for Measuring and Testing Electricity and Electrical Signals
8
2651
Manufacture of instruments and appliances for measuring, testing and navigation
Wind
334515
Instrument Manufacturing for Measuring and Testing Electricity and Electrical Signals
8
2711
Manufacture of electric motors, generators and transformers
Geothermal
335312
Motor and Generator Manufacturing
4
2711
Manufacture of electric motors, generators and transformers
Solar
335312
Motor and Generator Manufacturing
4
2711
Manufacture of electric motors, generators and transformers
Transmission
335312
Motor and Generator Manufacturing
4
2711
Manufacture of electric motors, generators and transformers
Wind
335312
Motor and Generator Manufacturing
4
2720
Manufacture of batteries and accumulators
Batteries
335911
Storage Battery Manufacturing
1
2825
Manufacture of non-domestic cooling and ventilation equipment
Biofuel
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
19
2825
Manufacture of non-domestic cooling and ventilation equipment
Electrolyzers
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
19
2825
Manufacture of non-domestic cooling and ventilation equipment
Geothermal
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
19
2825
Manufacture of non-domestic cooling and ventilation equipment
Heat Pumps
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
19
2829
Manufacture of other general-purpose machinery nec
Batteries
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
14
2829
Manufacture of other general-purpose machinery nec
Biofuel
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
14
2829
Manufacture of other general-purpose machinery nec
Electrolyzers
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
14
2829
Manufacture of other general-purpose machinery nec
Geothermal
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
14
2829
Manufacture of other general-purpose machinery nec
Nuclear
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
14
2829
Manufacture of other general-purpose machinery nec
Solar
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
14
2829
Manufacture of other general-purpose machinery nec
Transmission
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
14
2829
Manufacture of other general-purpose machinery nec
Wind
333999
All Other Miscellaneous General Purpose Machinery Manufacturing
14
5.3 Many-to-many ambiguity
Code
if (file.exists(xw_path)) { xw <-read_csv(xw_path, show_col_types =FALSE) tibble::tibble(Metric =c("NACE codes with clean 1-to-1 NAICS mapping","NACE codes with 2–5 NAICS codes (moderate ambiguity)","NACE codes with 6+ NAICS codes (high ambiguity — use naics6_all)","Median NAICS codes per NACE" ),Value =c(sum(xw$naics_n_mapped ==1),sum(xw$naics_n_mapped >=2& xw$naics_n_mapped <=5),sum(xw$naics_n_mapped >=6),median(xw$naics_n_mapped) ) ) |>kable() |>kable_styling(bootstrap_options =c("condensed"), full_width =FALSE)}
Metric
Value
NACE codes with clean 1-to-1 NAICS mapping
26
NACE codes with 2–5 NAICS codes (moderate ambiguity)
37
NACE codes with 6+ NAICS codes (high ambiguity — use naics6_all)
21
Median NAICS codes per NACE
3
High-ambiguity codes (NACE 2829, 2899, 2013, 2651) are generic machinery and chemicals categories that span many NAICS industries. For these, naics6_all should be used for analysis, or the comparison filtered to the specific product stage (Final Product NACE codes are all clean 1-to-1 mappings).
6. Firm-Level Database Comparison
Note
Database landscape, classification granularity, and decision guide are in §1 (Overview) above, alongside the concordance routes. This section adds JHU-specific subscription costs and a database selection guide.
Code
tibble::tribble(~Research_Question, ~Recommended_DB, ~Est_Cost, ~Reason,"Global manufacturing firm locations (all countries)", "ORBIS", "$25–60K/yr", "Only DB with private firms globally; NACE4 globally","US listed company financials + market data", "Compustat N. America", "Bundled in WRDS", "Deepest US coverage since 1950s; NAICS6 + SIC4","Cross-country listed company financials", "Compustat Global", "Bundled in WRDS", "Consistent accounting standards across 80 countries","Private equity / M&A / private firm deals", "S&P Capital IQ Pro", "$60–150K/yr", "Broadest private company coverage; GICS8 globally","IRA-linked US manufacturing capacity", "Compustat + ORBIS", "Combined", "Compustat for listed US; ORBIS for private US + global","Supply chain linkages between firms", "S&P Capital IQ Pro", "$60–150K/yr", "Supply chain relationships module") |>kable(col.names =c("Research Question","Recommended DB","Est. Annual Cost","Reason"),caption ="Database selection guide for clean-tech research") |>kable_styling(bootstrap_options =c("striped","hover","condensed"), full_width =TRUE) |>column_spec(2, bold =TRUE, color ="#073309")
Database selection guide for clean-tech research
Research Question
Recommended DB
Est. Annual Cost
Reason
Global manufacturing firm locations (all countries)
ORBIS
$25–60K/yr
Only DB with private firms globally; NACE4 globally
US listed company financials + market data
Compustat N. America
Bundled in WRDS
Deepest US coverage since 1950s; NAICS6 + SIC4
Cross-country listed company financials
Compustat Global
Bundled in WRDS
Consistent accounting standards across 80 countries
Private equity / M&A / private firm deals
S&P Capital IQ Pro
$60–150K/yr
Broadest private company coverage; GICS8 globally
IRA-linked US manufacturing capacity
Compustat + ORBIS
Combined
Compustat for listed US; ORBIS for private US + global
Supply chain linkages between firms
S&P Capital IQ Pro
$60–150K/yr
Supply chain relationships module
Note
JHU Library: ORBIS (BvD/Moody’s) and WRDS (Compustat NA + Global) confirmed. S&P Capital IQ Pro: verify with Sheridan Libraries — access may be limited. Contact dataservices@jhu.edu. Cost estimates are indicative academic pricing.
76-row crosswalk with tech labels + ambiguity tier (generated below)
8. Crosswalk CSV for Distribution
A clean, annotated version of the crosswalk is written to data/concordance/nace_naics_crosswalk_annotated.csv for sharing with collaborators who do not need to run R.
Clean technology/technologies linked to this NACE code
naics6_dominant
Most frequent NAICS 2017 6-digit code (Route B)
naics6_desc
Description of dominant NAICS6
naics4_dominant
Parent NAICS 4-digit
naics_n_mapped
Count of distinct NAICS6 codes mapping to this NACE
ambiguity_tier
Clean / Moderate / High
naics6_all
All NAICS6 codes, semicolon-separated
The file has 84 rows (one per NACE code) and is suitable for sharing as a standalone reference.
9. JHU Faculty Using Firm-Level Data
SAIS and Krieger faculty using firm-level datasets are relevant for potential collaboration or shared data access. The concordance routes in §1.3 above connect to the datasets these faculty use: Route A (NACE → ORBIS), Route B (NAICS → Compustat), and Census micro-linkage (LFTTD, described below).
Census micro-linkage: ground truth for US firms
A fourth option beyond Routes A–C exists for US firms: match ORBIS records to US Census Bureau records directly, bypassing classification translation entirely. The key dataset enabling this is the LFTTD:
Note
What is the LFTTD? The Longitudinal Firm Trade Transactions Database (LFTTD) is a restricted U.S. Census Bureau microdata file that links Customs import/export transaction records (bill-of-lading level, product code, value, country) to firm identifiers from the Census Business Register. It contains the universe of US goods trade at the firm–product–partner level. Access requires an approved Census Research Data Center (RDC) project — it is not a commercial subscription. It is not an alternative to ORBIS; it is the micro-level linkage that makes it possible to validate whether a firm classified as NAICS 334413 (Semiconductors) in Census records actually exports HS 854143 (solar modules).
SAIS faculty Mine Senses and Pravin Krishna have used LFTTD-type Census transaction data in published work.
JHU faculty with confirmed firm-level data use
Code
tibble::tribble(~Faculty, ~Affiliation, ~Field, ~Dataset_Used, ~Notes,"Mine Z. Senses", "SAIS", "International trade & labor", "U.S. Census LFTTD/LBD", "Firm–transaction matched data; published JIE 2014, 2016","Pravin Krishna", "SAIS / Krieger", "International economics & trade", "Indian customs × firm balance sheets; Census matched data", "RDC-type micro linkage; NBER 2024","Gordon M. Bodnar","SAIS", "International corporate finance", "Compustat (N. America)", "Exchange rate exposure of US multinationals; confirmed via co-authored papers",) |>kable(col.names =c("Faculty","Unit","Field","Dataset(s)","Evidence"),caption ="JHU faculty with confirmed firm-level / trade micro-data use") |>kable_styling(bootstrap_options =c("striped","condensed"), full_width =TRUE) |>column_spec(4, italic =TRUE)
JHU faculty with confirmed firm-level / trade micro-data use
Faculty
Unit
Field
Dataset(s)
Evidence
Mine Z. Senses
SAIS
International trade & labor
U.S. Census LFTTD/LBD
Firm–transaction matched data; published JIE 2014, 2016
Pravin Krishna
SAIS / Krieger
International economics & trade
Indian customs × firm balance sheets; Census matched data
RDC-type micro linkage; NBER 2024
Gordon M. Bodnar
SAIS
International corporate finance
Compustat (N. America)
Exchange rate exposure of US multinationals; confirmed via co-authored papers
Note
Sebnem Kalemli-Özcan (now at Brown University) is the leading methodologist for ORBIS use in economics. Her NBER working paper “How to Construct Nationally Representative Firm-Level Data from the Orbis Global Database” (NBER w21558, published AEJ: Macro 2024) is the canonical reference for any serious ORBIS work — covering sample selection, consolidation, coverage gaps, and winsorization. Useful reading before scaling up the CVCE ORBIS pipeline.