Identifying Battery Manufacturing Firms in ORBIS from HS Codes

Author

Oriol Vallès Codina

Published

December 13, 2025

1. Motivation and scope

This document identifies firms likely involved in battery manufacturing using ORBIS firm-level data. The starting point is a pair of HS product codes for batteries, which are mapped to economic activities (NACE) using Eurostat correspondences.

Caveat (non-negotiable): ORBIS does not contain HS-coded product output. Firms are identified via activity classifications (NACE), motivated by HS–CN–CPA correspondences. The mapping is inherently many-to-many and should be interpreted as a screening, not proof of HS-level production.


2. HS → CN → CPA → NACE mapping

2.1 HS codes of interest

We start from the following HS6 codes:

  • 850760 — Lithium-ion accumulators
  • 850780 — Other accumulators

2.2 CN–CPA correspondence (Eurostat)

Using Eurostat CN–CPA correspondence tables (CN 2022 ↔︎ CPA 2.1), we obtained:

CN (2022) CPA (2.1)
85076000 27.20.23
85078000 27.20.23

Thus, both HS codes map to CPA 27.20.23.

2.3 CPA → NACE anchor

CPA 27.20.23 belongs to the product family associated with:

  • NACE Rev.2 = 27.20
  • Manufacture of batteries and accumulators

This NACE code is used as the activity anchor in ORBIS.


3. ORBIS data and NACE encoding

3.1 Data source

The ORBIS data used here is a processed Parquet file containing firm identifiers, classifications, balance-sheet variables, and derived statistics.

It is not raw ORBIS: several variables (growth rates, moments, quantiles) appear to have been computed by a previous researcher.

3.2 Setup

Edit these paths to match your environment.

# ---- paths ----
orbis_parquet_path <- "../../../data/orbis.parquet" # adjust

# ---- connect (DuckDB reads Parquet lazily) ----
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:")

3.3 Schema inspection

schema_df <- DBI::dbGetQuery(
  con,
  sprintf("DESCRIBE SELECT * FROM read_parquet('%s')", orbis_parquet_path)
)
schema_df
           column_name column_type null  key default extra
1             column00     VARCHAR  YES <NA>    <NA>  <NA>
2                bvdid     VARCHAR  YES <NA>    <NA>  <NA>
3  CATEGORY_OF_COMPANY     VARCHAR  YES <NA>    <NA>  <NA>
4             CONSCODE     VARCHAR  YES <NA>    <NA>  <NA>
5        CLOSDATE_year     VARCHAR  YES <NA>    <NA>  <NA>
6                 FIAS     VARCHAR  YES <NA>    <NA>  <NA>
7                 TFAS     VARCHAR  YES <NA>    <NA>  <NA>
8                 TOAS     VARCHAR  YES <NA>    <NA>  <NA>
9                 DEPR     VARCHAR  YES <NA>    <NA>  <NA>
10              _40025     VARCHAR  YES <NA>    <NA>  <NA>
11       NAME_INTERNAT     VARCHAR  YES <NA>    <NA>  <NA>
12            POSTCODE     VARCHAR  YES <NA>    <NA>  <NA>
13       CITY_INTERNAT     VARCHAR  YES <NA>    <NA>  <NA>
14   REGION_IN_COUNTRY     VARCHAR  YES <NA>    <NA>  <NA>
15             COUNTRY     VARCHAR  YES <NA>    <NA>  <NA>
16                EBTA     VARCHAR  YES <NA>    <NA>  <NA>
17        major_sector     VARCHAR  YES <NA>    <NA>  <NA>
18  nace2_main_section     VARCHAR  YES <NA>    <NA>  <NA>
19           naceccod2     VARCHAR  YES <NA>    <NA>  <NA>
20           nacecdes2     VARCHAR  YES <NA>    <NA>  <NA>
21           Year_diff     VARCHAR  YES <NA>    <NA>  <NA>
22            Deflator     VARCHAR  YES <NA>    <NA>  <NA>
23     FIAS_growth_log     VARCHAR  YES <NA>    <NA>  <NA>
24     TFAS_growth_log     VARCHAR  YES <NA>    <NA>  <NA>
25             geo_ind     VARCHAR  YES <NA>    <NA>  <NA>
26         Data_source     VARCHAR  YES <NA>    <NA>  <NA>
27           IndexType     VARCHAR  YES <NA>    <NA>  <NA>
28               Index     VARCHAR  YES <NA>    <NA>  <NA>
29                Time     VARCHAR  YES <NA>    <NA>  <NA>
30                mean     VARCHAR  YES <NA>    <NA>  <NA>
31            kurtosis     VARCHAR  YES <NA>    <NA>  <NA>
32            variance     VARCHAR  YES <NA>    <NA>  <NA>
33         quantile_01     VARCHAR  YES <NA>    <NA>  <NA>
34         quantile_05     VARCHAR  YES <NA>    <NA>  <NA>
35         quantile_10     VARCHAR  YES <NA>    <NA>  <NA>
36         quantile_25     VARCHAR  YES <NA>    <NA>  <NA>
37         quantile_50     VARCHAR  YES <NA>    <NA>  <NA>
38         quantile_75     VARCHAR  YES <NA>    <NA>  <NA>
39         quantile_90     VARCHAR  YES <NA>    <NA>  <NA>
40         quantile_95     VARCHAR  YES <NA>    <NA>  <NA>
41         quantile_99     VARCHAR  YES <NA>    <NA>  <NA>
42        quantile_100     VARCHAR  YES <NA>    <NA>  <NA>

The relevant activity fields are:

  • naceccod2 — NACE Rev.2 code (string)
  • nacecdes2 — NACE description
  • nace2_main_section — high-level section

3.4 NACE encoding checks

We verify that 4-character NACE codes are digits-only.

nchar_dist <- DBI::dbGetQuery(
  con,
  sprintf(
    "
    SELECT
      length(naceccod2) AS len,
      COUNT(*) AS n
    FROM read_parquet('%s')
    GROUP BY 1
    ORDER BY 1
  ",
    orbis_parquet_path
  )
)
nchar_dist
  len        n
1   1        2
2   2      107
3   3   757431
4   4 22680915
digit_check <- DBI::dbGetQuery(
  con,
  sprintf(
    "
    SELECT
      SUM(CASE WHEN length(naceccod2)=4 AND regexp_full_match(naceccod2, '^[0-9]{4}$')
               THEN 1 ELSE 0 END) AS ok_4digit,
      SUM(CASE WHEN length(naceccod2)=4 THEN 1 ELSE 0 END) AS total_4char
    FROM read_parquet('%s')
  ",
    orbis_parquet_path
  )
)
digit_check
  ok_4digit total_4char
1  22680915    22680915

If ok_4digit == total_4char, then NACE4 is encoded as digits-only strings like "2720".


4. Extraction of battery manufacturing firms

4.1 Row-level extraction (NACE 2720)

We filter ORBIS rows with NACE Rev.2 code:

  • 2720 — Manufacture of batteries and accumulators
f <- DBI::dbGetQuery(
  con,
  sprintf(
    "
    SELECT *
    FROM read_parquet('%s')
    WHERE naceccod2 = '2720'
  ",
    orbis_parquet_path
  )
)

dim(f)
[1] 9889   42
str(f, max.level = 1)
'data.frame':   9889 obs. of  42 variables:
 $ column00           : chr  "2478" "3250" "5911" "7897" ...
 $ bvdid              : chr  "DZ09B0764222" "DZ98B0862237" "DZ05B0107843" "DZ09B0764222" ...
 $ CATEGORY_OF_COMPANY: chr  "MEDIUM SIZED COMPANY" "MEDIUM SIZED COMPANY" "MEDIUM SIZED COMPANY" "MEDIUM SIZED COMPANY" ...
 $ CONSCODE           : chr  "U1" "U1" "U1" "U1" ...
 $ CLOSDATE_year      : chr  "2011" "2011" "2012" "2012" ...
 $ FIAS               : chr  "1258925.89566195" "27440.0665062956" "88965.06" "77710.06" ...
 $ TFAS               : chr  "NA" "NA" "NA" "NA" ...
 $ TOAS               : chr  "5245143.70604891" "81268.6767126696" "100412.85" "81807.12" ...
 $ DEPR               : chr  "NA" "5455.81018696975" "-11594.57" "NA" ...
 $ _40025             : chr  "Corporate" "Corporate" "Corporate" "Corporate" ...
 $ NAME_INTERNAT      : chr  "Sarl Bergane Energy" "Sarl Sofacc" "Sarl STE Algerienne DES Accumulateurs Unite Oran" "Sarl Bergane Energy" ...
 $ POSTCODE           : chr  NA NA NA NA ...
 $ CITY_INTERNAT      : chr  "Berriane" "Berriane" "Hassi Bounif" "Berriane" ...
 $ REGION_IN_COUNTRY  : chr  "Ghardaia" "Ghardaia" "Oran" "Ghardaia" ...
 $ COUNTRY            : chr  "Algeria" "Algeria" "Algeria" "Algeria" ...
 $ EBTA               : chr  "NA" "23358.788139299" "-12840.47" "NA" ...
 $ major_sector       : chr  "Industrial, Electric & Electronic Machinery" "Industrial, Electric & Electronic Machinery" "Industrial, Electric & Electronic Machinery" "Industrial, Electric & Electronic Machinery" ...
 $ nace2_main_section : chr  "C - Manufacturing" "C - Manufacturing" "C - Manufacturing" "C - Manufacturing" ...
 $ naceccod2          : chr  "2720" "2720" "2720" "2720" ...
 $ nacecdes2          : chr  "Manufacture of batteries and accumulators" "Manufacture of batteries and accumulators" "Manufacture of batteries and accumulators" "Manufacture of batteries and accumulators" ...
 $ Year_diff          : chr  "1" "1" "1" "1" ...
 $ Deflator           : chr  "93.0593225571861" "93.0593225571861" "100" "100" ...
 $ FIAS_growth_log    : chr  "-2.78502945130906" "-0.106790099147275" "0.0544241875515256" "0.11352463757536" ...
 $ TFAS_growth_log    : chr  "NA" "NA" "NA" "NA" ...
 $ geo_ind            : chr  "Algeria Ghardaia Berriane 2011" "Algeria Ghardaia Berriane 2011" "Algeria Oran Hassi Bounif 2012" "Algeria Ghardaia Berriane 2012" ...
 $ Data_source        : chr  "Google Maps API" "Google Maps API" "Google Maps API" "Google Maps API" ...
 $ IndexType          : chr  "Index1" "Index1" "Index1" "Index1" ...
 $ Index              : chr  "44104" "44104" "45180" "44104" ...
 $ Time               : chr  "2011" "2011" "2012" "2012" ...
 $ mean               : chr  "27.3400991596588" "27.3400991596588" "24.3141939574904" "28.1336912066559" ...
 $ kurtosis           : chr  "1.64769923617101" "1.64769923617101" "1.64075687641879" "1.69208409768496" ...
 $ variance           : chr  "76.1899159220837" "76.1899159220837" "61.3591105593609" "96.2355460172974" ...
 $ quantile_01        : chr  "14.7206357192994" "14.7206357192994" "11.7148934364319" "11.6192949771881" ...
 $ quantile_05        : chr  "16.070845413208" "16.070845413208" "13.4513666629792" "14.4646921157837" ...
 $ quantile_10        : chr  "16.5799991607666" "16.5799991607666" "14.5896010398865" "15.6790790557862" ...
 $ quantile_25        : chr  "18.7607173919678" "18.7607173919678" "17.4301400184632" "18.8800129890442" ...
 $ quantile_50        : chr  "26.6600170135498" "26.6600170135498" "23.2173137664795" "27.3502235412598" ...
 $ quantile_75        : chr  "35.7472076416016" "35.7472076416016" "31.4587111473084" "37.0483531951904" ...
 $ quantile_90        : chr  "39.6790298461914" "39.6790298461914" "35.2621440887451" "41.677209854126" ...
 $ quantile_95        : chr  "41.0148834228516" "41.0148834228516" "35.9908561706543" "43.3350086212158" ...
 $ quantile_99        : chr  "42.3754103088379" "42.3754103088379" "37.8746026992798" "44.5432813644409" ...
 $ quantile_100       : chr  "43.2498626708984" "43.2498626708984" "38.5001335144043" "45.2242965698242" ...

Note: Multiple rows per firm may exist (panel / index structure), so rows ≠ firms.

4.2 Firm-level counts

n_firms <- length(unique(f$bvdid))
n_rows <- nrow(f)

data.frame(n_rows = n_rows, n_distinct_firms = n_firms)
  n_rows n_distinct_firms
1   9889             2651

4.3 Quick label sanity check

label_check <- DBI::dbGetQuery(
  con,
  sprintf(
    "
    SELECT naceccod2, nacecdes2, COUNT(*) AS n
    FROM read_parquet('%s')
    WHERE naceccod2 = '2720'
    GROUP BY 1,2
    ORDER BY n DESC
  ",
    orbis_parquet_path
  )
)
label_check
  naceccod2                                 nacecdes2    n
1      2720 Manufacture of batteries and accumulators 9889

5. Coverage diagnostics

5.1 Temporal coverage

table(f$CLOSDATE_year)

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 
  29   33   29   33   41   40  602  687 1020  919  601  879  983  528  628  732 
2017 2018 2019 
 811  711  583 

5.2 Country coverage

sort(table(f$COUNTRY), decreasing = TRUE)

               China              Germany                India 
                6766                  433                  343 
               Italy   Russian Federation        Taiwan, China 
                 299                  231                  228 
      United Kingdom               France              Ukraine 
                 177                  128                  128 
              Poland    Republic of Korea                Japan 
                 125                  114                   86 
            Malaysia                Spain          Netherlands 
                  85                   78                   57 
             Belgium              Hungary              Austria 
                  55                   46                   37 
              Sweden       Czech Republic          Philippines 
                  34                   33                   32 
            Bulgaria               Norway               Turkey 
                  30                   24                   24 
             Algeria              Romania               Serbia 
                  23                   22                   22 
          Luxembourg               Greece             Portugal 
                  21                   20                   19 
              Mexico            Australia             Colombia 
                  17                   15                   15 
             Finland            Singapore             Slovakia 
                  15                   15                   15 
             Morocco               Brazil             Slovenia 
                  12                   11                   10 
             Croatia              Denmark              Estonia 
                   7                    7                    7 
     North Macedonia             Pakistan              Uruguay 
                   7                    7                    4 
Hong Kong SAR, China               Cyprus         South Africa 
                   2                    1                    1 
            Thailand 
                   1 

5.3 Location completeness

table(is.na(f$POSTCODE))

FALSE  TRUE 
 9747   142 
table(is.na(f$CITY_INTERNAT))

FALSE  TRUE 
 9332   557 
table(is.na(f$REGION_IN_COUNTRY))

FALSE  TRUE 
 9217   672 
table(is.na(f$geo_ind))

FALSE 
 9889 

Observation: geo_ind appears fully populated; postcode/city/region are mostly available but not universal. The field geo_ind encodes a composite string (country–region–city–year) and can be parsed later for spatial aggregation.


6. Interpretation of key variables

6.1 Core economic variables (firm-level)

These are standard ORBIS-style quantities (when present and well-defined):

  • FIAS — Fixed assets (capital stock proxy)
  • TFAS — Tangible fixed assets
  • TOAS — Total assets (firm size)
  • DEPR — Depreciation (capital consumption)
  • EBTA — Earnings before taxes (profitability proxy)

These are the economically meaningful firm variables to start with.


7. Methodological caveat (explicit, for reuse)

# ORBIS provides firm activity codes (NACE), not HS product output.
# Firms are screened using NACE Rev.2 = 2720 (Manufacture of batteries and accumulators),
# motivated by Eurostat HS/CN 850760–850780 → CPA 27.20.23 → NACE 27.20.
# The mapping is many-to-many: NACE activity ≠ HS production.
# Results identify likely battery manufacturers by activity classification, not definitive HS producers.

8. Next steps (not executed here)

  • Construct a firm-level table (one row per bvdid) for reporting and deduplication.
  • Explore balance-sheet magnitudes and growth variables (after confirming units/deflation).
  • Parse geo_ind for spatial aggregation.
  • Only then: mapping (country map first; firm-level geocoding only if justified).