## Introduction

Black carbon (BC) is a notoriously difficult aerosol species to
characterize and quantify (e.g. Andreae et al. **2006**; Petzold et al. **2013**), which is why each property reported about these particles
is primarily defined by the measurement technique used. Research surrounding these
particles has been conducted with respect to air quality and health related problems
since the 1950s (Novakov and Rosen **2013**), but BC has also been studied with respect to post nuclear
war scenarios of so-called Nuclear winter (e.g. Crutzen and Birks **1982**). In this extreme case it is
predicted that smoke from extensive fires will block incoming the light from the
sun, which will cool the surface of the Earth. However, in the ambient atmosphere
the mass faction of BC is typically small compared to other aerosol species such as
sulfates, nitrates, sea salt, dust, or non-black organic (Seinfeld and
Pandis **2016**). Nevertheless,
despite its minute relative contribution it has been proposed that BC is only second
to CO_{2} in contributing to global climate change (Hansen and
Nazarenko **2004**; Johnson et al. **2019**; Jones et al. **2018**). The reason for this is the many possible
feedback mechanisms that can be activated if BC change the surface albedo of snow
and ice. Different from the Nuclear winter scenarios, BC in the present atmosphere
is expected to have a net warming effect on the Earth’s climate system
(IPCC: Chlimate Change **2014**). Characterizing the ratio between scattering and absorbing
aerosols and its evolution over time from long term observation provide necessary
knowledge for constraining the radiative forcing of aerosols in General Circulation
Models (GCM).

BC is formed during combustion of carbon fuels and freshly emitted particles are
typically chain aggregates of small spheres, called primary spheres. These primary
spheres are often in the range 20-50 nm in diameter. In conditions of high
BC concentrations, such as for instance in vehicle tailpipes, the aggregates will
grow rapidly and reach characteristic sizes of several hundreds of nanometer in
diameter (Kittelson **1998**). Because these particles are different from perfect spheres,
they are often described by their fractal dimension (e.g. Wang et al. **2017**), which characterizes the
fluffiness of the particle compared to a compact sphere. Traditionally, the strong
light absorption by BC has been used to quantify the amount present in exhaust, in
the atmosphere, or in snow and ice (Clarke and Noone **2007**; Hansen et al. **1984**). The measurement principle is to collect particles on a
filter substrate that will be stained by BC particles (and other light
absorbing particles). The sample spot is compared to a clean reference
surface of the filter and the blackness of the sample area, or rate of blackening,
is converted to equivalent BC (eBC) (Petzold et al. **2013**). Over the latest years,
advances in aerosol technology allowed for single particle analysis of aerosols and
in particular the introduction of the SP2 (Single Particles Soot
Photometer) instrument (Schwarz et al. **2006**) brings new insight to the properties of BC.
The SP2 uses incandescence to determine the amount of BC in each particle. Since the
emitted infrared light is due to the remaining refractive particles, this BC is
often referred to as rBC (Petzold et al. **2013**). Main advantages from these measurements is that the total
BC mass can be distributed as function of particles size and also the state of
mixing with non-refractory material can be determined. Additionally, characterizing
the ratio between scattering and absorbing aerosols size depending and its evolution
over time from long-term observation provide necessary knowledge for constraining
the radiative forcing of aerosols in GCMs.

Based on results comparing long-term data from remote locations and numerical
transport models, it is apparent that there is a lack in process understanding
concerning factors controlling even the magnitude and seasonal variation of the
first order metric, the eBC mass concentration in the Arctic (Korhonen et
al. **2008**). Hence, more observed
metrics, which describe BC in several dimensions could be helpful in order to better
understand the interactions between BC and the environment. This includes improved
BC source characterization and BC interaction with clouds and precipitation.

Currently, size resolve BC information from state-of-the-art instrumentations do not extend as far back in time and do not have the same spatial coverage as traditional filter based light absorption. In an attempt to alleviate this lack of multi-dimensional data, this study explores the usage of a statistical relationship between observed aerosol size distributions using a DMPS (Differential Mobility Particles Sizer) system and light absorption measured using a custom-built PSAP (Particle Soot Absorption Photometer). In this study, the bulk aerosol absorption measured using the PSAP is distributed over the size distribution based on the correlation between individual size bins and the total absorption signal. This approach is tested in the laboratory and applied on long-term data from the Zeppelin station, Ny-Ålesund Svalbard, in order to explore this derived additional dimension in data.

## Rationale

The study is divided into two main parts. The first part of the study is a presentation of a laboratory experiment conducted at Stockholm University, which was aimed at comparing observed size resolved eBC data with the same type of data, but inferred from using correlations between observed size distribution and observed total eBC concentrations. This first part will serve as a proof-of-concept of the statistical approach. In the second part of the study, the statistical approach is applied on long-term data to explore the potential of the added value from combining the data and to test the integrity in the results.

In section three, a brief description is made of the instrumentation used in the laboratory experiment and the monitoring program. In section four, the laboratory experiment and results are presented. In section five, the statistical approach is applied to the long-term data and the results are presented. For analyzing the data we used kmeans clustering on observed size distributions. A brief description of this methodology is included in section five. A flowchart outlining the methodological steps taken to derive a size resolved BC number distribution is shown in Figure S1. The results from the statistical approach is compared to reported Arctic observations using the SP2 instrument.

Statistical relationships between size distributions and BC have been reported before
by (Krecl et al. **2017**; Olivares
et al. **2007**), however, it is our
opinion that the statistical approach tested in this study adds value to existing
long-term data sets. This method of characterizing BC and will be advantageous in
studies of the BC life cycle and processes controlling it.

## Instrumentation and data

### PSAPS and MAAP

To measure eBC at the Zeppelin station and during the laboratory experiment,
custom-built PSAP’s were used. The PSAP instruments measures the rate of
change in light transmission (wavelength of 526 nm) as
particles are collected over a filter medium (Krecl et al. **2007**). For reference,
transmission is also measured over a clean filter area to compensate for any
variations in the light source intensity. This is achieved by splitting the
light from a single light source into two by using light pipes in combination of
two light detectors, one beneath the unloaded area of the filter and one beneath
the particle loaded area of the filter. The PSAP’s used in this study
(one instrument at the Zeppelin station and two during the laboratory
experiment) are essentially identical with respect to the sensing part
of the instrument. The main difference is that the instrument at the Zeppelin
station required a manual change of the filter once the transmission of light
reached 50%. The two instruments used in the laboratory study were
designed to change filter area automatically once the transmission threshold is
reached. One feature of the custom-built PSAP, compared to the commercial
versions, is the option to change the sample flow rate as the concentration of
light absorbing particles changes in the ambient air. For every hour, the
instrument checks if the ambient concentration increases or decreases and the
sample flow is adjusted accordingly to maintain a small variation in the signal
to noise ratio (Krecl et al. **2007**). This allows for a larger dynamical range in the
observed concentrations compared to operating with a fixed sample flow rate.
This feature is not used if the PSAP is connected to a DMPS system, where the
flowrate was 1 L min^{−1}.

The primary absorption signal was corrected for the enhancement in the signal
from the filter medium, loading factor, and influence from embedded light
scattering aerosols using Bond et al. (**1999**). This correction factor depends on the
scattering coefficient of the ambient aerosol. Ideally, an independent
instrument such as a Nephelometer (Anderson and Ogren **1998**) is used. The Zeppelin
PSAP data is corrected for scattering at three levels depending on data
availability. When the scattering coefficient from integrating Nephelometer
(TSI 3563) is available these data are used. If these are not
available, a scattering coefficient is calculated based on the observed size
distribution and assumed chemical composition. If neither the Nephelometer data
is available nor the size distribution data is used, a constant single
scattering albedo (ω) is assumed. For the Zeppelin data,
the correction would use a value of ω = 0.925.
For the laboratory studies, only a constant ω was used and in this case,
it was chosen to be 0.85. During the laboratory experiment a MAAP (Multi
Angle Absorption Photometer) was also used to compare with the PSAP
results. The MAAP instrument is a commercial instrument that collects particles
on a filter and determines the rate of change in light transmission
(wavelength of 637 nm, Müller et al. **2011**), which is reported as a
mass concentration of eBC (Petzold et al., **2002**; Petzold and Schönlinner, **2004**). The particular advantage
of this instrument is that it internally corrects for the scattering enhancement
effects by measuring the scattering from the sample in multiple angles in real
time.

### DMPS

The DMPS systems used at the Zeppelin station, Ström et al.
(**2003**) and Tunved
et al. (**2013**) and the
DMPS system used in the laboratory experiment, Salter et al. (**2015**), share many similarities.
The systems are both so-called Vienna type DMA (Differential Mobility
Analyzer) with a closed loop system, where the sheath flow is controlled
using critical orifices in a returning flow of air. The sample air to sheath air
ratio is around 1:5, and the fairly low ratio is a compromise in order to
increase the counting efficiency on the expense of the precision of the particle
size measurement. Before entering the DMA, the sample flow passes through a Ni63
bipolar charger in order to neutralize the charge distribution on the aerosols.
The assumed charge probability distribution is used in the inversion routine to
correct for the fraction of single and double charge particles
(Wiedensohler **1988**).
The length of both DMA’s is 28 cm and a TSI (Thermal
System Inc., USA) model 3010 CPC (Condensation Particle
Counter) was used to count particles at the Zeppelin station and a TSI
model 3272 was used in the laboratory experiment. To check for the sizing
consistency of the laboratory system, two sizes of latex spheres
(100 nm and 200 nm diameter) were nebulized in
deionized water. The agreement was within 2 percent in size. Particles in the
range 20 to 630 nm diameter was used for the Zeppelin data and the range
13 to 406 nm was used for the laboratory study.

### 3. Zeppelin data

The data from Zeppelin Observatory are available at the EBAS database
(http://ebas.nilu.no/) as hourly averages. We have
selected the period between 2002 and 2010 as this represents the start of PSAP
measurements and the period is well characterized with respect to number size
distribution (NSD) statistics (Tunved et al. **2013**). In order to emphasize
measurements when the station is out of cloud, only data points when the
relative humidity (RH) was below 95% are used. A total
of 34432 hourly averages with concurrent PSAP and DMPS data below an RH of
95% are available for the period in question, which corresponds to about
43% data coverage of the entire 9 years.

## Feasibility study

The laboratory experiment served three specific purposes. One, to show that the two PSAP instruments used in the experiment operated identical. Two, to show that the two PSAP instruments could readily be compared with an independent instrument, in this case the MAAP. Three, to show that distributing the total PSAP signal over the size distribution based on correlations between individual size bins and the signal from the PSAP has consistent properties to that of direct measurements of individual particle sizes. The data for this study were collected during the winter 2018/2019 at Stockholm University (59.37°N and 18.06°E), approximately 5 km north of the city center of Stockholm and close to a highway.

### Instrument intercomparison

The laboratory experiments took place during the winter period 2018/2019 and was divided into two phases. The two experimental setups are illustrated in Fig. 1. Firstly, the two PSAP’s and the MAAP was simply connected to the same ambient air inlet and was operating in parallel for approximately one week.

During the second phase, PSAP2 was connected to the aerosol outlet of the DMA of
the DMPS system. Figure 2a shows the
time series of the observed light absorption coefficient,
σ_{abs} (m^{−1}), by the
three instruments during the first phase of the experiment. The primary mass
concentration values by the MAAP are normalized by the mass absorption
coefficient (MAC_{MAAP} =
6.6 m^{2}g^{−1}) given by the
manufacturer for the operating wavelength of 637 nm order to arrive to
an absorption coefficient for the MAAP instrument. The σ_{abs}
observed by the MAAP is further adjusted to correspond to the same wavelength as
the PSAP instruments by using the ratio of the wave lengths 637/526
(Bergstrom et al., **2002**). Figures 2b-d,
present the corresponding scatter plots between the PSAP’s and MAAP.

From the short laboratory test, we can establish that all three instruments agree
well and that in particular PSAP1 and PSAP2 behaves almost identically with
slopes very close to the 1:1 line. If we use the primary output by the
MAAP, which is given as eBC mass concentration, we can derive a MAC value for
the PSAP that harmonizes the PSAP and MAAP mass concentration observations.
Fig. 3 presents the statistical
analysis of this MAC value that center around
9.4 m^{2}g^{−1}. This is within literature
values compared to a European survey performed by Zanatta et al. (**2016**).

### Comparing size resolved σ_{abs} with statistical correlation
derived σ_{abs}

The second phase was performed between 11-01-2019 and 25-02-2019. In Fig. 4 the total σ_{abs}
observed by the PSAP1 is presented as hourly averages (blue
line) and the hourly integrated value for PSAP2 (red
dots). An interesting feature in the time series is that between
24-01-2019 23:54 to 25-01-2019 08:50 a smoke plume from a nearby fire
(about 5 km distance) reached the inlet and resulted in
a strong signal.

The DMPS was configured to scan 15 positions in 60 minutes (4 minutes per voltage setting). Due to the parameter settings used, the smallest voltage (first bin) was always 0 volt and the smallest size measured (second bin) was 13 nm. The last bin was near maximum voltage of the infrastructure and the setting did not provide a stable measurement. Therefore, the largest size reported here is 406 nm. Hence, the thirteen size bins used are: 13, 17, 23, 31, 41, 55, 73, 97, 129, 172, 229, 305, and 406 nm aerodynamic diameter.

Due to the time lag between setting the voltage in the DMA and the detection of
particles in the sensors, the initial task was to adjust the time series
accordingly. This was readily performed by visually inspecting the time series
and harmonizing the oscillation due to the scanning of the DMA voltage with the
CPC and PSAP2 signals. In this process, it was apparent that the zero-voltage
bin presented a small average and persistent negative signal in
σ_{abs}, an example is shown in the Fig. S2. The exact reason for this effect is not resolved
(see Supplement for more discussion). To reduce this small
effect, an average absorption coefficient
(1.096 × 10^{−8}
m^{−1}) for all zero-voltage data was added to the
PSAP2 signal. A total of 892 scans (hourly time stamps) were
available from the second phase.

For each size bin scanned (D_{p}), the raw counts
ΔN from the CPC in the DMPS is used by the inversion software to
calculate the aerosol number size distribution
(dN/dlogD_{p}). The calculations take into account the
geometric dimensions of the DMA, sample and sheath flows, transfer function,
single and multiple charge statistics. By multiplying the derived
dN/dlogD_{p} with dlogD_{p} we can directly compare the raw
ΔN input with the inverted dN output for each size bin. The ratio
dN/ΔN replaces the complete inversion routine with a single
transformation factor as function of particle size. Close inspection of this
factor shows that it changes only slightly for a given size over the measuring
period. The median, mean and quartiles for this factor are presented in Fig. 5.

Hence, by using the factor presented in Fig.
5, it is possible to directly approximate the ambient dN based on the
observed ΔN behind the DMA without using the inversion routine. To our
knowledge there is no specific inversion routine for absorption coefficients
measured behind a DMA using ambient aerosols. We therefore simply assume that
the particles responsible for the light absorption in the PSAP2 can be scaled
using the same transformation factors as presented in Fig. 5. This is based on the assumption that particles in
a specific mobility bin can be corrected using the same factor irrespective how
the light absorbing material is distributed among these particles (as
cores, as individual particles, etc.). Hence, particles with or without
light absorbing material are corrected in the same manor, as long as they have
the same mobility size. In theory, this would yield dσ_{abs} for
each size bin and integrating over the sizes would give a similar
σ_{abs} value as observed by the PSAP1. The integrated PSAP2
data are compared with the PSAP1 signal in Fig.
4. From Fig. 4, it is clear
that the variation in the two data sets are very similar, but there is a
significant difference in magnitude where the integrated
dσ_{abs} value is about a factor of 2-3 greater than the
total measurement by the PSAP1. The offset has a tendency to be less in the
first half of the experiment and increase towards the end.

The reason for this offset is not known. For scaling the observed absorption for
each size bin to ambient conditions the transformation factor dN/ΔN
(see Fig. 5) is derived
and follows the correction used in the DMA system. However, if the charge
probability distribution for a given aerodynamic size is different between light
absorbing particles and other particles, then this might introduce a bias in the
results. Studies show that fractal particles may differ by some 30% in
charge probability, but certainly less than a factor of two (Lall et al.
**2006**). The process above
essentially calculates an average absorption coefficient per particles as
function of size. This single averaged absorbing particle is multiplied by the
ambient dN for the same size, and finally integrated over the size range.
Additionally, the factor dN/ΔN takes only doubly charged particles in
consideration, it is not accounted for other multiple charged particles and
could therefore account for the askew absorption signal of PSAP2
(Cotterell et al. **2020**). Even a few large particles could lead to an
overestimation of the absorption on the filter. This is underlined by the data
collected during and after the fire event (starting 23.01.),
during which a lot of small newly formed particles were measured (see
Fig. 4). One other possible
error can be that size dependent optical properties are not considered. In our
sequence, the correction for enhancement in absorption due to light scattering
particles was performed on the bulk signal and not specifically for each
particle size. Based on theoretical Mie calculations, light absorption is
dependent on the size of the BC particle and the state of mixing with light
scattering material. Hence, if the fraction of internally vs externally mixed
particle change with particles size this might introduce a difference between
the integrated σ_{abs} based on PSAP2 behind the DMA and the
ambient σ_{abs} measured by PSAP1. In want of more information,
we can only provide the speculations above for the observed offset and
additional analysis provided in Supplement.

The overall hypothesis is that the distribution of the total absorption signal
can realistically be attributed to different particle sizes according to the
correlation between specific DMPS size bins and observed total absorption. In
Fig. 6, the normalized average
distribution observed by the PSAP2 downstream the DMA is compared with the
distribution derived from statistical methods using PSAP1 and the number size
distributions. The statistical distribution is generated through four steps.
One, the Pearson correlation coefficient (r) is calculated for
PSAP1 and each size bin of the DMPS data. Two, negative values are taken to be
zero (r ≥ 0). Three, the correlation
coefficient is transformed to a scaling factor that is related to r^{2}
(Hull **1927**)

The reason for this factor is that the correlation by itself does not serve well
as a prognostic probability variable. Hull (**1927**) investigated the predictability of the
correlation coefficient and he found that the adjustment in Eq. (1) provided
a better predictive skill (see also Supplement). Four, the
distribution is normalized to one. As can be seen in Fig. 6, the normalized f_{s} distribution
(based on the Pearson correlation using PSAP1) and the median
size resolved observations (using PSAP2) match well with respect
to the main mode, but differ in the mode of the distribution based on the
correlations which is less pronounced. Therefore, this correlation distribution
suggests a larger contribution of smaller particles. This may result from
causality, but also that the absorption signal per particle in each size bin
decreases very much with decreasing particles size. Hence, the signal to noise
ratio is too small for the PSAP2 behind the DMA for small particles unless there
are very many of them.

## Statistical approach applied to long-term DMPS and eBC data from Svalbard

The first step in this process was to merge the DMPS data and the PSAP data together
with observations of ambient relative humidity (RH). The last
variable was used to screen data from high RH conditions. This was done because the
aerosol inlet used during the sampling period was not a true whole air inlet. High
ambient RH is indicative of, current or recent, cloud processed air that may
influence the results. We selected a threshold of 95% hourly averaged RH to
screen the data from possible significant interference by clouds. A second step was
to exclude the biomass plume from early May in 2006 (Stohl et al. **2007**). This plume gave such record
concentrations that it displayed anomalies in very many observe variables, and thus
this period (26 April to 5 May) was removed from the data set. After
data reduction, a total 34240 hourly averages are available which represents an
additional data reduction by approximately 0.6%.

### Clustering of DMPS data

As an alternative to grouping the remaining data according to the months of the
year, we instead used the strategy to group the size distributions using the
technique of clustering. This has proven an useful tool in sorting the size
distribution according to the stage of aerosol life cycle, rather than the date
of the year (Tunved and Ström **2019**). Of course, some clusters are very much
linked to particular seasons due to available sunlight or transport patterns and
associated meteorological history and source profiles.

In this study, we have used a MATLAB version (R2018b, Statistics and
Machine Learning Toolbox) of k-means clustering
(kmeans.m) to perform a clustering of the hourly size
distributions. The mathematical function will maximize the inter-cluster
variability while at the same time minimizing the intra-cluster variability.
Although there are tools available to aid in the selection of the number of
clusters used, it will in the end be given subjectively by the user. Larger data
sets can potentially resolve more clusters that are unique and meaningful and
the best mathematical solution to optimal number of clusters is not necessarily
the best in terms of provision of useful information. For this data set we
initially work with 12 clusters (both fewer and more clusters were
explored, but 12 clusters proved the best balance between cluster
number-to-cluster information). This concerns not only in the actual
number size distribution properties, but also when considering associated
parameters such as diurnal variability and seasonal variability of occurrence.
For a more detailed description the rational for deriving optimal number of
clusters c.f. Tunved and Ström (**2019**). The clustering was performed on hourly
averaged data, using “max iterations” of 10 000 and
“number of replicates” set to 10 in MATLAB. The distance
function applied was squared Euclidean distance, assuming that the difference is
calculated from the centroids defined as the mean of the points in the clusters,
d(x,c)=(x − c)(x − c)′,
where x is an observation (i.e. the size distribution vector)
and c is the centroid. Using the squared Euclidean method, it emphasizes extreme
situations better than other measures of distance. Therefore, sporadic events
such as new particle formation could be easily accentuated. For more information
on the clustering approach see Supplement.

The 12 different clusters are presented in Fig. 7. Cluster 1 is a common cluster that represents the lowest number densities with a dominating accumulation mode. This is interpreted as an aged and cloud processed air mass. Clusters 2 through 6 are dominated by the smallest particles and are interpreted as the evolution of new particle formation and subsequent growth. Clusters 7 through 9 are dominated by Aitken mode particles, but the integral number density is less than clusters 7 through 9 and the distributions show more developed accumulation mode. The clusters 10 through 12 are dominated by the accumulation mode. In this context, these clusters are interpreted as long range transport of polluted air corresponding to Arctic haze situations. To simplify the analysis and to increase the statistical material in each group, clusters 2 through 7 are grouped to into one category referred to as “Nucleation”, clusters 8 through 9 are grouped into one category and referred to as “Intermediate”, and clusters 10 through 12 are grouped into one category referred to as “Polluted”. Group 1 is kept as it is and referred to as “Washout”. These four categories are presented in Fig. 8 (see Table 1 for details).

The seasonal variation of the four categories are presented in Fig. 9. The Washout category is present
throughout the year with decrease during the summer and an enhanced peak during
the cleanest months of the year in September through November. The Nucleation
category is the least frequent of the four categories and is essentially only
present during the most sunlit period of the year between May and August. The
Intermediate category is also mainly present during the summer, but with a
somewhat broader distribution compared to the Nucleation category. Finally, the
polluted category is a winter and spring phenomenon which peaks in March and
April. A similar breakdown of each group over time of day, revealed strong
diurnal preference of Group 2 (Nucleation), with a clear maximum
of members found between 12:00 and 18:00 UTC (see Fig. S5). This in turn is suggestive of local
processes linked to intensity of solar radiation. The other three groups showed
small (Intermediate group) or insignificant (Washout and
Polluted group) diurnal preference. We can view the Washout category as
something of a base line distribution and the other three as superimposed
perturbations. The Nucleation and Intermediate categories are attributed to the
seasonal variation of sunlight in the Arctic region, whereas the Polluted
category is attributed to the seasonal pattern of long-range transport. Hence,
these two main processes serve as complements to Washout category. Actually, the
Nucleation category is probably more washed out than the Washout category
itself, as removing condensational aerosol surface area by precipitation sets
the stage for new particle formation (Tunved et al. **2013**).

### Statistical distribution of integral absorption signal as function of particle size

For each category presented in Fig. 8,
the correlation between the integral PSAP signal and dN/dlogD_{p} is
calculated as described in section four and illustrated in the flowchart S1.
This procedure results in a size dependent normalized absorption signal for each
category. The normalize correlation is further multiplied by the median signal
for the entire category in units of m^{−1} and divided by the
MAC value of 9.4 m^{2}g^{−1} derived in Section 4. Hence, correlations are
transformed into dM_{eBC}/dlogD_{p} mass distributions, which
are presented in Fig. 10 as
dM_{eBC}/dlogD_{p} distributions normalized to the maximum
value. In the derivation of the normalized dM_{eBC}/dlogD_{p}
distributions, the calculation of correlations was both performed for the
complete dataset and for bootstrapped dataset. The bootstrapping procedure was
performed on 5% subsections of paired NSD and PSAP data, and resulting
dM_{ebc}/dlogD_{p} was pooled, giving a range of
distribution indicated by the error bars in Fig. 10.

To date, only few observations of size resolved observations from Svalbard are
available. Ohata et al. (**2019**) presented an average distribution over an about a
two-week period in March 2017 based on SP2 observations conducted at the
Zeppelin station. The SP2 mass median diameter (MMD) derived
from the incandescence signal was 228 nm and the geometric standard
deviation (σ_{g}) 1.74. The average integral
mass rBC was 28 ng m^{−3}. Zanatta et al. (**2018**) presented average
parameters for a short campaign conducted at the Zeppelin station in 2012
between 22 March and 11 April. They reported a MMD of 251 nm, a
σ_{g} of 1.22 and an average rBC of 39
ngm^{−3} (median 37
ngm^{−3}). With respect to the time of the year, both
these observations would best fit with the Polluted category (c.f. Fig. 9), but the relatively low
integral values would better fit some of the other categories. Raatikainen et
al. (**2015**) reported
observations from Pallas (in northern Finland) during the winter
2011-2012. For this season the geometric mean diameter was 199 nm with
σ_{g}=1.7, and an integral value of 27
ngm^{−3}. Other SP2 observations related to the Arctic was
reported by Liu et al. (**2015**), which was based on aircraft data in March 2013
between northern Norway and Svalbard. They observed a range in MMD
190-210 nm and σ_{g} 1.55-1.65 and integral values
varied greatly from about 20 to above 100 ngm^{−3}, which
depended on the origin of the air mass. Lower integral values (typically
below 10 ngm^{−3}) was observed by Taketani et al.
(**2016**) from a
cruise near Bering Strait in September 2014. However, the modal parameters were
similar to the studies mentioned above and MMD ranged from about
170-190 nm and σ_{g} was about 1.8.

Despite different areas, different integral concentrations, different altitudes,
and different time of year, the mass distributions observed by the SP2 are
rather similar in the different studies with a MMD around 200 nm and a
σ_{g} around 1.7. This typical
SP2 dM/dlogD_{p} distribution is included in Fig. 10, normalized to the peak value of
1, as comparison to the statistically derived
dM_{ebc}/dlogD_{p}. In Table
1, statistics on derived eBC mass is given. As can be seen, the
calculated mass varies substantially between the clusters, from around median
values of 9.6 (3.8-19.4 ngm^{−3}, quartile
range) in the Washout group to around 50 ng
m^{−3} (18.0-110.4 ngm^{−3})
for the Polluted cluster group. These ranges are in the same range as the
observations reported for the SP2 above.

The characteristic modal parameters from the SP2 studies reported above fit best
the Intermediate category. From Fig. 10,
it is evident that the majority of derived dM_{ebc}/dlogD_{p}
distributions have a shape that is well in agreement with SP2 data. The derived
mass distributions for the Washout and Polluted groups are situated on either
side of the SP2 range. Washout group peaks at around 220-250 nm and the
Polluted group peaks around 180 nm. Normalized distributions of eBC for
the Nucleation and Intermediate groups are located below the typical SP2 range
(70 nm and 130 nm, respectively). It should be
noted that 77% of the data belongs to Washout and Polluted groups, which
suggest that the statistical method is consistent with the SP2 distributions for
a large portion of the data.

To test the consistency on how the derived eBC mass distributions relate to the
DMPS size distributions we convert the dM_{eBC}/dlogD_{p}
distributions to dN_{eBC}/dlogD_{p} distributions. For this test
we assume that eBC is externally mixed and that the density decreases with
increasing particle size. The density function assumes that BC particles
consists of agglomerates of primary spheres with a diameter (a)
of 20 nm and a density (ρ_{prim}) of 2
gcm^{−3}, and a fractal dimension
(D_{f}) of 2.5. A value of 2.5 is chosen to represent a
generally aged aerosol type. Fresh BC particles are very fluffy with
D_{f} around 1.8, but with aging these particles become more compact
and D_{f} can increase well above 2 (e.g. Colbeck et al. **1990**; Khalizov et al. **2013**). For instance, at the
remote Pico Mountain observatory in the Azores, approximately 70% of the
BC particles were found to be highly compact with a D_{f}>2.67
(China et al. **2015**).
The D_{f} = 2.5 used in this study is an assumed property. Using
a different D_{f} will influence the number density of mainly the
accumulation mode particles. To illustrate the effect using different
D_{f}, we refer to Fig. S6.
The number of primary spheres (N) is related to the fractal
dimension by the so-called radius of gyration (Rg) as

It has been shown that the electrical mobility diameter measured by the DMPS is
represented reasonable by R_{g} for fractal agglomerates
(Sorensen **2011**).
Hence, we simplify the size dependent density by comparing the mass of N primary
spheres for agglomerates of size 2 R_{g} with the mass of
compact spheres D_{p}. This is simply Eq. (2) divided by
(D_{p}/(2a))^{3}. Assuming
that 2 R_{g}=D_{p}, the size dependent density
(${\rho}_{BC}$(D_{p})) can take the simple
form

For comparison, this incidentally gives a similar size dependent effective
density as observed for heavy duty engine at idling conditions as observed by
Rissler et al. (**2013**). Using the eBC mass distributions presented in Fig. 10 and Eq. (3),
dN_{eBC}/dlogD_{p} distributions can be inferred. These are
presented in Fig. 11 with the DMPS
observed size distributions (as in Fig.
8) for comparison.

It is important to emphasize that the comparison presented in Fig. 11 is much simplified and that the
amplitude of dN_{eBC}/dlogD_{p} is directly proportional to the
MAC value used and obviously depends on the assumptions made in Eqs. (2) and
(3). The
assumption about external mixture is not made to capture the actual conditions
at a remote location such as Svalbard, but rather to represent the limiting case
of the least number of particles required to equal the eBC mass in each size
bin. Ideally, the inferred eBC NSD should be less or, at maximum, equal to the
size distribution observed by the DMPS.

From Fig. 11 it is clear that the
Polluted category do not always satisfy this condition and the eBC NSD can
significantly exceed the DMPS distribution for particles smaller than about
100 nm. The other three categories are within bounds, but present very
different features. The main eBC mode in the Washout and Nucleation categories
are located above and below 100 nm, respectively. Whereas the Washout
dN_{eBC}/dlogD_{p} mode is essentially co-located with the
main mode based on DMPS observations, the Nucleation category present the eBC
mode between the main modes by the DMPS. The overall shape of the Intermediate
category eBC NSD resembles the shape of the DMPS NSD with an enhanced peak of
small particles around 40 to 50 nm.

The Polluted category is particularly interesting as it displays two eBC NSD
modes of comparable amplitude. The larger mode peaks just above 100 nm,
which is between the two DMPS modes at 150 nm and 40 nm, whereas
the smaller eBC mode is around 50 nm. This smaller eBC mode stands out
as it’s range significantly exceeds the smaller DMPS mode. Whereas, the
three other categories present plausible derived
dN_{eBC}/dlogD_{p} distributions, the small mode of the
Polluted category is not realistic. Despite this, it is an interesting mode
since it is somehow linked to the observed variability in particle light
absorption and polluted air masses. Further research, outside the scope of this
study, is needed to resolve this feature.

## Summary and conclusions

This study was conducted in two parts, one laboratory experiment to compare size
resolved measurements of σ_{abs} to independent integral
observations of σ_{abs}, and one statistical application to derive
size resolved eBC from long-term observation of σ_{abs} and aerosol
size distributions in the Arctic. In the laboratory study it was first established
that the two PSAP instruments used in the study agreed very well, and by using a
scaling factor MAC = 9.4 m^{2}g^{−1} the two PSAP
instruments agreed closely to the MAAP measurements both in numerical value and
temporal variations when all instruments were operating in parallel.

One of the PSAP instruments was placed downstream of the DMA in parallel to the CPC
in the DMPS system, while the second PSAP instrument continued to measure the
ambient air directly. By assuming that the σ_{abs} signal measured
behind the DMA could be inverted with the same factor as the CPC (combined
effect from charge probability and transfer function) a direct measurement
of the size dependent σ_{abs} was calculated. This size dependent
incremental light absorption was integrated and compared to the total measurement.
Overall, the two measurements correlated well over time, but the integrated value
was more than a factor two greater. The exact reason for this is not known and we
can only speculate on the cause. The assumption that light absorbing particles can
be corrected in the DMPS system as other particles detected by the CPC is maybe not
be valid and thus the problem is related to the sampling of light absorbing
particles. The other possibility is that the optical response by the particles on
the filter is different if the measurements are made for limited size ranges
compared to bulk observations.

The observed size dependent σ_{abs} was normalized to unity and
compared to the statistically derived distribution based on about 892 hourly
averages. The derived distribution is essentially based on the correlation between
individual size bins of the particle size distribution and the total observed
σ_{abs}. The locations of the modes were similar between 220 and
250 nm diameter, but the width of the distributions differed. The
σ_{g} was approximately 1.8 for the observed distribution and
2.8 for the derived distribution. Either the observed distribution is narrower
because of measurement limitations above and below the mode (at small sizes
very little particle material is available downstream the DMA), or the
derived distribution is broader because the distribution of correlation coefficients
is not accentuated enough by Eq.
(1).

Encouraged by the co-location of the observed and derived modes of normalized
σ_{abs} distributions (Fig. 6), the statistical method was applied to a large data set
from the Zeppelin station, Svalbard. The data cover the years 2002-2010 and was
screened for RH above 95% to reduce the effect of in-cloud measurements. A
total of 34240 hourly averaged data points of concurrent measured size
distributions, σ_{abs}, and RH were available. The data was
initially clustered into 12 groups based on the particle size distributions. These,
were further grouped into four categories named; Washout, Nucleation, Intermediate,
and Polluted.

Each of the categories represent a unique derived dM_{eBC}/dlogD_{p}
with different characteristics, which in shape resembles available SP2 observations.
Whereas, categories Washout, Nucleation and Intermediate present plausible derived
dN_{eBC}/dlogD_{p} distributions, this is not the case for the
small particle eBC mode of the Polluted category. The eBC NSD mode around
50 nm is often over estimated compared to the DMPS NSD, which is an
intriguing observation. The mode clearly shows up as a result of the linkage to
variations in light absorbing particles, but requires further research. It is
important to emphasize that the amplitude of dN_{eBC}/dlogD_{p} is
directly proportional to the MAC value used. A greater MAC value will decrease the
dN_{eBC}/dlogD_{p}. A systematic over- or under-estimation of
σ_{abs} will directly affect the result, as will the made
assumption of the size dependent density or light absorbing properties. Because this
calculation is based on the assumption that eBC is externally mixed, the inferred
dN_{eBC}/dlogD_{p} represents the minimum number of particles
containing eBC.

Based on the investigations above we can make the following conclusions: The
comparison between the PSAP and MAAP instruments used in this study shows excellent
agreement and the scaling factor MAC between PSAP and MAAP was
9.4 m^{2}g^{−1}.

The location of the mode of the size dependent σ_{abs} agree very
well (about 10%) between the statistically derived
distribution and of that observed using a combination of PSAP and DMA. However, the
σ_{g} is larger for the statistical distribution compared to the
observed distribution, and the amplitude of the integrated observed distribution is
2-3 times larger than the observed total σ_{abs}. The latter
discrepancy is not resolved and needs further investigations.

The statistical approach applied on long-term Arctic data presents more variability
in the derived dM_{eBC}/dlogD_{p} between the different groups than
have been reported for rBC observed using the SP2 instrument. On the other hand,
both the Washout and Polluted groups (76% of the data) are
associated with dM_{eBC}/dlogD_{p} distributions that agrees very
well with typical SP2 distributions.

The category labelled Polluted is particularly interesting because the derived eBC NSD does not agree well for small particles around 50 nm when compared to the DMPS NSD. This indicates that for clusters belonging to the Polluted category at least some cases show correlation with particles around 50 nm that are not necessarily light absorbing eBC particles. More in-depth analysis of this category is of particular interest. It is however concluded, that the shape of derived eBC mass distribution still agrees very well with observed SP2 data.

This study demonstrated the feasibility in using the statistical relation between observed size distribution and the light absorption to gain insight to particle size dependent properties, where such direct observations are not available. Especially useful for analysis of historical data. Here, we used clustered distribution merged into four categories, but many other ways of grouping the data is possible e.g. by season, by optical properties, or by linking with trajectories and transport patterns.

## Additional File

The additional file for this article can be found as follows:

Supplemental File 1Basic flow chart of the working order. DOI: https://doi.org/10.1080/16000889.2021.1933775.s1