Skip to content

Commit 7188e77

Browse files
committed
correct sectionning for PDF version + references section
1 parent de8ebb8 commit 7188e77

File tree

1 file changed

+54
-48
lines changed

1 file changed

+54
-48
lines changed

published-202301-chagneux-macrolitter.qmd

Lines changed: 54 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -77,7 +77,7 @@ format:
7777
jupyter: python3
7878
---
7979

80-
## Introduction
80+
# Introduction
8181

8282

8383
Litter pollution concerns every part of the globe. Each year, almost ten
@@ -301,9 +301,9 @@ plt.show()
301301
```
302302

303303

304-
## Related works
304+
# Related works
305305

306-
### AI-automated counting
306+
## AI-automated counting
307307

308308
Counting from images has been an ongoing challenge in computer vision. Most
309309
works can be divided into (i) detection-based methods where objects are
@@ -318,7 +318,7 @@ achieve better counts at every frame, none of these methods actually attempt
318318
to produce global video counts.
319319

320320

321-
### Computer vision for macro litter monitoring
321+
## Computer vision for macro litter monitoring
322322

323323
Automatic macro litter monitoring in rivers is still a relatively nascent
324324
initiative, yet there have already been several attempts at using DNN-based
@@ -338,7 +338,7 @@ proposed to count litter directly in videos.
338338

339339

340340

341-
### Multi-object tracking
341+
## Multi-object tracking
342342

343343
Multi-object tracking usually involves object detection, data association and
344344
track management, with a very large number of methods already existing before
@@ -370,16 +370,16 @@ objects requires a new movement model, to take into account missing detections
370370
and large camera movements.
371371

372372

373-
## Datasets for training and evaluation
373+
# Datasets for training and evaluation
374374

375375
Our main dataset of annotated images is used to train the object detector.
376376
Then, only for evaluation purposes, we provide videos with annotated object
377377
positions and known global counts. Our motivation is to avoid relying on
378378
training data that requires this resource-consuming process.
379379

380-
### Images
380+
## Images
381381

382-
#### Data collection
382+
### Data collection
383383

384384
With help from volunteers, we compile photographs of litter stranded on river
385385
banks after increased river discharge, shot directly from kayaks navigating at
@@ -389,7 +389,7 @@ The resulting pictures depict trash items under the same conditions as the
389389
video footage we wish to count on, while spanning a wide variety of
390390
backgrounds, light conditions, viewing angles and picture quality.
391391

392-
#### Bounding box annotation
392+
### Bounding box annotation
393393

394394
For object detection applications, the images are annotated using a custom
395395
online platform where each object is located using a bounding box. In this
@@ -461,9 +461,9 @@ for imgId in imgIds:
461461
plt.axis('off')
462462
```
463463

464-
### Video sequences
464+
## Video sequences
465465

466-
#### Data collection
466+
### Data collection
467467

468468
For evaluation, an on-field study was conducted with 20 volunteers to manually
469469
count litter along three different riverbank sections in April 2021, on the
@@ -476,14 +476,14 @@ videos amount to 20 minutes of footage at 24 frames per second (fps) and a
476476
resolution of 1920x1080 pixels.
477477

478478

479-
#### Track annotation
479+
### Track annotation
480480

481481
On video footage, we manually recovered all visible object trajectories on
482482
each river section using an online video annotation tool (more details
483483
in @sec-video-dataset-appendix for the precise methodology). From that, we
484484
obtained a collection of distinct object tracks spanning the entire footage.
485485

486-
## Optical flow-based counting via Bayesian filtering and confidence regions
486+
# Optical flow-based counting via Bayesian filtering and confidence regions
487487

488488
Our counting method is divided into several interacting blocks. First, a
489489
detector outputs a set of predicted positions for objects in the current
@@ -494,9 +494,9 @@ tracking module, proposing distinct tracks for each object. A final
494494
postprocessing step only keeps the best tracks which are enumerated to yield
495495
the final count.
496496

497-
### Detector
497+
## Detector
498498

499-
#### Center-based anchor-free detection
499+
### Center-based anchor-free detection
500500

501501
In most benchmarks, the prediction quality of object attributes like bounding
502502
boxes is often used to improve tracking. For counting, however, point
@@ -525,7 +525,7 @@ detector outputs a set $\mathcal{D}_n = \{z_n^i\}_{1 \leq i \leq D_n}$ where
525525
each $z_n^i = (x_n^i,y_n^i)$ specifies the coordinates of one of the $D_n$
526526
detected objects.
527527

528-
### Training {#sec-detector_training}
528+
## Training {#sec-detector_training}
529529

530530
Training the detector is done similarly as in @Proenca2020.
531531

@@ -559,9 +559,9 @@ $$
559559
\right.
560560
$$
561561

562-
### Bayesian tracking with optical flow {#sec-bayesian_tracking}
562+
## Bayesian tracking with optical flow {#sec-bayesian_tracking}
563563

564-
#### Optical flow
564+
### Optical flow
565565

566566
Between two timesteps $n-1$ and $n$, the optical flow $\Delta_n$ is a mapping
567567
satisfying the following consistency constraint (@paragios2006):
@@ -576,7 +576,7 @@ coordinate on that grid. To estimate $\Delta_n$, we choose a simple
576576
unsupervised Gunner-Farneback algorithm which does not require further
577577
annotations, see @farneback2003two for details.
578578

579-
#### State space model {#sec-state_space_model}
579+
### State space model {#sec-state_space_model}
580580

581581
Using optical flow as a building block, we posit a state space model where
582582
estimates of $\Delta_n$ are used as a time and state-dependent offset for the
@@ -604,7 +604,7 @@ centered Gaussian random variables with covariance matrix $R$.
604604
In the following, $Q$ and $R$ are assumed to be diagonal, and are
605605
hyperparameters set to values given in @sec-covariance_matrices.
606606
607-
#### Approximations of the filtering distributions
607+
### Approximations of the filtering distributions
608608
609609
Denoting $u_{1:k} = (u_1,\ldots,u_k)$ for any $k$ and sequence $(u_i)_{i \geq
610610
0}$, Bayesian filtering aims at computing the conditional distribution of
@@ -649,14 +649,14 @@ observations, as the contribution of $\Delta_k$ in every transition ensures
649649
that each filter can cope with arbitrary inter-frame motion to keep track of
650650
its target.
651651
652-
#### Generating potential object tracks
652+
### Generating potential object tracks
653653
654654
The full MOT algorithm consists of a set of single-object trackers following
655655
the previous model, but each provided with distinct observations at every
656656
frame. These separate filters provide track proposals for every object
657657
detected in the video.
658658
659-
### Data association using confidence regions {#sec-data_association}
659+
## Data association using confidence regions {#sec-data_association}
660660
661661
Throughout the video, depending on various conditions on the incoming
662662
detections, existing trackers must be updated (with or without a new
@@ -714,7 +714,7 @@ V_\delta(z_n^i))$, which is standard in modern MOT.
714714
Visual representation of the tracking pipeline.
715715
:::
716716
717-
### Counting
717+
## Counting
718718
719719
At the end of the video, the previous process returns a set of candidate
720720
tracks. For counting purposes, we find that simple heuristics can be further
@@ -734,7 +734,7 @@ optimized for best count performance (see @sec-tau_kappa_appendix for a
734734
more comprehensive study).
735735
736736
737-
## Metrics for MOT-based counting
737+
# Metrics for MOT-based counting
738738
739739
Counting in videos using embedded moving cameras is not a common task, and as
740740
such it requires a specific evaluation protocol to understand and compare the
@@ -743,7 +743,7 @@ even if some do provide insights to assist evaluation of count performance.
743743
Second, considering only raw counts on long videos gives little information on
744744
which of the final counts effectively arise from well detected objects.
745745
746-
### Count-related MOT metrics
746+
## Count-related MOT metrics
747747
748748
Popular MOT benchmarks usually report several sets of metrics such as ClearMOT
749749
(@bernardin2008) or IDF1 (@RistaniSZCT16) which can account for
@@ -753,7 +753,7 @@ and association using the Jaccard index. The following components of their
753753
work are relevant to our task (we provide equation numbers in the original
754754
paper for formal definitions).
755755
756-
#### Detection
756+
### Detection
757757
758758
First, when considering all frames independently, traditional detection recall
759759
($\mathsf{DetRe}$) and precision ($\mathsf{DetPr}$) can be computed to assess the capabilities
@@ -781,7 +781,7 @@ Thus, low $\mathsf{DetRe}$ could theoretically be compensated with robust tracki
781781
Again, this suggests that low $\mathsf{DetPr}$ may allow decent counting performance.
782782

783783

784-
#### Association
784+
### Association
785785

786786
HOTA association metrics are built to measure tracking performance
787787
irrespective of the detection capabilities, by comparing predicted tracks
@@ -818,15 +818,15 @@ Consequently, $\mathsf{AssRe}$ does not account for tracks predicted from stream
818818
arising from rocks, water reflections, etc).
819819
Since such tracks induce false counts, a tracker which produces the fewest is better, but MOT metrics do not measure it.
820820

821-
### Count metrics
821+
## Count metrics
822822

823823
Denoting by $\mathsf{\hat{N}}$ and $\mathsf{N}$ the respective predicted and ground truth
824824
counts for the validation material, the error $\mathsf{\hat{N}} - \mathsf{N}$ is misleading as
825825
no information is provided on the quality of the predicted counts.
826826
Additionally, results on the original validation footage do not measure the
827827
statistical variability of the proposed estimators.
828828

829-
#### Count decomposition
829+
### Count decomposition
830830

831831
Define $i \in [\![1, \mathsf{N}]\!]$ and $j \in [\![1, \mathsf{\hat{N}}]\!]$ the labels of the
832832
annotated ground truth tracks and the predicted tracks, respectively. At
@@ -883,7 +883,7 @@ For more complicated data, an adaptation of such contributions into proper
883883
counting metrics could be valuable.
884884

885885

886-
#### Statistics
886+
### Statistics
887887

888888
Since the original validation set comprises only a few unequally long videos,
889889
only absolute results are available. Splitting the original sequences into
@@ -893,7 +893,8 @@ $\hat{\sigma}_{\mathsf{\hat{N}}_\bullet}$ the associated empirical standard devi
893893
computed on the set of short sequences.
894894

895895

896-
## Experiments
896+
# Experiments
897+
897898
We denote by $S_1$, $S_2$ and $S_3$ the three river sections of the evaluation material and split the associated footage into independent segments of 30 seconds. We further divide this material into two distinct validation (6min30) and test (7min) splits.
898899

899900

@@ -922,7 +923,7 @@ intermediate framerate to capture all objects while reducing the computational
922923
burden.
923924

924925

925-
### Detection
926+
## Detection
926927

927928
In the following section, we present the performance of the trained detector.
928929
Having annotated all frames of the evaluation videos, we directly compute
@@ -1003,7 +1004,8 @@ table_det.index = ['S1','S2','S3','All']
10031004
display(table_det)
10041005
```
10051006

1006-
### Counts
1007+
## Counts
1008+
10071009
To fairly compare the three solutions, we calibrate the hyperparameters of our
10081010
postprocessing block on the validation split and keep the values that minimize
10091011
the overall count error $\mathsf{\hat{N}}$ for each of them separately (see
@@ -1123,7 +1125,7 @@ results_All
11231125
```
11241126

11251127

1126-
##### Detailed results on individual segments
1128+
#### Detailed results on individual segments
11271129

11281130
```{python}
11291131
@@ -1140,7 +1142,7 @@ for ax, title, tracker_name in zip(axes, pretty_method_names, method_names):
11401142

11411143

11421144

1143-
## Practical impact and future goals
1145+
# Practical impact and future goals
11441146

11451147
We successfully tackled video object counting on river banks, in particular issues which could be addressed independently of detection quality.
11461148
Moreover the methodology developed to assess count quality enables us to precisely highlight the challenges that pertain to video object counting on river banks.
@@ -1156,12 +1158,11 @@ The resulting field data will help better understand litter origin, allowing to
11561158
Correlations between macro litter density and environmental parameters will be studied (e.g., population density, catchment size, land use and hydromorphology).
11571159
Finally, our work naturally benefits any extension of macrolitter monitoring in other areas (urban, coastal, etc) that may rely on a similar setup of moving cameras.
11581160

1161+
# Supplements
11591162

1160-
## Supplements
1163+
## Details on the image dataset {#sec-image_dataset_appendix}
11611164

1162-
### Details on the image dataset {#sec-image_dataset_appendix}
1163-
1164-
#### Categories
1165+
### Categories
11651166

11661167
In this work, we do not seek to precisely predict the proportions of the
11671168
different types of counted litter. However, we build our dataset to allow
@@ -1196,9 +1197,9 @@ Trash categories defined to facilitate porting to a counting system that allows
11961197
:::
11971198

11981199

1199-
### Details on the evaluation videos {#sec-video-dataset-appendix}
1200+
## Details on the evaluation videos {#sec-video-dataset-appendix}
12001201

1201-
#### River segments
1202+
### River segments
12021203

12031204
In this section, we provide further details on the evaluation material.
12041205
@fig-river-sections shows the setup and positioning of the three river
@@ -1228,9 +1229,9 @@ The following items provide further details on the exact annotation process.
12281229
- We do not provide inferred locations when an object is fully occluded, but tracks restart with the same identity whenever the object becomes visible again.
12291230
- Tracks stop whenever an object becomes indistinguishable and will not reappear again.
12301231

1231-
### Implementation details for the tracking module {#sec-tracking_module_appendix}
1232+
## Implementation details for the tracking module {#sec-tracking_module_appendix}
12321233

1233-
#### Covariance matrices for state and observation noises {#sec-covariance_matrices}
1234+
### Covariance matrices for state and observation noises {#sec-covariance_matrices}
12341235

12351236
In our state space model, $Q$ models the noise associated with the movement
12361237
model we posit in @sec-bayesian_tracking involving optical flow estimates,
@@ -1253,7 +1254,7 @@ As long as values are meaningful relative to the image dimensions and the size o
12531254

12541255

12551256

1256-
#### Influence of $\tau$ and $\kappa$ {#sec-tau_kappa_appendix}
1257+
### Influence of $\tau$ and $\kappa$ {#sec-tau_kappa_appendix}
12571258

12581259
An understanding of $\kappa$, $\tau$ and $\nu$ can be stated as follows.
12591260
For any track, given a value for $\kappa$ and $\nu$, an observation at time $n$ is only kept if there are also $\nu \cdot \kappa$ observations in the temporal window of size $\kappa$ that surrounds $n$ (windows are centered around $n$ except at the start and end of the track).
@@ -1339,7 +1340,7 @@ for method_name, pretty_method_name in zip(method_names, pretty_method_names):
13391340
hyperparameters(method_name.split('kappa')[0][:-1], pretty_method_name)
13401341
```
13411342

1342-
### Bayesian filtering {#sec-bayesian_filtering}
1343+
## Bayesian filtering {#sec-bayesian_filtering}
13431344

13441345
Considering a state space model with $(X_k, Z_k)_{k \geq 0}$ the random processes for the states and observations, respectively, the filtering recursions are given by:
13451346

@@ -1367,7 +1368,7 @@ $$Q_k = Q, R_k = R,$$
13671368
$$B_k = I, b_k = 0.$$
13681369

13691370

1370-
### Computing the confidence regions {#sec-confidence_regions_appendix}
1371+
## Computing the confidence regions {#sec-confidence_regions_appendix}
13711372

13721373

13731374
In words, $P(i,\ell)$ is the mass in $V_\delta(z_n^i) \subset \mathbb{R}^2$ of the
@@ -1412,7 +1413,7 @@ the corresponding confidence regions (see @sec-tracking_module_appendix above).
14121413
this distribution when SMC is used, and performance comparisons between the
14131414
EKF, UKF and SMC versions of our trackers are discussed.
14141415

1415-
#### SMC-based tracking
1416+
### SMC-based tracking
14161417

14171418
Denote $\mathbb{Q}_k$ the filtering distribution (ie. that of $Z_k$ given $X_{1:k}$) for the HMM $(X_k,Z_k)_{k \geq 1}$ (omitting the dependency on the observations for notation ease).
14181419
Using a set of samples $\{X_k^i\}_{1 \leq i \leq N}$ and importance weights $\{w_k^i\}_{1 \leq i \leq N}$, SMC methods build an approximation of the following form:
@@ -1450,7 +1451,7 @@ recover object identities via $\widehat{\mathbb{L}}_n^{\ell}(V_\delta(z_n^i))$
14501451
computed for all incoming detections $\mathcal{D}_n = \{z_n^i\}_{1 \leq i \leq D_n}$ and each of the $1 \leq \ell \leq L_n$ filters, where $\widehat{\mathbb{L}}_n^{\ell}$ is the predictive distribution associated with the $\ell$-th filter.
14511452

14521453

1453-
#### Performance comparison
1454+
### Performance comparison
14541455

14551456
In theory, sampling-based methods like UKF and SMC are better suited for
14561457
nonlinear state space models like the one we propose in @sec-state_space_model.
@@ -1462,3 +1463,8 @@ This comforts us into keeping it as a faster and computationally simpler option.
14621463
That said, this conclusion might not hold in scenarios where camera motion is even stronger, which was our main motivation to develop a flexible tracking solution and to provide implementations of UKF and SMC versions.
14631464
This allows easier extension of our work to more challenging data.
14641465

1466+
# References {.unnumbered}
1467+
1468+
::: {#refs}
1469+
:::
1470+

0 commit comments

Comments
 (0)