Skip to content

Commit be571e6

Browse files
committed
Updating paper
1 parent 766e777 commit be571e6

File tree

7 files changed

+838
-373
lines changed

7 files changed

+838
-373
lines changed

README.rst

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,18 @@ illustrates an example of one of these "triples".
9292
The name "ShitSpotter" is an homage to my earlier work: `HotSpotter <https://github.com/Erotemic/hotspotter>`_, which later became `IBEIS <https://github.com/Erotemic/ibeis>`_ This is work on individual animal identification, particularly Zebras. After 2017, this work was continued in Wildbook by `WildMe <https://www.wildme.org/>`_, which merged with `Conservation X Labs <https://www.conservationxlabs.com/>`_ in 2024.
9393

9494

95+
Experiments
96+
===========
97+
98+
We are working on polishing docker images to reproduce our existing experiments
99+
and serve as a foundation for new experiments.
100+
101+
Initial images are available on dockerhub:
102+
https://hub.docker.com/repository/docker/erotemic/shitspotter/general
103+
104+
Documentation "coming soon"™
105+
106+
95107
Downloading the Data
96108
====================
97109

@@ -153,6 +165,7 @@ Recent Updates
153165
Check back for updates, but because this is a personal project, it might take
154166
some time for it to fully drop.
155167

168+
* 2025-08-02 - Paper is under peer review with slightly positive reviews, grounding dino and YOLO models now trainable. Small test dataset is main limitation - working to rectify with roboflow data (note: a subset of our dataset is on there, but it has the incorrect license). Initial docker image to reproduce experiments is published.
156169
* 2025-07-04 - Releasing new data on IPFS. The growth seems to be increasing. Will take 7-9 more years to get 30k images.
157170
* 2025-04-20 - The number of images is now over 9000! The dataset is now `mirrored on hugging face <https://huggingface.co/datasets/erotemic/shitspotter>`__.
158171
* 2025-03-09 - Bunch of new images, with somewhat of a domain shift. The detectron model is good at annotating new images, but still not good enough. More work to be done.

dockerfiles/shitspotter.dockerfile

Lines changed: 115 additions & 62 deletions
Original file line numberDiff line numberDiff line change
@@ -1,26 +1,17 @@
11
# syntax=docker/dockerfile:1.5
22
FROM nvidia/cuda:12.4.1-cudnn-devel-ubuntu22.04
33

4-
ENV PIP_ROOT_USER_ACTION=ignore
5-
6-
# Control which python version we are using
7-
ARG PYTHON_VERSION=3.13
8-
9-
# Control the version of uv
10-
ARG UV_VERSION=0.7.19
114

125
# ------------------------------------
136
# Step 1: Install System Prerequisites
147
# ------------------------------------
15-
RUN <<EOF
8+
9+
RUN --mount=type=cache,target=/var/cache/apt \
10+
--mount=type=cache,target=/var/lib/apt/lists <<EOF
1611
#!/bin/bash
1712
set -e
1813
apt update -q
1914
DEBIAN_FRONTEND=noninteractive apt install -q -y --no-install-recommends \
20-
bzip2 \
21-
rsync \
22-
tmux \
23-
fd-find jq htop tree \
2415
curl \
2516
wget \
2617
git \
@@ -32,23 +23,29 @@ apt clean
3223
rm -rf /var/lib/apt/lists/*
3324
EOF
3425

35-
# Set the shell to bash to auto-activate enviornments
26+
# Set the shell to bash to auto-activate environments
3627
SHELL ["/bin/bash", "-l", "-c"]
3728

38-
# ------------------
29+
3930
# Step 2: Install uv
4031
# ------------------
4132
# Here we take a few extra steps to pin to a verified version of the uv
4233
# installer. This increases reproducibility and security against the main
4334
# astral domain, but not against those linked in the main installer.
4435
# The "normal" way to install the latest uv is:
4536
# curl -LsSf https://astral.sh/uv/install.sh | bash
46-
RUN <<EOF
37+
38+
# Control the version of uv
39+
ARG UV_VERSION=0.8.4
40+
41+
RUN --mount=type=cache,target=/root/.cache <<EOF
4742
#!/bin/bash
4843
set -e
4944
mkdir /bootstrap
5045
cd /bootstrap
46+
# For new releases see: https://github.com/astral-sh/uv/releases
5147
declare -A UV_INSTALL_KNOWN_HASHES=(
48+
["0.8.4"]="601321180a10e0187c99d8a15baa5ccc11b03494c2ca1152fc06f5afeba0a460"
5249
["0.7.20"]="3b7ca115ec2269966c22201b3a82a47227473bef2fe7066c62ea29603234f921"
5350
["0.7.19"]="e636668977200d1733263a99d5ea66f39d4b463e324bb655522c8782d85a8861"
5451
)
@@ -59,11 +56,14 @@ if [[ -z "$EXPECTED_SHA256" ]]; then
5956
exit 1
6057
fi
6158
curl -LsSf https://astral.sh/uv/$UV_VERSION/install.sh > $DOWNLOAD_PATH
62-
echo "$EXPECTED_SHA256 $DOWNLOAD_PATH" | sha256sum --check
59+
report_bad_checksum(){
60+
echo "Got unexpected checksum"
61+
sha256sum "$DOWNLOAD_PATH"
62+
exit 1
63+
}
64+
echo "$EXPECTED_SHA256 $DOWNLOAD_PATH" | sha256sum --check || report_bad_checksum
6365
# Run the install script
6466
bash /bootstrap/uv-install-v${UV_VERSION}.sh
65-
# Cleanup for smaller images
66-
rm -rf /root/.cache/
6767
EOF
6868

6969

@@ -73,7 +73,13 @@ EOF
7373
# This step mirrors a normal virtualenv development environment inside the
7474
# container, which can prevent subtle issues due when running as root inside
7575
# containers.
76-
RUN <<EOF
76+
77+
# Control which python version we are using
78+
ARG PYTHON_VERSION=3.10
79+
80+
ENV PIP_ROOT_USER_ACTION=ignore
81+
82+
RUN --mount=type=cache,target=/root/.cache <<EOF
7783
#!/bin/bash
7884
export PATH="$HOME/.local/bin:$PATH"
7985
# Use uv to install the requested python version and seed the venv
@@ -83,27 +89,69 @@ BASHRC_CONTENTS='
8389
export HOME="/root"
8490
export PATH="$HOME/.local/bin:$PATH"
8591
# Auto-activate the venv on login
86-
source /root/venv'$PYTHON_VERSION'/bin/activate
92+
source $HOME/venv'$PYTHON_VERSION'/bin/activate
8793
'
8894
# It is important to add the content to both so
89-
# subsequent run commands use the the context we setup here.
95+
# subsequent run commands use the context we setup here.
9096
echo "$BASHRC_CONTENTS" >> $HOME/.bashrc
9197
echo "$BASHRC_CONTENTS" >> $HOME/.profile
98+
echo "$BASHRC_CONTENTS" >> $HOME/.bash_profile
9299
EOF
93100

94101

95-
RUN mkdir -p /root/code/shitspotter
102+
# -----------------------------------
103+
# Step 4: Ensure venv auto-activation
104+
# -----------------------------------
105+
# This step creates an entrypoint script that ensures any command passed to
106+
# `docker run` is executed inside a login shell where the virtual environment
107+
# is auto-activated. It handles complex cases like multi-arg commands and
108+
# ensures quoting is preserved accurately.
109+
RUN <<EOF
110+
#!/bin/bash
111+
set -e
112+
113+
# We use a quoted heredoc to write the entrypoint script literally, with no variable expansion.
114+
cat <<'__EOSCRIPT__' > /entrypoint.sh
115+
#!/bin/bash
116+
set -e
117+
118+
# Reconstruct the full command line safely, quoting each argument
119+
args=()
120+
for arg in "$@"; do
121+
args+=("$(printf "%q" "$arg")")
122+
done
123+
124+
# Join arguments into a command string that can be executed by bash -c
125+
# This preserves exact argument semantics (including quotes, spaces, etc.)
126+
cmd="${args[*]}"
127+
128+
# Execute the reconstructed command inside a login shell
129+
# This ensures virtualenv activation via .bash_profile
130+
exec bash -l -c "$cmd"
131+
__EOSCRIPT__
132+
133+
# Print the script at build time for visibility/debugging
134+
cat /entrypoint.sh
135+
136+
chmod +x /entrypoint.sh
137+
EOF
138+
139+
# Set the entrypoint to our script that activates the virtual environment first
140+
ENTRYPOINT ["/entrypoint.sh"]
96141

97-
# Control the version of REPO (by default uses the current branch)
98-
ARG REPO_GIT_HASH=HEAD
99142

100143
# ---------------------------------
101-
# Step 4: Checkout and install REPO
144+
# Step 5: Checkout and install REPO
102145
# ---------------------------------
103146
# Based on the state of the repo this copies the host .git data over and then
104147
# checks out the extact version of REPO requested by REPO_GIT_HASH. It then
105148
# performs a basic install of shitspotter into the virtual environment.
106149

150+
RUN mkdir -p /root/code/shitspotter
151+
152+
# Control the version of REPO (by default uses the current branch)
153+
ARG REPO_GIT_HASH=HEAD
154+
107155
# NOTE: our .dockerignore file prevents us from copying in populated secrets /
108156
# credentials
109157
COPY .git /root/code/shitspotter/.git
@@ -147,35 +195,6 @@ cd /root/code/YOLO-v9
147195
uv pip install -e .
148196
EOF
149197

150-
151-
152-
# -----------------------------------
153-
# Step 5: Ensure venv auto-activation
154-
# -----------------------------------
155-
# This final steps ensures that commands the user provides to docker run
156-
# will always run in in the context of the virtual environment.
157-
RUN <<EOF
158-
#!/bin/bash
159-
set -e
160-
# write the entrypoint script
161-
echo '#!/bin/bash
162-
set -e
163-
# Build the escaped command string
164-
cmd=""
165-
for arg in "$@"; do
166-
# Use printf %q to properly escape each argument for bash
167-
cmd+=$(printf "%q " "$arg")
168-
done
169-
# Remove trailing space
170-
cmd=${cmd% }
171-
exec bash -lc "$cmd"
172-
' > entrypoint.sh
173-
chmod +x /entrypoint.sh
174-
EOF
175-
176-
# Set the entrypoint to our script that activates the virtual enviornment first
177-
ENTRYPOINT ["/entrypoint.sh"]
178-
179198
# Set the default workdir to the shitspotter code repo
180199
WORKDIR /root/code/shitspotter
181200

@@ -200,18 +219,27 @@ REPO_GIT_HASH=$(git rev-parse --short=12 HEAD)
200219
201220
python ./dockerfiles/setup_staging.py
202221
203-
# Build REPO in a reproducible way.
222+
# Determine version of repo, uv, and python to use
223+
export REPO_GIT_HASH=$(git rev-parse --short=12 HEAD)
224+
export UV_VERSION=0.8.4
225+
export PYTHON_VERSION=3.11
226+
227+
# Build the image with version-specific tags
204228
DOCKER_BUILDKIT=1 docker build --progress=plain \
205-
-t shitspotter:$REPO_GIT_HASH-uv0.7.29-python3.11 \
206-
--build-arg PYTHON_VERSION=3.11 \
207-
--build-arg UV_VERSION=0.7.19 \
229+
-t shitspotter:${REPO_GIT_HASH}-uv${UV_VERSION}-python${PYTHON_VERSION} \
230+
--build-arg PYTHON_VERSION=$PYTHON_VERSION \
231+
--build-arg UV_VERSION=$UV_VERSION \
208232
--build-arg REPO_GIT_HASH=$REPO_GIT_HASH \
209233
-f ./dockerfiles/shitspotter.dockerfile .
210234
211-
212-
# Add latest tags for convinience
213-
docker tag shitspotter:$REPO_GIT_HASH-uv0.7.29-python3.11 shitspotter:latest-uv0.7.29-python3.11
214-
docker tag shitspotter:$REPO_GIT_HASH-uv0.7.29-python3.11 shitspotter:latest
235+
# Add concise tags for easier reuse
236+
export IMAGE_QUALNAME=shitspotter:${REPO_GIT_HASH}-uv${UV_VERSION}-python${PYTHON_VERSION}
237+
export NAME1=shitspotter:latest-uv${UV_VERSION}-python${PYTHON_VERSION}
238+
export NAME2=shitspotter:latest-python${PYTHON_VERSION}
239+
export NAME3=shitspotter:latest
240+
docker tag $IMAGE_QUALNAME $NAME1
241+
docker tag $IMAGE_QUALNAME $NAME2
242+
docker tag $IMAGE_QUALNAME $NAME3
215243
216244
# Verify that GPUs are visible and that each shitspotter command works
217245
docker run --gpus=all -it shitspotter:latest nvidia-smi
@@ -220,6 +248,31 @@ docker run --gpus=all -it shitspotter:latest nvidia-smi
220248
# (TODO: show how to replicate experiments)
221249
docker run --gpus=all -it shitspotter:latest bash
222250
251+
# 1) Authenticate (recommended: use a Docker Hub access token)
252+
# Create a token in Docker Hub -> Account Settings -> Security
253+
# Then run:
254+
# echo "<your-access-token>" | docker login --username "$DOCKERHUB_USER" --password-stdin
255+
#
256+
# If you must, you can use interactive login:
257+
docker login
258+
259+
export DH_USER="erotemic"
260+
261+
# 3) Create remote-qualified tags
262+
docker tag $IMAGE_QUALNAME $DH_USER/$IMAGE_QUALNAME
263+
docker tag $NAME1 $DH_USER/$NAME1
264+
docker tag $NAME2 $DH_USER/$NAME2
265+
docker tag $NAME3 $DH_USER/$NAME3
266+
267+
# 4) Push the tags
268+
docker push $DH_USER/$IMAGE_QUALNAME
269+
docker push $DH_USER/$NAME1
270+
docker push $DH_USER/$NAME2
271+
docker push $DH_USER/$NAME3
272+
docker push $DH_USER:latest-uv0.7.29-python3.11
273+
docker push $DH_USER:latest
274+
275+
223276
' > /dev/null
224277

225278
EOF

experiments/grounding-dino-experiments/run_grounding_dino_experiments_v1.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
__doc__='
22
SeeAlso, for tuned version of grounding dino:
3-
~/code/shitspotter/dev/poc/tune_grounding_dino.sh
3+
~/code/shitspotter/experiments/grounding-dino-experiments/tune_grounding_dino.sh
44
'
55

66
#

0 commit comments

Comments
 (0)