OBPR = "One Box Per Ring"

One idea for counting rings was to treat each ring as an object. ...Turns out that a lot of rings get missed if you do this. But it DOES usually detect the outer ring, which is why I decided to just detect outer-boxes with IceVision, use these to crop the image, and and then try counting the rings separately using the cropped sub-images.

Installing IceVision and IceData

If on Colab run the following cell, else check the installation instructions

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

from icevision.all import *
import pandas as pd
Download dataset

We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here

#data_dir = icedata.load_data(data_url, 'chess_sample') / 'chess_sample-master'

# SPNET Real Dataset link (currently proprietary, thus link may not work)
#data_url = "https://anonymized.machine.com/~drscotthawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master' 

# espiownage cyclegan dataset:
data_url = 'https://anonymized.machine.com/~drscotthawley/espiownage-cyclegan.tgz'
data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'

Understand the data format

In this task we were given a .csv file with annotations, let's take a look at that.

df = pd.read_csv(data_dir / "bboxes/annotations_obpr.csv")
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 ring 130 114 265 281
1 steelpan_0000000.png 512 384 ring 144 130 251 265
2 steelpan_0000000.png 512 384 ring 157 147 238 248
3 steelpan_0000000.png 512 384 ring 171 164 224 231
4 steelpan_0000000.png 512 384 ring 184 181 211 214

At first glance, we can make the following assumptions:

  • Multiple rows with the same filename, width, height
  • A label for each row
  • A bbox [xmin, ymin, xmax, ymax] for each row

Once we know what our data provides we can create our custom Parser.


"Ring" is going to take up too much space when we plot images. Let's change it to "R":

df['label'] = "R"
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 R 130 114 265 281
1 steelpan_0000000.png 512 384 R 144 130 251 265
2 steelpan_0000000.png 512 384 R 157 147 238 248
3 steelpan_0000000.png 512 384 R 171 164 224 231
4 steelpan_0000000.png 512 384 R 184 181 211 214

Create the Parser

The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:

template_record = ObjectDetectionRecord()

Now use the method generate_template that will print out all the necessary steps we have to implement.

class MyParser(Parser):
    def __init__(self, template_record):
    def __iter__(self) -> Any:
    def __len__(self) -> int:
    def record_id(self, o: Any) -> Hashable:
    def parse_fields(self, o: Any, record: BaseRecord, is_new: bool):
        record.set_filepath(<Union[str, Path]>)
# but currently not a priority!
class ChessParser(Parser):
    def __init__(self, template_record, data_dir):
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations_obpr.csv")
        self.df['label'] = 'R'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
    def __len__(self) -> int:
        return len(self.df)
    def record_id(self, o) -> Hashable:
        return o.filename
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])

Let's randomly split the data and parser with Parser.parse:

parser = ChessParser(template_record, data_dir)
train_records, valid_records = parser.parse()
Let's take a look at one record:

show_record(train_records[5], display_label=False, figsize=(14, 10))

	- Record ID: 798
	- Image size ImgSize(width=512, height=384)
	- Filepath: /home/drscotthawley/.icevision/data/espiownage-cyclegan/espiownage-cyclegan/images/steelpan_0000919.png
	- Img: None
	- BBoxes: [<BBox (xmin:258, ymin:213, xmax:427, ymax:318)>, <BBox (xmin:286, ymin:230, xmax:399, ymax:301)>, <BBox (xmin:314, ymin:248, xmax:371, ymax:283)>, <BBox (xmin:205, ymin:3, xmax:350, ymax:200)>, <BBox (xmin:220, ymin:23, xmax:335, ymax:180)>, <BBox (xmin:234, ymin:42, xmax:321, ymax:161)>, <BBox (xmin:249, ymin:62, xmax:306, ymax:141)>, <BBox (xmin:263, ymin:82, xmax:292, ymax:121)>, <BBox (xmin:50, ymin:216, xmax:243, ymax:357)>, <BBox (xmin:69, ymin:230, xmax:224, ymax:343)>, <BBox (xmin:89, ymin:244, xmax:204, ymax:329)>, <BBox (xmin:108, ymin:258, xmax:185, ymax:315)>, <BBox (xmin:127, ymin:272, xmax:166, ymax:301)>, <BBox (xmin:59, ymin:105, xmax:154, ymax:216)>, <BBox (xmin:67, ymin:115, xmax:146, ymax:206)>, <BBox (xmin:75, ymin:124, xmax:138, ymax:197)>, <BBox (xmin:83, ymin:133, xmax:130, ymax:188)>, <BBox (xmin:91, ymin:142, xmax:122, ymax:179)>, <BBox (xmin:99, ymin:151, xmax:114, ymax:170)>]
	- Class Map: <ClassMap: {'background': 0, 'R': 1}>
	- Labels: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Moving On...

Following the Getting Started "refrigerator" notebook...

# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

look at the (augmented) target data

samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
selection = 0

extra_args = {}

if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x

elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn

elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size

elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size

model_type, backbone, extra_args
(<module 'icevision.models.mmdet.models.retinanet' from '/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/icevision/models/mmdet/models/retinanet/__init__.py'>,
 <icevision.models.mmdet.models.retinanet.backbones.resnet_fpn.MMDetRetinanetBackboneConfig at 0x7eff25efcb80>,
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args) 
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)
model_type.show_batch(first(valid_dl), ncols=4)
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)

# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)
learn.fine_tune(60, 1e-4, freeze_epochs=2)
epoch train_loss valid_loss COCOMetric time
0 0.928480 0.602156 0.249890 00:28
1 0.553398 0.474446 0.333411 00:27
epoch train_loss valid_loss COCOMetric time
0 0.437952 0.420487 0.348959 00:30
1 0.407828 0.389949 0.369970 00:30
2 0.389400 0.380634 0.371569 00:30
3 0.376582 0.361188 0.393257 00:30
4 0.362269 0.348352 0.396370 00:30
5 0.352502 0.346673 0.408938 00:30
6 0.345822 0.329053 0.414611 00:30
7 0.334411 0.326302 0.420059 00:29
8 0.328194 0.316225 0.422598 00:29
9 0.324628 0.315352 0.429932 00:29
10 0.314684 0.320508 0.431626 00:29
11 0.309845 0.315450 0.434806 00:29
12 0.307848 0.305349 0.448769 00:29
13 0.307842 0.293209 0.447368 00:29
14 0.305091 0.300741 0.446580 00:29
15 0.294238 0.291090 0.459869 00:28
16 0.288746 0.281675 0.461619 00:29
17 0.287281 0.283271 0.460128 00:29
18 0.289213 0.278813 0.455900 00:28
19 0.283405 0.274307 0.458390 00:29
20 0.280650 0.303495 0.440220 00:28
21 0.274419 0.281923 0.468723 00:29
22 0.271343 0.265222 0.460714 00:28
23 0.269648 0.266300 0.477265 00:28
24 0.264892 0.265283 0.477649 00:28
25 0.263941 0.257803 0.473233 00:28
26 0.267473 0.264576 0.471951 00:29
27 0.261077 0.262606 0.474387 00:29
28 0.259938 0.257200 0.482154 00:28
29 0.254449 0.258375 0.479738 00:28
30 0.253290 0.259928 0.479199 00:28
31 0.250123 0.250104 0.484753 00:28
32 0.250764 0.250883 0.477746 00:28
33 0.250196 0.266613 0.489192 00:28
34 0.245433 0.248948 0.484409 00:28
35 0.244292 0.250585 0.495278 00:29
36 0.241775 0.245834 0.498617 00:28
37 0.242332 0.251036 0.496346 00:28
38 0.236041 0.244724 0.489617 00:28
39 0.236889 0.247994 0.488461 00:29
40 0.231236 0.249484 0.495947 00:28
41 0.230529 0.249377 0.479443 00:28
42 0.230867 0.243312 0.491363 00:28
43 0.230171 0.241391 0.499601 00:28
44 0.230219 0.243975 0.502050 00:28
45 0.230918 0.243116 0.499362 00:28
46 0.224446 0.240595 0.495946 00:28
47 0.223183 0.239601 0.497498 00:28
48 0.225230 0.239358 0.495366 00:28
49 0.227244 0.243232 0.506848 00:28
50 0.220151 0.238877 0.504132 00:28
51 0.222786 0.238536 0.501295 00:28
52 0.219587 0.237296 0.498865 00:28
53 0.224671 0.238217 0.498187 00:28
54 0.222557 0.237781 0.500376 00:28
55 0.221216 0.237311 0.501271 00:28
56 0.220398 0.237355 0.500388 00:28
57 0.222505 0.238207 0.504028 00:29
58 0.219006 0.238140 0.502263 00:28
59 0.220286 0.238082 0.502299 00:28

model_type.show_results(model, valid_ds, detection_threshold=.5)