OBPR = "One Box Per Ring"

One idea for counting rings was to treat each ring as an object. ...Turns out that a lot of rings get missed if you do this. But it DOES usually detect the outer ring, which is why I decided to just detect outer-boxes with IceVision, use these to crop the image, and and then try counting the rings separately using the cropped sub-images.

Installing IceVision and IceData

If on Colab run the following cell, else check the installation instructions

 
!if -e /content:
    try:
        !wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
        !chmod +x install_colab.sh && ./install_colab.sh
    except:
        print("Ignore the error messages and just keep going")
    
!if !-e /content:
    # For Icevision Install of MMD.  cf. https://airctic.com/0.8.1/install/
    import torch, re 
    tv, cv = torch.__version__, torch.version.cuda
    tv = re.sub('\+cu.*','',tv)
    TORCH_VERSION = 'torch'+tv[0:-1]+'0'
    CUDA_VERSION = 'cu'+cv.replace('.','')

    print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")

    !pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
    !pip install mmdet -qq

Imports

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

from icevision.all import *
import pandas as pd
INFO     - The mmdet config folder already exists. No need to downloaded it. Path : /home/drscotthawley/.icevision/mmdetection_configs/mmdetection_configs-2.10.0/configs | icevision.models.mmdet.download_configs:download_mmdet_configs:17

Download dataset

We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here

#data_dir = icedata.load_data(data_url, 'chess_sample') / 'chess_sample-master'

# SPNET Real Dataset link (currently proprietary, thus link may not work)
#data_url = "https://anonymized.machine.com/~drscotthawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master' 

# espiownage cyclegan dataset:
data_url = 'https://anonymized.machine.com/~drscotthawley/espiownage-cyclegan.tgz'
data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'

Understand the data format

In this task we were given a .csv file with annotations, let's take a look at that.

df = pd.read_csv(data_dir / "bboxes/annotations_obpr.csv")
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 ring 130 114 265 281
1 steelpan_0000000.png 512 384 ring 144 130 251 265
2 steelpan_0000000.png 512 384 ring 157 147 238 248
3 steelpan_0000000.png 512 384 ring 171 164 224 231
4 steelpan_0000000.png 512 384 ring 184 181 211 214

At first glance, we can make the following assumptions:

  • Multiple rows with the same filename, width, height
  • A label for each row
  • A bbox [xmin, ymin, xmax, ymax] for each row

Once we know what our data provides we can create our custom Parser.

set(np.array(df['label']).flatten())
{'ring'}

"Ring" is going to take up too much space when we plot images. Let's change it to "R":

df['label'] = "R"
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 R 130 114 265 281
1 steelpan_0000000.png 512 384 R 144 130 251 265
2 steelpan_0000000.png 512 384 R 157 147 238 248
3 steelpan_0000000.png 512 384 R 171 164 224 231
4 steelpan_0000000.png 512 384 R 184 181 211 214

Create the Parser

The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:

template_record = ObjectDetectionRecord()

Now use the method generate_template that will print out all the necessary steps we have to implement.

Parser.generate_template(template_record)
class MyParser(Parser):
    def __init__(self, template_record):
        super().__init__(template_record=template_record)
    def __iter__(self) -> Any:
    def __len__(self) -> int:
    def record_id(self, o: Any) -> Hashable:
    def parse_fields(self, o: Any, record: BaseRecord, is_new: bool):
        record.set_img_size(<ImgSize>)
        record.set_filepath(<Union[str, Path]>)
        record.detection.add_bboxes(<Sequence[BBox]>)
        record.detection.set_class_map(<ClassMap>)
        record.detection.add_labels(<Sequence[Hashable]>)
# but currently not a priority!
class ChessParser(Parser):
    def __init__(self, template_record, data_dir):
        super().__init__(template_record=template_record)
        
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations_obpr.csv")
        self.df['label'] = 'R'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
        
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
        
    def __len__(self) -> int:
        return len(self.df)
        
    def record_id(self, o) -> Hashable:
        return o.filename
        
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
            record.detection.set_class_map(self.class_map)
        
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
        record.detection.add_labels([o.label])

Let's randomly split the data and parser with Parser.parse:

parser = ChessParser(template_record, data_dir)
train_records, valid_records = parser.parse()
INFO     - Autofixing records | icevision.parsers.parser:parse:136

Let's take a look at one record:

show_record(train_records[5], display_label=False, figsize=(14, 10))
train_records[0]
BaseRecord

common: 
	- Record ID: 798
	- Image size ImgSize(width=512, height=384)
	- Filepath: /home/drscotthawley/.icevision/data/espiownage-cyclegan/espiownage-cyclegan/images/steelpan_0000919.png
	- Img: None
detection: 
	- BBoxes: [<BBox (xmin:258, ymin:213, xmax:427, ymax:318)>, <BBox (xmin:286, ymin:230, xmax:399, ymax:301)>, <BBox (xmin:314, ymin:248, xmax:371, ymax:283)>, <BBox (xmin:205, ymin:3, xmax:350, ymax:200)>, <BBox (xmin:220, ymin:23, xmax:335, ymax:180)>, <BBox (xmin:234, ymin:42, xmax:321, ymax:161)>, <BBox (xmin:249, ymin:62, xmax:306, ymax:141)>, <BBox (xmin:263, ymin:82, xmax:292, ymax:121)>, <BBox (xmin:50, ymin:216, xmax:243, ymax:357)>, <BBox (xmin:69, ymin:230, xmax:224, ymax:343)>, <BBox (xmin:89, ymin:244, xmax:204, ymax:329)>, <BBox (xmin:108, ymin:258, xmax:185, ymax:315)>, <BBox (xmin:127, ymin:272, xmax:166, ymax:301)>, <BBox (xmin:59, ymin:105, xmax:154, ymax:216)>, <BBox (xmin:67, ymin:115, xmax:146, ymax:206)>, <BBox (xmin:75, ymin:124, xmax:138, ymax:197)>, <BBox (xmin:83, ymin:133, xmax:130, ymax:188)>, <BBox (xmin:91, ymin:142, xmax:122, ymax:179)>, <BBox (xmin:99, ymin:151, xmax:114, ymax:170)>]
	- Class Map: <ClassMap: {'background': 0, 'R': 1}>
	- Labels: [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Moving On...

Following the Getting Started "refrigerator" notebook...

# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

look at the (augmented) target data

samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
selection = 0


extra_args = {}

if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x

elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn

elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size

elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size

model_type, backbone, extra_args
(<module 'icevision.models.mmdet.models.retinanet' from '/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/icevision/models/mmdet/models/retinanet/__init__.py'>,
 <icevision.models.mmdet.models.retinanet.backbones.resnet_fpn.MMDetRetinanetBackboneConfig at 0x7eff25efcb80>,
 {})
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args) 
/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/builder.py:16: UserWarning: ``build_anchor_generator`` would be deprecated soon, please use ``build_prior_generator`` 
  warnings.warn(
Use load_from_local loader
The model and loaded state dict do not match exactly

size mismatch for bbox_head.retina_cls.weight: copying a param with shape torch.Size([720, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([9, 256, 3, 3]).
size mismatch for bbox_head.retina_cls.bias: copying a param with shape torch.Size([720]) from checkpoint, the shape in current model is torch.Size([9]).
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)
model_type.show_batch(first(valid_dl), ncols=4)
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)
learn.lr_find()

# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)
/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:324: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:360: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(
SuggestedLRs(lr_min=5.754399462603033e-05, lr_steep=9.120108734350652e-05)
learn.fine_tune(60, 1e-4, freeze_epochs=2)
epoch train_loss valid_loss COCOMetric time
0 0.928480 0.602156 0.249890 00:28
1 0.553398 0.474446 0.333411 00:27
epoch train_loss valid_loss COCOMetric time
0 0.437952 0.420487 0.348959 00:30
1 0.407828 0.389949 0.369970 00:30
2 0.389400 0.380634 0.371569 00:30
3 0.376582 0.361188 0.393257 00:30
4 0.362269 0.348352 0.396370 00:30
5 0.352502 0.346673 0.408938 00:30
6 0.345822 0.329053 0.414611 00:30
7 0.334411 0.326302 0.420059 00:29
8 0.328194 0.316225 0.422598 00:29
9 0.324628 0.315352 0.429932 00:29
10 0.314684 0.320508 0.431626 00:29
11 0.309845 0.315450 0.434806 00:29
12 0.307848 0.305349 0.448769 00:29
13 0.307842 0.293209 0.447368 00:29
14 0.305091 0.300741 0.446580 00:29
15 0.294238 0.291090 0.459869 00:28
16 0.288746 0.281675 0.461619 00:29
17 0.287281 0.283271 0.460128 00:29
18 0.289213 0.278813 0.455900 00:28
19 0.283405 0.274307 0.458390 00:29
20 0.280650 0.303495 0.440220 00:28
21 0.274419 0.281923 0.468723 00:29
22 0.271343 0.265222 0.460714 00:28
23 0.269648 0.266300 0.477265 00:28
24 0.264892 0.265283 0.477649 00:28
25 0.263941 0.257803 0.473233 00:28
26 0.267473 0.264576 0.471951 00:29
27 0.261077 0.262606 0.474387 00:29
28 0.259938 0.257200 0.482154 00:28
29 0.254449 0.258375 0.479738 00:28
30 0.253290 0.259928 0.479199 00:28
31 0.250123 0.250104 0.484753 00:28
32 0.250764 0.250883 0.477746 00:28
33 0.250196 0.266613 0.489192 00:28
34 0.245433 0.248948 0.484409 00:28
35 0.244292 0.250585 0.495278 00:29
36 0.241775 0.245834 0.498617 00:28
37 0.242332 0.251036 0.496346 00:28
38 0.236041 0.244724 0.489617 00:28
39 0.236889 0.247994 0.488461 00:29
40 0.231236 0.249484 0.495947 00:28
41 0.230529 0.249377 0.479443 00:28
42 0.230867 0.243312 0.491363 00:28
43 0.230171 0.241391 0.499601 00:28
44 0.230219 0.243975 0.502050 00:28
45 0.230918 0.243116 0.499362 00:28
46 0.224446 0.240595 0.495946 00:28
47 0.223183 0.239601 0.497498 00:28
48 0.225230 0.239358 0.495366 00:28
49 0.227244 0.243232 0.506848 00:28
50 0.220151 0.238877 0.504132 00:28
51 0.222786 0.238536 0.501295 00:28
52 0.219587 0.237296 0.498865 00:28
53 0.224671 0.238217 0.498187 00:28
54 0.222557 0.237781 0.500376 00:28
55 0.221216 0.237311 0.501271 00:28
56 0.220398 0.237355 0.500388 00:28
57 0.222505 0.238207 0.504028 00:29
58 0.219006 0.238140 0.502263 00:28
59 0.220286 0.238082 0.502299 00:28

model_type.show_results(model, valid_ds, detection_threshold=.5)