This is a mashup of IceVision's "Custom Parser" example and their "Getting Started (Object Detection)" notebooks, to analyze SPNet Real dataset, for which I generated bounding boxes. -- shawley, July 1, 2021

Installing IceVision and IceData

If on Colab run the following cell, else check the installation instructions

#try:
#    !wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
#    !chmod +x install_colab.sh && ./install_colab.sh
#except:
#    print("Ignore the error messages and just keep going")
 
import torch, re 
tv, cv = torch.__version__, torch.version.cuda
tv = re.sub('\+cu.*','',tv)
TORCH_VERSION = 'torch'+tv[0:-1]+'0'
CUDA_VERSION = 'cu'+cv.replace('.','')

print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")
print(f"CUDA available = {torch.cuda.is_available()}, Device count = {torch.cuda.device_count()}, Current device = {torch.cuda.current_device()}")
print(f"Device name = {torch.cuda.get_device_name()}")
TORCH_VERSION=torch1.8.0; CUDA_VERSION=cu101
CUDA available = True, Device count = 1, Current device = 0
Device name = GeForce RTX 2080 Ti
#!pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
#!pip install mmdet -qq

Imports

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

from icevision.all import *
import pandas as pd
from espiownage.core import *

Download dataset

We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here

dataset_name = 'fake'
data_dir = get_data(dataset_name)

CSV data format

df = pd.read_csv(data_dir / 'bboxes/annotations.csv')
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 8 135 110 322 287
1 steelpan_0000000.png 512 384 4 399 4 462 103
2 steelpan_0000000.png 512 384 2 20 132 79 211
3 steelpan_0000000.png 512 384 4 353 175 504 254
4 steelpan_0000000.png 512 384 6 75 34 162 105

At first glance, we can make the following assumptions:

  • Multiple rows with the same filename, width, height
  • A label for each row
  • A bbox [xmin, ymin, xmax, ymax] for each row

Once we know what our data provides we can create our custom Parser.

set(np.array(df['label']).flatten())
{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
df['label'] /= 2
df['label'] = df['label'].apply(int) 
print(set(np.array(df['label']).flatten()))
df['label'] = "_"+df['label'].apply(str)+"_"
{0, 1, 2, 3, 4, 5}
df['label'] = 'AN'  # antinode
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 AN 135 110 322 287
1 steelpan_0000000.png 512 384 AN 399 4 462 103
2 steelpan_0000000.png 512 384 AN 20 132 79 211
3 steelpan_0000000.png 512 384 AN 353 175 504 254
4 steelpan_0000000.png 512 384 AN 75 34 162 105

Create the Parser

template_record = ObjectDetectionRecord()

Now use the method generate_template that will print out all the necessary steps we have to implement.

Parser.generate_template(template_record)
class MyParser(Parser):
    def __init__(self, template_record):
        super().__init__(template_record=template_record)
    def __iter__(self) -> Any:
    def __len__(self) -> int:
    def record_id(self, o: Any) -> Hashable:
    def parse_fields(self, o: Any, record: BaseRecord, is_new: bool):
        record.set_filepath(<Union[str, Path]>)
        record.set_img_size(<ImgSize>)
        record.detection.set_class_map(<ClassMap>)
        record.detection.add_labels(<Sequence[Hashable]>)
        record.detection.add_bboxes(<Sequence[BBox]>)

We can copy the template and use it as our starting point. Let's go over each of the methods we have to define:

  • __init__: What happens here is completely up to you, normally we have to pass some reference to our data, data_dir in our case.

  • __iter__: This tells our parser how to iterate over our data, each item returned here will be passed to parse_fields as o. In our case we call df.itertuples to iterate over all df rows.

  • __len__: How many items will be iterating over.

  • imageid: Should return a Hashable (int, str, etc). In our case we want all the dataset items that have the same filename to be unified in the same record.

  • parse_fields: Here is where the attributes of the record are collected, the template will suggest what methods we need to call on the record and what parameters it expects. The parameter o it receives is the item returned by __iter__.

class BBoxParser(Parser):
    def __init__(self, template_record, data_dir):
        super().__init__(template_record=template_record)
        
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations.csv")
        self.df['label'] = 'AN'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
        
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
        
    def __len__(self) -> int:
        return len(self.df)
        
    def record_id(self, o) -> Hashable:
        return o.filename
        
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
            record.detection.set_class_map(self.class_map)
        
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
        record.detection.add_labels([o.label])

Let's randomly split the data and parser with Parser.parse:

parser = BBoxParser(template_record, data_dir)
train_records, valid_records = parser.parse()
INFO     - Autofixing records | icevision.parsers.parser:parse:122

Let's take a look at one record:

show_record(train_records[5], display_label=False, figsize=(14, 10))
train_records[0]
BaseRecord

common: 
	- Filepath: /home/shawley/.espiownage/data/espiownage-fake/images/steelpan_0000732.png
	- Img: None
	- Image size ImgSize(width=512, height=384)
	- Record ID: steelpan_0000732.png
detection: 
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: [1, 1, 1]
	- BBoxes: [<BBox (xmin:211, ymin:212, xmax:336, ymax:331)>, <BBox (xmin:227, ymin:0, xmax:500, ymax:148)>, <BBox (xmin:56, ymin:99, xmax:197, ymax:236)>]

Moving On...

Following the Getting Started "refrigerator" notebook...

# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

this next cell generates an error. ignore it and move on

samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
selection = 1


extra_args = {}

if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x

elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn

elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size

elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size

model_type, backbone, extra_args
(<module 'icevision.models.torchvision.retinanet' from '/home/shawley/envs/icevision/lib/python3.6/site-packages/icevision/models/torchvision/retinanet/__init__.py'>,
 <icevision.models.torchvision.retinanet.backbones.resnet_fpn.RetinanetTorchvisionBackboneConfig at 0x7f2ec50246d8>,
 {})
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args) 
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /home/shawley/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)
model_type.show_batch(first(valid_dl), ncols=4)
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)
learn.lr_find(end_lr=5e-3)

# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)
SuggestedLRs(lr_min=7.945936522446573e-05, lr_steep=9.127484372584149e-05)
kfold = False
epochs = 11 if kfold else 30   # go faster for kfold; 10 is good enough ;-)
freeze_epochs=2
print(f"Training for {epochs} epochs, starting with {freeze_epochs} frozen epochs...")
learn.fine_tune(epochs, 1e-4, freeze_epochs=2)
Training for 30 epochs, starting with 2 frozen epochs...
epoch train_loss valid_loss COCOMetric time
0 1.071140 0.992633 0.066605 00:38
1 0.830804 0.818732 0.244911 00:38
epoch train_loss valid_loss COCOMetric time
0 0.598979 0.617029 0.404402 00:52
1 0.508841 0.526212 0.479217 00:51
2 0.441530 0.437959 0.570448 00:50
3 0.392238 0.392943 0.577387 00:49
4 0.350390 0.334665 0.649274 00:49
5 0.313766 0.315748 0.623470 00:48
6 0.301350 0.260728 0.712127 00:48
7 0.274593 0.257943 0.719304 00:48
8 0.273574 0.242692 0.719628 00:48
9 0.255794 0.221418 0.737301 00:48
10 0.241270 0.205354 0.758872 00:48
11 0.244133 0.219039 0.742489 00:47
12 0.220957 0.194552 0.754311 00:48
13 0.228145 0.258578 0.640521 00:47
14 0.213714 0.188448 0.759919 00:47
15 0.202714 0.189642 0.738343 00:47
16 0.194675 0.188408 0.749062 00:47
17 0.194372 0.170616 0.774808 00:48
18 0.194591 0.186191 0.749064 00:47
19 0.185809 0.201155 0.707918 00:47
20 0.180845 0.173587 0.746402 00:47
21 0.173546 0.166388 0.778691 00:47
22 0.179162 0.160369 0.786542 00:47
23 0.178859 0.159345 0.777532 00:47
24 0.167058 0.173141 0.750707 00:47
25 0.164910 0.162669 0.763373 00:47
26 0.167588 0.170730 0.749880 00:47
27 0.169934 0.172657 0.745413 00:47
28 0.165541 0.172417 0.745669 00:47
29 0.162258 0.172917 0.746560 00:47
model_type.show_results(model, valid_ds, detection_threshold=.5)
if False:
    checkpoint_path = f'espi-retinanet-checkpoint-fake.pth'
    save_icevision_checkpoint(model, 
                        model_name='mmdet.retinanet', 
                        backbone_name='resnet50_fpn_1x',
                        classes =  parser.class_map.get_classes(), 
                        img_size=384, 
                        filename=checkpoint_path,
                        meta={'icevision_version': '0.9.1'})

Inference

Inference on this model would proceed the same as with the Real dataset. But these were all fake, so the necessity or utility of doing so is not evident at this time.

Note that you'd want to restart and load from a checkpoint as you'll likely get a CUDA OOM error.