This is a mashup of IceVision's "Custom Parser" example and their "Getting Started (Object Detection)" notebooks, to analyze SPNet Real dataset, for which I generated bounding boxes. -- shawley, July 1, 2021

Installing IceVision and IceData

If on Colab run the following cell, else check the installation instructions

#try:
#    !wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
#    !chmod +x install_colab.sh && ./install_colab.sh
#except:
#    print("Ignore the error messages and just keep going")
 
import torch, re 
tv, cv = torch.__version__, torch.version.cuda
tv = re.sub('\+cu.*','',tv)
TORCH_VERSION = 'torch'+tv[0:-1]+'0'
CUDA_VERSION = 'cu'+cv.replace('.','')

print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")
print(f"CUDA available = {torch.cuda.is_available()}, Device count = {torch.cuda.device_count()}, Current device = {torch.cuda.current_device()}")
print(f"Device name = {torch.cuda.get_device_name()}")
TORCH_VERSION=torch1.8.0; CUDA_VERSION=cu102
CUDA available = True, Device count = 1, Current device = 0
Device name = TITAN X (Pascal)
#!pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
#!pip install mmdet -qq

Imports

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

from icevision.all import *
import pandas as pd
INFO     - Downloading default `.ttf` font file - SpaceGrotesk-Medium.ttf from https://raw.githubusercontent.com/airctic/storage/master/SpaceGrotesk-Medium.ttf to /home/drscotthawley/.icevision/fonts/SpaceGrotesk-Medium.ttf | icevision.visualize.utils:get_default_font:66
INFO     - Downloading mmdet configs | icevision.models.mmdet.download_configs:download_mmdet_configs:31

Download dataset

We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here

#!rm -rf  /root/.icevision/data/espiownage-cyclegan
#data_url = "https://anonymized.machine.com/~drscotthawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master' 

# can use public espiownage cyclegan dataset:
#data_url = 'https://anonymized.machine.com/~drscotthawley/espiownage-cyclegan.tgz'
#data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'

# or local data already there:
from pathlib import Path
data_dir = Path('/home/drscotthawley/datasets/espiownage-cyclegan')

Understand the data format

In this task we were given a .csv file with annotations, let's take a look at that.

!!! danger "Important"
Replace source with your own path for the dataset directory.

df = pd.read_csv(data_dir / "bboxes/annotations.csv")
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 10 130 114 265 281
1 steelpan_0000000.png 512 384 4 272 37 377 178
2 steelpan_0000000.png 512 384 10 415 292 480 353
3 steelpan_0000000.png 512 384 10 36 21 109 158
4 steelpan_0000002.png 512 384 2 100 161 163 218

At first glance, we can make the following assumptions:

  • Multiple rows with the same filename, width, height
  • A label for each row
  • A bbox [xmin, ymin, xmax, ymax] for each row

Once we know what our data provides we can create our custom Parser.

set(np.array(df['label']).flatten())
{2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22}
#df['label'] = ["Object"]*len(df)#  "_"+df['label'].apply(str)   # force label to be string-like
df['label'] /= 2
#df.head()
df['label'] = df['label'].apply(int) 
print(set(np.array(df['label']).flatten()))
df['label'] = "_"+df['label'].apply(str)+"_"
{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 _5_ 130 114 265 281
1 steelpan_0000000.png 512 384 _2_ 272 37 377 178
2 steelpan_0000000.png 512 384 _5_ 415 292 480 353
3 steelpan_0000000.png 512 384 _5_ 36 21 109 158
4 steelpan_0000002.png 512 384 _1_ 100 161 163 218
df['label'] = 'AN'  # antinode
df.head()
filename width height label xmin ymin xmax ymax
0 steelpan_0000000.png 512 384 AN 130 114 265 281
1 steelpan_0000000.png 512 384 AN 272 37 377 178
2 steelpan_0000000.png 512 384 AN 415 292 480 353
3 steelpan_0000000.png 512 384 AN 36 21 109 158
4 steelpan_0000002.png 512 384 AN 100 161 163 218

Create the Parser

The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:

template_record = ObjectDetectionRecord()

Now use the method generate_template that will print out all the necessary steps we have to implement.

Parser.generate_template(template_record)
class MyParser(Parser):
    def __init__(self, template_record):
        super().__init__(template_record=template_record)
    def __iter__(self) -> Any:
    def __len__(self) -> int:
    def record_id(self, o: Any) -> Hashable:
    def parse_fields(self, o: Any, record: BaseRecord, is_new: bool):
        record.set_img_size(<ImgSize>)
        record.set_filepath(<Union[str, Path]>)
        record.detection.add_bboxes(<Sequence[BBox]>)
        record.detection.set_class_map(<ClassMap>)
        record.detection.add_labels(<Sequence[Hashable]>)

We can copy the template and use it as our starting point. Let's go over each of the methods we have to define:

  • __init__: What happens here is completely up to you, normally we have to pass some reference to our data, data_dir in our case.

  • __iter__: This tells our parser how to iterate over our data, each item returned here will be passed to parse_fields as o. In our case we call df.itertuples to iterate over all df rows.

  • __len__: How many items will be iterating over.

  • imageid: Should return a Hashable (int, str, etc). In our case we want all the dataset items that have the same filename to be unified in the same record.

  • parse_fields: Here is where the attributes of the record are collected, the template will suggest what methods we need to call on the record and what parameters it expects. The parameter o it receives is the item returned by __iter__.

!!! danger "Important"
Be sure to pass the correct type on all record methods!

class BBoxParser(Parser):
    def __init__(self, template_record, data_dir):
        super().__init__(template_record=template_record)
        
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations.csv")
        #self.df['label'] /= 2
        #self.df['label'] = self.df['label'].apply(int) 
        #self.df['label'] = "_"+self.df['label'].apply(str)+"_"
        self.df['label'] = 'AN'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
        
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
        
    def __len__(self) -> int:
        return len(self.df)
        
    def record_id(self, o) -> Hashable:
        return o.filename
        
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
            record.detection.set_class_map(self.class_map)
        
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
        record.detection.add_labels([o.label])

Let's randomly split the data and parser with Parser.parse:

parser = BBoxParser(template_record, data_dir)
train_records, valid_records = parser.parse()
INFO     - Autofixing records | icevision.parsers.parser:parse:136

Let's take a look at one record:

show_record(train_records[5], display_label=False, figsize=(14, 10))
train_records[0]
BaseRecord

common: 
	- Record ID: 870
	- Image size ImgSize(width=512, height=384)
	- Filepath: /home/drscotthawley/datasets/espiownage-cyclegan/images/steelpan_0040039.png
	- Img: None
detection: 
	- BBoxes: [<BBox (xmin:263, ymin:193, xmax:396, ymax:346)>, <BBox (xmin:128, ymin:68, xmax:411, ymax:113)>, <BBox (xmin:53, ymin:196, xmax:232, ymax:369)>, <BBox (xmin:9, ymin:6, xmax:112, ymax:117)>, <BBox (xmin:97, ymin:124, xmax:384, ymax:185)>]
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: [1, 1, 1, 1, 1]

Moving On...

Following the Getting Started "refrigerator" notebook...

# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

this next cell generates an error. ignore it and move on

samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
selection = 1


extra_args = {}

if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x

elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn

elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size

elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size

model_type, backbone, extra_args
(<module 'icevision.models.torchvision.retinanet' from '/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/icevision/models/torchvision/retinanet/__init__.py'>,
 <icevision.models.torchvision.retinanet.backbones.resnet_fpn.RetinanetTorchvisionBackboneConfig at 0x7fdf2dd9ce50>,
 {})
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args) 
Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /home/drscotthawley/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)
model_type.show_batch(first(valid_dl), ncols=4)
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)
learn.lr_find(end_lr=0.005)

# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)
SuggestedLRs(lr_min=2.7542287716642023e-05, lr_steep=7.585775892948732e-05)
learn.fine_tune(60, 1e-4, freeze_epochs=2)
epoch train_loss valid_loss COCOMetric time
0 1.382472 1.138447 0.014189 00:24
1 0.995744 0.985307 0.090372 00:23
epoch train_loss valid_loss COCOMetric time
0 0.766973 0.737682 0.266701 00:33
1 0.704099 0.664647 0.345390 00:33
2 0.641785 0.597058 0.410615 00:33
3 0.588479 0.552740 0.439253 00:33
4 0.553743 0.534752 0.484354 00:33
5 0.528012 0.492559 0.502902 00:32
6 0.496259 0.467164 0.507011 00:32
7 0.468638 0.433915 0.542065 00:32
8 0.448554 0.413170 0.579916 00:32
9 0.434068 0.412798 0.551295 00:32
10 0.415386 0.388108 0.586605 00:32
11 0.406240 0.382528 0.589393 00:32
12 0.400146 0.385356 0.559640 00:32
13 0.383361 0.367989 0.580080 00:32
14 0.368630 0.329652 0.634962 00:32
15 0.349198 0.333366 0.622955 00:32
16 0.378034 0.332617 0.638347 00:32
17 0.347078 0.316014 0.636328 00:32
18 0.337173 0.296142 0.671272 00:32
19 0.334409 0.324778 0.615452 00:32
20 0.336331 0.300936 0.659581 00:32
21 0.320485 0.295569 0.667316 00:32
22 0.307609 0.316991 0.574209 00:32
23 0.310142 0.277255 0.662101 00:32
24 0.293880 0.299024 0.638539 00:32
25 0.295195 0.267777 0.675304 00:32
26 0.289908 0.282522 0.637798 00:31
27 0.282468 0.287645 0.635229 00:32
28 0.282700 0.281933 0.664142 00:32
29 0.276684 0.266250 0.644042 00:31
30 0.274813 0.267006 0.670515 00:32
31 0.266677 0.245335 0.706987 00:32
32 0.268640 0.258879 0.673365 00:31
33 0.260452 0.255824 0.696242 00:32
34 0.263646 0.250630 0.698721 00:31
35 0.254323 0.248900 0.695740 00:31
36 0.246985 0.247855 0.688055 00:32
37 0.246890 0.263726 0.656197 00:31
38 0.246568 0.243103 0.702485 00:32
39 0.244392 0.267593 0.651396 00:31
40 0.239273 0.249232 0.677462 00:32
41 0.238626 0.245842 0.678853 00:31
42 0.229082 0.265359 0.653194 00:32
43 0.230608 0.260192 0.665803 00:32
44 0.228591 0.250828 0.675536 00:32
45 0.230032 0.255471 0.662977 00:32
46 0.225583 0.246671 0.682974 00:31
47 0.225184 0.243084 0.674873 00:32
48 0.220221 0.241577 0.687320 00:32
49 0.226940 0.254195 0.662080 00:32
50 0.215349 0.246749 0.677838 00:31
51 0.222931 0.231992 0.695828 00:32
52 0.216862 0.242454 0.677646 00:31
53 0.217495 0.236846 0.693383 00:32
54 0.211591 0.243640 0.682728 00:32
55 0.213098 0.237190 0.692352 00:32
56 0.224707 0.244036 0.678120 00:31
57 0.214665 0.245788 0.673550 00:31
58 0.219662 0.242171 0.682184 00:32
59 0.220014 0.242718 0.680125 00:31
model_type.show_results(model, valid_ds, detection_threshold=.5)

Inference

preds = model_type.predict(model, valid_ds, keep_images=True)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-a2cbe0697b78> in <module>
----> 1 preds, losses = model_type.predict(model, valid_ds, keep_images=True, with_loss=True)

TypeError: predict() got an unexpected keyword argument 'with_loss'
show_preds(preds=preds[0:10])
len(train_ds), len(valid_ds), len(preds)
(832, 209, 209)

let's try to figure out how to get what we want from these predictions. hmmm

preds[0].pred
BaseRecord

common: 
	- Img: 384x384x3 <np.ndarray> Image
	- Record ID: 539
	- Image size ImgSize(width=384, height=384)
detection: 
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: [1]
	- Scores: [0.9868553]
	- BBoxes: [<BBox (xmin:142.2591094970703, ymin:156.2466278076172, xmax:333.04168701171875, ymax:267.3779602050781)>]
preds[0].pred.detection.scores
array([0.9868553], dtype=float32)
preds[0].pred.detection.bboxes
[<BBox (xmin:142.2591094970703, ymin:156.2466278076172, xmax:333.04168701171875, ymax:267.3779602050781)>]
preds[0].pred.detection.bboxes[0].xmin

def get_bblist(pred):
    my_bblist = []
    bblist = pred.pred.detection.bboxes
    for i in range(len(bblist)):
        my_bblist.append([bblist[i].xmin, bblist[i].ymin, bblist[i].xmax, bblist[i].ymax])
    return my_bblist

get_bblist(preds[0])      
[[142.25911, 156.24663, 333.0417, 267.37796]]
results = []
for i in range(len(preds)):
    #print(f"i = {i}, file = {str(Path(valid_ds[i].common.filepath).stem)+'.csv'}, bboxes = {get_bblist(preds[i])}, scores={preds[i].pred.detection.scores}\n")
    worst_score = np.min(np.array(preds[i].pred.detection.scores))
    line_list = [str(Path(valid_ds[i].common.filepath).stem)+'.csv', get_bblist(preds[i]), preds[i].pred.detection.scores, worst_score, i]
    results.append(line_list)
    
# store as pandas dataframe
res_df = pd.DataFrame(results, columns=['filename', 'bblist','scores','worst_score','i'])
res_df = res_df.sort_values('worst_score')  # order by worst score as a "top losses" kind of thing
res_df.head() # take a look
filename bblist scores worst_score i
12 steelpan_0000029.csv [[42.684723, 187.99416, 183.18018, 268.6339], [191.2504, 75.9324, 323.02982, 229.93192], [68.3033, 67.521286, 170.44449, 165.69962], [51.211563, 267.44662, 249.46263, 322.21536], [33.59114, 206.9505, 212.26224, 289.34067]] [0.9714665, 0.9657926, 0.7204041, 0.70633274, 0.5111434] 0.511143 12
189 steelpan_0000904.csv [[200.8158, 174.86542, 281.0075, 250.05515], [234.00565, 75.93068, 330.38516, 176.80026], [85.0496, 183.49402, 189.56403, 333.149], [64.77986, 52.04474, 238.72475, 127.76965], [14.119356, 103.22791, 48.50573, 148.12149]] [0.9960259, 0.992378, 0.99137294, 0.912177, 0.511592] 0.511592 189
66 steelpan_0000049.csv [[15.12051, 118.76639, 70.77783, 183.96085], [248.91342, 105.94249, 336.9633, 247.08269], [293.78885, 264.37354, 336.80228, 308.43597], [142.19037, 184.63148, 180.11258, 227.51576]] [0.9898268, 0.9850744, 0.95944643, 0.5253186] 0.525319 66
47 steelpan_0000888.csv [[294.75272, 53.51385, 366.92, 116.89061], [229.65038, 95.167015, 287.82156, 189.70688], [82.16948, 205.09933, 121.94702, 250.89099], [158.75662, 223.72386, 324.54315, 322.34283], [50.855408, 113.68904, 192.70499, 194.04472]] [0.9938643, 0.99187475, 0.9840629, 0.9813324, 0.5274254] 0.527425 47
151 steelpan_0000784.csv [[17.847351, 52.1698, 67.796936, 101.52548], [227.98306, 78.69093, 316.62363, 199.97267], [6.6651897, 184.91397, 68.176956, 242.81845], [166.20135, 257.3072, 276.95096, 315.14948], [267.2783, 269.17236, 326.0624, 326.73273], [114.752144, 116.17899, 157.22328, 320.7459]] [0.98965514, 0.9828661, 0.9827445, 0.83460915, 0.7973198, 0.5497492] 0.549749 151
res_df.to_csv('bboxes_top_losses_cg.csv', index=False)

Reminder: which computer did we run this on?? lol

!hostname
lecun