This is a mashup of IceVision's "Custom Parser" example and their "Getting Started (Object Detection)" notebooks, to analyze SPNet Real dataset, for which I generated bounding boxes. -- shawley, July 1, 2021

Installing IceVision and IceData

If on Colab run the following cell, else check the installation instructions

#try:
#    !wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
#    !chmod +x install_colab.sh && ./install_colab.sh
#except:
#    print("Ignore the error messages and just keep going")

import torch, re 
tv, cv = torch.__version__, torch.version.cuda
tv = re.sub('\+cu.*','',tv)
TORCH_VERSION = 'torch'+tv[0:-1]+'0'
CUDA_VERSION = 'cu'+cv.replace('.','')

print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")
print(f"CUDA available = {torch.cuda.is_available()}, Device count = {torch.cuda.device_count()}, Current device = {torch.cuda.current_device()}")
print(f"Device name = {torch.cuda.get_device_name()}")

TORCH_VERSION=torch1.8.0; CUDA_VERSION=cu102
CUDA available = True, Device count = 1, Current device = 0
Device name = TITAN X (Pascal)

#!pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
#!pip install mmdet -qq

Imports

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

from icevision.all import *
import pandas as pd

INFO     - Downloading default `.ttf` font file - SpaceGrotesk-Medium.ttf from https://raw.githubusercontent.com/airctic/storage/master/SpaceGrotesk-Medium.ttf to /home/drscotthawley/.icevision/fonts/SpaceGrotesk-Medium.ttf | icevision.visualize.utils:get_default_font:66
INFO     - Downloading mmdet configs | icevision.models.mmdet.download_configs:download_mmdet_configs:31

Download dataset

We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here

#!rm -rf  /root/.icevision/data/espiownage-cyclegan

#data_url = "https://anonymized.machine.com/~drscotthawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master' 

# can use public espiownage cyclegan dataset:
#data_url = 'https://anonymized.machine.com/~drscotthawley/espiownage-cyclegan.tgz'
#data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'

# or local data already there:
from pathlib import Path
data_dir = Path('/home/drscotthawley/datasets/espiownage-cyclegan')

Understand the data format

In this task we were given a .csv file with annotations, let's take a look at that.

!!! danger "Important"
Replace source with your own path for the dataset directory.

df = pd.read_csv(data_dir / "bboxes/annotations.csv")
df.head()

At first glance, we can make the following assumptions:

Multiple rows with the same filename, width, height
A label for each row
A bbox [xmin, ymin, xmax, ymax] for each row

Once we know what our data provides we can create our custom Parser.

set(np.array(df['label']).flatten())

{2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22}

#df['label'] = ["Object"]*len(df)#  "_"+df['label'].apply(str)   # force label to be string-like

df['label'] /= 2
#df.head()
df['label'] = df['label'].apply(int) 
print(set(np.array(df['label']).flatten()))
df['label'] = "_"+df['label'].apply(str)+"_"

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

df.head()

df['label'] = 'AN'  # antinode
df.head()

Create the Parser

The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:

template_record = ObjectDetectionRecord()

Now use the method generate_template that will print out all the necessary steps we have to implement.

Parser.generate_template(template_record)

class MyParser(Parser):
    def __init__(self, template_record):
        super().__init__(template_record=template_record)
    def __iter__(self) -> Any:
    def __len__(self) -> int:
    def record_id(self, o: Any) -> Hashable:
    def parse_fields(self, o: Any, record: BaseRecord, is_new: bool):
        record.set_img_size(<ImgSize>)
        record.set_filepath(<Union[str, Path]>)
        record.detection.add_bboxes(<Sequence[BBox]>)
        record.detection.set_class_map(<ClassMap>)
        record.detection.add_labels(<Sequence[Hashable]>)

We can copy the template and use it as our starting point. Let's go over each of the methods we have to define:

__init__: What happens here is completely up to you, normally we have to pass some reference to our data, data_dir in our case.
__iter__: This tells our parser how to iterate over our data, each item returned here will be passed to parse_fields as o. In our case we call df.itertuples to iterate over all df rows.
__len__: How many items will be iterating over.
imageid: Should return a Hashable (int, str, etc). In our case we want all the dataset items that have the same filename to be unified in the same record.
parse_fields: Here is where the attributes of the record are collected, the template will suggest what methods we need to call on the record and what parameters it expects. The parameter o it receives is the item returned by __iter__.

!!! danger "Important"
Be sure to pass the correct type on all record methods!

class BBoxParser(Parser):
    def __init__(self, template_record, data_dir):
        super().__init__(template_record=template_record)
        
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations.csv")
        #self.df['label'] /= 2
        #self.df['label'] = self.df['label'].apply(int) 
        #self.df['label'] = "_"+self.df['label'].apply(str)+"_"
        self.df['label'] = 'AN'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
        
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
        
    def __len__(self) -> int:
        return len(self.df)
        
    def record_id(self, o) -> Hashable:
        return o.filename
        
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
            record.detection.set_class_map(self.class_map)
        
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
        record.detection.add_labels([o.label])

Let's randomly split the data and parser with Parser.parse:

parser = BBoxParser(template_record, data_dir)

train_records, valid_records = parser.parse()

INFO     - Autofixing records | icevision.parsers.parser:parse:136

Let's take a look at one record:

show_record(train_records[5], display_label=False, figsize=(14, 10))

train_records[0]

BaseRecord

common: 
	- Record ID: 870
	- Image size ImgSize(width=512, height=384)
	- Filepath: /home/drscotthawley/datasets/espiownage-cyclegan/images/steelpan_0040039.png
	- Img: None
detection: 
	- BBoxes: [<BBox (xmin:263, ymin:193, xmax:396, ymax:346)>, <BBox (xmin:128, ymin:68, xmax:411, ymax:113)>, <BBox (xmin:53, ymin:196, xmax:232, ymax:369)>, <BBox (xmin:9, ymin:6, xmax:112, ymax:117)>, <BBox (xmin:97, ymin:124, xmax:384, ymax:185)>]
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: [1, 1, 1, 1, 1]

Moving On...

Following the Getting Started "refrigerator" notebook...

# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

this next cell generates an error. ignore it and move on

samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)

model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)

selection = 1


extra_args = {}

if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x

elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn

elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size

elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size

model_type, backbone, extra_args

(<module 'icevision.models.torchvision.retinanet' from '/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/icevision/models/torchvision/retinanet/__init__.py'>,
 <icevision.models.torchvision.retinanet.backbones.resnet_fpn.RetinanetTorchvisionBackboneConfig at 0x7fdf2dd9ce50>,
 {})

model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args)

Downloading: "https://download.pytorch.org/models/resnet50-19c8e357.pth" to /home/drscotthawley/.cache/torch/hub/checkpoints/resnet50-19c8e357.pth

train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)

model_type.show_batch(first(valid_dl), ncols=4)

metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]

learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)

learn.lr_find(end_lr=0.005)

# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)

SuggestedLRs(lr_min=2.7542287716642023e-05, lr_steep=7.585775892948732e-05)

learn.fine_tune(60, 1e-4, freeze_epochs=2)

model_type.show_results(model, valid_ds, detection_threshold=.5)

Inference

preds = model_type.predict(model, valid_ds, keep_images=True)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-a2cbe0697b78> in <module>
----> 1 preds, losses = model_type.predict(model, valid_ds, keep_images=True, with_loss=True)

TypeError: predict() got an unexpected keyword argument 'with_loss'

show_preds(preds=preds[0:10])

len(train_ds), len(valid_ds), len(preds)

(832, 209, 209)

let's try to figure out how to get what we want from these predictions. hmmm

preds[0].pred

BaseRecord

common: 
	- Img: 384x384x3 <np.ndarray> Image
	- Record ID: 539
	- Image size ImgSize(width=384, height=384)
detection: 
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: [1]
	- Scores: [0.9868553]
	- BBoxes: [<BBox (xmin:142.2591094970703, ymin:156.2466278076172, xmax:333.04168701171875, ymax:267.3779602050781)>]

preds[0].pred.detection.scores

array([0.9868553], dtype=float32)

preds[0].pred.detection.bboxes

[<BBox (xmin:142.2591094970703, ymin:156.2466278076172, xmax:333.04168701171875, ymax:267.3779602050781)>]

preds[0].pred.detection.bboxes[0].xmin

def get_bblist(pred):
    my_bblist = []
    bblist = pred.pred.detection.bboxes
    for i in range(len(bblist)):
        my_bblist.append([bblist[i].xmin, bblist[i].ymin, bblist[i].xmax, bblist[i].ymax])
    return my_bblist

get_bblist(preds[0])

[[142.25911, 156.24663, 333.0417, 267.37796]]

results = []
for i in range(len(preds)):
    #print(f"i = {i}, file = {str(Path(valid_ds[i].common.filepath).stem)+'.csv'}, bboxes = {get_bblist(preds[i])}, scores={preds[i].pred.detection.scores}\n")
    worst_score = np.min(np.array(preds[i].pred.detection.scores))
    line_list = [str(Path(valid_ds[i].common.filepath).stem)+'.csv', get_bblist(preds[i]), preds[i].pred.detection.scores, worst_score, i]
    results.append(line_list)
    
# store as pandas dataframe
res_df = pd.DataFrame(results, columns=['filename', 'bblist','scores','worst_score','i'])
res_df = res_df.sort_values('worst_score')  # order by worst score as a "top losses" kind of thing
res_df.head() # take a look

res_df.to_csv('bboxes_top_losses_cg.csv', index=False)

Reminder: which computer did we run this on?? lol

!hostname

lecun

	filename	width	height	label	xmin	ymin	xmax	ymax
0	steelpan_0000000.png	512	384	_5_	130	114	265	281
1	steelpan_0000000.png	512	384	_2_	272	37	377	178
2	steelpan_0000000.png	512	384	_5_	415	292	480	353
3	steelpan_0000000.png	512	384	_5_	36	21	109	158
4	steelpan_0000002.png	512	384	_1_	100	161	163	218

epoch	train_loss	valid_loss	COCOMetric	time
0	1.382472	1.138447	0.014189	00:24
1	0.995744	0.985307	0.090372	00:23

epoch	train_loss	valid_loss	COCOMetric	time
0	0.766973	0.737682	0.266701	00:33
1	0.704099	0.664647	0.345390	00:33
2	0.641785	0.597058	0.410615	00:33
3	0.588479	0.552740	0.439253	00:33
4	0.553743	0.534752	0.484354	00:33
5	0.528012	0.492559	0.502902	00:32
6	0.496259	0.467164	0.507011	00:32
7	0.468638	0.433915	0.542065	00:32
8	0.448554	0.413170	0.579916	00:32
9	0.434068	0.412798	0.551295	00:32
10	0.415386	0.388108	0.586605	00:32
11	0.406240	0.382528	0.589393	00:32
12	0.400146	0.385356	0.559640	00:32
13	0.383361	0.367989	0.580080	00:32
14	0.368630	0.329652	0.634962	00:32
15	0.349198	0.333366	0.622955	00:32
16	0.378034	0.332617	0.638347	00:32
17	0.347078	0.316014	0.636328	00:32
18	0.337173	0.296142	0.671272	00:32
19	0.334409	0.324778	0.615452	00:32
20	0.336331	0.300936	0.659581	00:32
21	0.320485	0.295569	0.667316	00:32
22	0.307609	0.316991	0.574209	00:32
23	0.310142	0.277255	0.662101	00:32
24	0.293880	0.299024	0.638539	00:32
25	0.295195	0.267777	0.675304	00:32
26	0.289908	0.282522	0.637798	00:31
27	0.282468	0.287645	0.635229	00:32
28	0.282700	0.281933	0.664142	00:32
29	0.276684	0.266250	0.644042	00:31
30	0.274813	0.267006	0.670515	00:32
31	0.266677	0.245335	0.706987	00:32
32	0.268640	0.258879	0.673365	00:31
33	0.260452	0.255824	0.696242	00:32
34	0.263646	0.250630	0.698721	00:31
35	0.254323	0.248900	0.695740	00:31
36	0.246985	0.247855	0.688055	00:32
37	0.246890	0.263726	0.656197	00:31
38	0.246568	0.243103	0.702485	00:32
39	0.244392	0.267593	0.651396	00:31
40	0.239273	0.249232	0.677462	00:32
41	0.238626	0.245842	0.678853	00:31
42	0.229082	0.265359	0.653194	00:32
43	0.230608	0.260192	0.665803	00:32
44	0.228591	0.250828	0.675536	00:32
45	0.230032	0.255471	0.662977	00:32
46	0.225583	0.246671	0.682974	00:31
47	0.225184	0.243084	0.674873	00:32
48	0.220221	0.241577	0.687320	00:32
49	0.226940	0.254195	0.662080	00:32
50	0.215349	0.246749	0.677838	00:31
51	0.222931	0.231992	0.695828	00:32
52	0.216862	0.242454	0.677646	00:31
53	0.217495	0.236846	0.693383	00:32
54	0.211591	0.243640	0.682728	00:32
55	0.213098	0.237190	0.692352	00:32
56	0.224707	0.244036	0.678120	00:31
57	0.214665	0.245788	0.673550	00:31
58	0.219662	0.242171	0.682184	00:32
59	0.220014	0.242718	0.680125	00:31

	filename	bblist	scores	worst_score	i
12	steelpan_0000029.csv	[[42.684723, 187.99416, 183.18018, 268.6339], [191.2504, 75.9324, 323.02982, 229.93192], [68.3033, 67.521286, 170.44449, 165.69962], [51.211563, 267.44662, 249.46263, 322.21536], [33.59114, 206.9505, 212.26224, 289.34067]]	[0.9714665, 0.9657926, 0.7204041, 0.70633274, 0.5111434]	0.511143	12
189	steelpan_0000904.csv	[[200.8158, 174.86542, 281.0075, 250.05515], [234.00565, 75.93068, 330.38516, 176.80026], [85.0496, 183.49402, 189.56403, 333.149], [64.77986, 52.04474, 238.72475, 127.76965], [14.119356, 103.22791, 48.50573, 148.12149]]	[0.9960259, 0.992378, 0.99137294, 0.912177, 0.511592]	0.511592	189
66	steelpan_0000049.csv	[[15.12051, 118.76639, 70.77783, 183.96085], [248.91342, 105.94249, 336.9633, 247.08269], [293.78885, 264.37354, 336.80228, 308.43597], [142.19037, 184.63148, 180.11258, 227.51576]]	[0.9898268, 0.9850744, 0.95944643, 0.5253186]	0.525319	66
47	steelpan_0000888.csv	[[294.75272, 53.51385, 366.92, 116.89061], [229.65038, 95.167015, 287.82156, 189.70688], [82.16948, 205.09933, 121.94702, 250.89099], [158.75662, 223.72386, 324.54315, 322.34283], [50.855408, 113.68904, 192.70499, 194.04472]]	[0.9938643, 0.99187475, 0.9840629, 0.9813324, 0.5274254]	0.527425	47
151	steelpan_0000784.csv	[[17.847351, 52.1698, 67.796936, 101.52548], [227.98306, 78.69093, 316.62363, 199.97267], [6.6651897, 184.91397, 68.176956, 242.81845], [166.20135, 257.3072, 276.95096, 315.14948], [267.2783, 269.17236, 326.0624, 326.73273], [114.752144, 116.17899, 157.22328, 320.7459]]	[0.98965514, 0.9828661, 0.9827445, 0.83460915, 0.7973198, 0.5497492]	0.549749	151

IceVision Bboxes - CycleGAN Data