This is a mashup of IceVision's "Custom Parser" example and their "Getting Started (Object Detection)" notebooks, to analyze SPNet Real dataset, for which I generated bounding boxes. -- drscotthawley, July 1, 2021

Installing IceVision and IceData

If on Colab run the following cell, else check the installation instructions

#try:
#    !wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
#    !chmod +x install_colab.sh && ./install_colab.sh
#except:
#    print("Ignore the error messages and just keep going")

import torch, re 
tv, cv = torch.__version__, torch.version.cuda
tv = re.sub('\+cu.*','',tv)
TORCH_VERSION = 'torch'+tv[0:-1]+'0'
CUDA_VERSION = 'cu'+cv.replace('.','')

print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")
print(f"CUDA available = {torch.cuda.is_available()}, Device count = {torch.cuda.device_count()}, Current device = {torch.cuda.current_device()}")
print(f"Device name = {torch.cuda.get_device_name()}")
print("hostname:")
!hostname

TORCH_VERSION=torch1.8.0; CUDA_VERSION=cu102
CUDA available = True, Device count = 1, Current device = 0
Device name = TITAN X (Pascal)
hostname:
lecun

#!pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
#!pip install mmdet -qq

Imports

As always, let's import everything from icevision. Additionally, we will also need pandas (you might need to install it with pip install pandas).

from icevision.all import *
import pandas as pd

INFO     - The mmdet config folder already exists. No need to downloaded it. Path : /home/drscotthawley/.icevision/mmdetection_configs/mmdetection_configs-2.10.0/configs | icevision.models.mmdet.download_configs:download_mmdet_configs:17

Download dataset

We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here

#data_dir = icedata.load_data(data_url, 'chess_sample') / 'chess_sample-master'

# SPNET Real Dataset link (currently proprietary, thus link may not work)
#data_url = "https://anonymized.machine.com/~drscotthawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master' 

# public espiownage cyclegan dataset:
#data_url = 'https://anonymized.machine.com/~drscotthawley/espiownage-cyclegan.tgz'
#data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'

# local data already there:
from pathlib import Path
data_dir = Path('/home/drscotthawley/datasets/espiownage-spnet')  # real data is local and private

Understand the data format

In this task we were given a .csv file with annotations, let's take a look at that.

!!! danger "Important"
Replace source with your own path for the dataset directory.

df = pd.read_csv(data_dir / "bboxes/annotations.csv")
df.head()

At first glance, we can make the following assumptions:

Multiple rows with the same filename, width, height
A label for each row
A bbox [xmin, ymin, xmax, ymax] for each row

Once we know what our data provides we can create our custom Parser.

set(np.array(df['label']).flatten())

{1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}

#df['label'] = ["Object"]*len(df)#  "_"+df['label'].apply(str)   # force label to be string-like

df['label'] /= 2
#df.head()
df['label'] = df['label'].apply(int) 
print(set(np.array(df['label']).flatten()))
df['label'] = "_"+df['label'].apply(str)+"_"

{0, 1, 2, 3, 4, 5}

df.head()

df['label'] = 'AN'  # antinode
df.head()

Create the Parser

The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:

template_record = ObjectDetectionRecord()

Now use the method generate_template that will print out all the necessary steps we have to implement.

Parser.generate_template(template_record)

class MyParser(Parser):
    def __init__(self, template_record):
        super().__init__(template_record=template_record)
    def __iter__(self) -> Any:
    def __len__(self) -> int:
    def record_id(self, o: Any) -> Hashable:
    def parse_fields(self, o: Any, record: BaseRecord, is_new: bool):
        record.set_img_size(<ImgSize>)
        record.set_filepath(<Union[str, Path]>)
        record.detection.set_class_map(<ClassMap>)
        record.detection.add_labels(<Sequence[Hashable]>)
        record.detection.add_bboxes(<Sequence[BBox]>)

We can copy the template and use it as our starting point. Let's go over each of the methods we have to define:

__init__: What happens here is completely up to you, normally we have to pass some reference to our data, data_dir in our case.
__iter__: This tells our parser how to iterate over our data, each item returned here will be passed to parse_fields as o. In our case we call df.itertuples to iterate over all df rows.
__len__: How many items will be iterating over.
imageid: Should return a Hashable (int, str, etc). In our case we want all the dataset items that have the same filename to be unified in the same record.
parse_fields: Here is where the attributes of the record are collected, the template will suggest what methods we need to call on the record and what parameters it expects. The parameter o it receives is the item returned by __iter__.

!!! danger "Important"
Be sure to pass the correct type on all record methods!

class BBoxParser(Parser):
    def __init__(self, template_record, data_dir):
        super().__init__(template_record=template_record)
        
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations.csv")
        #self.df['label'] /= 2
        #self.df['label'] = self.df['label'].apply(int) 
        #self.df['label'] = "_"+self.df['label'].apply(str)+"_"
        self.df['label'] = 'AN'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
        
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
        
    def __len__(self) -> int:
        return len(self.df)
        
    def record_id(self, o) -> Hashable:
        return o.filename
        
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
            record.detection.set_class_map(self.class_map)
        
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
        record.detection.add_labels([o.label])

Let's randomly split the data and parser with Parser.parse:

parser = BBoxParser(template_record, data_dir)

train_records, valid_records = parser.parse()

INFO     - Autofixing records | icevision.parsers.parser:parse:136

Let's take a look at one record:

show_record(train_records[5], display_label=False, figsize=(14, 10))

train_records[0]

BaseRecord

common: 
	- Image size ImgSize(width=512, height=384)
	- Record ID: 223
	- Filepath: /home/drscotthawley/datasets/espiownage-spnet/images/06240907_proc_01186.png
	- Img: None
detection: 
	- Class Map: <ClassMap: {'background': 0, 'AN': 1}>
	- Labels: [1, 1]
	- BBoxes: [<BBox (xmin:0, ymin:114, xmax:180, ymax:323)>, <BBox (xmin:216, ymin:119, xmax:367, ymax:268)>]

Moving On...

Following the Getting Started "refrigerator" notebook...

# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])

# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)

samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)

model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)

selection = 0

extra_args = {}

if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x

elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn

elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size

elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size

model_type, backbone, extra_args

(<module 'icevision.models.mmdet.models.retinanet' from '/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/icevision/models/mmdet/models/retinanet/__init__.py'>,
 <icevision.models.mmdet.models.retinanet.backbones.resnet_fpn.MMDetRetinanetBackboneConfig at 0x7f81b3326640>,
 {})

model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args)

/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/builder.py:16: UserWarning: ``build_anchor_generator`` would be deprecated soon, please use ``build_prior_generator`` 
  warnings.warn(

Use load_from_local loader
The model and loaded state dict do not match exactly

size mismatch for bbox_head.retina_cls.weight: copying a param with shape torch.Size([720, 256, 3, 3]) from checkpoint, the shape in current model is torch.Size([9, 256, 3, 3]).
size mismatch for bbox_head.retina_cls.bias: copying a param with shape torch.Size([720]) from checkpoint, the shape in current model is torch.Size([9]).

train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)

model_type.show_batch(first(valid_dl), ncols=4)

metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]

learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)

learn.lr_find(end_lr=0.005)

# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)

/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:324: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:360: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(

SuggestedLRs(lr_min=9.865622268989682e-05, lr_steep=4.279797212802805e-05)

learn.fine_tune(60, 7e-5, freeze_epochs=2)

model_type.show_results(model, valid_ds, detection_threshold=.5)

learn.save('iv_bbox_spnet')

Path('models/iv_bbox_real.pth')

Inference

Might get a CUDA OOM error here. If so, restart kernel and load what we just saved. You'll have to go back and re-define learn, model, valid_ds etc., though.

learn.load('iv_bbox_real')

<fastai.learner.Learner at 0x7f6af37e0be0>

preds = model_type.predict(model, valid_ds, keep_images=True)

/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:324: UserWarning: ``grid_anchors`` would be deprecated soon. Please use ``grid_priors`` 
  warnings.warn('``grid_anchors`` would be deprecated soon. '
/home/drscotthawley/envs/icevision/lib/python3.8/site-packages/mmdet/core/anchor/anchor_generator.py:360: UserWarning: ``single_level_grid_anchors`` would be deprecated soon. Please use ``single_level_grid_priors`` 
  warnings.warn(

show_preds(preds=preds[0:10])

len(train_ds), len(valid_ds), len(preds)

(1564, 391, 391)

let's try to figure out how to get what we want from these predictions. hmmm

preds[1].pred.detection.scores

array([0.74782693], dtype=float32)

preds[1].pred.detection.bboxes

[<BBox (xmin:157.4410400390625, ymin:131.0904541015625, xmax:273.8079528808594, ymax:248.00094604492188)>]

preds[1].pred.detection.bboxes[0].xmin

def get_bblist(pred):
    my_bblist = []
    bblist = pred.pred.detection.bboxes
    for i in range(len(bblist)):
        my_bblist.append([bblist[i].xmin, bblist[i].ymin, bblist[i].xmax, bblist[i].ymax])
    return my_bblist

get_bblist(preds[1])

[[157.44104, 131.09045, 273.80795, 248.00095]]

results = []
for i in range(len(preds)):
    if (len(preds[i].pred.detection.scores) == 0): continue   # sometimes you get a zero box/prediction. ??
    #print(f"i = {i}, file = {str(Path(valid_ds[i].common.filepath).stem)+'.csv'}, bboxes = {get_bblist(preds[i])}, scores={preds[i].pred.detection.scores}\n")
    worst_score = np.min(np.array(preds[i].pred.detection.scores))
    line_list = [str(Path(valid_ds[i].common.filepath).stem)+'.csv', get_bblist(preds[i]), preds[i].pred.detection.scores, worst_score, i]
    results.append(line_list)
    
# store as pandas dataframe
res_df = pd.DataFrame(results, columns=['filename', 'bblist','scores','worst_score','i'])
res_df = res_df.sort_values('worst_score')  # order by worst score as a "top losses" kind of thing
res_df.head() # take a look

res_df.to_csv('bboxes_top_losses_spnet.csv', index=False)

	filename	width	height	label	xmin	ymin	xmax	ymax
0	06240907_proc_00254.png	512	384	_0_	31	135	184	290
1	06240907_proc_00255.png	512	384	_0_	31	135	184	290
2	06240907_proc_00258.png	512	384	_0_	47	146	182	281
3	06240907_proc_00279.png	512	384	_3_	0	100	196	337
4	06240907_proc_00281.png	512	384	_3_	0	106	190	335

epoch	train_loss	valid_loss	COCOMetric	time
0	1.047699	0.615364	0.302271	00:27
1	0.575915	0.409219	0.582179	00:23

epoch	train_loss	valid_loss	COCOMetric	time
0	0.411781	0.366766	0.614092	00:27
1	0.397051	0.371890	0.597651	00:26
2	0.378663	0.342739	0.632920	00:26
3	0.358742	0.326446	0.645479	00:26
4	0.348090	0.318937	0.651167	00:26
5	0.338420	0.322026	0.638967	00:26
6	0.326928	0.314481	0.635699	00:26
7	0.317437	0.294522	0.664348	00:26
8	0.319057	0.300924	0.663398	00:26
9	0.311580	0.284971	0.670532	00:26
10	0.310455	0.310869	0.645157	00:26
11	0.309387	0.294418	0.657540	00:26
12	0.298440	0.296238	0.663736	00:26
13	0.299861	0.304949	0.665349	00:26
14	0.293429	0.292048	0.640622	00:26
15	0.286499	0.279393	0.668234	00:26
16	0.285640	0.277074	0.675794	00:26
17	0.285753	0.276993	0.681833	00:26
18	0.280671	0.275564	0.681320	00:26
19	0.278825	0.291242	0.632610	00:26
20	0.272143	0.272930	0.677797	00:26
21	0.273672	0.277031	0.669605	00:26
22	0.266255	0.272669	0.678056	00:26
23	0.267727	0.277017	0.675904	00:26
24	0.271897	0.276413	0.675349	00:26
25	0.256562	0.275574	0.666734	00:26
26	0.259799	0.277707	0.662865	00:26
27	0.254569	0.275554	0.672917	00:26
28	0.261391	0.271389	0.669720	00:26
29	0.258200	0.268209	0.682071	00:26
30	0.248794	0.285132	0.658587	00:26
31	0.251159	0.283234	0.673531	00:26
32	0.250187	0.274535	0.671519	00:26
33	0.245818	0.270305	0.668040	00:26
34	0.242382	0.268192	0.681797	00:26
35	0.242524	0.274608	0.678423	00:26
36	0.240442	0.287515	0.662058	00:26
37	0.240757	0.268516	0.680385	00:26
38	0.235095	0.273574	0.679136	00:26
39	0.236386	0.269723	0.678852	00:26
40	0.231749	0.269291	0.678322	00:26
41	0.231073	0.270203	0.678052	00:26
42	0.225076	0.269570	0.677818	00:26
43	0.226153	0.272644	0.670528	00:26
44	0.223029	0.268783	0.677829	00:26
45	0.231639	0.270344	0.677466	00:26
46	0.218738	0.275886	0.682482	00:26
47	0.223939	0.267651	0.681296	00:26
48	0.219822	0.271778	0.680471	00:26
49	0.225606	0.272843	0.679914	00:26
50	0.222605	0.273892	0.681581	00:26
51	0.228529	0.271275	0.680667	00:26
52	0.229368	0.270150	0.677127	00:26
53	0.221393	0.271045	0.677128	00:26
54	0.222894	0.270123	0.681291	00:26
55	0.213640	0.271296	0.679195	00:26
56	0.225078	0.271074	0.680125	00:26
57	0.216607	0.271225	0.678747	00:26
58	0.221376	0.271281	0.678708	00:26
59	0.222046	0.271297	0.679141	00:26

	filename	bblist	scores	worst_score	i
16	06241902_proc_00857.csv	[[101.7664, 97.137886, 214.55583, 225.27325]]	[0.50051755]	0.500518	33
99	06240907_proc_00556.csv	[[0.0, 34.046432, 157.73097, 326.12262]]	[0.50141394]	0.501414	254
52	06240907_proc_00724.csv	[[173.59372, 135.70735, 271.25345, 242.49944]]	[0.502879]	0.502879	147
45	06240907_proc_00725.csv	[[209.86786, 52.368305, 265.12656, 128.56065]]	[0.5033824]	0.503382	126
103	06240907_proc_00299.csv	[[0.0, 83.26705, 149.51791, 328.78644]]	[0.5035457]	0.503546	264