This is a mashup of IceVision's "Custom Parser" example and their "Getting Started (Object Detection)" notebooks, to analyze SPNet Real dataset, for which I generated bounding boxes. -- drscotthawley, July 1, 2021
Installing IceVision and IceData
If on Colab run the following cell, else check the installation instructions
#try:
#    !wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
#    !chmod +x install_colab.sh && ./install_colab.sh
#except:
#    print("Ignore the error messages and just keep going")
 
import torch, re 
tv, cv = torch.__version__, torch.version.cuda
tv = re.sub('\+cu.*','',tv)
TORCH_VERSION = 'torch'+tv[0:-1]+'0'
CUDA_VERSION = 'cu'+cv.replace('.','')
print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")
print(f"CUDA available = {torch.cuda.is_available()}, Device count = {torch.cuda.device_count()}, Current device = {torch.cuda.current_device()}")
print(f"Device name = {torch.cuda.get_device_name()}")
print("hostname:")
!hostname
#!pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
#!pip install mmdet -qq
from icevision.all import *
import pandas as pd
We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here
#data_dir = icedata.load_data(data_url, 'chess_sample') / 'chess_sample-master'
# SPNET Real Dataset link (currently proprietary, thus link may not work)
#data_url = "https://anonymized.machine.com/~drscotthawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master' 
# public espiownage cyclegan dataset:
#data_url = 'https://anonymized.machine.com/~drscotthawley/espiownage-cyclegan.tgz'
#data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'
# local data already there:
from pathlib import Path
data_dir = Path('/home/drscotthawley/datasets/espiownage-spnet')  # real data is local and private
In this task we were given a .csv file with annotations, let's take a look at that.
!!! danger "Important"
    Replace source with your own path for the dataset directory.
df = pd.read_csv(data_dir / "bboxes/annotations.csv")
df.head()
At first glance, we can make the following assumptions:
- Multiple rows with the same filename, width, height
- A label for each row
- A bbox [xmin, ymin, xmax, ymax] for each row
Once we know what our data provides we can create our custom Parser.
set(np.array(df['label']).flatten())
#df['label'] = ["Object"]*len(df)#  "_"+df['label'].apply(str)   # force label to be string-like
df['label'] /= 2
#df.head()
df['label'] = df['label'].apply(int) 
print(set(np.array(df['label']).flatten()))
df['label'] = "_"+df['label'].apply(str)+"_"
df.head()
df['label'] = 'AN'  # antinode
df.head()
The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:
template_record = ObjectDetectionRecord()
Now use the method generate_template that will print out all the necessary steps we have to implement.
Parser.generate_template(template_record)
We can copy the template and use it as our starting point. Let's go over each of the methods we have to define:
- __init__: What happens here is completely up to you, normally we have to pass some reference to our data,- data_dirin our case.
- __iter__: This tells our parser how to iterate over our data, each item returned here will be passed to- parse_fieldsas- o. In our case we call- df.itertuplesto iterate over all- dfrows.
- __len__: How many items will be iterating over.
- imageid: Should return a- Hashable(- int,- str, etc). In our case we want all the dataset items that have the same- filenameto be unified in the same record.
- parse_fields: Here is where the attributes of the record are collected, the template will suggest what methods we need to call on the record and what parameters it expects. The parameter- oit receives is the item returned by- __iter__.
!!! danger "Important"
    Be sure to pass the correct type on all record methods!
class BBoxParser(Parser):
    def __init__(self, template_record, data_dir):
        super().__init__(template_record=template_record)
        
        self.data_dir = data_dir
        self.df = pd.read_csv(data_dir / "bboxes/annotations.csv")
        #self.df['label'] /= 2
        #self.df['label'] = self.df['label'].apply(int) 
        #self.df['label'] = "_"+self.df['label'].apply(str)+"_"
        self.df['label'] = 'AN'  # make them all the same object
        self.class_map = ClassMap(list(self.df['label'].unique()))
        
    def __iter__(self) -> Any:
        for o in self.df.itertuples():
            yield o
        
    def __len__(self) -> int:
        return len(self.df)
        
    def record_id(self, o) -> Hashable:
        return o.filename
        
    def parse_fields(self, o, record, is_new):
        if is_new:
            record.set_filepath(self.data_dir / 'images' / o.filename)
            record.set_img_size(ImgSize(width=o.width, height=o.height))
            record.detection.set_class_map(self.class_map)
        
        record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
        record.detection.add_labels([o.label])
Let's randomly split the data and parser with Parser.parse:
parser = BBoxParser(template_record, data_dir)
train_records, valid_records = parser.parse()
Let's take a look at one record:
show_record(train_records[5], display_label=False, figsize=(14, 10))
train_records[0]
# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384  
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])
# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)
samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
selection = 0
extra_args = {}
if selection == 0:
  model_type = models.mmdet.retinanet
  backbone = model_type.backbones.resnet50_fpn_1x
elif selection == 1:
  # The Retinanet model is also implemented in the torchvision library
  model_type = models.torchvision.retinanet
  backbone = model_type.backbones.resnet50_fpn
elif selection == 2:
  model_type = models.ross.efficientdet
  backbone = model_type.backbones.tf_lite0
  # The efficientdet model requires an img_size parameter
  extra_args['img_size'] = image_size
elif selection == 3:
  model_type = models.ultralytics.yolov5
  backbone = model_type.backbones.small
  # The yolov5 model requires an img_size parameter
  extra_args['img_size'] = image_size
model_type, backbone, extra_args
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args) 
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)
model_type.show_batch(first(valid_dl), ncols=4)
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)
learn.lr_find(end_lr=0.005)
# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)
learn.fine_tune(60, 7e-5, freeze_epochs=2)
model_type.show_results(model, valid_ds, detection_threshold=.5)
learn.save('iv_bbox_spnet')
learn.load('iv_bbox_real')
preds = model_type.predict(model, valid_ds, keep_images=True)
show_preds(preds=preds[0:10])
len(train_ds), len(valid_ds), len(preds)
let's try to figure out how to get what we want from these predictions. hmmm
preds[1].pred.detection.scores
preds[1].pred.detection.bboxes
preds[1].pred.detection.bboxes[0].xmin
def get_bblist(pred):
    my_bblist = []
    bblist = pred.pred.detection.bboxes
    for i in range(len(bblist)):
        my_bblist.append([bblist[i].xmin, bblist[i].ymin, bblist[i].xmax, bblist[i].ymax])
    return my_bblist
get_bblist(preds[1])      
results = []
for i in range(len(preds)):
    if (len(preds[i].pred.detection.scores) == 0): continue   # sometimes you get a zero box/prediction. ??
    #print(f"i = {i}, file = {str(Path(valid_ds[i].common.filepath).stem)+'.csv'}, bboxes = {get_bblist(preds[i])}, scores={preds[i].pred.detection.scores}\n")
    worst_score = np.min(np.array(preds[i].pred.detection.scores))
    line_list = [str(Path(valid_ds[i].common.filepath).stem)+'.csv', get_bblist(preds[i]), preds[i].pred.detection.scores, worst_score, i]
    results.append(line_list)
    
# store as pandas dataframe
res_df = pd.DataFrame(results, columns=['filename', 'bblist','scores','worst_score','i'])
res_df = res_df.sort_values('worst_score')  # order by worst score as a "top losses" kind of thing
res_df.head() # take a look
res_df.to_csv('bboxes_top_losses_spnet.csv', index=False)