OBPR = "One Box Per Ring"
One idea for counting rings was to treat each ring as an object. ...Turns out that a lot of rings get missed if you do this. But it DOES usually detect the outer ring, which is why I decided to just detect outer-boxes with IceVision, use these to crop the image, and and then try counting the rings separately using the cropped sub-images.
Installing IceVision and IceData
If on Colab run the following cell, else check the installation instructions
!if -e /content:
try:
!wget https://raw.githubusercontent.com/airctic/icevision/master/install_colab.sh
!chmod +x install_colab.sh && ./install_colab.sh
except:
print("Ignore the error messages and just keep going")
!if !-e /content:
# For Icevision Install of MMD. cf. https://airctic.com/0.8.1/install/
import torch, re
tv, cv = torch.__version__, torch.version.cuda
tv = re.sub('\+cu.*','',tv)
TORCH_VERSION = 'torch'+tv[0:-1]+'0'
CUDA_VERSION = 'cu'+cv.replace('.','')
print(f"TORCH_VERSION={TORCH_VERSION}; CUDA_VERSION={CUDA_VERSION}")
!pip install -qq mmcv-full=="1.3.8" -f https://download.openmmlab.com/mmcv/dist/{CUDA_VERSION}/{TORCH_VERSION}/index.html --upgrade
!pip install mmdet -qq
from icevision.all import *
import pandas as pd
We're going to be using a small sample of the chess dataset, the full dataset is offered by roboflow here
#data_dir = icedata.load_data(data_url, 'chess_sample') / 'chess_sample-master'
# SPNET Real Dataset link (currently proprietary, thus link may not work)
#data_url = "https://anonymized.machine.com/~drscotthawley/spnet_sample-master.zip"
#data_dir = icedata.load_data(data_url, 'spnet_sample') / 'spnet_sample-master'
# espiownage cyclegan dataset:
data_url = 'https://anonymized.machine.com/~drscotthawley/espiownage-cyclegan.tgz'
data_dir = icedata.load_data(data_url, 'espiownage-cyclegan') / 'espiownage-cyclegan'
In this task we were given a .csv
file with annotations, let's take a look at that.
df = pd.read_csv(data_dir / "bboxes/annotations_obpr.csv")
df.head()
At first glance, we can make the following assumptions:
- Multiple rows with the same filename, width, height
- A label for each row
- A bbox [xmin, ymin, xmax, ymax] for each row
Once we know what our data provides we can create our custom Parser
.
set(np.array(df['label']).flatten())
"Ring" is going to take up too much space when we plot images. Let's change it to "R":
df['label'] = "R"
df.head()
The first step is to create a template record for our specific type of dataset, in this case we're doing standard object detection:
template_record = ObjectDetectionRecord()
Now use the method generate_template
that will print out all the necessary steps we have to implement.
Parser.generate_template(template_record)
# but currently not a priority!
class ChessParser(Parser):
def __init__(self, template_record, data_dir):
super().__init__(template_record=template_record)
self.data_dir = data_dir
self.df = pd.read_csv(data_dir / "bboxes/annotations_obpr.csv")
self.df['label'] = 'R' # make them all the same object
self.class_map = ClassMap(list(self.df['label'].unique()))
def __iter__(self) -> Any:
for o in self.df.itertuples():
yield o
def __len__(self) -> int:
return len(self.df)
def record_id(self, o) -> Hashable:
return o.filename
def parse_fields(self, o, record, is_new):
if is_new:
record.set_filepath(self.data_dir / 'images' / o.filename)
record.set_img_size(ImgSize(width=o.width, height=o.height))
record.detection.set_class_map(self.class_map)
record.detection.add_bboxes([BBox.from_xyxy(o.xmin, o.ymin, o.xmax, o.ymax)])
record.detection.add_labels([o.label])
Let's randomly split the data and parser with Parser.parse
:
parser = ChessParser(template_record, data_dir)
train_records, valid_records = parser.parse()
Let's take a look at one record:
show_record(train_records[5], display_label=False, figsize=(14, 10))
train_records[0]
# size is set to 384 because EfficientDet requires its inputs to be divisible by 128
image_size = 384
train_tfms = tfms.A.Adapter([*tfms.A.aug_tfms(size=image_size, presize=512), tfms.A.Normalize()])
valid_tfms = tfms.A.Adapter([*tfms.A.resize_and_pad(image_size), tfms.A.Normalize()])
# Datasets
train_ds = Dataset(train_records, train_tfms)
valid_ds = Dataset(valid_records, valid_tfms)
samples = [train_ds[0] for _ in range(3)]
show_samples(samples, ncols=3)
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x(pretrained=True)
selection = 0
extra_args = {}
if selection == 0:
model_type = models.mmdet.retinanet
backbone = model_type.backbones.resnet50_fpn_1x
elif selection == 1:
# The Retinanet model is also implemented in the torchvision library
model_type = models.torchvision.retinanet
backbone = model_type.backbones.resnet50_fpn
elif selection == 2:
model_type = models.ross.efficientdet
backbone = model_type.backbones.tf_lite0
# The efficientdet model requires an img_size parameter
extra_args['img_size'] = image_size
elif selection == 3:
model_type = models.ultralytics.yolov5
backbone = model_type.backbones.small
# The yolov5 model requires an img_size parameter
extra_args['img_size'] = image_size
model_type, backbone, extra_args
model = model_type.model(backbone=backbone(pretrained=True), num_classes=len(parser.class_map), **extra_args)
train_dl = model_type.train_dl(train_ds, batch_size=8, num_workers=4, shuffle=True)
valid_dl = model_type.valid_dl(valid_ds, batch_size=8, num_workers=4, shuffle=False)
model_type.show_batch(first(valid_dl), ncols=4)
metrics = [COCOMetric(metric_type=COCOMetricType.bbox)]
learn = model_type.fastai.learner(dls=[train_dl, valid_dl], model=model, metrics=metrics)
learn.lr_find()
# For Sparse-RCNN, use lower `end_lr`
# learn.lr_find(end_lr=0.005)
learn.fine_tune(60, 1e-4, freeze_epochs=2)
model_type.show_results(model, valid_ds, detection_threshold=.5)
detection_threshold=0.5
and plot an ROC curve.