These dataclasses bundle patch embeddings with their spatial and mask metadata, replacing scattered positional return values and manual mask indexing throughout the codebase.
PatchState
Holds a set of patch embeddings at a single spatial scale, along with their grid positions and masks. Provides convenience properties for common operations like filtering visible patches.
HierarchicalPatchState
A list of PatchState objects ordered coarsest → finest (index 0 = global/CLS level). Currently used with two levels (CLS + patches), designed to extend to Swin-style multi-scale later.
EncoderOutput
Full encoder output bundling the hierarchical patch states with the full (pre-MAE-masking) positions and masks needed by the decoder for reconstruction.
Bundle of patch embeddings at a single spatial scale with their metadata.
Attributes: emb: (B, N, dim) patch embeddings pos: (N, 2) grid coordinates (row, col) for each patch non_empty: (B, N) content mask — 1 where patch has content (e.g. notes), 0 for empty mae_mask: (N,) MAE visibility mask — 1=visible, 0=masked out for reconstruction
Attributes: patches: Encoded representations (visible patches only) full_pos: (N_full, 2) all grid positions before MAE masking (needed by decoder) full_non_empty: (B, N_full) all content masks before MAE masking mae_mask: (N_full,) the MAE mask applied (1=visible, 0=masked)
Sample usage
Encoder returns EncoderOutput containing a HierarchicalPatchState:
enc_out = encoder(img, mask_ratio=0.5)# Access the patch hierarchycls_state = enc_out.patches.coarsest # PatchState with CLS tokenpatch_state = enc_out.patches.finest # PatchState with patch embeddings
Working with PatchState — filtering, shapes, masks:
ps = enc_out.patches.finestps.emb # (B, N_visible, dim) — patch embeddingsps.pos # (N_visible, 2) — grid coordinates (row, col)ps.non_empty # (B, N_visible) — content mask (1=has notes)ps.mae_mask # (N_visible,) — all True for already-filtered patchesps.dim # embedding dimensionps.num_patches # number of patchesvis = ps.visible # new PatchState with only MAE-visible patches