Here, we provide strong features for temporal action localization on HACS and Epic-Kitchens-100.
dataset | model | resolution | features | classification | average mAP |
---|---|---|---|---|---|
EK100 | TAda2D | 8 x 8 | features:code dc05 | classification:code 2j51 | 13.18 (A) |
HACS | TAda2D | 8 x 8 | features:code 23kv | - | 32.3 |
EK100 | ViViT Fact. Enc.-B16x2 | 32 x 2 | coming soon | coming soon | 18.30 (A) |
Annotations used for temporal action localization with our codebase can be found here:code r30w.
Pre-trained localization models using these features can be found in the MODEL_ZOO.md.
After downloading the compressed feature files, first extract the .pkl
files as follows. For example, for TAda2D HACS features:
cat features_s16_fps30_val_2G.tar.gz?? | tar zx
cat features_s16_fps30_train_2G.tar.gz?? | tar zx
By the above commands, you should have two folders named features_s16_fps30_train
and features_s16_fps30_val
, containing some .pkl
files. Each .pkl
file corresponds to one video.
To load the features, please use the load_feature
function in datasets/base/epickitchen100_feature.py
:
def load_feature(path):
if type(path) is str:
with open(path, 'rb') as f:
data = torch.load(f)
else:
data = torch.load(path)
return data
For Epic-Kitchen-100, we divide each video to multiple clips, and the length of each clip is 5 secs. To perform action localization, features are first concatenated using _transform_feature_scale
function in datasets/base/epickitchen100_feature.py
. For example, during training, if the action segment is [8.5, 16.1]
, it will require three clip features: [[5.0,10.0], [10.0, 15.0], [15.0, 20.0]]
. From the features from these clips, we obtain features for the ground truth action segment. For more details, please refer to epickitchen100_feature.py.