FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Supervised Action Segmentation

In this work, we propose an efficient Frame-Action Cross-attention Temporal modeling (FACT) framework that (i) performs temporal modeling on frame and action levels in parallel and (ii) leverage this parallelism to achieve iterative bidirectional information transfer between action/frame features and refine them.

We achieve SOTA on four datasets while enjoy lower computational cost.

Preparation

1. Install the Requirements

pip3 install -r requirements.txt

2. Prepare Codes

mkdir FACT_actseg
cd FACT_actseg
git clone https://github.com/ZijiaLewisLu/CVPR2024-FACT.git
mv CVPR2024-FACT src
mkdir data

3. Prepare Data

download Breakfast and GTEA data from link1 or link2, and place them in FACT_actseg/data.
download EgoProcel and Epic-Kitchens data from here, and place them in FACT_actseg/data.
Features for Epic-Kitchens can be downloaded via this script and extracted with utils/extract_epic_kitchen.py.
After this, FACT_actseg/data should contain four folders, one for each dataset.

Training

The training is configured using YAML, and all the configurations are listed in configs. You can use the following commands to run the experiments.

cd FACT_actseg
# breakfast
python3 -m src.train --cfg src/configs/breakfast.yaml --set aux.gpu 0 split "split1"
# gtea
python3 -m src.train --cfg src/configs/gtea.yaml --set aux.gpu 0 split "split1"
# egoprocel
python3 -m src.train --cfg src/configs/egoprocel.yaml --set aux.gpu 0 split "split1"
# epic-kitchens
python3 -m src.train --cfg src/configs/epic-kitchens.yaml --set aux.gpu 0 split "split1"

By default, log will be saved to FACT_actseg/log/<experiment-path>. Evaluation results are saved as Checkpoint objects defined utils/evaluate.py. Loss and metrics are also visualized with wandb.

Pre-Trained Models

Pre-trained model weights can be downloaded from here. You can place the files under FACT_actseg/ckpts and test the models with the following command.

python3 -m src.eval

We lost the original data and model weights in a disk failure. These models are replicated afterward, thus the performance slightly varies from those in the papers.

Breakfast models
GTEA models
EgoProceL models
Epic-Kitchens models

Citation

@inproceedings{
    lu2024fact,
    title={{FACT}: Frame-Action Cross-Attention Temporal Modeling for Efficient Supervised Action Segmentation},
    author={Zijia Lu and Ehsan Elhamifar},
    booktitle={Conference on Computer Vision and Pattern Recognition 2024},
    year={2024},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Supervised Action Segmentation

Preparation

1. Install the Requirements

2. Prepare Codes

3. Prepare Data

Training

Pre-Trained Models

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

FACT: Frame-Action Cross-Attention Temporal Modeling for Efficient Supervised Action Segmentation

Preparation

1. Install the Requirements

2. Prepare Codes

3. Prepare Data

Training

Pre-Trained Models

Citation