Video Swin Transformer 환경설정 및 실행

https://github.com/SwinTransformer/Video-Swin-Transformer

많은 에러를 마주쳤으나 기억이 나지 않는다..

꼭 해야 할 것들만 기록해본다.

환경설정

apex

내 GPU는 3090인데, RTX 30 시리즈에서 버전 호환문제로 apex 설치가 잘 되지 않았다.

이거 때문에 삽질을 오래 했는데...

다행히 NGC에서 이를 해결한 이미지를 제공하고 있었다.

아래 이미지로 도커 환경을 세팅하였다.

https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags

PyTorch | NVIDIA NGC

PyTorch is a GPU accelerated tensor computational framework. Functionality can be extended with common Python libraries such as NumPy and SciPy. Automatic differentiation is done with a tape-based system at the functional and neural network layer levels.

catalog.ngc.nvidia.com

mmcv

opencv-python 이 설치되어 있으면 이상하게 에러가 났다.

그래서 opencv-python를 지우고 mmcv-full을 설치하였다.

mmcv는 mmcv와 mmcv-full이 있는데 둘 중 하나만 설치할 수 있다.

또한 도커파일 빌드 시 아래를 추가해야 한다.

RUN apt-get -y install libgl1-mesa-glx

커스텀 데이터셋 학습

비디오를 입력으로하는 datasetloader를 만드려고 한다.

annotation 파일을 생성

filepath label

비디오경로와 레이블 정보를 한 줄에 입력한다.

아래는 예시

some/path/000.mp4 1
some/path/001.mp4 1
some/path/002.mp4 2
some/path/003.mp4 2
some/path/004.mp4 3
some/path/005.mp4 3

커스텀 데이터셋 구현

git clone 후 mmaction/dataset 에서 custom 데이터셋을 구현한다.

나는 my_dataset.py 라는 파일에 MYDatase 이라는 클래스로 생성하였다.

import os.path as osp

from .base import BaseDataset
from .builder import DATASETS


@DATASETS.register_module()
class MYDataset(BaseDataset):
    """Video dataset for action recognition.

    The dataset loads raw videos and apply specified transforms to return a
    dict containing the frame tensors and other information.

    The ann_file is a text file with multiple lines, and each line indicates
    a sample video with the filepath and label, which are split with a
    whitespace. Example of a annotation file:

    .. code-block:: txt

        some/path/000.mp4 1
        some/path/001.mp4 1
        some/path/002.mp4 2
        some/path/003.mp4 2
        some/path/004.mp4 3
        some/path/005.mp4 3


    Args:
        ann_file (str): Path to the annotation file.
        pipeline (list[dict | callable]): A sequence of data transforms.
        start_iimport os.path as osp


    """

    def __init__(self, ann_file, pipeline, start_index=0, **kwargs):
        super().__init__(ann_file, pipeline, start_index=start_index, **kwargs)

    def load_annotations(self):
        """Load annotation file to get video information."""
        if self.ann_file.endswith('.json'):
            return self.load_json_annotations()

        video_infos = []
        with open(self.ann_file, 'r') as fin:
            for line in fin:
                line_split = line.split(' ')
                label = int(line_split.pop())
                filename = ' '.join(line_split)


                video_infos.append(dict(filename=filename, label=label))
        return video_infos

레지스트리 등록

mmaction/dataset/__init__.py 에 생성한 데이터셋을 import하고 __all__ 에 추가해준다.

...
from .my_dataset import MYDataset

__all__ = [
	...
    'BaseMiniBatchBlending', 'CutmixBlending', 'MixupBlending', 'LabelSmoothing', 'DATASETS',
    'PIPELINES', 'BLENDINGS', 'PoseDataset', 'MYDataset'
]

Config 생성

configs 폴더 안에 my_config.py를 생성한다.

나는 사용하고자하는 모델의 config 파일을 복사하고, dataset_type과 데이터파일 경로를 수정해주었다.

각자 학습할 데이터의 경로에 맞게 수정하면 된다.

dataset_type = 'MYDataset' # 생성한 데이터셋 이름
data_root = '/mount/video/kinetics600_5per/kinetics600_5per/train'
data_root_val = '/mount/video/kinetics600_5per/kinetics600_5per/test'
ann_file_train = '/mount/video/kinetics600_5per/kinetics600_5per/train_list.txt'
ann_file_val = '/mount/video/kinetics600_5per/kinetics600_5per/test_list.txt'
ann_file_test = '/mount/video/kinetics600_5per/kinetics600_5per/test_list.txt'

빌드

아래 명령어로 빌드를 해야 내가 생성한 데이터셋이 등록이 된다.

pip install -v -e .

그러면 학습 준비가 끝이 난다.