diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/.gitignore b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/.gitignore new file mode 100644 index 0000000000000000000000000000000000000000..306db4fda1fea41ff79900b4aaf55ccd2b222e1b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/.gitignore @@ -0,0 +1,121 @@ +# Byte-compiled / optimized / DLL files +__pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +*.egg-info/ +.installed.cfg +*.egg +MANIFEST + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +.hypothesis/ +.pytest_cache/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py +db.sqlite3 + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# pyenv +.python-version + +# celery beat schedule file +celerybeat-schedule + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ +env.bak/ +venv.bak/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ + +data +.vscode +.idea + +# custom +*.pkl +*.pkl.json +*.log.json +work_dirs/ +work_dirs +pretrained +pretrained/ +# Pytorch +*.pth +trash/ +trash diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/LICENSE b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/LICENSE new file mode 100644 index 0000000000000000000000000000000000000000..c14f578420e1f6ff6f60ca714839d0ed329a1051 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/LICENSE @@ -0,0 +1,64 @@ +NVIDIA Source Code License for SegFormer + +1. Definitions + +“Licensor” means any person or entity that distributes its Work. + +“Software” means the original work of authorship made available under this License. + +“Work” means the Software and any additions to or derivative works of the Software that are made available under +this License. + +The terms “reproduce,” “reproduction,” “derivative works,” and “distribution” have the meaning as provided under +U.S. copyright law; provided, however, that for the purposes of this License, derivative works shall not include +works that remain separable from, or merely link (or bind by name) to the interfaces of, the Work. + +Works, including the Software, are “made available” under this License by including in or with the Work either +(a) a copyright notice referencing the applicability of this License to the Work, or (b) a copy of this License. + +2. License Grant + +2.1 Copyright Grant. Subject to the terms and conditions of this License, each Licensor grants to you a perpetual, +worldwide, non-exclusive, royalty-free, copyright license to reproduce, prepare derivative works of, publicly +display, publicly perform, sublicense and distribute its Work and any resulting derivative works in any form. + +3. Limitations + +3.1 Redistribution. You may reproduce or distribute the Work only if (a) you do so under this License, (b) you +include a complete copy of this License with your distribution, and (c) you retain without modification any +copyright, patent, trademark, or attribution notices that are present in the Work. + +3.2 Derivative Works. You may specify that additional or different terms apply to the use, reproduction, and +distribution of your derivative works of the Work (“Your Terms”) only if (a) Your Terms provide that the use +limitation in Section 3.3 applies to your derivative works, and (b) you identify the specific derivative works +that are subject to Your Terms. Notwithstanding Your Terms, this License (including the redistribution +requirements in Section 3.1) will continue to apply to the Work itself. + +3.3 Use Limitation. The Work and any derivative works thereof only may be used or intended for use +non-commercially. Notwithstanding the foregoing, NVIDIA and its affiliates may use the Work and any derivative +works commercially. As used herein, “non-commercially” means for research or evaluation purposes only. + +3.4 Patent Claims. If you bring or threaten to bring a patent claim against any Licensor (including any claim, +cross-claim or counterclaim in a lawsuit) to enforce any patents that you allege are infringed by any Work, then +your rights under this License from such Licensor (including the grant in Section 2.1) will terminate immediately. + +3.5 Trademarks. This License does not grant any rights to use any Licensor’s or its affiliates’ names, logos, +or trademarks, except as necessary to reproduce the notices described in this License. + +3.6 Termination. If you violate any term of this License, then your rights under this License (including the +grant in Section 2.1) will terminate immediately. + +4. Disclaimer of Warranty. + +THE WORK IS PROVIDED “AS IS” WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING +WARRANTIES OR CONDITIONS OF M ERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE OR NON-INFRINGEMENT. YOU +BEAR THE RISK OF UNDERTAKING ANY ACTIVITIES UNDER THIS LICENSE. + +5. Limitation of Liability. + +EXCEPT AS PROHIBITED BY APPLICABLE LAW, IN NO EVENT AND UNDER NO LEGAL THEORY, WHETHER IN TORT (INCLUDING +NEGLIGENCE), CONTRACT, OR OTHERWISE SHALL ANY LICENSOR BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY DIRECT, +INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF OR RELATED TO THIS LICENSE, THE USE OR +INABILITY TO USE THE WORK (INCLUDING BUT NOT LIMITED TO LOSS OF GOODWILL, BUSINESS INTERRUPTION, LOST PROFITS OR +DATA, COMPUTER FAILURE OR MALFUNCTION, OR ANY OTHER COMM ERCIAL DAMAGES OR LOSSES), EVEN IF THE LICENSOR HAS BEEN +ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/README.md new file mode 100644 index 0000000000000000000000000000000000000000..27f8edffb2609e961d8cdf00c5f75443029e2801 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/README.md @@ -0,0 +1,65 @@ +# Segformer + +- 参考实现: +``` +url=https://github.com/NVlabs/SegFormer +``` + +# Requirements + +- install requirement + + ``` + pip install docutils myst-parser sphinx sphinx_copybutton sphinx_markdown_tables + pip install -e git+https://github.com/gaotongxiao/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme + pip install cityscapesscripts + pip install matplotlib mmcls numpy packaging prettytable + pip install codecov flake8 interrogate pytest xdoctest yapf + ``` + + 安装修改过的`mmcv-1.2.7`库, + + ```shell + # 卸载已安装的mmcv-1.2.7库 + pip3 uninstall mmcv-1.2.7 + + # 安装修改过的mmcv-1.2.7库 + cd mmcv-1.2.7 + python3 setup.py install + ``` + torch 和 apex要固定为ascend20220315版本,之后的版本会出错(SyncBN报错)。 + 另外还需要在项目目录下新建`pretrained`文件夹,并在`pretrained`路径下添加文件mit_b0.pth,文件获取路径: + obs://ascend-pytorch-model-file/验收-训练/cv/semantic_segmentation/segformer/mit_b0.pth + +# 精度性能 + +| 名称 | 精度 | 性能 | +| :----: | :---: | :--: | +| GPU-1p | - | 8.75 | +| GPU-8p | 76.91 | 50.16 | +| NPU-1p | - | 8.82 | +| NPU-8p | 76.40 | 54.48 | + +# 自验报告 +```shell +# 1p train perf +# 是否正确输出了性能log文件 +bash test/train_performance_1p.sh --data_path=real_data_path +# 验收结果: OK + +# 8p train perf +# 是否正确输出了性能log文件 +bash test/train_performance_8p.sh --data_path=real_data_path +# 验收结果: OK + +# 1p train full +# 是否正确输出了性能精度log文件,是否正确保存了模型文件 +bash test/train_full_1p.sh --data_path=real_data_path +# 验收结果: OK + +# 8p train full +# 是否正确输出了性能精度log文件,是否正确保存了模型文件 +bash test/train_full_8p.sh --data_path=real_data_path +# 验收结果: OK + +``` \ No newline at end of file diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/bind_pyt.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/bind_pyt.py new file mode 100644 index 0000000000000000000000000000000000000000..d05a5a495594f5b729f4ae371b7260cbb273ae0c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/bind_pyt.py @@ -0,0 +1,141 @@ +# Copyright (c) 2019-2021 NVIDIA CORPORATION. All rights reserved. +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import sys +import subprocess +import os +import socket +from argparse import ArgumentParser, REMAINDER + +import torch + + +def parse_args(): + """ + Helper function parsing the command line options + @retval ArgumentParser + """ + parser = ArgumentParser(description="PyTorch distributed training launch " + "helper utilty that will spawn up " + "multiple distributed processes") + + # Optional arguments for the launch helper + parser.add_argument("--nnodes", type=int, default=1, + help="The number of nodes to use for distributed " + "training") + parser.add_argument("--node_rank", type=int, default=0, + help="The rank of the node for multi-node distributed " + "training") + parser.add_argument("--nproc_per_node", type=int, default=8, + help="The number of processes to launch on each node, " + "for GPU training, this is recommended to be set " + "to the number of GPUs in your system so that " + "each process can be bound to a single GPU.") + parser.add_argument("--master_addr", default="127.0.0.1", type=str, + help="Master node (rank 0)'s address, should be either " + "the IP address or the hostname of node 0, for " + "single node multi-proc training, the " + "--master_addr can simply be 127.0.0.1") + parser.add_argument("--master_port", default=29688, type=int, + help="Master node (rank 0)'s free port that needs to " + "be used for communciation during distributed " + "training") + parser.add_argument('--no_hyperthreads', action='store_true', + help='Flag to disable binding to hyperthreads') + parser.add_argument('--no_membind', action='store_true', + help='Flag to disable memory binding') + + # non-optional arguments for binding + parser.add_argument("--nsockets_per_node", type=int, required=True, + help="Number of CPU sockets on a node") + parser.add_argument("--ncores_per_socket", type=int, required=True, + help="Number of CPU cores per socket") + + # positional + parser.add_argument("training_script", type=str, + help="The full path to the single GPU training " + "program/script to be launched in parallel, " + "followed by all the arguments for the " + "training script") + + # rest from the training program + parser.add_argument("--data_path", type=str, default='') + parser.add_argument('training_script_args', nargs=REMAINDER) + return parser.parse_args() + + +def main(): + args = parse_args() + + # variables for numactrl binding + + NSOCKETS = args.nsockets_per_node + NGPUS_PER_SOCKET = (args.nproc_per_node // args.nsockets_per_node) + ( + 1 if (args.nproc_per_node % args.nsockets_per_node) else 0) + NCORES_PER_GPU = args.ncores_per_socket // NGPUS_PER_SOCKET + + # world size in terms of number of processes + dist_world_size = args.nproc_per_node * args.nnodes + + # set PyTorch distributed related environmental variables + current_env = os.environ.copy() + current_env["MASTER_ADDR"] = args.master_addr + current_env["MASTER_PORT"] = str(args.master_port) + current_env["WORLD_SIZE"] = str(dist_world_size) + current_env['NODE_RANK'] = str(args.node_rank) + + processes = [] + + for local_rank in range(0, args.nproc_per_node): + # each process's rank + dist_rank = args.nproc_per_node * args.node_rank + local_rank + current_env["RANK"] = str(dist_rank) + current_env['LOCAL_RANK'] = str(local_rank) + + # form numactrl binding command + cpu_ranges = [local_rank * NCORES_PER_GPU, + (local_rank + 1) * NCORES_PER_GPU - 1, + local_rank * NCORES_PER_GPU + (NCORES_PER_GPU * NGPUS_PER_SOCKET * NSOCKETS), + (local_rank + 1) * NCORES_PER_GPU + (NCORES_PER_GPU * NGPUS_PER_SOCKET * NSOCKETS) - 1] + + numactlargs = [] + if args.no_hyperthreads: + numactlargs += ["--physcpubind={}-{}".format(*cpu_ranges[0:2])] + else: + numactlargs += ["--physcpubind={}-{},{}-{}".format(*cpu_ranges)] + + if not args.no_membind: + memnode = local_rank // NGPUS_PER_SOCKET + numactlargs += ["--membind={}".format(memnode)] + + # spawn the processes + cmd = ["/usr/bin/numactl"] \ + + numactlargs \ + + [sys.executable, + "-u", + args.training_script, + "--local_rank={}".format(local_rank) + ] \ + + args.training_script_args + + process = subprocess.Popen(cmd, env=current_env) + processes.append(process) + + for process in processes: + process.wait() + + +if __name__ == "__main__": + main() + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..efc8b4bb20c981f3db6df7eb52b3dc0744c94cc0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/ade20k.py @@ -0,0 +1,54 @@ +# dataset settings +dataset_type = 'ADE20KDataset' +data_root = 'data/ade/ADEChallengeData2016' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 512) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations', reduce_zero_label=True), + dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 512), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/chase_db1.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/chase_db1.py new file mode 100644 index 0000000000000000000000000000000000000000..298594ea925f87f22b37094a2ec50e370aec96a0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/chase_db1.py @@ -0,0 +1,59 @@ +# dataset settings +dataset_type = 'ChaseDB1Dataset' +data_root = 'data/CHASE_DB1' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +img_scale = (960, 999) +crop_size = (128, 128) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']) +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) +] + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=40000, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..f21867c63e1835f6fceb61f066e802fd8fd2a735 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/cityscapes.py @@ -0,0 +1,54 @@ +# dataset settings +dataset_type = 'CityscapesDataset' +data_root = 'data/cityscapes/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 1024) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/train', + ann_dir='gtFine/train', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/cityscapes_768x768.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/cityscapes_768x768.py new file mode 100644 index 0000000000000000000000000000000000000000..fde9d7c7d8076dabff081fce0989eec6a6f5ff07 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/cityscapes_768x768.py @@ -0,0 +1,35 @@ +_base_ = './cityscapes.py' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (768, 768) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2049, 1025), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2049, 1025), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + train=dict(pipeline=train_pipeline), + val=dict(pipeline=test_pipeline), + test=dict(pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/cityscapes_769x769.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/cityscapes_769x769.py new file mode 100644 index 0000000000000000000000000000000000000000..336c7b254fe392b4703039fec86a83acdbd2e1a5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/cityscapes_769x769.py @@ -0,0 +1,35 @@ +_base_ = './cityscapes.py' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (769, 769) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2049, 1025), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2049, 1025), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + train=dict(pipeline=train_pipeline), + val=dict(pipeline=test_pipeline), + test=dict(pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/drive.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/drive.py new file mode 100644 index 0000000000000000000000000000000000000000..06e8ff606e0d2a4514ec8b7d2c6c436a32efcbf4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/drive.py @@ -0,0 +1,59 @@ +# dataset settings +dataset_type = 'DRIVEDataset' +data_root = 'data/DRIVE' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +img_scale = (584, 565) +crop_size = (64, 64) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']) +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) +] + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=40000, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/hrf.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/hrf.py new file mode 100644 index 0000000000000000000000000000000000000000..242d790eb1b83e75cf6b7eaa7a35c674099311ad --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/hrf.py @@ -0,0 +1,59 @@ +# dataset settings +dataset_type = 'HRFDataset' +data_root = 'data/HRF' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +img_scale = (2336, 3504) +crop_size = (256, 256) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']) +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) +] + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=40000, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..ff65bad1b86d7e3a5980bb5b9fc55798dc8df5f4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/pascal_context.py @@ -0,0 +1,60 @@ +# dataset settings +dataset_type = 'PascalContextDataset' +data_root = 'data/VOCdevkit/VOC2010/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) + +img_scale = (520, 520) +crop_size = (480, 480) + +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClassContext', + split='ImageSets/SegmentationContext/train.txt', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClassContext', + split='ImageSets/SegmentationContext/val.txt', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClassContext', + split='ImageSets/SegmentationContext/val.txt', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/pascal_voc12.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/pascal_voc12.py new file mode 100644 index 0000000000000000000000000000000000000000..ba1d42d0c5781f56dc177d860d856bb34adce555 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/pascal_voc12.py @@ -0,0 +1,57 @@ +# dataset settings +dataset_type = 'PascalVOCDataset' +data_root = 'data/VOCdevkit/VOC2012' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 512) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 512), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClass', + split='ImageSets/Segmentation/train.txt', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClass', + split='ImageSets/Segmentation/val.txt', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClass', + split='ImageSets/Segmentation/val.txt', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/pascal_voc12_aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/pascal_voc12_aug.py new file mode 100644 index 0000000000000000000000000000000000000000..3f23b6717d53ad29f02dd15046802a2631a5076b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/pascal_voc12_aug.py @@ -0,0 +1,9 @@ +_base_ = './pascal_voc12.py' +# dataset settings +data = dict( + train=dict( + ann_dir=['SegmentationClass', 'SegmentationClassAug'], + split=[ + 'ImageSets/Segmentation/train.txt', + 'ImageSets/Segmentation/aug.txt' + ])) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/stare.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/stare.py new file mode 100644 index 0000000000000000000000000000000000000000..3f71b25488cc11a6b4d582ac52b5a24e1ad1cf8e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/datasets/stare.py @@ -0,0 +1,59 @@ +# dataset settings +dataset_type = 'STAREDataset' +data_root = 'data/STARE' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +img_scale = (605, 700) +crop_size = (128, 128) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']) +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) +] + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=40000, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/default_runtime.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/default_runtime.py new file mode 100644 index 0000000000000000000000000000000000000000..b564cc4e7e7d9a67dacaaddecb100e4d8f5c005b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/default_runtime.py @@ -0,0 +1,14 @@ +# yapf:disable +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook', by_epoch=False), + # dict(type='TensorboardLoggerHook') + ]) +# yapf:enable +dist_params = dict(backend='nccl') +log_level = 'INFO' +load_from = None +resume_from = None +workflow = [('train', 1)] +cudnn_benchmark = True diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ann_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ann_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..a2cb653827e44e6015b3b83bc578003e614a6aa1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ann_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='ANNHead', + in_channels=[1024, 2048], + in_index=[2, 3], + channels=512, + project_channels=256, + query_scales=(1, ), + key_pool_scales=(1, 3, 6, 8), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/apcnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/apcnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..c8f5316cbcf3896ba9de7ca2c801eba512f01d5e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/apcnet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='APCHead', + in_channels=2048, + in_index=3, + channels=512, + pool_scales=(1, 2, 3, 6), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=dict(type='SyncBN', requires_grad=True), + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ccnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ccnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..794148f576b9e215c3c6963e73dffe98204b7717 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ccnet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='CCHead', + in_channels=2048, + in_index=3, + channels=512, + recurrence=2, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/cgnet.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/cgnet.py new file mode 100644 index 0000000000000000000000000000000000000000..eff8d9458c877c5db894957e0b1b4597e40da6ab --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/cgnet.py @@ -0,0 +1,35 @@ +# model settings +norm_cfg = dict(type='SyncBN', eps=1e-03, requires_grad=True) +model = dict( + type='EncoderDecoder', + backbone=dict( + type='CGNet', + norm_cfg=norm_cfg, + in_channels=3, + num_channels=(32, 64, 128), + num_blocks=(3, 21), + dilations=(2, 4), + reductions=(8, 16)), + decode_head=dict( + type='FCNHead', + in_channels=256, + in_index=2, + channels=256, + num_convs=0, + concat_input=False, + dropout_ratio=0, + num_classes=19, + norm_cfg=norm_cfg, + loss_decode=dict( + type='CrossEntropyLoss', + use_sigmoid=False, + loss_weight=1.0, + class_weight=[ + 2.5959933, 6.7415504, 3.5354059, 9.8663225, 9.690899, 9.369352, + 10.289121, 9.953208, 4.3097677, 9.490387, 7.674431, 9.396905, + 10.347791, 6.3927646, 10.226669, 10.241062, 10.280587, + 10.396974, 10.055647 + ])), + # model training and testing settings + train_cfg=dict(sampler=None), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/danet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/danet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..2c934939fac48525f22ad86f489a041dd7db7d09 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/danet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='DAHead', + in_channels=2048, + in_index=3, + channels=512, + pam_channels=64, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/deeplabv3_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/deeplabv3_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..d7a43bee01422ad4795dd27874e0cd4bb6cbfecf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/deeplabv3_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='ASPPHead', + in_channels=2048, + in_index=3, + channels=512, + dilations=(1, 12, 24, 36), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/deeplabv3_unet_s5-d16.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/deeplabv3_unet_s5-d16.py new file mode 100644 index 0000000000000000000000000000000000000000..0cd262999d8b2cb8e14a5c32190ae73f479d8e81 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/deeplabv3_unet_s5-d16.py @@ -0,0 +1,50 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained=None, + backbone=dict( + type='UNet', + in_channels=3, + base_channels=64, + num_stages=5, + strides=(1, 1, 1, 1, 1), + enc_num_convs=(2, 2, 2, 2, 2), + dec_num_convs=(2, 2, 2, 2), + downsamples=(True, True, True, True), + enc_dilations=(1, 1, 1, 1, 1), + dec_dilations=(1, 1, 1, 1), + with_cp=False, + conv_cfg=None, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + upsample_cfg=dict(type='InterpConv'), + norm_eval=False), + decode_head=dict( + type='ASPPHead', + in_channels=64, + in_index=4, + channels=16, + dilations=(1, 12, 24, 36), + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=128, + in_index=3, + channels=64, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='slide', crop_size=256, stride=170)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/deeplabv3plus_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/deeplabv3plus_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..050e39e091d816df9028d23aa3ecf9db74e441e1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/deeplabv3plus_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='DepthwiseSeparableASPPHead', + in_channels=2048, + in_index=3, + channels=512, + dilations=(1, 12, 24, 36), + c1_in_channels=256, + c1_channels=48, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/dmnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/dmnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..d22ba52640bebd805b3b8d07025e276dfb023759 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/dmnet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='DMHead', + in_channels=2048, + in_index=3, + channels=512, + filter_sizes=(1, 3, 5, 7), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=dict(type='SyncBN', requires_grad=True), + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/dnl_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/dnl_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..edb4c174c51e34c103737ba39bfc48bf831e561d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/dnl_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='DNLHead', + in_channels=2048, + in_index=3, + channels=512, + dropout_ratio=0.1, + reduction=2, + use_scale=True, + mode='embedded_gaussian', + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/emanet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/emanet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..26adcd430926de0862204a71d345f2543167f27b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/emanet_r50-d8.py @@ -0,0 +1,47 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='EMAHead', + in_channels=2048, + in_index=3, + channels=256, + ema_channels=512, + num_bases=64, + num_stages=3, + momentum=0.1, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/encnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/encnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..be777123a886503172a95fe0719e956a147bbd68 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/encnet_r50-d8.py @@ -0,0 +1,48 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='EncHead', + in_channels=[512, 1024, 2048], + in_index=(1, 2, 3), + channels=512, + num_codes=32, + use_se_loss=True, + add_lateral=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), + loss_se_decode=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.2)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fast_scnn.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fast_scnn.py new file mode 100644 index 0000000000000000000000000000000000000000..32fdeb659355a5ce5ef2cc7c2f30742703811cdf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fast_scnn.py @@ -0,0 +1,57 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True, momentum=0.01) +model = dict( + type='EncoderDecoder', + backbone=dict( + type='FastSCNN', + downsample_dw_channels=(32, 48), + global_in_channels=64, + global_block_channels=(64, 96, 128), + global_block_strides=(2, 2, 1), + global_out_channels=128, + higher_in_channels=64, + lower_in_channels=128, + fusion_out_channels=128, + out_indices=(0, 1, 2), + norm_cfg=norm_cfg, + align_corners=False), + decode_head=dict( + type='DepthwiseSeparableFCNHead', + in_channels=128, + channels=128, + concat_input=False, + num_classes=19, + in_index=-1, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)), + auxiliary_head=[ + dict( + type='FCNHead', + in_channels=128, + channels=32, + num_convs=1, + num_classes=19, + in_index=-2, + norm_cfg=norm_cfg, + concat_input=False, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)), + dict( + type='FCNHead', + in_channels=64, + channels=32, + num_convs=1, + num_classes=19, + in_index=-3, + norm_cfg=norm_cfg, + concat_input=False, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)), + ], + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fcn_hr18.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fcn_hr18.py new file mode 100644 index 0000000000000000000000000000000000000000..c3e299bc89ada56ca14bbffcbdb08a586b8ed9e9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fcn_hr18.py @@ -0,0 +1,52 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://msra/hrnetv2_w18', + backbone=dict( + type='HRNet', + norm_cfg=norm_cfg, + norm_eval=False, + extra=dict( + stage1=dict( + num_modules=1, + num_branches=1, + block='BOTTLENECK', + num_blocks=(4, ), + num_channels=(64, )), + stage2=dict( + num_modules=1, + num_branches=2, + block='BASIC', + num_blocks=(4, 4), + num_channels=(18, 36)), + stage3=dict( + num_modules=4, + num_branches=3, + block='BASIC', + num_blocks=(4, 4, 4), + num_channels=(18, 36, 72)), + stage4=dict( + num_modules=3, + num_branches=4, + block='BASIC', + num_blocks=(4, 4, 4, 4), + num_channels=(18, 36, 72, 144)))), + decode_head=dict( + type='FCNHead', + in_channels=[18, 36, 72, 144], + in_index=(0, 1, 2, 3), + channels=sum([18, 36, 72, 144]), + input_transform='resize_concat', + kernel_size=1, + num_convs=1, + concat_input=False, + dropout_ratio=-1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fcn_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fcn_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..5e98f6cc918b6146fc6d613c6918e825ef1355c3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fcn_r50-d8.py @@ -0,0 +1,45 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='FCNHead', + in_channels=2048, + in_index=3, + channels=512, + num_convs=2, + concat_input=True, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fcn_unet_s5-d16.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fcn_unet_s5-d16.py new file mode 100644 index 0000000000000000000000000000000000000000..a33e7972877f902d0e7d18401ca675e3e4e60a18 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fcn_unet_s5-d16.py @@ -0,0 +1,51 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained=None, + backbone=dict( + type='UNet', + in_channels=3, + base_channels=64, + num_stages=5, + strides=(1, 1, 1, 1, 1), + enc_num_convs=(2, 2, 2, 2, 2), + dec_num_convs=(2, 2, 2, 2), + downsamples=(True, True, True, True), + enc_dilations=(1, 1, 1, 1, 1), + dec_dilations=(1, 1, 1, 1), + with_cp=False, + conv_cfg=None, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + upsample_cfg=dict(type='InterpConv'), + norm_eval=False), + decode_head=dict( + type='FCNHead', + in_channels=64, + in_index=4, + channels=64, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=128, + in_index=3, + channels=64, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='slide', crop_size=256, stride=170)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fpn_r50.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fpn_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..86ab327db92e44c14822d65f1c9277cb007f17c1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/fpn_r50.py @@ -0,0 +1,36 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 1, 1), + strides=(1, 2, 2, 2), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + neck=dict( + type='FPN', + in_channels=[256, 512, 1024, 2048], + out_channels=256, + num_outs=4), + decode_head=dict( + type='FPNHead', + in_channels=[256, 256, 256, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/gcnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/gcnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..3d2ad69f5c22adfe79d5fdabf920217628987166 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/gcnet_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='GCHead', + in_channels=2048, + in_index=3, + channels=512, + ratio=1 / 4., + pooling_type='att', + fusion_types=('channel_add', ), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/lraspp_m-v3-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/lraspp_m-v3-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..93258242a90695cc94a7c6bd41562d6a75988771 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/lraspp_m-v3-d8.py @@ -0,0 +1,25 @@ +# model settings +norm_cfg = dict(type='SyncBN', eps=0.001, requires_grad=True) +model = dict( + type='EncoderDecoder', + backbone=dict( + type='MobileNetV3', + arch='large', + out_indices=(1, 3, 16), + norm_cfg=norm_cfg), + decode_head=dict( + type='LRASPPHead', + in_channels=(16, 24, 960), + in_index=(0, 1, 2), + channels=128, + input_transform='multiple_select', + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/nonlocal_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/nonlocal_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..5674a39854cafd1f2e363bac99c58ccae62f24da --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/nonlocal_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='NLHead', + in_channels=2048, + in_index=3, + channels=512, + dropout_ratio=0.1, + reduction=2, + use_scale=True, + mode='embedded_gaussian', + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ocrnet_hr18.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ocrnet_hr18.py new file mode 100644 index 0000000000000000000000000000000000000000..c60f62a7cdf3f5c5096a7a7e725e8268fddcb057 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ocrnet_hr18.py @@ -0,0 +1,68 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='CascadeEncoderDecoder', + num_stages=2, + pretrained='open-mmlab://msra/hrnetv2_w18', + backbone=dict( + type='HRNet', + norm_cfg=norm_cfg, + norm_eval=False, + extra=dict( + stage1=dict( + num_modules=1, + num_branches=1, + block='BOTTLENECK', + num_blocks=(4, ), + num_channels=(64, )), + stage2=dict( + num_modules=1, + num_branches=2, + block='BASIC', + num_blocks=(4, 4), + num_channels=(18, 36)), + stage3=dict( + num_modules=4, + num_branches=3, + block='BASIC', + num_blocks=(4, 4, 4), + num_channels=(18, 36, 72)), + stage4=dict( + num_modules=3, + num_branches=4, + block='BASIC', + num_blocks=(4, 4, 4, 4), + num_channels=(18, 36, 72, 144)))), + decode_head=[ + dict( + type='FCNHead', + in_channels=[18, 36, 72, 144], + channels=sum([18, 36, 72, 144]), + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + kernel_size=1, + num_convs=1, + concat_input=False, + dropout_ratio=-1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[18, 36, 72, 144], + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + channels=512, + ocr_channels=256, + dropout_ratio=-1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + ], + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ocrnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ocrnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..615aa3ff703942b6c22b2d6e9642504dd3e41ebd --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/ocrnet_r50-d8.py @@ -0,0 +1,47 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='CascadeEncoderDecoder', + num_stages=2, + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=[ + dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=2048, + in_index=3, + channels=512, + ocr_channels=256, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ], + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/pointrend_r50.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/pointrend_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..9d323dbf9466d41e0800aa57ef84045f3d874bdf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/pointrend_r50.py @@ -0,0 +1,56 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='CascadeEncoderDecoder', + num_stages=2, + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 1, 1), + strides=(1, 2, 2, 2), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + neck=dict( + type='FPN', + in_channels=[256, 512, 1024, 2048], + out_channels=256, + num_outs=4), + decode_head=[ + dict( + type='FPNHead', + in_channels=[256, 256, 256, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=-1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + dict( + type='PointHead', + in_channels=[256], + in_index=[0], + channels=256, + num_fcs=3, + coarse_pred_each_layer=True, + dropout_ratio=-1, + num_classes=19, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ], + # model training and testing settings + train_cfg=dict( + num_points=2048, oversample_ratio=3, importance_sample_ratio=0.75), + test_cfg=dict( + mode='whole', + subdivision_steps=2, + subdivision_num_points=8196, + scale_factor=2)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/psanet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/psanet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..689513fa9d2a40f14bf0ae4ae61f38f0dcc1b3da --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/psanet_r50-d8.py @@ -0,0 +1,49 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='PSAHead', + in_channels=2048, + in_index=3, + channels=512, + mask_size=(97, 97), + psa_type='bi-direction', + compact=False, + shrink_factor=2, + normalization_factor=1.0, + psa_softmax=True, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/pspnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/pspnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..f451e08ad2eb0732dcb806b1851eb978d4acf136 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/pspnet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='PSPHead', + in_channels=2048, + in_index=3, + channels=512, + pool_scales=(1, 2, 3, 6), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/pspnet_unet_s5-d16.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/pspnet_unet_s5-d16.py new file mode 100644 index 0000000000000000000000000000000000000000..fcff9ec4f41fad158344ecd77313dc14564f3682 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/pspnet_unet_s5-d16.py @@ -0,0 +1,50 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained=None, + backbone=dict( + type='UNet', + in_channels=3, + base_channels=64, + num_stages=5, + strides=(1, 1, 1, 1, 1), + enc_num_convs=(2, 2, 2, 2, 2), + dec_num_convs=(2, 2, 2, 2), + downsamples=(True, True, True, True), + enc_dilations=(1, 1, 1, 1, 1), + dec_dilations=(1, 1, 1, 1), + with_cp=False, + conv_cfg=None, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + upsample_cfg=dict(type='InterpConv'), + norm_eval=False), + decode_head=dict( + type='PSPHead', + in_channels=64, + in_index=4, + channels=16, + pool_scales=(1, 2, 3, 6), + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=128, + in_index=3, + channels=64, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='slide', crop_size=256, stride=170)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/upernet_r50.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/upernet_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..10974962fdd7136031fd06de1700f497d355ceaa --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/models/upernet_r50.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 1, 1), + strides=(1, 2, 2, 2), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='UPerHead', + in_channels=[256, 512, 1024, 2048], + in_index=[0, 1, 2, 3], + pool_scales=(1, 2, 3, 6), + channels=512, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_160k.py new file mode 100644 index 0000000000000000000000000000000000000000..52603890b10f25faf8eec9f9e5a4468fae09b811 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_160k.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=160000) +checkpoint_config = dict(by_epoch=False, interval=16000) +evaluation = dict(interval=16000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_20k.py new file mode 100644 index 0000000000000000000000000000000000000000..bf780a1b6f6521833c6a5859675147824efa599d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_20k.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=20000) +checkpoint_config = dict(by_epoch=False, interval=2000) +evaluation = dict(interval=2000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_40k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_40k.py new file mode 100644 index 0000000000000000000000000000000000000000..cdbf841abcb26eed87bf76ab816aff4bae0630ee --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_40k.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=40000) +checkpoint_config = dict(by_epoch=False, interval=4000) +evaluation = dict(interval=4000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_80k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_80k.py new file mode 100644 index 0000000000000000000000000000000000000000..c190cee6bdc7922b688ea75dc8f152fa15c24617 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/_base_/schedules/schedule_80k.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=80000) +checkpoint_config = dict(by_epoch=False, interval=8000) +evaluation = dict(interval=8000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/README.md new file mode 100644 index 0000000000000000000000000000000000000000..7fc1648311d8f6789fd2ed99789973afbb940531 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/README.md @@ -0,0 +1,52 @@ +# Asymmetric Non-local Neural Networks for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{annn, + author = {Zhen Zhu and + Mengde Xu and + Song Bai and + Tengteng Huang and + Xiang Bai}, + title = {Asymmetric Non-local Neural Networks for Semantic Segmentation}, + booktitle={International Conference on Computer Vision}, + year = {2019}, + url = {http://arxiv.org/abs/1908.07678}, +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| ANN | R-50-D8 | 512x1024 | 40000 | 6 | 3.71 | 77.40 | 78.57 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x1024_40k_cityscapes/ann_r50-d8_512x1024_40k_cityscapes_20200605_095211-049fc292.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x1024_40k_cityscapes/ann_r50-d8_512x1024_40k_cityscapes_20200605_095211.log.json) | +| ANN | R-101-D8 | 512x1024 | 40000 | 9.5 | 2.55 | 76.55 | 78.85 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x1024_40k_cityscapes/ann_r101-d8_512x1024_40k_cityscapes_20200605_095243-adf6eece.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x1024_40k_cityscapes/ann_r101-d8_512x1024_40k_cityscapes_20200605_095243.log.json) | +| ANN | R-50-D8 | 769x769 | 40000 | 6.8 | 1.70 | 78.89 | 80.46 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_769x769_40k_cityscapes/ann_r50-d8_769x769_40k_cityscapes_20200530_025712-2b46b04d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_769x769_40k_cityscapes/ann_r50-d8_769x769_40k_cityscapes_20200530_025712.log.json) | +| ANN | R-101-D8 | 769x769 | 40000 | 10.7 | 1.15 | 79.32 | 80.94 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_769x769_40k_cityscapes/ann_r101-d8_769x769_40k_cityscapes_20200530_025720-059bff28.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_769x769_40k_cityscapes/ann_r101-d8_769x769_40k_cityscapes_20200530_025720.log.json) | +| ANN | R-50-D8 | 512x1024 | 80000 | - | - | 77.34 | 78.65 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x1024_80k_cityscapes/ann_r50-d8_512x1024_80k_cityscapes_20200607_101911-5a9ad545.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x1024_80k_cityscapes/ann_r50-d8_512x1024_80k_cityscapes_20200607_101911.log.json) | +| ANN | R-101-D8 | 512x1024 | 80000 | - | - | 77.14 | 78.81 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x1024_80k_cityscapes/ann_r101-d8_512x1024_80k_cityscapes_20200607_013728-aceccc6e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x1024_80k_cityscapes/ann_r101-d8_512x1024_80k_cityscapes_20200607_013728.log.json) | +| ANN | R-50-D8 | 769x769 | 80000 | - | - | 78.88 | 80.57 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_769x769_80k_cityscapes/ann_r50-d8_769x769_80k_cityscapes_20200607_044426-cc7ff323.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_769x769_80k_cityscapes/ann_r50-d8_769x769_80k_cityscapes_20200607_044426.log.json) | +| ANN | R-101-D8 | 769x769 | 80000 | - | - | 78.80 | 80.34 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_769x769_80k_cityscapes/ann_r101-d8_769x769_80k_cityscapes_20200607_013713-a9d4be8d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_769x769_80k_cityscapes/ann_r101-d8_769x769_80k_cityscapes_20200607_013713.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| ANN | R-50-D8 | 512x512 | 80000 | 9.1 | 21.01 | 41.01 | 42.30 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x512_80k_ade20k/ann_r50-d8_512x512_80k_ade20k_20200615_014818-26f75e11.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x512_80k_ade20k/ann_r50-d8_512x512_80k_ade20k_20200615_014818.log.json) | +| ANN | R-101-D8 | 512x512 | 80000 | 12.5 | 14.12 | 42.94 | 44.18 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x512_80k_ade20k/ann_r101-d8_512x512_80k_ade20k_20200615_014818-c0153543.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x512_80k_ade20k/ann_r101-d8_512x512_80k_ade20k_20200615_014818.log.json) | +| ANN | R-50-D8 | 512x512 | 160000 | - | - | 41.74 | 42.62 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x512_160k_ade20k/ann_r50-d8_512x512_160k_ade20k_20200615_231733-892247bc.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x512_160k_ade20k/ann_r50-d8_512x512_160k_ade20k_20200615_231733.log.json) | +| ANN | R-101-D8 | 512x512 | 160000 | - | - | 42.94 | 44.06 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x512_160k_ade20k/ann_r101-d8_512x512_160k_ade20k_20200615_231733-955eb1ec.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x512_160k_ade20k/ann_r101-d8_512x512_160k_ade20k_20200615_231733.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| ANN | R-50-D8 | 512x512 | 20000 | 6 | 20.92 | 74.86 | 76.13 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x512_20k_voc12aug/ann_r50-d8_512x512_20k_voc12aug_20200617_222246-dfcb1c62.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x512_20k_voc12aug/ann_r50-d8_512x512_20k_voc12aug_20200617_222246.log.json) | +| ANN | R-101-D8 | 512x512 | 20000 | 9.5 | 13.94 | 77.47 | 78.70 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x512_20k_voc12aug/ann_r101-d8_512x512_20k_voc12aug_20200617_222246-2fad0042.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x512_20k_voc12aug/ann_r101-d8_512x512_20k_voc12aug_20200617_222246.log.json) | +| ANN | R-50-D8 | 512x512 | 40000 | - | - | 76.56 | 77.51 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x512_40k_voc12aug/ann_r50-d8_512x512_40k_voc12aug_20200613_231314-b5dac322.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r50-d8_512x512_40k_voc12aug/ann_r50-d8_512x512_40k_voc12aug_20200613_231314.log.json) | +| ANN | R-101-D8 | 512x512 | 40000 | - | - | 76.70 | 78.06 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x512_40k_voc12aug/ann_r101-d8_512x512_40k_voc12aug_20200613_231314-bd205bbe.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ann/ann_r101-d8_512x512_40k_voc12aug/ann_r101-d8_512x512_40k_voc12aug_20200613_231314.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d494e07333217e0c6830d36d1bb58fa78b03cfb0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './ann_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1eeff0b030cf1db8c6ec9740fa65db44b2026d58 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './ann_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..9e43af541f6e3df3f36479e736bb0c03fc916970 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './ann_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..d854f2e4223731f443369febc500dbccdc524d9d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './ann_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..893c53b1ca4bf9788e4d94f0f53cfe92a93f48ce --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './ann_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..a64dac670ed4d4632e7b9791ec5f8a334dcea78e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './ann_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..59508248490b3edbac1c46b4fcc7891f99655b9b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './ann_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a9c712d1ccfd62ddf6f12ff01ea347ca1995013b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './ann_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..00b2594ba8a1c9edc90cca7a6d7c3334fa209edc --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/ann_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..ef7b369dd9e12b2282a30da14f99dd4547c53a7b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/ann_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..ca6bb248ac867d463c274f975c884aa80a57730f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/ann_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..071f190261c4e8f4a80a5da12a88e0cfcdfef0d8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/ann_r50-d8.py', '../_base_/datasets/pascal_voc12_aug.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..82a1c9386c51fb0ada436e51702beb961a534b26 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/ann_r50-d8.py', '../_base_/datasets/pascal_voc12_aug.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..5e04aa7c6ac050d119e07b715e2082f692e1a1de --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/ann_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..4912bdb9fb298518ae084eb7df0ad22d3e4ff84f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/ann_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d1cc072b152986102286f503e3d7b92999bf414c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ann/ann_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/ann_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..c2ab106a29c1a135fc7a726df9f6f22536357ced --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/README.md @@ -0,0 +1,39 @@ +# Adaptive Pyramid Context Network for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@InProceedings{He_2019_CVPR, +author = {He, Junjun and Deng, Zhongying and Zhou, Lei and Wang, Yali and Qiao, Yu}, +title = {Adaptive Pyramid Context Network for Semantic Segmentation}, +booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, +month = {June}, +year = {2019} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| APCNet | R-50-D8 | 512x1024 | 40000 | 7.7 | 3.57 | 78.02 | 79.26 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_512x1024_40k_cityscapes/apcnet_r50-d8_512x1024_40k_cityscapes_20201214_115717-5e88fa33.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_512x1024_40k_cityscapes/apcnet_r50-d8_512x1024_40k_cityscapes-20201214_115717.log.json) | +| APCNet | R-101-D8 | 512x1024 | 40000 | 11.2 | 2.15 | 79.08 | 80.34 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_512x1024_40k_cityscapes/apcnet_r101-d8_512x1024_40k_cityscapes_20201214_115716-abc9d111.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_512x1024_40k_cityscapes/apcnet_r101-d8_512x1024_40k_cityscapes-20201214_115716.log.json) | +| APCNet | R-50-D8 | 769x769 | 40000 | 8.7 | 1.52 | 77.89 | 79.75 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_769x769_40k_cityscapes/apcnet_r50-d8_769x769_40k_cityscapes_20201214_115717-2a2628d7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_769x769_40k_cityscapes/apcnet_r50-d8_769x769_40k_cityscapes-20201214_115717.log.json) | +| APCNet | R-101-D8 | 769x769 | 40000 | 12.7 | 1.03 | 77.96 | 79.24 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_769x769_40k_cityscapes/apcnet_r101-d8_769x769_40k_cityscapes_20201214_115718-b650de90.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_769x769_40k_cityscapes/apcnet_r101-d8_769x769_40k_cityscapes-20201214_115718.log.json) | +| APCNet | R-50-D8 | 512x1024 | 80000 | - | - | 78.96 | 79.94 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_512x1024_80k_cityscapes/apcnet_r50-d8_512x1024_80k_cityscapes_20201214_115716-987f51e3.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_512x1024_80k_cityscapes/apcnet_r50-d8_512x1024_80k_cityscapes-20201214_115716.log.json) | +| APCNet | R-101-D8 | 512x1024 | 80000 | - | - | 79.64 | 80.61 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_512x1024_80k_cityscapes/apcnet_r101-d8_512x1024_80k_cityscapes_20201214_115705-b1ff208a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_512x1024_80k_cityscapes/apcnet_r101-d8_512x1024_80k_cityscapes-20201214_115705.log.json) | +| APCNet | R-50-D8 | 769x769 | 80000 | - | - | 78.79 | 80.35 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_769x769_80k_cityscapes/apcnet_r50-d8_769x769_80k_cityscapes_20201214_115718-7ea9fa12.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_769x769_80k_cityscapes/apcnet_r50-d8_769x769_80k_cityscapes-20201214_115718.log.json) | +| APCNet | R-101-D8 | 769x769 | 80000 | - | - | 78.45 | 79.91 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_769x769_80k_cityscapes/apcnet_r101-d8_769x769_80k_cityscapes_20201214_115716-a7fbc2ab.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_769x769_80k_cityscapes/apcnet_r101-d8_769x769_80k_cityscapes-20201214_115716.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| APCNet | R-50-D8 | 512x512 | 80000 | 10.1 | 19.61 | 42.20 | 43.30 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_512x512_80k_ade20k/apcnet_r50-d8_512x512_80k_ade20k_20201214_115705-a8626293.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_512x512_80k_ade20k/apcnet_r50-d8_512x512_80k_ade20k-20201214_115705.log.json) | +| APCNet | R-101-D8 | 512x512 | 80000 | 13.6 | 13.10 | 45.54 | 46.65 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_512x512_80k_ade20k/apcnet_r101-d8_512x512_80k_ade20k_20201214_115704-c656c3fb.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_512x512_80k_ade20k/apcnet_r101-d8_512x512_80k_ade20k-20201214_115704.log.json) | +| APCNet | R-50-D8 | 512x512 | 160000 | - | - | 43.40 | 43.94 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_512x512_160k_ade20k/apcnet_r50-d8_512x512_160k_ade20k_20201214_115706-25fb92c2.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r50-d8_512x512_160k_ade20k/apcnet_r50-d8_512x512_160k_ade20k-20201214_115706.log.json) | +| APCNet | R-101-D8 | 512x512 | 160000 | - | - | 45.41 | 46.63 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_512x512_160k_ade20k/apcnet_r101-d8_512x512_160k_ade20k_20201214_115705-73f9a8d7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/apcnet/apcnet_r101-d8_512x512_160k_ade20k/apcnet_r101-d8_512x512_160k_ade20k-20201214_115705.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1e1cec67355abae33d518417eb96eae111f16d2b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './apcnet_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..04cb006ba146268e1d3278151bc6ea00a4fb1bfe --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './apcnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..1ce2279a0fbfd6fcc7cd20e3f552b1a39f47d943 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './apcnet_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..8f10b98406c88256c66d3bbe241c149791d68feb --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './apcnet_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..5c44ebcaf36075e67208c5f033d1e5f9a78dda4e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './apcnet_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..616984575dda73a13fc5870f60ae6ffa30d6b01b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './apcnet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..99c61a942e4868315ce4a9404d113f73fed4a4ea --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/apcnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..62a0627ae2e9bb17974068e56ee660093e944e0d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/apcnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..f7821c559d2f92d23b28e07e040a54cfc425eefc --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/apcnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..daafa5fbc12c3ed6c10b5234d520166f774e0f94 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/apcnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..3db6140cb97da1d202fd464d01f793276effa629 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/apcnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..9cac4254f37bc3755bff869a10eb3cb75db4d943 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/apcnet/apcnet_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/apcnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..044d5896781de5824fc5a009d8c0eadf47a44e4e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/README.md @@ -0,0 +1,47 @@ +# CCNet: Criss-Cross Attention for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@article{huang2018ccnet, + title={CCNet: Criss-Cross Attention for Semantic Segmentation}, + author={Huang, Zilong and Wang, Xinggang and Huang, Lichao and Huang, Chang and Wei, Yunchao and Liu, Wenyu}, + booktitle={ICCV}, + year={2019} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| CCNet | R-50-D8 | 512x1024 | 40000 | 6 | 3.32 | 77.76 | 78.87 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x1024_40k_cityscapes/ccnet_r50-d8_512x1024_40k_cityscapes_20200616_142517-4123f401.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x1024_40k_cityscapes/ccnet_r50-d8_512x1024_40k_cityscapes_20200616_142517.log.json) | +| CCNet | R-101-D8 | 512x1024 | 40000 | 9.5 | 2.31 | 76.35 | 78.19 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x1024_40k_cityscapes/ccnet_r101-d8_512x1024_40k_cityscapes_20200616_142540-a3b84ba6.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x1024_40k_cityscapes/ccnet_r101-d8_512x1024_40k_cityscapes_20200616_142540.log.json) | +| CCNet | R-50-D8 | 769x769 | 40000 | 6.8 | 1.43 | 78.46 | 79.93 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_769x769_40k_cityscapes/ccnet_r50-d8_769x769_40k_cityscapes_20200616_145125-76d11884.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_769x769_40k_cityscapes/ccnet_r50-d8_769x769_40k_cityscapes_20200616_145125.log.json) | +| CCNet | R-101-D8 | 769x769 | 40000 | 10.7 | 1.01 | 76.94 | 78.62 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_769x769_40k_cityscapes/ccnet_r101-d8_769x769_40k_cityscapes_20200617_101428-4f57c8d0.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_769x769_40k_cityscapes/ccnet_r101-d8_769x769_40k_cityscapes_20200617_101428.log.json) | +| CCNet | R-50-D8 | 512x1024 | 80000 | - | - | 79.03 | 80.16 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x1024_80k_cityscapes/ccnet_r50-d8_512x1024_80k_cityscapes_20200617_010421-869a3423.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x1024_80k_cityscapes/ccnet_r50-d8_512x1024_80k_cityscapes_20200617_010421.log.json) | +| CCNet | R-101-D8 | 512x1024 | 80000 | - | - | 78.87 | 79.90 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x1024_80k_cityscapes/ccnet_r101-d8_512x1024_80k_cityscapes_20200617_203935-ffae8917.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x1024_80k_cityscapes/ccnet_r101-d8_512x1024_80k_cityscapes_20200617_203935.log.json) | +| CCNet | R-50-D8 | 769x769 | 80000 | - | - | 79.29 | 81.08 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_769x769_80k_cityscapes/ccnet_r50-d8_769x769_80k_cityscapes_20200617_010421-73eed8ca.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_769x769_80k_cityscapes/ccnet_r50-d8_769x769_80k_cityscapes_20200617_010421.log.json) | +| CCNet | R-101-D8 | 769x769 | 80000 | - | - | 79.45 | 80.66 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_769x769_80k_cityscapes/ccnet_r101-d8_769x769_80k_cityscapes_20200618_011502-ad3cd481.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_769x769_80k_cityscapes/ccnet_r101-d8_769x769_80k_cityscapes_20200618_011502.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| CCNet | R-50-D8 | 512x512 | 80000 | 8.8 | 20.89 | 41.78 | 42.98 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x512_80k_ade20k/ccnet_r50-d8_512x512_80k_ade20k_20200615_014848-aa37f61e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x512_80k_ade20k/ccnet_r50-d8_512x512_80k_ade20k_20200615_014848.log.json) | +| CCNet | R-101-D8 | 512x512 | 80000 | 12.2 | 14.11 | 43.97 | 45.13 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x512_80k_ade20k/ccnet_r101-d8_512x512_80k_ade20k_20200615_014848-1f4929a3.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x512_80k_ade20k/ccnet_r101-d8_512x512_80k_ade20k_20200615_014848.log.json) | +| CCNet | R-50-D8 | 512x512 | 160000 | - | - | 42.08 | 43.13 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x512_160k_ade20k/ccnet_r50-d8_512x512_160k_ade20k_20200616_084435-7c97193b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x512_160k_ade20k/ccnet_r50-d8_512x512_160k_ade20k_20200616_084435.log.json) | +| CCNet | R-101-D8 | 512x512 | 160000 | - | - | 43.71 | 45.04 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x512_160k_ade20k/ccnet_r101-d8_512x512_160k_ade20k_20200616_000644-e849e007.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x512_160k_ade20k/ccnet_r101-d8_512x512_160k_ade20k_20200616_000644.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| CCNet | R-50-D8 | 512x512 | 20000 | 6 | 20.45 | 76.17 | 77.51 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x512_20k_voc12aug/ccnet_r50-d8_512x512_20k_voc12aug_20200617_193212-fad81784.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x512_20k_voc12aug/ccnet_r50-d8_512x512_20k_voc12aug_20200617_193212.log.json) | +| CCNet | R-101-D8 | 512x512 | 20000 | 9.5 | 13.64 | 77.27 | 79.02 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x512_20k_voc12aug/ccnet_r101-d8_512x512_20k_voc12aug_20200617_193212-0007b61d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x512_20k_voc12aug/ccnet_r101-d8_512x512_20k_voc12aug_20200617_193212.log.json) | +| CCNet | R-50-D8 | 512x512 | 40000 | - | - | 75.96 | 77.04 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x512_40k_voc12aug/ccnet_r50-d8_512x512_40k_voc12aug_20200613_232127-c2a15f02.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r50-d8_512x512_40k_voc12aug/ccnet_r50-d8_512x512_40k_voc12aug_20200613_232127.log.json) | +| CCNet | R-101-D8 | 512x512 | 40000 | - | - | 77.87 | 78.90 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x512_40k_voc12aug/ccnet_r101-d8_512x512_40k_voc12aug_20200613_232127-c30da577.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ccnet/ccnet_r101-d8_512x512_40k_voc12aug/ccnet_r101-d8_512x512_40k_voc12aug_20200613_232127.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d2bac38ca6760af6441ede5a04409ed495ef87f3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './ccnet_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..989928ab7f98da86e291451040ff85669a9fbddb --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './ccnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..c32bf48751f0a18983bff0d99310870b71801663 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './ccnet_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..53eb77c0cd6690668ee7c2a666bd85b9a5f7e73b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './ccnet_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..d7eb668f39bbd22a1f42628428bc19d1645e9865 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './ccnet_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..029c1d525b809b61dc8e548ebe4fb26e5c68a8be --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './ccnet_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..43f05fab05ee4e20c3509a923118fe9818543cbd --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './ccnet_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..654f377b6f6152c9bd98d33824a39a41d7510c3f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './ccnet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..6a4316dde57206fe369e72fa0d32a529fe1a1932 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/ccnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..16e34356e9f8566ec73e3c25c771e281d3eeb975 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/ccnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..1ad94d8988bb822c1571816255464126d9d5b95d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/ccnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..bbcd29ccea8dcf9f67f1cd198dacd5dab380b265 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/ccnet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..947b8ac8ce1ddf7906ad39788c6992df3b506d29 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/ccnet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..1a1f49cf6b112afdadf1841571f51b98c010ddf8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/ccnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..580d59ca6995ea95a9345ef3ea574ea5b57e9cfb --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/ccnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..c6dac64377bb3f73fdf5c836fa9c38757f75ff76 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ccnet/ccnet_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/ccnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/cgnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/cgnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..00ba387203a257dbb485d68134c88cb43780722d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/cgnet/README.md @@ -0,0 +1,23 @@ +# CGNet: A Light-weight Context Guided Network for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latext +@article{wu2018cgnet, + title={CGNet: A Light-weight Context Guided Network for Semantic Segmentation}, + author={Wu, Tianyi and Tang, Sheng and Zhang, Rui and Zhang, Yongdong}, + journal={arXiv preprint arXiv:1811.08201}, + year={2018} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|-----------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| CGNet | M3N21 | 680x680 | 60000 | 7.5 | 30.51 | 65.63 | 68.04 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/cgnet/cgnet_680x680_60k_cityscapes/cgnet_680x680_60k_cityscapes_20201101_110253-4c0b2f2d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/cgnet/cgnet_680x680_60k_cityscapes/cgnet_680x680_60k_cityscapes-20201101_110253.log.json) | +| CGNet | M3N21 | 512x1024 | 60000 | 8.3 | 31.14 | 68.27 | 70.33 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/cgnet/cgnet_512x1024_60k_cityscapes/cgnet_512x1024_60k_cityscapes_20201101_110254-124ea03b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/cgnet/cgnet_512x1024_60k_cityscapes/cgnet_512x1024_60k_cityscapes-20201101_110254.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/cgnet/cgnet_512x1024_60k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/cgnet/cgnet_512x1024_60k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..11421ef9d375d01b01c333c3705d6eb6e3348ee8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/cgnet/cgnet_512x1024_60k_cityscapes.py @@ -0,0 +1,66 @@ +_base_ = ['../_base_/models/cgnet.py', '../_base_/default_runtime.py'] + +# optimizer +optimizer = dict(type='Adam', lr=0.001, eps=1e-08, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +total_iters = 60000 +checkpoint_config = dict(by_epoch=False, interval=4000) +evaluation = dict(interval=4000, metric='mIoU') + +# dataset settings +dataset_type = 'CityscapesDataset' +data_root = 'data/cityscapes/' +img_norm_cfg = dict( + mean=[72.39239876, 82.90891754, 73.15835921], std=[1, 1, 1], to_rgb=True) +crop_size = (512, 1024) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', flip_ratio=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=8, + workers_per_gpu=8, + train=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/train', + ann_dir='gtFine/train', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/cgnet/cgnet_680x680_60k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/cgnet/cgnet_680x680_60k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..2b2f8eefb7dbecf81fcd2db54644493480825246 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/cgnet/cgnet_680x680_60k_cityscapes.py @@ -0,0 +1,50 @@ +_base_ = [ + '../_base_/models/cgnet.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py' +] + +# optimizer +optimizer = dict(type='Adam', lr=0.001, eps=1e-08, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +total_iters = 60000 +checkpoint_config = dict(by_epoch=False, interval=4000) +evaluation = dict(interval=4000, metric='mIoU') + +img_norm_cfg = dict( + mean=[72.39239876, 82.90891754, 73.15835921], std=[1, 1, 1], to_rgb=True) +crop_size = (680, 680) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size), + dict(type='RandomFlip', flip_ratio=0.5), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=8, + workers_per_gpu=8, + train=dict(pipeline=train_pipeline), + val=dict(pipeline=test_pipeline), + test=dict(pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..f49ccf96194f820f509aa9448493ba12ead91953 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/README.md @@ -0,0 +1,47 @@ +# Dual Attention Network for Scene Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@article{fu2018dual, + title={Dual Attention Network for Scene Segmentation}, + author={Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang,and Hanqing Lu}, + booktitle={The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, + year={2019} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DANet | R-50-D8 | 512x1024 | 40000 | 7.4 | 2.66 | 78.74 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x1024_40k_cityscapes/danet_r50-d8_512x1024_40k_cityscapes_20200605_191324-c0dbfa5f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x1024_40k_cityscapes/danet_r50-d8_512x1024_40k_cityscapes_20200605_191324.log.json) | +| DANet | R-101-D8 | 512x1024 | 40000 | 10.9 | 1.99 | 80.52 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x1024_40k_cityscapes/danet_r101-d8_512x1024_40k_cityscapes_20200605_200831-c57a7157.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x1024_40k_cityscapes/danet_r101-d8_512x1024_40k_cityscapes_20200605_200831.log.json) | +| DANet | R-50-D8 | 769x769 | 40000 | 8.8 | 1.56 | 78.88 | 80.62 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_769x769_40k_cityscapes/danet_r50-d8_769x769_40k_cityscapes_20200530_025703-76681c60.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_769x769_40k_cityscapes/danet_r50-d8_769x769_40k_cityscapes_20200530_025703.log.json) | +| DANet | R-101-D8 | 769x769 | 40000 | 12.8 | 1.07 | 79.88 | 81.47 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_769x769_40k_cityscapes/danet_r101-d8_769x769_40k_cityscapes_20200530_025717-dcb7fd4e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_769x769_40k_cityscapes/danet_r101-d8_769x769_40k_cityscapes_20200530_025717.log.json) | +| DANet | R-50-D8 | 512x1024 | 80000 | - | - | 79.34 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x1024_80k_cityscapes/danet_r50-d8_512x1024_80k_cityscapes_20200607_133029-2bfa2293.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x1024_80k_cityscapes/danet_r50-d8_512x1024_80k_cityscapes_20200607_133029.log.json) | +| DANet | R-101-D8 | 512x1024 | 80000 | - | - | 80.41 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x1024_80k_cityscapes/danet_r101-d8_512x1024_80k_cityscapes_20200607_132918-955e6350.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x1024_80k_cityscapes/danet_r101-d8_512x1024_80k_cityscapes_20200607_132918.log.json) | +| DANet | R-50-D8 | 769x769 | 80000 | - | - | 79.27 | 80.96 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_769x769_80k_cityscapes/danet_r50-d8_769x769_80k_cityscapes_20200607_132954-495689b4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_769x769_80k_cityscapes/danet_r50-d8_769x769_80k_cityscapes_20200607_132954.log.json) | +| DANet | R-101-D8 | 769x769 | 80000 | - | - | 80.47 | 82.02 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_769x769_80k_cityscapes/danet_r101-d8_769x769_80k_cityscapes_20200607_132918-f3a929e7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_769x769_80k_cityscapes/danet_r101-d8_769x769_80k_cityscapes_20200607_132918.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DANet | R-50-D8 | 512x512 | 80000 | 11.5 | 21.20 | 41.66 | 42.90 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x512_80k_ade20k/danet_r50-d8_512x512_80k_ade20k_20200615_015125-edb18e08.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x512_80k_ade20k/danet_r50-d8_512x512_80k_ade20k_20200615_015125.log.json) | +| DANet | R-101-D8 | 512x512 | 80000 | 15 | 14.18 | 43.64 | 45.19 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x512_80k_ade20k/danet_r101-d8_512x512_80k_ade20k_20200615_015126-d0357c73.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x512_80k_ade20k/danet_r101-d8_512x512_80k_ade20k_20200615_015126.log.json) | +| DANet | R-50-D8 | 512x512 | 160000 | - | - | 42.45 | 43.25 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x512_160k_ade20k/danet_r50-d8_512x512_160k_ade20k_20200616_082340-9cb35dcd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x512_160k_ade20k/danet_r50-d8_512x512_160k_ade20k_20200616_082340.log.json) | +| DANet | R-101-D8 | 512x512 | 160000 | - | - | 44.17 | 45.02 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x512_160k_ade20k/danet_r101-d8_512x512_160k_ade20k_20200616_082348-23bf12f9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x512_160k_ade20k/danet_r101-d8_512x512_160k_ade20k_20200616_082348.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DANet | R-50-D8 | 512x512 | 20000 | 6.5 | 20.94 | 74.45 | 75.69 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x512_20k_voc12aug/danet_r50-d8_512x512_20k_voc12aug_20200618_070026-9e9e3ab3.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x512_20k_voc12aug/danet_r50-d8_512x512_20k_voc12aug_20200618_070026.log.json) | +| DANet | R-101-D8 | 512x512 | 20000 | 9.9 | 13.76 | 76.02 | 77.23 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x512_20k_voc12aug/danet_r101-d8_512x512_20k_voc12aug_20200618_070026-d48d23b2.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x512_20k_voc12aug/danet_r101-d8_512x512_20k_voc12aug_20200618_070026.log.json) | +| DANet | R-50-D8 | 512x512 | 40000 | - | - | 76.37 | 77.29 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x512_40k_voc12aug/danet_r50-d8_512x512_40k_voc12aug_20200613_235526-426e3a64.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r50-d8_512x512_40k_voc12aug/danet_r50-d8_512x512_40k_voc12aug_20200613_235526.log.json) | +| DANet | R-101-D8 | 512x512 | 40000 | - | - | 76.51 | 77.32 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x512_40k_voc12aug/danet_r101-d8_512x512_40k_voc12aug_20200613_223031-788e232a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/danet/danet_r101-d8_512x512_40k_voc12aug/danet_r101-d8_512x512_40k_voc12aug_20200613_223031.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..3bfb9bdb3064275c2ac3bf2a057ef8eb79c308df --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './danet_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d80b2ec160ae1c41499d45242713a99122d8adf8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './danet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..0f22d0fb6362252ac02f3f152a42997c68b90343 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './danet_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..709f93cba3e3bca6ce0635457ab1823b04123bf8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './danet_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..5c623eb56836760694b50f3e4e66aa0f1fc069df --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './danet_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..bd31bc8f283fe8c322ee4876deadb89569dc1743 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './danet_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..597d76de79610780b03cd91dba5f3a4f10147bcd --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './danet_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..70f9b31966128e8d9ec37859f57a7edfd8e6d1b2 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './danet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1b70c5b8d49f04661e23604ca4da56a82b1b99c9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/danet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..03734310d7338c75d48c914cb325500961c04a79 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/danet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..22aaf857c3212d0b36b0b04e7990616025a3ef9b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/danet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..010f86f1aac1b5c827dec29f692d137dc1c399bf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/danet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..0cef0f09bfa2290d14fc3a783ea500d6c3da2931 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/danet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..154e84890ed73fe4813dddc8c321de6cd2854fc1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/danet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..5c5b94e5a27d7f902d4bdea7ef6c4ef0b816bb99 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/danet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..c7237ae03c601204dc7c03018ca17ed363090569 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/danet/danet_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/danet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/README.md new file mode 100644 index 0000000000000000000000000000000000000000..c4994f6469051efd4881546acfcefef76616332e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/README.md @@ -0,0 +1,66 @@ +# Rethinking atrous convolution for semantic image segmentation + +## Introduction + +[ALGORITHM] + +```latext +@article{chen2017rethinking, + title={Rethinking atrous convolution for semantic image segmentation}, + author={Chen, Liang-Chieh and Papandreou, George and Schroff, Florian and Adam, Hartwig}, + journal={arXiv preprint arXiv:1706.05587}, + year={2017} +} +``` + +## Results and models + +Note: `D-8` here corresponding to the output stride 8 setting for DeepLab series. + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|-----------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DeepLabV3 | R-50-D8 | 512x1024 | 40000 | 6.1 | 2.57 | 79.09 | 80.45 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes/deeplabv3_r50-d8_512x1024_40k_cityscapes_20200605_022449-acadc2f8.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes/deeplabv3_r50-d8_512x1024_40k_cityscapes_20200605_022449.log.json) | +| DeepLabV3 | R-101-D8 | 512x1024 | 40000 | 9.6 | 1.92 | 77.12 | 79.61 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x1024_40k_cityscapes/deeplabv3_r101-d8_512x1024_40k_cityscapes_20200605_012241-7fd3f799.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x1024_40k_cityscapes/deeplabv3_r101-d8_512x1024_40k_cityscapes_20200605_012241.log.json) | +| DeepLabV3 | R-50-D8 | 769x769 | 40000 | 6.9 | 1.11 | 78.58 | 79.89 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_769x769_40k_cityscapes/deeplabv3_r50-d8_769x769_40k_cityscapes_20200606_113723-7eda553c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_769x769_40k_cityscapes/deeplabv3_r50-d8_769x769_40k_cityscapes_20200606_113723.log.json) | +| DeepLabV3 | R-101-D8 | 769x769 | 40000 | 10.9 | 0.83 | 79.27 | 80.11 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_769x769_40k_cityscapes/deeplabv3_r101-d8_769x769_40k_cityscapes_20200606_113809-c64f889f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_769x769_40k_cityscapes/deeplabv3_r101-d8_769x769_40k_cityscapes_20200606_113809.log.json) | +| DeepLabV3 | R-18-D8 | 512x1024 | 80000 | 1.7 | 13.78 | 76.70 | 78.27 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r18-d8_512x1024_80k_cityscapes/deeplabv3_r18-d8_512x1024_80k_cityscapes_20201225_021506-23dffbe2.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r18-d8_512x1024_80k_cityscapes/deeplabv3_r18-d8_512x1024_80k_cityscapes-20201225_021506.log.json) | +| DeepLabV3 | R-50-D8 | 512x1024 | 80000 | - | - | 79.32 | 80.57 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes/deeplabv3_r50-d8_512x1024_80k_cityscapes_20200606_113404-b92cfdd4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes/deeplabv3_r50-d8_512x1024_80k_cityscapes_20200606_113404.log.json) | +| DeepLabV3 | R-101-D8 | 512x1024 | 80000 | - | - | 80.20 | 81.21 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x1024_80k_cityscapes/deeplabv3_r101-d8_512x1024_80k_cityscapes_20200606_113503-9e428899.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x1024_80k_cityscapes/deeplabv3_r101-d8_512x1024_80k_cityscapes_20200606_113503.log.json) | +| DeepLabV3 | R-18-D8 | 769x769 | 80000 | 1.9 | 5.55 | 76.60 | 78.26 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r18-d8_769x769_80k_cityscapes/deeplabv3_r18-d8_769x769_80k_cityscapes_20201225_021506-6452126a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r18-d8_769x769_80k_cityscapes/deeplabv3_r18-d8_769x769_80k_cityscapes-20201225_021506.log.json) | +| DeepLabV3 | R-50-D8 | 769x769 | 80000 | - | - | 79.89 | 81.06 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_769x769_80k_cityscapes/deeplabv3_r50-d8_769x769_80k_cityscapes_20200606_221338-788d6228.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_769x769_80k_cityscapes/deeplabv3_r50-d8_769x769_80k_cityscapes_20200606_221338.log.json) | +| DeepLabV3 | R-101-D8 | 769x769 | 80000 | - | - | 79.67 | 80.81 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_769x769_80k_cityscapes/deeplabv3_r101-d8_769x769_80k_cityscapes_20200607_013353-60e95418.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_769x769_80k_cityscapes/deeplabv3_r101-d8_769x769_80k_cityscapes_20200607_013353.log.json) | +| DeepLabV3 | R-101-D16-MG124 | 512x1024 | 40000 | 4.7 | - 6.96 | 76.71 | 78.63 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_40k_cityscapes/deeplabv3_r101-d16-mg124_512x1024_40k_cityscapes_20200908_005644-67b0c992.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_40k_cityscapes/deeplabv3_r101-d16-mg124_512x1024_40k_cityscapes-20200908_005644.log.json) | +| DeepLabV3 | R-101-D16-MG124 | 512x1024 | 80000 | - | - | 78.36 | 79.84 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_80k_cityscapes/deeplabv3_r101-d16-mg124_512x1024_80k_cityscapes_20200908_005644-57bb8425.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_80k_cityscapes/deeplabv3_r101-d16-mg124_512x1024_80k_cityscapes-20200908_005644.log.json) | +| DeepLabV3 | R-18b-D8 | 512x1024 | 80000 | 1.6 | 13.93 | 76.26 | 77.88 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r18b-d8_512x1024_80k_cityscapes/deeplabv3_r18b-d8_512x1024_80k_cityscapes_20201225_094144-46040cef.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r18b-d8_512x1024_80k_cityscapes/deeplabv3_r18b-d8_512x1024_80k_cityscapes-20201225_094144.log.json) | +| DeepLabV3 | R-50b-D8 | 512x1024 | 80000 | 6.0 | 2.74 | 79.63 | 80.98 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50b-d8_512x1024_80k_cityscapes/deeplabv3_r50b-d8_512x1024_80k_cityscapes_20201225_155148-ec368954.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50b-d8_512x1024_80k_cityscapes/deeplabv3_r50b-d8_512x1024_80k_cityscapes-20201225_155148.log.json) | +| DeepLabV3 | R-101b-D8| 512x1024 | 80000 | 9.5 | 1.81 | 80.01 | 81.21 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101b-d8_512x1024_80k_cityscapes/deeplabv3_r101b-d8_512x1024_80k_cityscapes_20201226_171821-8fd49503.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101b-d8_512x1024_80k_cityscapes/deeplabv3_r101b-d8_512x1024_80k_cityscapes-20201226_171821.log.json) | +| DeepLabV3 | R-18b-D8 | 769x769 | 80000 | 1.8 | 5.79 | 76.63 | 77.51 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r18b-d8_769x769_80k_cityscapes/deeplabv3_r18b-d8_769x769_80k_cityscapes_20201225_094144-fdc985d9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r18b-d8_769x769_80k_cityscapes/deeplabv3_r18b-d8_769x769_80k_cityscapes-20201225_094144.log.json) | +| DeepLabV3 | R-50b-D8 | 769x769 | 80000 | 6.8 | 1.16 | 78.80 | 80.27 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50b-d8_769x769_80k_cityscapes/deeplabv3_r50b-d8_769x769_80k_cityscapes_20201225_155404-87fb0cf4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50b-d8_769x769_80k_cityscapes/deeplabv3_r50b-d8_769x769_80k_cityscapes-20201225_155404.log.json) | +| DeepLabV3 | R-101b-D8| 769x769 | 80000 | 10.7 | 0.82 | 79.41 | 80.73 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101b-d8_769x769_80k_cityscapes/deeplabv3_r101b-d8_769x769_80k_cityscapes_20201226_190843-9142ee57.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101b-d8_769x769_80k_cityscapes/deeplabv3_r101b-d8_769x769_80k_cityscapes-20201226_190843.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|-----------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DeepLabV3 | R-50-D8 | 512x512 | 80000 | 8.9 | 14.76 | 42.42 | 43.28 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x512_80k_ade20k/deeplabv3_r50-d8_512x512_80k_ade20k_20200614_185028-0bb3f844.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x512_80k_ade20k/deeplabv3_r50-d8_512x512_80k_ade20k_20200614_185028.log.json) | +| DeepLabV3 | R-101-D8 | 512x512 | 80000 | 12.4 | 10.14 | 44.08 | 45.19 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x512_80k_ade20k/deeplabv3_r101-d8_512x512_80k_ade20k_20200615_021256-d89c7fa4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x512_80k_ade20k/deeplabv3_r101-d8_512x512_80k_ade20k_20200615_021256.log.json) | +| DeepLabV3 | R-50-D8 | 512x512 | 160000 | - | - | 42.66 | 44.09 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x512_160k_ade20k/deeplabv3_r50-d8_512x512_160k_ade20k_20200615_123227-5d0ee427.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x512_160k_ade20k/deeplabv3_r50-d8_512x512_160k_ade20k_20200615_123227.log.json) | +| DeepLabV3 | R-101-D8 | 512x512 | 160000 | - | - | 45.00 | 46.66 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x512_160k_ade20k/deeplabv3_r101-d8_512x512_160k_ade20k_20200615_105816-b1f72b3b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x512_160k_ade20k/deeplabv3_r101-d8_512x512_160k_ade20k_20200615_105816.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|-----------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DeepLabV3 | R-50-D8 | 512x512 | 20000 | 6.1 | 13.88 | 76.17 | 77.42 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x512_20k_voc12aug/deeplabv3_r50-d8_512x512_20k_voc12aug_20200617_010906-596905ef.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x512_20k_voc12aug/deeplabv3_r50-d8_512x512_20k_voc12aug_20200617_010906.log.json) | +| DeepLabV3 | R-101-D8 | 512x512 | 20000 | 9.6 | 9.81 | 78.70 | 79.95 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x512_20k_voc12aug/deeplabv3_r101-d8_512x512_20k_voc12aug_20200617_010932-8d13832f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x512_20k_voc12aug/deeplabv3_r101-d8_512x512_20k_voc12aug_20200617_010932.log.json) | +| DeepLabV3 | R-50-D8 | 512x512 | 40000 | - | - | 77.68 | 78.78 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x512_40k_voc12aug/deeplabv3_r50-d8_512x512_40k_voc12aug_20200613_161546-2ae96e7e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r50-d8_512x512_40k_voc12aug/deeplabv3_r50-d8_512x512_40k_voc12aug_20200613_161546.log.json) | +| DeepLabV3 | R-101-D8 | 512x512 | 40000 | - | - | 77.92 | 79.18 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x512_40k_voc12aug/deeplabv3_r101-d8_512x512_40k_voc12aug_20200613_161432-0017d784.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_512x512_40k_voc12aug/deeplabv3_r101-d8_512x512_40k_voc12aug_20200613_161432.log.json) | + +### Pascal Context + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|-----------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DeepLabV3 | R-101-D8 | 480x480 | 40000 | 9.2 | 7.09 | 46.55 | 47.81 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_480x480_40k_pascal_context/deeplabv3_r101-d8_480x480_40k_pascal_context_20200911_204118-1aa27336.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_480x480_40k_pascal_context/deeplabv3_r101-d8_480x480_40k_pascal_context-20200911_204118.log.json) | +| DeepLabV3 | R-101-D8 | 480x480 | 80000 | - | - | 46.42 | 47.53 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_480x480_80k_pascal_context/deeplabv3_r101-d8_480x480_80k_pascal_context_20200911_170155-2a21fff3.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3/deeplabv3_r101-d8_480x480_80k_pascal_context/deeplabv3_r101-d8_480x480_80k_pascal_context-20200911_170155.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..f20f260e23a95dfee9dfdceef9badab992246f53 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_40k_cityscapes.py @@ -0,0 +1,11 @@ +_base_ = './deeplabv3_r50-d8_512x1024_40k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet101_v1c', + backbone=dict( + depth=101, + dilations=(1, 1, 1, 2), + strides=(1, 2, 2, 1), + multi_grid=(1, 2, 4)), + decode_head=dict( + dilations=(1, 6, 12, 18), + sampler=dict(type='OHEMPixelSampler', min_kept=100000))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..de4a8a5e9f030f1e8a8802596885186163f23eed --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d16-mg124_512x1024_80k_cityscapes.py @@ -0,0 +1,11 @@ +_base_ = './deeplabv3_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet101_v1c', + backbone=dict( + depth=101, + dilations=(1, 1, 1, 2), + strides=(1, 2, 2, 1), + multi_grid=(1, 2, 4)), + decode_head=dict( + dilations=(1, 6, 12, 18), + sampler=dict(type='OHEMPixelSampler', min_kept=100000))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..0b5256f7b7b053cbe8d9e4ca2ec6139bb02387f6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_480x480_40k_pascal_context.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_480x480_40k_pascal_context.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..001b7a69c15299fc1fe5b269a5accf92c5ece032 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_480x480_80k_pascal_context.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_480x480_80k_pascal_context.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..8c707c79d659bc544d242352bcb29686eb40b004 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..6804a5781369d1031f179d421a3b5a160fd575d3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..df6f36ef7c3b71ba7979aa7a1b226b3e3ebd9bb4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..40f5f62373e59d1c6c01ca3f57777698461127c9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..fb2be22f8bc2e10cdfba4f58b2ad1ced913b4ea4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..796ba3fb142394c4d93a29ba57548dca59d8d02b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..e6d58a67b3b4dddf3da42efca30fa599e623f183 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..13094a98ee9be3cf8c88370e1e111cb4dde03ec4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..5186bf614bc9ebffe47323ea61afbc9604be265b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = './deeplabv3_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet101', + backbone=dict(type='ResNet', depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d185db95adc61734f11f0dcd7b6c45aa652680b0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r101b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = './deeplabv3_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet101', + backbone=dict(type='ResNet', depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..e084e95c70b0b7b0c9dcc3388d6b7d3d51d54b6d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './deeplabv3_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet18_v1c', + backbone=dict(depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a990c076536ad9455a9203f5b6a60157f2f2f99f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './deeplabv3_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet18_v1c', + backbone=dict(depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..b25e725ed98324e6ea648567740dc67e0413b4f9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './deeplabv3_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet18', + backbone=dict(type='ResNet', depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..fd920f0ca7c690d3d1c44f5f7be1cbea18fa14d4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r18b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './deeplabv3_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet18', + backbone=dict(type='ResNet', depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..9d493ef527bb161be98d0e4ea433104b3bb9ff48 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_480x480_40k_pascal_context.py @@ -0,0 +1,10 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', + '../_base_/datasets/pascal_context.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=60), + auxiliary_head=dict(num_classes=60), + test_cfg=dict(mode='slide', crop_size=(480, 480), stride=(320, 320))) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..71a0fda48aa2538e4d913e73e94a71564377ea50 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_480x480_80k_pascal_context.py @@ -0,0 +1,10 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', + '../_base_/datasets/pascal_context.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=60), + auxiliary_head=dict(num_classes=60), + test_cfg=dict(mode='slide', crop_size=(480, 480), stride=(320, 320))) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..8e7420d24a20b662286266cac58cab4721dc8df3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..132787db98d3fc9df5ed62e31738c82da8c279bf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..b4a9d4e1b9123b3c965cd430237ce9fcc7018a11 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..f62da1a8090da389a77d77a9887926af2a7ded49 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..492bd3dfdce331070cb9645dbe55142e9b662da1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..78f4d0d9de3d6b8dd2b097531317956d8e3b19f1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..e35d1988f0bb7ad47a73ef1a64b73d9b40e0ba40 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..dd7c16580d0620bc854f2c6eb7c881bdcd23020a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/deeplabv3_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..e742d9a5ec2b6addf829cb802de27ea1afd53301 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='torchvision://resnet50', backbone=dict(type='ResNet')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..332d9cfb79fb698c7867f0f80053c1fd29bf2c1d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3/deeplabv3_r50b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='torchvision://resnet50', backbone=dict(type='ResNet')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/README.md new file mode 100644 index 0000000000000000000000000000000000000000..dc02660428fe534605ae3bf9659382c282379a4e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/README.md @@ -0,0 +1,68 @@ +# Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{deeplabv3plus2018, + title={Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation}, + author={Liang-Chieh Chen and Yukun Zhu and George Papandreou and Florian Schroff and Hartwig Adam}, + booktitle={ECCV}, + year={2018} +} +``` + +## Results and models + +Note: +`D-8`/`D-16` here corresponding to the output stride 8/16 setting for DeepLab series. +`MG-124` stands for multi-grid dilation in the last stage of ResNet. + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DeepLabV3+ | R-50-D8 | 512x1024 | 40000 | 7.5 | 3.94 | 79.61 | 81.01 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes/deeplabv3plus_r50-d8_512x1024_40k_cityscapes_20200605_094610-d222ffcd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes/deeplabv3plus_r50-d8_512x1024_40k_cityscapes_20200605_094610.log.json) | +| DeepLabV3+ | R-101-D8 | 512x1024 | 40000 | 11 | 2.60 | 80.21 | 81.82 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_40k_cityscapes/deeplabv3plus_r101-d8_512x1024_40k_cityscapes_20200605_094614-3769eecf.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_40k_cityscapes/deeplabv3plus_r101-d8_512x1024_40k_cityscapes_20200605_094614.log.json) | +| DeepLabV3+ | R-50-D8 | 769x769 | 40000 | 8.5 | 1.72 | 78.97 | 80.46 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_769x769_40k_cityscapes/deeplabv3plus_r50-d8_769x769_40k_cityscapes_20200606_114143-1dcb0e3c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_769x769_40k_cityscapes/deeplabv3plus_r50-d8_769x769_40k_cityscapes_20200606_114143.log.json) | +| DeepLabV3+ | R-101-D8 | 769x769 | 40000 | 12.5 | 1.15 | 79.46 | 80.50 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_769x769_40k_cityscapes/deeplabv3plus_r101-d8_769x769_40k_cityscapes_20200606_114304-ff414b9e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_769x769_40k_cityscapes/deeplabv3plus_r101-d8_769x769_40k_cityscapes_20200606_114304.log.json) | +| DeepLabV3+ | R-18-D8 | 512x1024 | 80000 | 2.2 | 14.27 | 76.89 | 78.76 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18-d8_512x1024_80k_cityscapes/deeplabv3plus_r18-d8_512x1024_80k_cityscapes_20201226_080942-cff257fe.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18-d8_512x1024_80k_cityscapes/deeplabv3plus_r18-d8_512x1024_80k_cityscapes-20201226_080942.log.json) | +| DeepLabV3+ | R-50-D8 | 512x1024 | 80000 | - | - | 80.09 | 81.13 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes/deeplabv3plus_r50-d8_512x1024_80k_cityscapes_20200606_114049-f9fb496d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes/deeplabv3plus_r50-d8_512x1024_80k_cityscapes_20200606_114049.log.json) | +| DeepLabV3+ | R-101-D8 | 512x1024 | 80000 | - | - | 80.97 | 82.03 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_80k_cityscapes/deeplabv3plus_r101-d8_512x1024_80k_cityscapes_20200606_114143-068fcfe9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_80k_cityscapes/deeplabv3plus_r101-d8_512x1024_80k_cityscapes_20200606_114143.log.json) | +| DeepLabV3+ | R-18-D8 | 769x769 | 80000 | 2.5 | 5.74 | 76.26 | 77.91 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18-d8_769x769_80k_cityscapes/deeplabv3plus_r18-d8_769x769_80k_cityscapes_20201226_083346-f326e06a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18-d8_769x769_80k_cityscapes/deeplabv3plus_r18-d8_769x769_80k_cityscapes-20201226_083346.log.json) | +| DeepLabV3+ | R-50-D8 | 769x769 | 80000 | - | - | 79.83 | 81.48 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_769x769_80k_cityscapes/deeplabv3plus_r50-d8_769x769_80k_cityscapes_20200606_210233-0e9dfdc4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_769x769_80k_cityscapes/deeplabv3plus_r50-d8_769x769_80k_cityscapes_20200606_210233.log.json) | +| DeepLabV3+ | R-101-D8 | 769x769 | 80000 | - | - | 80.98 | 82.18 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_769x769_80k_cityscapes/deeplabv3plus_r101-d8_769x769_80k_cityscapes_20200607_000405-a7573d20.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_769x769_80k_cityscapes/deeplabv3plus_r101-d8_769x769_80k_cityscapes_20200607_000405.log.json) | +| DeepLabV3+ | R-101-D16-MG124 | 512x1024 | 40000 | 5.8 | 7.48 | 79.09 | 80.36 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_40k_cityscapes/deeplabv3plus_r101-d16-mg124_512x1024_40k_cityscapes_20200908_005644-cf9ce186.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_40k_cityscapes/deeplabv3plus_r101-d16-mg124_512x1024_40k_cityscapes-20200908_005644.log.json) | +| DeepLabV3+ | R-101-D16-MG124 | 512x1024 | 80000 | 9.9 | - | 79.90 | 81.33 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_80k_cityscapes/deeplabv3plus_r101-d16-mg124_512x1024_80k_cityscapes_20200908_005644-ee6158e0.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_80k_cityscapes/deeplabv3plus_r101-d16-mg124_512x1024_80k_cityscapes-20200908_005644.log.json) | +| DeepLabV3+ | R-18b-D8 | 512x1024 | 80000 | 2.1 | 14.95 | 75.87 | 77.52 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes_20201226_090828-e451abd9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes-20201226_090828.log.json) | +| DeepLabV3+ | R-50b-D8 | 512x1024 | 80000 | 7.4 | 3.94 | 80.28 | 81.44 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50b-d8_512x1024_80k_cityscapes/deeplabv3plus_r50b-d8_512x1024_80k_cityscapes_20201225_213645-a97e4e43.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50b-d8_512x1024_80k_cityscapes/deeplabv3plus_r50b-d8_512x1024_80k_cityscapes-20201225_213645.log.json) | +| DeepLabV3+ | R-101b-D8| 512x1024 | 80000 | 10.9 | 2.60 | 80.16 | 81.41 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101b-d8_512x1024_80k_cityscapes/deeplabv3plus_r101b-d8_512x1024_80k_cityscapes_20201226_190843-9c3c93a4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101b-d8_512x1024_80k_cityscapes/deeplabv3plus_r101b-d8_512x1024_80k_cityscapes-20201226_190843.log.json) | +| DeepLabV3+ | R-18b-D8 | 769x769 | 80000 | 2.4 | 5.96 | 76.36 | 78.24 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18b-d8_769x769_80k_cityscapes/deeplabv3plus_r18b-d8_769x769_80k_cityscapes_20201226_151312-2c868aff.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r18b-d8_769x769_80k_cityscapes/deeplabv3plus_r18b-d8_769x769_80k_cityscapes-20201226_151312.log.json) | +| DeepLabV3+ | R-50b-D8 | 769x769 | 80000 | 8.4 | 1.72 | 79.41 | 80.56 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50b-d8_769x769_80k_cityscapes/deeplabv3plus_r50b-d8_769x769_80k_cityscapes_20201225_224655-8b596d1c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50b-d8_769x769_80k_cityscapes/deeplabv3plus_r50b-d8_769x769_80k_cityscapes-20201225_224655.log.json) | +| DeepLabV3+ | R-101b-D8| 769x769 | 80000 | 12.3 | 1.10 | 79.88 | 81.46 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101b-d8_769x769_80k_cityscapes/deeplabv3plus_r101b-d8_769x769_80k_cityscapes_20201226_205041-227cdf7c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101b-d8_769x769_80k_cityscapes/deeplabv3plus_r101b-d8_769x769_80k_cityscapes-20201226_205041.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DeepLabV3+ | R-50-D8 | 512x512 | 80000 | 10.6 | 21.01 | 42.72 | 43.75 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x512_80k_ade20k/deeplabv3plus_r50-d8_512x512_80k_ade20k_20200614_185028-bf1400d8.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x512_80k_ade20k/deeplabv3plus_r50-d8_512x512_80k_ade20k_20200614_185028.log.json) | +| DeepLabV3+ | R-101-D8 | 512x512 | 80000 | 14.1 | 14.16 | 44.60 | 46.06 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x512_80k_ade20k/deeplabv3plus_r101-d8_512x512_80k_ade20k_20200615_014139-d5730af7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x512_80k_ade20k/deeplabv3plus_r101-d8_512x512_80k_ade20k_20200615_014139.log.json) | +| DeepLabV3+ | R-50-D8 | 512x512 | 160000 | - | - | 43.95 | 44.93 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x512_160k_ade20k/deeplabv3plus_r50-d8_512x512_160k_ade20k_20200615_124504-6135c7e0.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x512_160k_ade20k/deeplabv3plus_r50-d8_512x512_160k_ade20k_20200615_124504.log.json) | +| DeepLabV3+ | R-101-D8 | 512x512 | 160000 | - | - | 45.47 | 46.35 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x512_160k_ade20k/deeplabv3plus_r101-d8_512x512_160k_ade20k_20200615_123232-38ed86bb.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x512_160k_ade20k/deeplabv3plus_r101-d8_512x512_160k_ade20k_20200615_123232.log.json) | + +#### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DeepLabV3+ | R-50-D8 | 512x512 | 20000 | 7.6 | 21 | 75.93 | 77.50 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x512_20k_voc12aug/deeplabv3plus_r50-d8_512x512_20k_voc12aug_20200617_102323-aad58ef1.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x512_20k_voc12aug/deeplabv3plus_r50-d8_512x512_20k_voc12aug_20200617_102323.log.json) | +| DeepLabV3+ | R-101-D8 | 512x512 | 20000 | 11 | 13.88 | 77.22 | 78.59 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x512_20k_voc12aug/deeplabv3plus_r101-d8_512x512_20k_voc12aug_20200617_102345-c7ff3d56.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x512_20k_voc12aug/deeplabv3plus_r101-d8_512x512_20k_voc12aug_20200617_102345.log.json) | +| DeepLabV3+ | R-50-D8 | 512x512 | 40000 | - | - | 76.81 | 77.57 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x512_40k_voc12aug/deeplabv3plus_r50-d8_512x512_40k_voc12aug_20200613_161759-e1b43aa9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r50-d8_512x512_40k_voc12aug/deeplabv3plus_r50-d8_512x512_40k_voc12aug_20200613_161759.log.json) | +| DeepLabV3+ | R-101-D8 | 512x512 | 40000 | - | - | 78.62 | 79.53 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x512_40k_voc12aug/deeplabv3plus_r101-d8_512x512_40k_voc12aug_20200613_205333-faf03387.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_512x512_40k_voc12aug/deeplabv3plus_r101-d8_512x512_40k_voc12aug_20200613_205333.log.json) | + +#### Pascal Context + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DeepLabV3+ | R-101-D8 | 480x480 | 40000 | - | 9.09 | 47.30 | 48.47 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_480x480_40k_pascal_context/deeplabv3plus_r101-d8_480x480_40k_pascal_context_20200911_165459-d3c8a29e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_480x480_40k_pascal_context/deeplabv3plus_r101-d8_480x480_40k_pascal_context-20200911_165459.log.json) | +| DeepLabV3+ | R-101-D8 | 480x480 | 80000 | - | - | 47.23 | 48.26 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_480x480_80k_pascal_context/deeplabv3plus_r101-d8_480x480_80k_pascal_context_20200911_155322-145d3ee8.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/deeplabv3plus/deeplabv3plus_r101-d8_480x480_80k_pascal_context/deeplabv3plus_r101-d8_480x480_80k_pascal_context-20200911_155322.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..bf39d2f12b719b1c91e38bef71f0f5232543b0dc --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_40k_cityscapes.py @@ -0,0 +1,11 @@ +_base_ = './deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet101_v1c', + backbone=dict( + depth=101, + dilations=(1, 1, 1, 2), + strides=(1, 2, 2, 1), + multi_grid=(1, 2, 4)), + decode_head=dict( + dilations=(1, 6, 12, 18), + sampler=dict(type='OHEMPixelSampler', min_kept=100000))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..c53ec41baf9043029549b4893b2380372ea5ecd9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d16-mg124_512x1024_80k_cityscapes.py @@ -0,0 +1,11 @@ +_base_ = './deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet101_v1c', + backbone=dict( + depth=101, + dilations=(1, 1, 1, 2), + strides=(1, 2, 2, 1), + multi_grid=(1, 2, 4)), + decode_head=dict( + dilations=(1, 6, 12, 18), + sampler=dict(type='OHEMPixelSampler', min_kept=100000))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..68e2b072e4b8d076e8c3e929dfdc73bcd24ce859 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_480x480_40k_pascal_context.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_480x480_40k_pascal_context.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..3a46c28608add5325ec1decf33624c3c00bff1d7 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_480x480_80k_pascal_context.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_480x480_80k_pascal_context.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d6ce85aea5a960e76f8154a5319c7c52e98c4c45 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..0ebbd3c70ee5e33c6ef4ae76b6c6a6ce828d07b4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..a75c9d3019b13d01c0dd13dae53bce3d15791d52 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..ebb1a8eaee16de7443ab3e79e02a37340de511d7 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..3caa6cf8ae61d467628378d99a919c9db1253b91 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..53fd3a909585367ca59eb827c2fbbab4cdf234ea --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..c3c92eb26f8fead94f5ad7ac7d7fb60d92c57114 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..5ea9cdb5b639e5284cd46e02ce1b67b4729950f7 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..398d9759cafc1d01e78c138abd249808531a97b9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = './deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet101', + backbone=dict(type='ResNet', depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..136449083f7a9efbad6df94f1acd04170147aaba --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r101b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = './deeplabv3plus_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet101', + backbone=dict(type='ResNet', depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..aff70c93e6142ddda3a874d9dfd57ec6c4cd89b3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,11 @@ +_base_ = './deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet18_v1c', + backbone=dict(depth=18), + decode_head=dict( + c1_in_channels=64, + c1_channels=12, + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..0172d9a87d6dc1c75bf75a9c48363eb985d389a8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18-d8_769x769_80k_cityscapes.py @@ -0,0 +1,11 @@ +_base_ = './deeplabv3plus_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet18_v1c', + backbone=dict(depth=18), + decode_head=dict( + c1_in_channels=64, + c1_channels=12, + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..b90b292b03a80aa37b8ca236746cf7cddc4ac27e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,11 @@ +_base_ = './deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet18', + backbone=dict(type='ResNet', depth=18), + decode_head=dict( + c1_in_channels=64, + c1_channels=12, + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..b49da3581d9697e726e114b1564fc58a55ef1099 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r18b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,11 @@ +_base_ = './deeplabv3plus_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet18', + backbone=dict(type='ResNet', depth=18), + decode_head=dict( + c1_in_channels=64, + c1_channels=12, + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..318845de1e2124a4dff3348749ec5a13d78d686f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_480x480_40k_pascal_context.py @@ -0,0 +1,10 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', + '../_base_/datasets/pascal_context.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=60), + auxiliary_head=dict(num_classes=60), + test_cfg=dict(mode='slide', crop_size=(480, 480), stride=(320, 320))) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..1736c2397a9b2a4b4fb12eee8175e5ee98eaf805 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_480x480_80k_pascal_context.py @@ -0,0 +1,10 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', + '../_base_/datasets/pascal_context.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=60), + auxiliary_head=dict(num_classes=60), + test_cfg=dict(mode='slide', crop_size=(480, 480), stride=(320, 320))) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..7243d0390f6394fdd528c881bb128b2c13d08037 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,5 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', + '../_base_/datasets/cityscapes.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..3304d3677f5357f1a3e343b39fcd97b238abdb5e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,5 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', + '../_base_/datasets/cityscapes.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..1491e3b8247c9d163d6016caf2fcd8043a053b7e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..1056ad4d1e2a4f956d12f6daf506620fab27dd17 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..e36c83ba601884b81c06ee69445a94e76224c828 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..352d870bc8eab11974640c4b2d9c80dc6fbbaaf2 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..e4bda3eded693bfd44a8c86ced7ae6ee9963c583 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1420b97a4bd0dc0f5451623697666012a2de635c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/deeplabv3plus_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..dd8e1da9c7b1d86bc8a0c834bbede9d0fd40acf5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='torchvision://resnet50', backbone=dict(type='ResNet')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..c0ba019136c2e4f33b015be3d82505bee2066655 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/deeplabv3plus/deeplabv3plus_r50b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './deeplabv3plus_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='torchvision://resnet50', backbone=dict(type='ResNet')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..9b12c8d862fb7b7633c5b2f4a1c357803abdcd32 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/README.md @@ -0,0 +1,39 @@ +# Dynamic Multi-scale Filters for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@InProceedings{He_2019_ICCV, +author = {He, Junjun and Deng, Zhongying and Qiao, Yu}, +title = {Dynamic Multi-Scale Filters for Semantic Segmentation}, +booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, +month = {October}, +year = {2019} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DMNet | R-50-D8 | 512x1024 | 40000 | 7.0 | 3.66 | 77.78 | 79.14 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_512x1024_40k_cityscapes/dmnet_r50-d8_512x1024_40k_cityscapes_20201214_115717-5e88fa33.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_512x1024_40k_cityscapes/dmnet_r50-d8_512x1024_40k_cityscapes-20201214_115717.log.json) | +| DMNet | R-101-D8 | 512x1024 | 40000 | 10.6 | 2.54 | 78.37 | 79.72 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_512x1024_40k_cityscapes/dmnet_r101-d8_512x1024_40k_cityscapes_20201214_115716-abc9d111.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_512x1024_40k_cityscapes/dmnet_r101-d8_512x1024_40k_cityscapes-20201214_115716.log.json) | +| DMNet | R-50-D8 | 769x769 | 40000 | 7.9 | 1.57 | 78.49 | 80.27 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_769x769_40k_cityscapes/dmnet_r50-d8_769x769_40k_cityscapes_20201214_115717-2a2628d7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_769x769_40k_cityscapes/dmnet_r50-d8_769x769_40k_cityscapes-20201214_115717.log.json) | +| DMNet | R-101-D8 | 769x769 | 40000 | 12.0 | 1.01 | 77.62 | 78.94 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_769x769_40k_cityscapes/dmnet_r101-d8_769x769_40k_cityscapes_20201214_115718-b650de90.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_769x769_40k_cityscapes/dmnet_r101-d8_769x769_40k_cityscapes-20201214_115718.log.json) | +| DMNet | R-50-D8 | 512x1024 | 80000 | - | - | 79.07 | 80.22 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_512x1024_80k_cityscapes/dmnet_r50-d8_512x1024_80k_cityscapes_20201214_115716-987f51e3.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_512x1024_80k_cityscapes/dmnet_r50-d8_512x1024_80k_cityscapes-20201214_115716.log.json) | +| DMNet | R-101-D8 | 512x1024 | 80000 | - | - | 79.64 | 80.67 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_512x1024_80k_cityscapes/dmnet_r101-d8_512x1024_80k_cityscapes_20201214_115705-b1ff208a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_512x1024_80k_cityscapes/dmnet_r101-d8_512x1024_80k_cityscapes-20201214_115705.log.json) | +| DMNet | R-50-D8 | 769x769 | 80000 | - | - | 79.22 | 80.55 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_769x769_80k_cityscapes/dmnet_r50-d8_769x769_80k_cityscapes_20201214_115718-7ea9fa12.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_769x769_80k_cityscapes/dmnet_r50-d8_769x769_80k_cityscapes-20201214_115718.log.json) | +| DMNet | R-101-D8 | 769x769 | 80000 | - | - | 79.19 | 80.65 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_769x769_80k_cityscapes/dmnet_r101-d8_769x769_80k_cityscapes_20201214_115716-a7fbc2ab.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_769x769_80k_cityscapes/dmnet_r101-d8_769x769_80k_cityscapes-20201214_115716.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DMNet | R-50-D8 | 512x512 | 80000 | 9.4 | 20.95 | 42.37 | 43.62 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_512x512_80k_ade20k/dmnet_r50-d8_512x512_80k_ade20k_20201214_115705-a8626293.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_512x512_80k_ade20k/dmnet_r50-d8_512x512_80k_ade20k-20201214_115705.log.json) | +| DMNet | R-101-D8 | 512x512 | 80000 | 13.0 | 13.88 | 45.34 | 46.13 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_512x512_80k_ade20k/dmnet_r101-d8_512x512_80k_ade20k_20201214_115704-c656c3fb.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_512x512_80k_ade20k/dmnet_r101-d8_512x512_80k_ade20k-20201214_115704.log.json) | +| DMNet | R-50-D8 | 512x512 | 160000 | - | - | 43.15 | 44.17 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_512x512_160k_ade20k/dmnet_r50-d8_512x512_160k_ade20k_20201214_115706-25fb92c2.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r50-d8_512x512_160k_ade20k/dmnet_r50-d8_512x512_160k_ade20k-20201214_115706.log.json) | +| DMNet | R-101-D8 | 512x512 | 160000 | - | - | 45.42 | 46.76 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_512x512_160k_ade20k/dmnet_r101-d8_512x512_160k_ade20k_20201214_115705-73f9a8d7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dmnet/dmnet_r101-d8_512x512_160k_ade20k/dmnet_r101-d8_512x512_160k_ade20k-20201214_115705.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..fd6897691d3f8f200783fae7bfe231735f25a11b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './dmnet_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..116cbdcede32bf24ad95f04291e98754011172c9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './dmnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..d78d46c040f75d16225307d4b4151b87e6e3db29 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './dmnet_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..9713b731a47df9c5e23d26a08ad17d03a0d5e9fe --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './dmnet_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..6b222e730073dd42df618db5660ee9d4117f3956 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './dmnet_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..f36d490e9c9b31de7eedf735d2712e55f35db998 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './dmnet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1f9a917fa4223bd2428f2b2d10eac446f7ecc71a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/dmnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1b38f90dc4318f23d32971e7afbf90a327768f2d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/dmnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..a8fbd9beb11f3d1308ce2cd12da2a177c2d39478 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/dmnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..74f6d6a85a06e96580a3c8d5843f660c85bca5ad --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/dmnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..19841547a42315164de547a4121cfd64739cf24b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/dmnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..31d95f96eb10025c2ad054cde4c81f47db21f0f2 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dmnet/dmnet_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/dmnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..172dfe1a0f07646c5f8cc47ddd62f5cb6da85a55 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/README.md @@ -0,0 +1,42 @@ +# Disentangled Non-Local Neural Networks + +## Introduction + +[ALGORITHM] + +This example is to reproduce ["Disentangled Non-Local Neural Networks"](https://arxiv.org/abs/2006.06668) for semantic segmentation. It is still in progress. + +## Citation + +```latex +@misc{yin2020disentangled, + title={Disentangled Non-Local Neural Networks}, + author={Minghao Yin and Zhuliang Yao and Yue Cao and Xiu Li and Zheng Zhang and Stephen Lin and Han Hu}, + year={2020}, + booktitle={ECCV} +} +``` + +## Results and models (in progress) + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|---------:|----------------|------:|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| dnl | R-50-D8 | 512x1024 | 40000 | 7.3 | 2.56 | 78.61 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_512x1024_40k_cityscapes/dnl_r50-d8_512x1024_40k_cityscapes_20200904_233629-53d4ea93.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_512x1024_40k_cityscapes/dnl_r50-d8_512x1024_40k_cityscapes-20200904_233629.log.json) | +| dnl | R-101-D8 | 512x1024 | 40000 | 10.9 | 1.96 | 78.31 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_512x1024_40k_cityscapes/dnl_r101-d8_512x1024_40k_cityscapes_20200904_233629-9928ffef.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_512x1024_40k_cityscapes/dnl_r101-d8_512x1024_40k_cityscapes-20200904_233629.log.json) | +| dnl | R-50-D8 | 769x769 | 40000 | 9.2 | 1.50 | 78.44 | 80.27 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_769x769_40k_cityscapes/dnl_r50-d8_769x769_40k_cityscapes_20200820_232206-0f283785.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_769x769_40k_cityscapes/dnl_r50-d8_769x769_40k_cityscapes-20200820_232206.log.json) | +| dnl | R-101-D8 | 769x769 | 40000 | 12.6 | 1.02 | 76.39 | 77.77 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_769x769_40k_cityscapes/dnl_r101-d8_769x769_40k_cityscapes_20200820_171256-76c596df.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_769x769_40k_cityscapes/dnl_r101-d8_769x769_40k_cityscapes-20200820_171256.log.json) | +| dnl | R-50-D8 | 512x1024 | 80000 | - | - | 79.33 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_512x1024_80k_cityscapes/dnl_r50-d8_512x1024_80k_cityscapes_20200904_233629-58b2f778.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_512x1024_80k_cityscapes/dnl_r50-d8_512x1024_80k_cityscapes-20200904_233629.log.json) | +| dnl | R-101-D8 | 512x1024 | 80000 | - | - | 80.41 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_512x1024_80k_cityscapes/dnl_r101-d8_512x1024_80k_cityscapes_20200904_233629-758e2dd4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_512x1024_80k_cityscapes/dnl_r101-d8_512x1024_80k_cityscapes-20200904_233629.log.json) | +| dnl | R-50-D8 | 769x769 | 80000 | - | - | 79.36 | 80.70 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_769x769_80k_cityscapes/dnl_r50-d8_769x769_80k_cityscapes_20200820_011925-366bc4c7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_769x769_80k_cityscapes/dnl_r50-d8_769x769_80k_cityscapes-20200820_011925.log.json) | +| dnl | R-101-D8 | 769x769 | 80000 | - | - | 79.41 | 80.68 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_769x769_80k_cityscapes/dnl_r101-d8_769x769_80k_cityscapes_20200821_051111-95ff84ab.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_769x769_80k_cityscapes/dnl_r101-d8_769x769_80k_cityscapes-20200821_051111.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|---------:|----------------|------:|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| DNL | R-50-D8 | 512x512 | 80000 | 8.8 | 20.66 | 41.76 | 42.99 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_512x512_80k_ade20k/dnl_r50-d8_512x512_80k_ade20k_20200826_183354-1cf6e0c1.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_512x512_80k_ade20k/dnl_r50-d8_512x512_80k_ade20k-20200826_183354.log.json) | +| DNL | R-101-D8 | 512x512 | 80000 | 12.8 | 12.54 | 43.76 | 44.91 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_512x512_80k_ade20k/dnl_r101-d8_512x512_80k_ade20k_20200826_183354-d820d6ea.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_512x512_80k_ade20k/dnl_r101-d8_512x512_80k_ade20k-20200826_183354.log.json) | +| DNL | R-50-D8 | 512x512 | 160000 | - | - | 41.87 | 43.01 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_512x512_160k_ade20k/dnl_r50-d8_512x512_160k_ade20k_20200826_183350-37837798.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r50-d8_512x512_160k_ade20k/dnl_r50-d8_512x512_160k_ade20k-20200826_183350.log.json) | +| DNL | R-101-D8 | 512x512 | 160000 | - | - | 44.25 | 45.78 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_512x512_160k_ade20k/dnl_r101-d8_512x512_160k_ade20k_20200826_183350-ed522c61.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/dnlnet/dnl_r101-d8_512x512_160k_ade20k/dnl_r101-d8_512x512_160k_ade20k-20200826_183350.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1a36e3c80a13f91e37e4d90b7ae47c7e0d204144 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './dnl_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..0f2e1b6da7e63841f4429b1caed5fbe9d537c4f8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './dnl_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..aca44e478b67d5a226681c099e64fe67d93cf39b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './dnl_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..ebd27a1d1c6bf0e983fafed2e5659701dadb8f24 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './dnl_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..575e9d01343a4563e0d3ba89b361ea8e358d2dee --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './dnl_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..4f1b9e19411eb963d16fd2a8174529e69ecd5a1a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './dnl_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..f7aa7444d4c8022563db642478beec4dc5ab0dab --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/dnl_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..fdff93f543af6bac93949e68532daea45e437167 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/dnl_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..5305689d09b944f6e37aa85567ce3f29fc6974a7 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/dnl_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..09604c39729abfc9015eb971069b987c8d8a82cb --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/dnl_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..0666199b63e604b09fe8187c378589c25d0d311b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/dnl_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..f7b07c4f47629c07faa013b9d1eae3462d898c6f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/dnlnet/dnl_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,12 @@ +_base_ = [ + '../_base_/models/dnl_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) +optimizer = dict( + paramwise_cfg=dict( + custom_keys=dict(theta=dict(wd_mult=0.), phi=dict(wd_mult=0.)))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..40df946ed446cf862847dad084324412e5aa52ee --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/README.md @@ -0,0 +1,26 @@ +# Expectation-Maximization Attention Networks for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{li2019expectation, + title={Expectation-maximization attention networks for semantic segmentation}, + author={Li, Xia and Zhong, Zhisheng and Wu, Jianlong and Yang, Yibo and Lin, Zhouchen and Liu, Hong}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision}, + pages={9167--9176}, + year={2019} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|---------:|----------------|------:|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| EMANet | R-50-D8 | 512x1024 | 80000 | 5.4 | 4.58 | 77.59 | 79.44 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_512x1024_80k_cityscapes/emanet_r50-d8_512x1024_80k_cityscapes_20200901_100301-c43fcef1.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_512x1024_80k_cityscapes/emanet_r50-d8_512x1024_80k_cityscapes-20200901_100301.log.json) | +| EMANet | R-101-D8 | 512x1024 | 80000 | 6.2 | 2.87 | 79.10 | 81.21 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_512x1024_80k_cityscapes/emanet_r101-d8_512x1024_80k_cityscapes_20200901_100301-2d970745.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_512x1024_80k_cityscapes/emanet_r101-d8_512x1024_80k_cityscapes-20200901_100301.log.json) | +| EMANet | R-50-D8 | 769x769 | 80000 | 8.9 | 1.97 | 79.33 | 80.49 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_769x769_80k_cityscapes/emanet_r50-d8_769x769_80k_cityscapes_20200901_100301-16f8de52.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r50-d8_769x769_80k_cityscapes/emanet_r50-d8_769x769_80k_cityscapes-20200901_100301.log.json) | +| EMANet | R-101-D8 | 769x769 | 80000 | 10.1 | 1.22 | 79.62 | 81.00 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_769x769_80k_cityscapes/emanet_r101-d8_769x769_80k_cityscapes_20200901_100301-47a324ce.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/emanet/emanet_r101-d8_769x769_80k_cityscapes/emanet_r101-d8_769x769_80k_cityscapes-20200901_100301.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..58f28b43f55f54c7a604960735963e6b7c13b6f1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './emanet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..c5dbf20b0fcc7bc1dd077bd8b7077772251d4c1a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './emanet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..73b7788bf924be2e1588596a88f0155ddc37358e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/emanet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..699aa212c3518901b2f84db3f062c16b023c7538 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/emanet/emanet_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/emanet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..6ba42f69fae5e52254b34195b6cd0ed689c5bf6c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/README.md @@ -0,0 +1,39 @@ +# Context Encoding for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@InProceedings{Zhang_2018_CVPR, +author = {Zhang, Hang and Dana, Kristin and Shi, Jianping and Zhang, Zhongyue and Wang, Xiaogang and Tyagi, Ambrish and Agrawal, Amit}, +title = {Context Encoding for Semantic Segmentation}, +booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)}, +month = {June}, +year = {2018} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| encnet | R-50-D8 | 512x1024 | 40000 | 8.6 | 4.58 | 75.67 | 77.08 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_512x1024_40k_cityscapes/encnet_r50-d8_512x1024_40k_cityscapes_20200621_220958-68638a47.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_512x1024_40k_cityscapes/encnet_r50-d8_512x1024_40k_cityscapes-20200621_220958.log.json) | +| encnet | R-101-D8 | 512x1024 | 40000 | 12.1 | 2.66 | 75.81 | 77.21 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_512x1024_40k_cityscapes/encnet_r101-d8_512x1024_40k_cityscapes_20200621_220933-35e0a3e8.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_512x1024_40k_cityscapes/encnet_r101-d8_512x1024_40k_cityscapes-20200621_220933.log.json) | +| encnet | R-50-D8 | 769x769 | 40000 | 9.8 | 1.82 | 76.24 | 77.85 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_769x769_40k_cityscapes/encnet_r50-d8_769x769_40k_cityscapes_20200621_220958-3bcd2884.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_769x769_40k_cityscapes/encnet_r50-d8_769x769_40k_cityscapes-20200621_220958.log.json) | +| encnet | R-101-D8 | 769x769 | 40000 | 13.7 | 1.26 | 74.25 | 76.25 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_769x769_40k_cityscapes/encnet_r101-d8_769x769_40k_cityscapes_20200621_220933-2fafed55.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_769x769_40k_cityscapes/encnet_r101-d8_769x769_40k_cityscapes-20200621_220933.log.json) | +| encnet | R-50-D8 | 512x1024 | 80000 | - | - | 77.94 | 79.13 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_512x1024_80k_cityscapes/encnet_r50-d8_512x1024_80k_cityscapes_20200622_003554-fc5c5624.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_512x1024_80k_cityscapes/encnet_r50-d8_512x1024_80k_cityscapes-20200622_003554.log.json) | +| encnet | R-101-D8 | 512x1024 | 80000 | - | - | 78.55 | 79.47 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_512x1024_80k_cityscapes/encnet_r101-d8_512x1024_80k_cityscapes_20200622_003555-1de64bec.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_512x1024_80k_cityscapes/encnet_r101-d8_512x1024_80k_cityscapes-20200622_003555.log.json) | +| encnet | R-50-D8 | 769x769 | 80000 | - | - | 77.44 | 78.72 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_769x769_80k_cityscapes/encnet_r50-d8_769x769_80k_cityscapes_20200622_003554-55096dcb.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_769x769_80k_cityscapes/encnet_r50-d8_769x769_80k_cityscapes-20200622_003554.log.json) | +| encnet | R-101-D8 | 769x769 | 80000 | - | - | 76.10 | 76.97 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_769x769_80k_cityscapes/encnet_r101-d8_769x769_80k_cityscapes_20200622_003555-470ef79d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_769x769_80k_cityscapes/encnet_r101-d8_769x769_80k_cityscapes-20200622_003555.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| encnet | R-50-D8 | 512x512 | 80000 | 10.1 | 22.81 | 39.53 | 41.17 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_512x512_80k_ade20k/encnet_r50-d8_512x512_80k_ade20k_20200622_042412-44b46b04.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_512x512_80k_ade20k/encnet_r50-d8_512x512_80k_ade20k-20200622_042412.log.json) | +| encnet | R-101-D8 | 512x512 | 80000 | 13.6 | 14.87 | 42.11 | 43.61 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_512x512_80k_ade20k/encnet_r101-d8_512x512_80k_ade20k_20200622_101128-dd35e237.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_512x512_80k_ade20k/encnet_r101-d8_512x512_80k_ade20k-20200622_101128.log.json) | +| encnet | R-50-D8 | 512x512 | 160000 | - | - | 40.10 | 41.71 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_512x512_160k_ade20k/encnet_r50-d8_512x512_160k_ade20k_20200622_101059-b2db95e0.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r50-d8_512x512_160k_ade20k/encnet_r50-d8_512x512_160k_ade20k-20200622_101059.log.json) | +| encnet | R-101-D8 | 512x512 | 160000 | - | - | 42.61 | 44.01 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_512x512_160k_ade20k/encnet_r101-d8_512x512_160k_ade20k_20200622_073348-7989641f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/encnet/encnet_r101-d8_512x512_160k_ade20k/encnet_r101-d8_512x512_160k_ade20k-20200622_073348.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..f34373d9ebab5ef6f4c01e3eab8a97c288495be0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './encnet_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..0b0207b3144460d25229e3ac4c4d0d9fc1d34292 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './encnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..8fec6ba255f33d48a66a831de4571346a7a2bd2e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './encnet_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..c264af998b5ef6a9e521db204205fb998cce68a9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './encnet_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..8a6968ea583758191fa8e94497c7186e653c7afb --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './encnet_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..94151004ea88394373cf8f135b065d5056b11179 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './encnet_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d6ade67b76ce04e1ede3ff99aab4863705cff446 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './encnet_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..55648c08b2c4eb78d7d5ae65482e5e5b291c058a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './encnet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..4ea6ed0e84f3aa7d2c7acd8dd5c459a8cd3ce45c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/encnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d2feeef7e982550481365f8187cb1a50f0fafcc9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/encnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..2a5dc203cc793860aae7743d16c4fb9a564ad1d8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/encnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..9cb7952cede58165d2ed0f35d2208ad1ffb65232 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/encnet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..81f3cbfbf516e833821c49deecd8f167170021f0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/encnet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..835375cb0447378fc76431158eb0b8fc011d36bc --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/encnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d311e33f56ba431a882b0e7079001b0e9932a011 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/encnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..7b535f3c80818ce6b692b66f18ceee8e7b181fdc --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/encnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50s-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50s-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..600b701a7194ead496cc924bee897b6096e1c7ca --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/encnet/encnet_r50s-d8_512x512_80k_ade20k.py @@ -0,0 +1,8 @@ +_base_ = [ + '../_base_/models/encnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + backbone=dict(stem_channels=128), + decode_head=dict(num_classes=150), + auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fastscnn/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fastscnn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..bb87a9f7aeb8d3630aeccc04b585a4dfec9f2b7e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fastscnn/README.md @@ -0,0 +1,22 @@ +# Fast-SCNN for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@article{poudel2019fast, + title={Fast-scnn: Fast semantic segmentation network}, + author={Poudel, Rudra PK and Liwicki, Stephan and Cipolla, Roberto}, + journal={arXiv preprint arXiv:1902.04502}, + year={2019} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|-----------|-----------|--------:|----------|----------------|------:|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| Fast-SCNN | Fast-SCNN | 512x1024 | 80000 | 8.4 | 63.61 | 69.06 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fast_scnn/fast_scnn_4x8_80k_lr0.12_cityscapes-f5096c79.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fast_scnn/fast_scnn_4x8_80k_lr0.12_cityscapes-20200807_165744.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fastscnn/fast_scnn_4x8_80k_lr0.12_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fastscnn/fast_scnn_4x8_80k_lr0.12_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..3d9c9999370c8b1c28af3063a3aded0d88c91caf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fastscnn/fast_scnn_4x8_80k_lr0.12_cityscapes.py @@ -0,0 +1,10 @@ +_base_ = [ + '../_base_/models/fast_scnn.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] + +# Re-config the data sampler. +data = dict(samples_per_gpu=2, workers_per_gpu=4) + +# Re-config the optimizer. +optimizer = dict(type='SGD', lr=0.12, momentum=0.9, weight_decay=4e-5) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..95ca2ac0439c3a33ef13f2e22d7fa7a83754142b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/README.md @@ -0,0 +1,66 @@ +# Fully Convolutional Networks for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@article{shelhamer2017fully, + title={Fully convolutional networks for semantic segmentation}, + author={Shelhamer, Evan and Long, Jonathan and Darrell, Trevor}, + journal={IEEE transactions on pattern analysis and machine intelligence}, + volume={39}, + number={4}, + pages={640--651}, + year={2017}, + publisher={IEEE Trans Pattern Anal Mach Intell} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | R-50-D8 | 512x1024 | 40000 | 5.7 | 4.17 | 72.25 | 73.36 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x1024_40k_cityscapes/fcn_r50-d8_512x1024_40k_cityscapes_20200604_192608-efe53f0d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x1024_40k_cityscapes/fcn_r50-d8_512x1024_40k_cityscapes_20200604_192608.log.json) | +| FCN | R-101-D8 | 512x1024 | 40000 | 9.2 | 2.66 | 75.45 | 76.58 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x1024_40k_cityscapes/fcn_r101-d8_512x1024_40k_cityscapes_20200604_181852-a883d3a1.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x1024_40k_cityscapes/fcn_r101-d8_512x1024_40k_cityscapes_20200604_181852.log.json) | +| FCN | R-50-D8 | 769x769 | 40000 | 6.5 | 1.80 | 71.47 | 72.54 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_769x769_40k_cityscapes/fcn_r50-d8_769x769_40k_cityscapes_20200606_113104-977b5d02.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_769x769_40k_cityscapes/fcn_r50-d8_769x769_40k_cityscapes_20200606_113104.log.json) | +| FCN | R-101-D8 | 769x769 | 40000 | 10.4 | 1.19 | 73.93 | 75.14 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_769x769_40k_cityscapes/fcn_r101-d8_769x769_40k_cityscapes_20200606_113208-7d4ab69c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_769x769_40k_cityscapes/fcn_r101-d8_769x769_40k_cityscapes_20200606_113208.log.json) | +| FCN | R-18-D8 | 512x1024 | 80000 | 1.7 | 14.65 | 71.11 | 72.91 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r18-d8_512x1024_80k_cityscapes/fcn_r18-d8_512x1024_80k_cityscapes_20201225_021327-6c50f8b4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r18-d8_512x1024_80k_cityscapes/fcn_r18-d8_512x1024_80k_cityscapes-20201225_021327.log.json) | +| FCN | R-50-D8 | 512x1024 | 80000 | - | | 73.61 | 74.24 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x1024_80k_cityscapes/fcn_r50-d8_512x1024_80k_cityscapes_20200606_113019-03aa804d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x1024_80k_cityscapes/fcn_r50-d8_512x1024_80k_cityscapes_20200606_113019.log.json) | +| FCN | R-101-D8 | 512x1024 | 80000 | - | - | 75.13 | 75.94 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x1024_80k_cityscapes/fcn_r101-d8_512x1024_80k_cityscapes_20200606_113038-3fb937eb.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x1024_80k_cityscapes/fcn_r101-d8_512x1024_80k_cityscapes_20200606_113038.log.json) | +| FCN | R-18-D8 | 769x769 | 80000 | 1.9 | 6.40 | 70.80 | 73.16 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r18-d8_769x769_80k_cityscapes/fcn_r18-d8_769x769_80k_cityscapes_20201225_021451-9739d1b8.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r18-d8_769x769_80k_cityscapes/fcn_r18-d8_769x769_80k_cityscapes-20201225_021451.log.json) | +| FCN | R-50-D8 | 769x769 | 80000 | - | - | 72.64 | 73.32 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_769x769_80k_cityscapes/fcn_r50-d8_769x769_80k_cityscapes_20200606_195749-f5caeabc.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_769x769_80k_cityscapes/fcn_r50-d8_769x769_80k_cityscapes_20200606_195749.log.json) | +| FCN | R-101-D8 | 769x769 | 80000 | - | - | 75.52 | 76.61 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_769x769_80k_cityscapes/fcn_r101-d8_769x769_80k_cityscapes_20200606_214354-45cbac68.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_769x769_80k_cityscapes/fcn_r101-d8_769x769_80k_cityscapes_20200606_214354.log.json) | +| FCN | R-18b-D8 | 512x1024 | 80000 | 1.6 | 16.74 | 70.24 | 72.77 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r18b-d8_512x1024_80k_cityscapes/fcn_r18b-d8_512x1024_80k_cityscapes_20201225_230143-92c0f445.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r18b-d8_512x1024_80k_cityscapes/fcn_r18b-d8_512x1024_80k_cityscapes-20201225_230143.log.json) | +| FCN | R-50b-D8 | 512x1024 | 80000 | 5.6 | 4.20 | 75.65 | 77.59 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50b-d8_512x1024_80k_cityscapes/fcn_r50b-d8_512x1024_80k_cityscapes_20201225_094221-82957416.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50b-d8_512x1024_80k_cityscapes/fcn_r50b-d8_512x1024_80k_cityscapes-20201225_094221.log.json) | +| FCN | R-101b-D8| 512x1024 | 80000 | 9.1 | 2.73 | 77.37 | 78.77 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101b-d8_512x1024_80k_cityscapes/fcn_r101b-d8_512x1024_80k_cityscapes_20201226_160213-4543858f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101b-d8_512x1024_80k_cityscapes/fcn_r101b-d8_512x1024_80k_cityscapes-20201226_160213.log.json) | +| FCN | R-18b-D8 | 769x769 | 80000 | 1.7 | 6.70 | 69.66 | 72.07 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r18b-d8_769x769_80k_cityscapes/fcn_r18b-d8_769x769_80k_cityscapes_20201226_004430-32d504e5.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r18b-d8_769x769_80k_cityscapes/fcn_r18b-d8_769x769_80k_cityscapes-20201226_004430.log.json) | +| FCN | R-50b-D8 | 769x769 | 80000 | 6.3 | 1.82 | 73.83 | 76.60 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50b-d8_769x769_80k_cityscapes/fcn_r50b-d8_769x769_80k_cityscapes_20201225_094223-94552d38.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50b-d8_769x769_80k_cityscapes/fcn_r50b-d8_769x769_80k_cityscapes-20201225_094223.log.json) | +| FCN | R-101b-D8| 769x769 | 80000 | 10.3 | 1.15 | 77.02 | 78.67 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101b-d8_769x769_80k_cityscapes/fcn_r101b-d8_769x769_80k_cityscapes_20201226_170012-82be37e2.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101b-d8_769x769_80k_cityscapes/fcn_r101b-d8_769x769_80k_cityscapes-20201226_170012.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | R-50-D8 | 512x512 | 80000 | 8.5 | 23.49 | 35.94 | 37.94 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x512_80k_ade20k/fcn_r50-d8_512x512_80k_ade20k_20200614_144016-f8ac5082.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x512_80k_ade20k/fcn_r50-d8_512x512_80k_ade20k_20200614_144016.log.json) | +| FCN | R-101-D8 | 512x512 | 80000 | 12 | 14.78 | 39.61 | 40.83 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x512_80k_ade20k/fcn_r101-d8_512x512_80k_ade20k_20200615_014143-bc1809f7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x512_80k_ade20k/fcn_r101-d8_512x512_80k_ade20k_20200615_014143.log.json) | +| FCN | R-50-D8 | 512x512 | 160000 | - | - | 36.10 | 38.08 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x512_160k_ade20k/fcn_r50-d8_512x512_160k_ade20k_20200615_100713-4edbc3b4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x512_160k_ade20k/fcn_r50-d8_512x512_160k_ade20k_20200615_100713.log.json) | +| FCN | R-101-D8 | 512x512 | 160000 | - | - | 39.91 | 41.40 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x512_160k_ade20k/fcn_r101-d8_512x512_160k_ade20k_20200615_105816-fd192bd5.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x512_160k_ade20k/fcn_r101-d8_512x512_160k_ade20k_20200615_105816.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | R-50-D8 | 512x512 | 20000 | 5.7 | 23.28 | 67.08 | 69.94 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x512_20k_voc12aug/fcn_r50-d8_512x512_20k_voc12aug_20200617_010715-52dc5306.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x512_20k_voc12aug/fcn_r50-d8_512x512_20k_voc12aug_20200617_010715.log.json) | +| FCN | R-101-D8 | 512x512 | 20000 | 9.2 | 14.81 | 71.16 | 73.57 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x512_20k_voc12aug/fcn_r101-d8_512x512_20k_voc12aug_20200617_010842-0bb4e798.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x512_20k_voc12aug/fcn_r101-d8_512x512_20k_voc12aug_20200617_010842.log.json) | +| FCN | R-50-D8 | 512x512 | 40000 | - | - | 66.97 | 69.04 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x512_40k_voc12aug/fcn_r50-d8_512x512_40k_voc12aug_20200613_161222-5e2dbf40.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r50-d8_512x512_40k_voc12aug/fcn_r50-d8_512x512_40k_voc12aug_20200613_161222.log.json) | +| FCN | R-101-D8 | 512x512 | 40000 | - | - | 69.91 | 72.38 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x512_40k_voc12aug/fcn_r101-d8_512x512_40k_voc12aug_20200613_161240-4c8bcefd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_512x512_40k_voc12aug/fcn_r101-d8_512x512_40k_voc12aug_20200613_161240.log.json) | + +### Pascal Context + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | R-101-D8 | 480x480 | 40000 | - | 9.93 | 44.14 | 45.67 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_480x480_40k_pascal_context/fcn_r101-d8_480x480_40k_pascal_context_20200911_212515-9b565a6d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_480x480_40k_pascal_context/fcn_r101-d8_480x480_40k_pascal_context-20200911_212515.log.json) | +| FCN | R-101-D8 | 480x480 | 80000 | - | - | 44.47 | 45.74 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_480x480_80k_pascal_context/fcn_r101-d8_480x480_80k_pascal_context_20200915_032644-a3828480.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fcn/fcn_r101-d8_480x480_80k_pascal_context/fcn_r101-d8_480x480_80k_pascal_context-20200915_032644.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..f3a15b41054318d508e98685632921f262029de0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_480x480_40k_pascal_context.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_480x480_40k_pascal_context.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..bdccfd99ba0c25646f02850483c2cdf679fdbf3d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_480x480_80k_pascal_context.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_480x480_80k_pascal_context.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..7918dd10d05cd98dbc02f02ef1b93e3134f52357 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..528110dc73c15008869a9ad9851ef487f0c952c7 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..1bf6780f2c821052692ddcb904bd10e6256c1e71 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..09a5fe5468f0155f8fd0bf2cd1574a33624d8492 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..eafefaa67565513c277c5eb42e3661a88133cb27 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..6d0294530f4c817b352cb020d111e3248690ae1f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..6b4cc571294fa45b4442c2bfeb9fda13a14fc5c2 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..3503c76935e294c881130b309999d32f13df8839 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1b9bf60fc13364ca1b7b3842664950f653426e67 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = './fcn_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet101', + backbone=dict(type='ResNet', depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..f36eb02e68707d502cbe315ff8f6f25b232dee92 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r101b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = './fcn_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet101', + backbone=dict(type='ResNet', depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..5a1d29e480cb46a763cb17d2105b3f040153d417 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './fcn_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet18_v1c', + backbone=dict(depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..6644a58dea86fd38e208abbedffe4f836e677078 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './fcn_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet18_v1c', + backbone=dict(depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..92accfc703fc398d2845d7dc2f1d5336f24738e8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './fcn_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet18', + backbone=dict(type='ResNet', depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..5dd34dd2134c745275c66adc5488b4b9f68d6809 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r18b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './fcn_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet18', + backbone=dict(type='ResNet', depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..fdc6314f704e61d064f5fb7bdd30bc38a9e87ee5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_480x480_40k_pascal_context.py @@ -0,0 +1,8 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', '../_base_/datasets/pascal_context.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=60), + test_cfg=dict(mode='slide', crop_size=(480, 480), stride=(320, 320))) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..0870f928b82b24b8179305f6c9fc7f6013fb481e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_480x480_80k_pascal_context.py @@ -0,0 +1,8 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', '../_base_/datasets/pascal_context.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=60), + test_cfg=dict(mode='slide', crop_size=(480, 480), stride=(320, 320))) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..401c6ea7330d45d8f7604a1da63fc6e15faea424 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..990a085eda2f2dc47f1a1289bfbf2726ad8c9c4f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..9ca7fd23cedc0567a015bd5f8641a509ead6110a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..17206a5171dcc357c589a1711afa52d87faeece0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', '../_base_/datasets/pascal_voc12_aug.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..8cec429c3e27ad2543b7e38fa206e6606fda4d5a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', '../_base_/datasets/pascal_voc12_aug.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..ef194cb594eb76316324066e23e48184d8cede27 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..fca98c1d9ace73a61ae395914e5960832216bf67 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..7d75cd9f49343355b14c7d60bb0df0936ffe0278 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/fcn_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..28ef13f8d17e977f710ba9a863f182b1f80dc8cf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='torchvision://resnet50', backbone=dict(type='ResNet')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..106f7b6a1ece974c9f732ee813724bd8bda3bef3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fcn/fcn_r50b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './fcn_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='torchvision://resnet50', backbone=dict(type='ResNet')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/README.md new file mode 100644 index 0000000000000000000000000000000000000000..8d12e4d78025313fb9689bb320b07520bd7648ef --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/README.md @@ -0,0 +1,25 @@ +# Mixed Precision Training + +## Introduction + +[OTHERS] + +```latex +@article{micikevicius2017mixed, + title={Mixed precision training}, + author={Micikevicius, Paulius and Narang, Sharan and Alben, Jonah and Diamos, Gregory and Elsen, Erich and Garcia, David and Ginsburg, Boris and Houston, Michael and Kuchaiev, Oleksii and Venkatesh, Ganesh and others}, + journal={arXiv preprint arXiv:1710.03740}, + year={2017} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | R-101-D8 | 512x1024 | 80000 | 5.50 | 2.66 | 76.80 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fp16/fcn_r101-d8_512x1024_80k_fp16_cityscapes/fcn_r101-d8_512x1024_80k_fp16_cityscapes-50245227.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fp16/fcn_r101-d8_512x1024_80k_fp16_cityscapes/fcn_r101-d8_512x1024_80k_fp16_cityscapes_20200717_230921.log.json) | +| PSPNet | R-101-D8 | 512x1024 | 80000 | 5.47 | 2.68 | 79.46 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fp16/pspnet_r101-d8_512x1024_80k_fp16_cityscapes/pspnet_r101-d8_512x1024_80k_fp16_cityscapes-ade37931.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fp16/pspnet_r101-d8_512x1024_80k_fp16_cityscapes/pspnet_r101-d8_512x1024_80k_fp16_cityscapes_20200717_230919.log.json) | +| DeepLabV3 | R-101-D8 | 512x1024 | 80000 | 5.91 | 1.93 | 80.48 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fp16/deeplabv3_r101-d8_512x1024_80k_fp16_cityscapes/deeplabv3_r101-d8_512x1024_80k_fp16_cityscapes-bc86dc84.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fp16/deeplabv3_r101-d8_512x1024_80k_fp16_cityscapes/deeplabv3_r101-d8_512x1024_80k_fp16_cityscapes_20200717_230920.log.json) | +| DeepLabV3+ | R-101-D8 | 512x1024 | 80000 | 6.46 | 2.60 | 80.46 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/fp16/deeplabv3plus_r101-d8_512x1024_80k_fp16_cityscapes/deeplabv3plus_r101-d8_512x1024_80k_fp16_cityscapes-cc58bc8d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/fp16/deeplabv3plus_r101-d8_512x1024_80k_fp16_cityscapes/deeplabv3plus_r101-d8_512x1024_80k_fp16_cityscapes_20200717_230920.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/deeplabv3_r101-d8_512x1024_80k_fp16_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/deeplabv3_r101-d8_512x1024_80k_fp16_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..60d8350e98605c2445ce8359ca9a7a1951fe0085 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/deeplabv3_r101-d8_512x1024_80k_fp16_cityscapes.py @@ -0,0 +1,3 @@ +_base_ = '../deeplabv3/deeplabv3_r101-d8_512x1024_80k_cityscapes.py' +# fp16 settings +optimizer_config = dict(type='Fp16OptimizerHook', loss_scale=512.) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/deeplabv3plus_r101-d8_512x1024_80k_fp16_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/deeplabv3plus_r101-d8_512x1024_80k_fp16_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..c263d6907ec9ddfdad32b380a7d926d1391e393c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/deeplabv3plus_r101-d8_512x1024_80k_fp16_cityscapes.py @@ -0,0 +1,3 @@ +_base_ = '../deeplabv3plus/deeplabv3plus_r101-d8_512x1024_80k_cityscapes.py' +# fp16 settings +optimizer_config = dict(type='Fp16OptimizerHook', loss_scale=512.) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/fcn_r101-d8_512x1024_80k_fp16_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/fcn_r101-d8_512x1024_80k_fp16_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..8100a8e64d780c1b9c272b57e4e171c5e1a4120b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/fcn_r101-d8_512x1024_80k_fp16_cityscapes.py @@ -0,0 +1,3 @@ +_base_ = '../fcn/fcn_r101-d8_512x1024_80k_cityscapes.py' +# fp16 settings +optimizer_config = dict(type='Fp16OptimizerHook', loss_scale=512.) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/pspnet_r101-d8_512x1024_80k_fp16_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/pspnet_r101-d8_512x1024_80k_fp16_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..aefac2953abf65c10023599ec42c709a794935c8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/fp16/pspnet_r101-d8_512x1024_80k_fp16_cityscapes.py @@ -0,0 +1,3 @@ +_base_ = '../pspnet/pspnet_r101-d8_512x1024_80k_cityscapes.py' +# fp16 settings +optimizer_config = dict(type='Fp16OptimizerHook', loss_scale=512.) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..b840d5bf9f844e7af89803b48a43b76b887ced36 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/README.md @@ -0,0 +1,48 @@ +# GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{cao2019gcnet, + title={Gcnet: Non-local networks meet squeeze-excitation networks and beyond}, + author={Cao, Yue and Xu, Jiarui and Lin, Stephen and Wei, Fangyun and Hu, Han}, + booktitle={Proceedings of the IEEE International Conference on Computer Vision Workshops}, + pages={0--0}, + year={2019} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| GCNet | R-50-D8 | 512x1024 | 40000 | 5.8 | 3.93 | 77.69 | 78.56 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x1024_40k_cityscapes/gcnet_r50-d8_512x1024_40k_cityscapes_20200618_074436-4b0fd17b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x1024_40k_cityscapes/gcnet_r50-d8_512x1024_40k_cityscapes_20200618_074436.log.json) | +| GCNet | R-101-D8 | 512x1024 | 40000 | 9.2 | 2.61 | 78.28 | 79.34 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x1024_40k_cityscapes/gcnet_r101-d8_512x1024_40k_cityscapes_20200618_074436-5e62567f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x1024_40k_cityscapes/gcnet_r101-d8_512x1024_40k_cityscapes_20200618_074436.log.json) | +| GCNet | R-50-D8 | 769x769 | 40000 | 6.5 | 1.67 | 78.12 | 80.09 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_769x769_40k_cityscapes/gcnet_r50-d8_769x769_40k_cityscapes_20200618_182814-a26f4471.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_769x769_40k_cityscapes/gcnet_r50-d8_769x769_40k_cityscapes_20200618_182814.log.json) | +| GCNet | R-101-D8 | 769x769 | 40000 | 10.5 | 1.13 | 78.95 | 80.71 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_769x769_40k_cityscapes/gcnet_r101-d8_769x769_40k_cityscapes_20200619_092550-ca4f0a84.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_769x769_40k_cityscapes/gcnet_r101-d8_769x769_40k_cityscapes_20200619_092550.log.json) | +| GCNet | R-50-D8 | 512x1024 | 80000 | - | - | 78.48 | 80.01 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x1024_80k_cityscapes/gcnet_r50-d8_512x1024_80k_cityscapes_20200618_074450-ef8f069b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x1024_80k_cityscapes/gcnet_r50-d8_512x1024_80k_cityscapes_20200618_074450.log.json) | +| GCNet | R-101-D8 | 512x1024 | 80000 | - | - | 79.03 | 79.84 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x1024_80k_cityscapes/gcnet_r101-d8_512x1024_80k_cityscapes_20200618_074450-778ebf69.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x1024_80k_cityscapes/gcnet_r101-d8_512x1024_80k_cityscapes_20200618_074450.log.json) | +| GCNet | R-50-D8 | 769x769 | 80000 | - | - | 78.68 | 80.66 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_769x769_80k_cityscapes/gcnet_r50-d8_769x769_80k_cityscapes_20200619_092516-4839565b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_769x769_80k_cityscapes/gcnet_r50-d8_769x769_80k_cityscapes_20200619_092516.log.json) | +| GCNet | R-101-D8 | 769x769 | 80000 | - | - | 79.18 | 80.71 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_769x769_80k_cityscapes/gcnet_r101-d8_769x769_80k_cityscapes_20200619_092628-8e043423.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_769x769_80k_cityscapes/gcnet_r101-d8_769x769_80k_cityscapes_20200619_092628.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| GCNet | R-50-D8 | 512x512 | 80000 | 8.5 | 23.38 | 41.47 | 42.85 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x512_80k_ade20k/gcnet_r50-d8_512x512_80k_ade20k_20200614_185146-91a6da41.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x512_80k_ade20k/gcnet_r50-d8_512x512_80k_ade20k_20200614_185146.log.json) | +| GCNet | R-101-D8 | 512x512 | 80000 | 12 | 15.20 | 42.82 | 44.54 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x512_80k_ade20k/gcnet_r101-d8_512x512_80k_ade20k_20200615_020811-c3fcb6dd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x512_80k_ade20k/gcnet_r101-d8_512x512_80k_ade20k_20200615_020811.log.json) | +| GCNet | R-50-D8 | 512x512 | 160000 | - | - | 42.37 | 43.52 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x512_160k_ade20k/gcnet_r50-d8_512x512_160k_ade20k_20200615_224122-d95f3e1f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x512_160k_ade20k/gcnet_r50-d8_512x512_160k_ade20k_20200615_224122.log.json) | +| GCNet | R-101-D8 | 512x512 | 160000 | - | - | 43.69 | 45.21 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x512_160k_ade20k/gcnet_r101-d8_512x512_160k_ade20k_20200615_225406-615528d7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x512_160k_ade20k/gcnet_r101-d8_512x512_160k_ade20k_20200615_225406.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| GCNet | R-50-D8 | 512x512 | 20000 | 5.8 | 23.35 | 76.42 | 77.51 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x512_20k_voc12aug/gcnet_r50-d8_512x512_20k_voc12aug_20200617_165701-3cbfdab1.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x512_20k_voc12aug/gcnet_r50-d8_512x512_20k_voc12aug_20200617_165701.log.json) | +| GCNet | R-101-D8 | 512x512 | 20000 | 9.2 | 14.80 | 77.41 | 78.56 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x512_20k_voc12aug/gcnet_r101-d8_512x512_20k_voc12aug_20200617_165713-6c720aa9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x512_20k_voc12aug/gcnet_r101-d8_512x512_20k_voc12aug_20200617_165713.log.json) | +| GCNet | R-50-D8 | 512x512 | 40000 | - | - | 76.24 | 77.63 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x512_40k_voc12aug/gcnet_r50-d8_512x512_40k_voc12aug_20200613_195105-9797336d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r50-d8_512x512_40k_voc12aug/gcnet_r50-d8_512x512_40k_voc12aug_20200613_195105.log.json) | +| GCNet | R-101-D8 | 512x512 | 40000 | - | - | 77.84 | 78.59 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x512_40k_voc12aug/gcnet_r101-d8_512x512_40k_voc12aug_20200613_185806-1e38208d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/gcnet/gcnet_r101-d8_512x512_40k_voc12aug/gcnet_r101-d8_512x512_40k_voc12aug_20200613_185806.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..27bd9422dad49bc5a06f577ee45cd834bdbe3912 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './gcnet_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..7f0f83fe39da31fe9a5b497e0481e1c79a33e764 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './gcnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..9888120f65b045df1c7d4d05fb010373abf82ccf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './gcnet_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..1b70ca8e46a0409379f5ae9809ce03de203426ad --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './gcnet_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..b17c7a12b547ee4e1cd60d667c575eab06eb071c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './gcnet_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..a2183fc2db1ff188b0ad5418e55f71005da926cc --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './gcnet_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..08a6031f20234b1cc1d792ea5d4891613503a185 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './gcnet_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..5efb61339cdbdde585f7814e9650be2e2df654ac --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './gcnet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..610467c07204140bf604f8dda2aa57978c565ed3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/gcnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..155e28f42194112703bb21473e5e3dd0fca40d49 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/gcnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..1549a4d5bf10cd3fd6e3bd57bf7a48e7e5e1ede8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/gcnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..a496204bdb061d975c40cb7ef2aaada40e020a13 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/gcnet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..d85cf6550fea5da7cf1fa078eb4fa30e017166b4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/gcnet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..89d5e1ae0f3ef44626f3b5534c504cbce7389a32 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/gcnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..332495d3d7f7d7c7c0e0aca4e379cd54e2ed07de --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/gcnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d6d9cb1c64bcf8c3e952b6f8adc11bec0403d106 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/gcnet/gcnet_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/gcnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4d77cefe3e51091bc2264df8b201968addd81e8d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/README.md @@ -0,0 +1,59 @@ +# Deep High-Resolution Representation Learning for Human Pose Estimation + +## Introduction + +[ALGORITHM] + +```latext +@inproceedings{SunXLW19, + title={Deep High-Resolution Representation Learning for Human Pose Estimation}, + author={Ke Sun and Bin Xiao and Dong Liu and Jingdong Wang}, + booktitle={CVPR}, + year={2019} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|--------------------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | HRNetV2p-W18-Small | 512x1024 | 40000 | 1.7 | 23.74 | 73.86 | 75.91 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_40k_cityscapes/fcn_hr18s_512x1024_40k_cityscapes_20200601_014216-93db27d0.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_40k_cityscapes/fcn_hr18s_512x1024_40k_cityscapes_20200601_014216.log.json) | +| FCN | HRNetV2p-W18 | 512x1024 | 40000 | 2.9 | 12.97 | 77.19 | 78.92 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_40k_cityscapes/fcn_hr18_512x1024_40k_cityscapes_20200601_014216-f196fb4e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_40k_cityscapes/fcn_hr18_512x1024_40k_cityscapes_20200601_014216.log.json) | +| FCN | HRNetV2p-W48 | 512x1024 | 40000 | 6.2 | 6.42 | 78.48 | 79.69 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x1024_40k_cityscapes/fcn_hr48_512x1024_40k_cityscapes_20200601_014240-a989b146.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x1024_40k_cityscapes/fcn_hr48_512x1024_40k_cityscapes_20200601_014240.log.json) | +| FCN | HRNetV2p-W18-Small | 512x1024 | 80000 | - | - | 75.31 | 77.48 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_80k_cityscapes/fcn_hr18s_512x1024_80k_cityscapes_20200601_202700-1462b75d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_80k_cityscapes/fcn_hr18s_512x1024_80k_cityscapes_20200601_202700.log.json) | +| FCN | HRNetV2p-W18 | 512x1024 | 80000 | - | - | 78.65 | 80.35 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_80k_cityscapes/fcn_hr18_512x1024_80k_cityscapes_20200601_223255-4e7b345e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_80k_cityscapes/fcn_hr18_512x1024_80k_cityscapes_20200601_223255.log.json) | +| FCN | HRNetV2p-W48 | 512x1024 | 80000 | - | - | 79.93 | 80.72 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x1024_80k_cityscapes/fcn_hr48_512x1024_80k_cityscapes_20200601_202606-58ea95d6.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x1024_80k_cityscapes/fcn_hr48_512x1024_80k_cityscapes_20200601_202606.log.json) | +| FCN | HRNetV2p-W18-Small | 512x1024 | 160000 | - | - | 76.31 | 78.31 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_160k_cityscapes/fcn_hr18s_512x1024_160k_cityscapes_20200602_190901-4a0797ea.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x1024_160k_cityscapes/fcn_hr18s_512x1024_160k_cityscapes_20200602_190901.log.json) | +| FCN | HRNetV2p-W18 | 512x1024 | 160000 | - | - | 78.80 | 80.74 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_160k_cityscapes/fcn_hr18_512x1024_160k_cityscapes_20200602_190822-221e4a4f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x1024_160k_cityscapes/fcn_hr18_512x1024_160k_cityscapes_20200602_190822.log.json) | +| FCN | HRNetV2p-W48 | 512x1024 | 160000 | - | - | 80.65 | 81.92 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x1024_160k_cityscapes/fcn_hr48_512x1024_160k_cityscapes_20200602_190946-59b7973e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x1024_160k_cityscapes/fcn_hr48_512x1024_160k_cityscapes_20200602_190946.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|--------------------|-----------|--------:|----------|----------------|------:|--------------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | HRNetV2p-W18-Small | 512x512 | 80000 | 3.8 | 38.66 | 31.38 | 32.45 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x512_80k_ade20k/fcn_hr18s_512x512_80k_ade20k_20200614_144345-77fc814a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x512_80k_ade20k/fcn_hr18s_512x512_80k_ade20k_20200614_144345.log.json) | +| FCN | HRNetV2p-W18 | 512x512 | 80000 | 4.9 | 22.57 | 35.51 | 36.80 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x512_80k_ade20k/fcn_hr18_512x512_80k_ade20k_20200614_185145-66f20cb7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x512_80k_ade20k/fcn_hr18_512x512_80k_ade20k_20200614_185145.log.json) | +| FCN | HRNetV2p-W48 | 512x512 | 80000 | 8.2 | 21.23 | 41.90 | 43.27 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x512_80k_ade20k/fcn_hr48_512x512_80k_ade20k_20200614_193946-7ba5258d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x512_80k_ade20k/fcn_hr48_512x512_80k_ade20k_20200614_193946.log.json) | +| FCN | HRNetV2p-W18-Small | 512x512 | 160000 | - | - | 33.00 | 34.55 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x512_160k_ade20k/fcn_hr18s_512x512_160k_ade20k_20200614_214413-870f65ac.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x512_160k_ade20k/fcn_hr18s_512x512_160k_ade20k_20200614_214413.log.json) | +| FCN | HRNetV2p-W18 | 512x512 | 160000 | - | - | 36.79 | 38.58 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x512_160k_ade20k/fcn_hr18_512x512_160k_ade20k_20200614_214426-ca961836.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x512_160k_ade20k/fcn_hr18_512x512_160k_ade20k_20200614_214426.log.json) | +| FCN | HRNetV2p-W48 | 512x512 | 160000 | - | - | 42.02 | 43.86 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x512_160k_ade20k/fcn_hr48_512x512_160k_ade20k_20200614_214407-a52fc02c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x512_160k_ade20k/fcn_hr48_512x512_160k_ade20k_20200614_214407.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|--------------------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | HRNetV2p-W18-Small | 512x512 | 20000 | 1.8 | 43.36 | 65.20 | 68.55 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x512_20k_voc12aug/fcn_hr18s_512x512_20k_voc12aug_20200617_224503-56e36088.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x512_20k_voc12aug/fcn_hr18s_512x512_20k_voc12aug_20200617_224503.log.json) | +| FCN | HRNetV2p-W18 | 512x512 | 20000 | 2.9 | 23.48 | 72.30 | 74.71 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x512_20k_voc12aug/fcn_hr18_512x512_20k_voc12aug_20200617_224503-488d45f7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x512_20k_voc12aug/fcn_hr18_512x512_20k_voc12aug_20200617_224503.log.json) | +| FCN | HRNetV2p-W48 | 512x512 | 20000 | 6.2 | 22.05 | 75.87 | 78.58 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x512_20k_voc12aug/fcn_hr48_512x512_20k_voc12aug_20200617_224419-89de05cd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x512_20k_voc12aug/fcn_hr48_512x512_20k_voc12aug_20200617_224419.log.json) | +| FCN | HRNetV2p-W18-Small | 512x512 | 40000 | - | - | 66.61 | 70.00 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x512_40k_voc12aug/fcn_hr18s_512x512_40k_voc12aug_20200614_000648-4f8d6e7f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18s_512x512_40k_voc12aug/fcn_hr18s_512x512_40k_voc12aug_20200614_000648.log.json) | +| FCN | HRNetV2p-W18 | 512x512 | 40000 | - | - | 72.90 | 75.59 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x512_40k_voc12aug/fcn_hr18_512x512_40k_voc12aug_20200613_224401-1b4b76cd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr18_512x512_40k_voc12aug/fcn_hr18_512x512_40k_voc12aug_20200613_224401.log.json) | +| FCN | HRNetV2p-W48 | 512x512 | 40000 | - | - | 76.24 | 78.49 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x512_40k_voc12aug/fcn_hr48_512x512_40k_voc12aug_20200613_222111-1b0f18bc.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_512x512_40k_voc12aug/fcn_hr48_512x512_40k_voc12aug_20200613_222111.log.json) | + +### Pascal Context + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|--------------------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | HRNetV2p-W48 | 480x480 | 40000 | 6.1 | 8.86 | 45.14 | 47.42 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_480x480_40k_pascal_context/fcn_hr48_480x480_40k_pascal_context_20200911_164852-667d00b0.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_480x480_40k_pascal_context/fcn_hr48_480x480_40k_pascal_context-20200911_164852.log.json) | +| FCN | HRNetV2p-W48 | 480x480 | 80000 | - | - | 45.84 | 47.84 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_480x480_80k_pascal_context/fcn_hr48_480x480_80k_pascal_context_20200911_155322-847a6711.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/hrnet/fcn_hr48_480x480_80k_pascal_context/fcn_hr48_480x480_80k_pascal_context-20200911_155322.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..5ff05aa595399d77ee51552c243e489f395a820e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_480x480_40k_pascal_context.py @@ -0,0 +1,8 @@ +_base_ = [ + '../_base_/models/fcn_hr18.py', '../_base_/datasets/pascal_context.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=60), + test_cfg=dict(mode='slide', crop_size=(480, 480), stride=(320, 320))) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..cf315a4f0e6f397768572c590a634cc1b9d298a9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_480x480_80k_pascal_context.py @@ -0,0 +1,8 @@ +_base_ = [ + '../_base_/models/fcn_hr18.py', '../_base_/datasets/pascal_context.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=60), + test_cfg=dict(mode='slide', crop_size=(480, 480), stride=(320, 320))) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x1024_160k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x1024_160k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..9f04e935c39b08de66629f913b30675ffff2a8fe --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x1024_160k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/fcn_hr18.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..99760c36d8399204ca8e35f32690bcd369676852 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/fcn_hr18.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a653dda19255214a1a412b645abddd3fc5c0d853 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/fcn_hr18.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..45ed99b6813324a58575f9bb74ce0534626e10c4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_160k_ade20k.py @@ -0,0 +1,5 @@ +_base_ = [ + '../_base_/models/fcn_hr18.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict(decode_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..f06448b168af4d2dcc5a1f96e4430a7948b7e170 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_20k_voc12aug.py @@ -0,0 +1,5 @@ +_base_ = [ + '../_base_/models/fcn_hr18.py', '../_base_/datasets/pascal_voc12_aug.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_20k.py' +] +model = dict(decode_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..d74e95943afca04ba4073e411e0b713985384129 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_40k_voc12aug.py @@ -0,0 +1,5 @@ +_base_ = [ + '../_base_/models/fcn_hr18.py', '../_base_/datasets/pascal_voc12_aug.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(decode_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..52bc9f5e91f2fdf9ce8f9e3a873902dd8db56522 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18_512x512_80k_ade20k.py @@ -0,0 +1,5 @@ +_base_ = [ + '../_base_/models/fcn_hr18.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict(decode_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..d09931048f762cd2ac224d62c2fe2ed8e0e148c8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_480x480_40k_pascal_context.py @@ -0,0 +1,9 @@ +_base_ = './fcn_hr18_480x480_40k_pascal_context.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..584b7135fd95464f3d2c965440a0b92161cde09a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_480x480_80k_pascal_context.py @@ -0,0 +1,9 @@ +_base_ = './fcn_hr18_480x480_80k_pascal_context.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x1024_160k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x1024_160k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..ddbe3801f99dc21120548af85c55c7cdcfadaea2 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x1024_160k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './fcn_hr18_512x1024_160k_cityscapes.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..4e31d26e093b6cb2d59b24bb3060c92bd7dccdea --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x1024_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './fcn_hr18_512x1024_40k_cityscapes.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..ee2831d99d859c419b158b5f828d8a84063564ea --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './fcn_hr18_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..22a3ce0b38f36efc96595fe1c3ef428fc1575eb0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_160k_ade20k.py @@ -0,0 +1,9 @@ +_base_ = './fcn_hr18_512x512_160k_ade20k.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..d0de5df75242e58ba572277d6fc5cf93675a097e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_20k_voc12aug.py @@ -0,0 +1,9 @@ +_base_ = './fcn_hr18_512x512_20k_voc12aug.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..409db3c628edf63cd40e002f436884ce1fb75970 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_40k_voc12aug.py @@ -0,0 +1,9 @@ +_base_ = './fcn_hr18_512x512_40k_voc12aug.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..a8400979b1e94dd42343de656ffbc5fbb7a07944 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr18s_512x512_80k_ade20k.py @@ -0,0 +1,9 @@ +_base_ = './fcn_hr18_512x512_80k_ade20k.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..0e2d96cb6ce7249852cb1d9b36a2f24bdce00199 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_480x480_40k_pascal_context.py @@ -0,0 +1,10 @@ +_base_ = './fcn_hr18_480x480_40k_pascal_context.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=dict( + in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..e28164e3dc9d321bf0a97b37f14f3184f95a27a5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_480x480_80k_pascal_context.py @@ -0,0 +1,10 @@ +_base_ = './fcn_hr18_480x480_80k_pascal_context.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=dict( + in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x1024_160k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x1024_160k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..394a61c99f038c94fce58ac9c422b7c3ee4b5f50 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x1024_160k_cityscapes.py @@ -0,0 +1,10 @@ +_base_ = './fcn_hr18_512x1024_160k_cityscapes.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=dict( + in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d37ab1d09ef51b1321ed8b3634fd99445efee543 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x1024_40k_cityscapes.py @@ -0,0 +1,10 @@ +_base_ = './fcn_hr18_512x1024_40k_cityscapes.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=dict( + in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a9bab32b52ca41155062c7655986ed84677a8280 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x1024_80k_cityscapes.py @@ -0,0 +1,10 @@ +_base_ = './fcn_hr18_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=dict( + in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..dff4fea85ced568c38d39408d459697e88ca0faa --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_160k_ade20k.py @@ -0,0 +1,10 @@ +_base_ = './fcn_hr18_512x512_160k_ade20k.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=dict( + in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..a8d1deb98659d05755c6316c2aff2295afb0bb9c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_20k_voc12aug.py @@ -0,0 +1,10 @@ +_base_ = './fcn_hr18_512x512_20k_voc12aug.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=dict( + in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..1084a57e978195df6d45a9a00415953ddbaeeb51 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_40k_voc12aug.py @@ -0,0 +1,10 @@ +_base_ = './fcn_hr18_512x512_40k_voc12aug.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=dict( + in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..7eca7fa4b8102c6225af3b484ffff5bdc7c0f201 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/hrnet/fcn_hr48_512x512_80k_ade20k.py @@ -0,0 +1,10 @@ +_base_ = './fcn_hr18_512x512_80k_ade20k.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=dict( + in_channels=[48, 96, 192, 384], channels=sum([48, 96, 192, 384]))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/README.md new file mode 100644 index 0000000000000000000000000000000000000000..e0e75e028db703129551402ae9d63c6a2861dc4b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/README.md @@ -0,0 +1,35 @@ +# MobileNetV2: Inverted Residuals and Linear Bottlenecks + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{sandler2018mobilenetv2, + title={Mobilenetv2: Inverted residuals and linear bottlenecks}, + author={Sandler, Mark and Howard, Andrew and Zhu, Menglong and Zhmoginov, Andrey and Chen, Liang-Chieh}, + booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition}, + pages={4510--4520}, + year={2018} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|----------|-----------|--------:|---------:|----------------|------:|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | M-V2-D8 | 512x1024 | 80000 | 3.4 | 14.2 | 61.54 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/fcn_m-v2-d8_512x1024_80k_cityscapes/fcn_m-v2-d8_512x1024_80k_cityscapes_20200825_124817-d24c28c1.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/fcn_m-v2-d8_512x1024_80k_cityscapes/fcn_m-v2-d8_512x1024_80k_cityscapes-20200825_124817.log.json) | +| PSPNet | M-V2-D8 | 512x1024 | 80000 | 3.6 | 11.2 | 70.23 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/pspnet_m-v2-d8_512x1024_80k_cityscapes/pspnet_m-v2-d8_512x1024_80k_cityscapes_20200825_124817-19e81d51.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/pspnet_m-v2-d8_512x1024_80k_cityscapes/pspnet_m-v2-d8_512x1024_80k_cityscapes-20200825_124817.log.json) | +| DeepLabV3 | M-V2-D8 | 512x1024 | 80000 | 3.9 | 8.4 | 73.84 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/deeplabv3_m-v2-d8_512x1024_80k_cityscapes/deeplabv3_m-v2-d8_512x1024_80k_cityscapes_20200825_124836-bef03590.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/deeplabv3_m-v2-d8_512x1024_80k_cityscapes/deeplabv3_m-v2-d8_512x1024_80k_cityscapes-20200825_124836.log.json) | +| DeepLabV3+ | M-V2-D8 | 512x1024 | 80000 | 5.1 | 8.4 | 75.20 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/deeplabv3plus_m-v2-d8_512x1024_80k_cityscapes/deeplabv3plus_m-v2-d8_512x1024_80k_cityscapes_20200825_124836-d256dd4b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/deeplabv3plus_m-v2-d8_512x1024_80k_cityscapes/deeplabv3plus_m-v2-d8_512x1024_80k_cityscapes-20200825_124836.log.json) | + +### ADE20k + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|----------|-----------|--------:|---------:|----------------|------:|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | M-V2-D8 | 512x512 | 160000 | 6.5 | 64.4 | 19.71 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/fcn_m-v2-d8_512x512_160k_ade20k/fcn_m-v2-d8_512x512_160k_ade20k_20200825_214953-c40e1095.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/fcn_m-v2-d8_512x512_160k_ade20k/fcn_m-v2-d8_512x512_160k_ade20k-20200825_214953.log.json) | +| PSPNet | M-V2-D8 | 512x512 | 160000 | 6.5 | 57.7 | 29.68 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/pspnet_m-v2-d8_512x512_160k_ade20k/pspnet_m-v2-d8_512x512_160k_ade20k_20200825_214953-f5942f7a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/pspnet_m-v2-d8_512x512_160k_ade20k/pspnet_m-v2-d8_512x512_160k_ade20k-20200825_214953.log.json) | +| DeepLabV3 | M-V2-D8 | 512x512 | 160000 | 6.8 | 39.9 | 34.08 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/deeplabv3_m-v2-d8_512x512_160k_ade20k/deeplabv3_m-v2-d8_512x512_160k_ade20k_20200825_223255-63986343.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/deeplabv3_m-v2-d8_512x512_160k_ade20k/deeplabv3_m-v2-d8_512x512_160k_ade20k-20200825_223255.log.json) | +| DeepLabV3+ | M-V2-D8 | 512x512 | 160000 | 8.2 | 43.1 | 34.02 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/deeplabv3plus_m-v2-d8_512x512_160k_ade20k/deeplabv3plus_m-v2-d8_512x512_160k_ade20k_20200825_223255-465a01d4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v2/deeplabv3plus_m-v2-d8_512x512_160k_ade20k/deeplabv3plus_m-v2-d8_512x512_160k_ade20k-20200825_223255.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3_m-v2-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3_m-v2-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..267483d88ff25d75dc18c5c2d37375cd77c9639c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3_m-v2-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,12 @@ +_base_ = '../deeplabv3/deeplabv3_r101-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='mmcls://mobilenet_v2', + backbone=dict( + _delete_=True, + type='MobileNetV2', + widen_factor=1., + strides=(1, 2, 2, 1, 1, 1, 1), + dilations=(1, 1, 1, 2, 2, 4, 4), + out_indices=(1, 2, 4, 6)), + decode_head=dict(in_channels=320), + auxiliary_head=dict(in_channels=96)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3_m-v2-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3_m-v2-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..e15b8cc82b09ac3e64875936cdfd0f663aaba936 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3_m-v2-d8_512x512_160k_ade20k.py @@ -0,0 +1,12 @@ +_base_ = '../deeplabv3/deeplabv3_r101-d8_512x512_160k_ade20k.py' +model = dict( + pretrained='mmcls://mobilenet_v2', + backbone=dict( + _delete_=True, + type='MobileNetV2', + widen_factor=1., + strides=(1, 2, 2, 1, 1, 1, 1), + dilations=(1, 1, 1, 2, 2, 4, 4), + out_indices=(1, 2, 4, 6)), + decode_head=dict(in_channels=320), + auxiliary_head=dict(in_channels=96)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3plus_m-v2-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3plus_m-v2-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d4533d79a25771905d7f1900bf7b34037885a77a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3plus_m-v2-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,12 @@ +_base_ = '../deeplabv3plus/deeplabv3plus_r101-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='mmcls://mobilenet_v2', + backbone=dict( + _delete_=True, + type='MobileNetV2', + widen_factor=1., + strides=(1, 2, 2, 1, 1, 1, 1), + dilations=(1, 1, 1, 2, 2, 4, 4), + out_indices=(1, 2, 4, 6)), + decode_head=dict(in_channels=320, c1_in_channels=24), + auxiliary_head=dict(in_channels=96)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3plus_m-v2-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3plus_m-v2-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..7615a7c19a3f19635b71801a55e4544be4d215b5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/deeplabv3plus_m-v2-d8_512x512_160k_ade20k.py @@ -0,0 +1,12 @@ +_base_ = '../deeplabv3plus/deeplabv3plus_r101-d8_512x512_160k_ade20k.py' +model = dict( + pretrained='mmcls://mobilenet_v2', + backbone=dict( + _delete_=True, + type='MobileNetV2', + widen_factor=1., + strides=(1, 2, 2, 1, 1, 1, 1), + dilations=(1, 1, 1, 2, 2, 4, 4), + out_indices=(1, 2, 4, 6)), + decode_head=dict(in_channels=320, c1_in_channels=24), + auxiliary_head=dict(in_channels=96)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/fcn_m-v2-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/fcn_m-v2-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a535bd0ed8a4883134acdc52cf3f77c8d897ce82 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/fcn_m-v2-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,12 @@ +_base_ = '../fcn/fcn_r101-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='mmcls://mobilenet_v2', + backbone=dict( + _delete_=True, + type='MobileNetV2', + widen_factor=1., + strides=(1, 2, 2, 1, 1, 1, 1), + dilations=(1, 1, 1, 2, 2, 4, 4), + out_indices=(1, 2, 4, 6)), + decode_head=dict(in_channels=320), + auxiliary_head=dict(in_channels=96)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/fcn_m-v2-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/fcn_m-v2-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..c5f6ab0d62e269e44dac016eb5ac58f49c1fa292 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/fcn_m-v2-d8_512x512_160k_ade20k.py @@ -0,0 +1,12 @@ +_base_ = '../fcn/fcn_r101-d8_512x512_160k_ade20k.py' +model = dict( + pretrained='mmcls://mobilenet_v2', + backbone=dict( + _delete_=True, + type='MobileNetV2', + widen_factor=1., + strides=(1, 2, 2, 1, 1, 1, 1), + dilations=(1, 1, 1, 2, 2, 4, 4), + out_indices=(1, 2, 4, 6)), + decode_head=dict(in_channels=320), + auxiliary_head=dict(in_channels=96)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/pspnet_m-v2-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/pspnet_m-v2-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..7403bee864d833bcc31160665e4b54fdd738cc13 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/pspnet_m-v2-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,12 @@ +_base_ = '../pspnet/pspnet_r101-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='mmcls://mobilenet_v2', + backbone=dict( + _delete_=True, + type='MobileNetV2', + widen_factor=1., + strides=(1, 2, 2, 1, 1, 1, 1), + dilations=(1, 1, 1, 2, 2, 4, 4), + out_indices=(1, 2, 4, 6)), + decode_head=dict(in_channels=320), + auxiliary_head=dict(in_channels=96)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/pspnet_m-v2-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/pspnet_m-v2-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..5b72ac830be29b865ed52adaf41f2fe800f252cc --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v2/pspnet_m-v2-d8_512x512_160k_ade20k.py @@ -0,0 +1,12 @@ +_base_ = '../pspnet/pspnet_r101-d8_512x512_160k_ade20k.py' +model = dict( + pretrained='mmcls://mobilenet_v2', + backbone=dict( + _delete_=True, + type='MobileNetV2', + widen_factor=1., + strides=(1, 2, 2, 1, 1, 1, 1), + dilations=(1, 1, 1, 2, 2, 4, 4), + out_indices=(1, 2, 4, 6)), + decode_head=dict(in_channels=320), + auxiliary_head=dict(in_channels=96)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/README.md new file mode 100644 index 0000000000000000000000000000000000000000..2bad2a731c63ba51ee05518af893618cf7ed94a0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/README.md @@ -0,0 +1,28 @@ +# Searching for MobileNetV3 + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{Howard_2019_ICCV, + title={Searching for MobileNetV3}, + author={Howard, Andrew and Sandler, Mark and Chu, Grace and Chen, Liang-Chieh and Chen, Bo and Tan, Mingxing and Wang, Weijun and Zhu, Yukun and Pang, Ruoming and Vasudevan, Vijay and Le, Quoc V. and Adam, Hartwig}, + booktitle={The IEEE International Conference on Computer Vision (ICCV)}, + pages={1314-1324}, + month={October}, + year={2019}, + doi={10.1109/ICCV.2019.00140}} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|----------|-----------|--------:|---------:|----------------|------:|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| LRASPP | M-V3-D8 | 512x1024 | 320000 | 8.9 | 15.22 | 69.54 | 70.89 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v3/lraspp_m-v3-d8_512x1024_320k_cityscapes/lraspp_m-v3-d8_512x1024_320k_cityscapes_20201224_220337-cfe8fb07.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v3/lraspp_m-v3-d8_512x1024_320k_cityscapes/lraspp_m-v3-d8_512x1024_320k_cityscapes-20201224_220337.log.json)| +| LRASPP | M-V3-D8 (scratch) | 512x1024 | 320000 | 8.9 | 14.77 | 67.87 | 69.78 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v3/lraspp_m-v3-d8_scratch_512x1024_320k_cityscapes/lraspp_m-v3-d8_scratch_512x1024_320k_cityscapes_20201224_220337-9f29cd72.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v3/lraspp_m-v3-d8_scratch_512x1024_320k_cityscapes/lraspp_m-v3-d8_scratch_512x1024_320k_cityscapes-20201224_220337.log.json)| +| LRASPP | M-V3s-D8 | 512x1024 | 320000 | 5.3 | 23.64 | 64.11 | 66.42 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v3/lraspp_m-v3s-d8_512x1024_320k_cityscapes/lraspp_m-v3s-d8_512x1024_320k_cityscapes_20201224_223935-61565b34.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v3/lraspp_m-v3s-d8_512x1024_320k_cityscapes/lraspp_m-v3s-d8_512x1024_320k_cityscapes-20201224_223935.log.json)| +| LRASPP | M-V3s-D8 (scratch) | 512x1024 | 320000 | 5.3 | 24.50 | 62.74 | 65.01 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v3/lraspp_m-v3s-d8_scratch_512x1024_320k_cityscapes/lraspp_m-v3s-d8_scratch_512x1024_320k_cityscapes_20201224_223935-03daeabb.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/mobilenet_v3/lraspp_m-v3s-d8_scratch_512x1024_320k_cityscapes/lraspp_m-v3s-d8_scratch_512x1024_320k_cityscapes-20201224_223935.log.json)| diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3-d8_512x1024_320k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3-d8_512x1024_320k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..e59a78b48be3a0997a31524fd78e7fad5636bc82 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3-d8_512x1024_320k_cityscapes.py @@ -0,0 +1,11 @@ +_base_ = [ + '../_base_/models/lraspp_m-v3-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] + +model = dict(pretrained='open-mmlab://contrib/mobilenet_v3_large') + +# Re-config the data sampler. +data = dict(samples_per_gpu=4, workers_per_gpu=4) + +runner = dict(type='IterBasedRunner', max_iters=320000) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3-d8_scratch_512x1024_320k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3-d8_scratch_512x1024_320k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a3c5435142db6b1f81421f5fd96d07ece32b5f38 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3-d8_scratch_512x1024_320k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/lraspp_m-v3-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] + +# Re-config the data sampler. +data = dict(samples_per_gpu=4, workers_per_gpu=4) + +runner = dict(type='IterBasedRunner', max_iters=320000) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3s-d8_512x1024_320k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3s-d8_512x1024_320k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d4e368b2a11ed6433d8f2594a2cc3184fe5ddfff --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3s-d8_512x1024_320k_cityscapes.py @@ -0,0 +1,23 @@ +_base_ = './lraspp_m-v3-d8_512x1024_320k_cityscapes.py' +norm_cfg = dict(type='SyncBN', eps=0.001, requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://contrib/mobilenet_v3_small', + backbone=dict( + type='MobileNetV3', + arch='small', + out_indices=(0, 1, 12), + norm_cfg=norm_cfg), + decode_head=dict( + type='LRASPPHead', + in_channels=(16, 16, 576), + in_index=(0, 1, 2), + channels=128, + input_transform='multiple_select', + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3s-d8_scratch_512x1024_320k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3s-d8_scratch_512x1024_320k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..0c5f707200c5d8b6d39493762baf59023dcaad11 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/mobilenet_v3/lraspp_m-v3s-d8_scratch_512x1024_320k_cityscapes.py @@ -0,0 +1,22 @@ +_base_ = './lraspp_m-v3-d8_scratch_512x1024_320k_cityscapes.py' +norm_cfg = dict(type='SyncBN', eps=0.001, requires_grad=True) +model = dict( + type='EncoderDecoder', + backbone=dict( + type='MobileNetV3', + arch='small', + out_indices=(0, 1, 12), + norm_cfg=norm_cfg), + decode_head=dict( + type='LRASPPHead', + in_channels=(16, 16, 576), + in_index=(0, 1, 2), + channels=128, + input_transform='multiple_select', + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/README.md new file mode 100644 index 0000000000000000000000000000000000000000..76352e265a9db69640c650e192f66ab75b1f19b3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/README.md @@ -0,0 +1,48 @@ +# Non-local Neural Networks + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{wang2018non, + title={Non-local neural networks}, + author={Wang, Xiaolong and Girshick, Ross and Gupta, Abhinav and He, Kaiming}, + booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition}, + pages={7794--7803}, + year={2018} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|----------|----------|-----------|--------:|----------|----------------|------:|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| NonLocal | R-50-D8 | 512x1024 | 40000 | 7.4 | 2.72 | 78.24 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x1024_40k_cityscapes/nonlocal_r50-d8_512x1024_40k_cityscapes_20200605_210748-c75e81e3.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x1024_40k_cityscapes/nonlocal_r50-d8_512x1024_40k_cityscapes_20200605_210748.log.json) | +| NonLocal | R-101-D8 | 512x1024 | 40000 | 10.9 | 1.95 | 78.66 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x1024_40k_cityscapes/nonlocal_r101-d8_512x1024_40k_cityscapes_20200605_210748-d63729fa.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x1024_40k_cityscapes/nonlocal_r101-d8_512x1024_40k_cityscapes_20200605_210748.log.json) | +| NonLocal | R-50-D8 | 769x769 | 40000 | 8.9 | 1.52 | 78.33 | 79.92 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_769x769_40k_cityscapes/nonlocal_r50-d8_769x769_40k_cityscapes_20200530_045243-82ef6749.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_769x769_40k_cityscapes/nonlocal_r50-d8_769x769_40k_cityscapes_20200530_045243.log.json) | +| NonLocal | R-101-D8 | 769x769 | 40000 | 12.8 | 1.05 | 78.57 | 80.29 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_769x769_40k_cityscapes/nonlocal_r101-d8_769x769_40k_cityscapes_20200530_045348-8fe9a9dc.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_769x769_40k_cityscapes/nonlocal_r101-d8_769x769_40k_cityscapes_20200530_045348.log.json) | +| NonLocal | R-50-D8 | 512x1024 | 80000 | - | - | 78.01 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x1024_80k_cityscapes/nonlocal_r50-d8_512x1024_80k_cityscapes_20200607_193518-d6839fae.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x1024_80k_cityscapes/nonlocal_r50-d8_512x1024_80k_cityscapes_20200607_193518.log.json) | +| NonLocal | R-101-D8 | 512x1024 | 80000 | - | - | 78.93 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x1024_80k_cityscapes/nonlocal_r101-d8_512x1024_80k_cityscapes_20200607_183411-32700183.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x1024_80k_cityscapes/nonlocal_r101-d8_512x1024_80k_cityscapes_20200607_183411.log.json) | +| NonLocal | R-50-D8 | 769x769 | 80000 | - | - | 79.05 | 80.68 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_769x769_80k_cityscapes/nonlocal_r50-d8_769x769_80k_cityscapes_20200607_193506-1f9792f6.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_769x769_80k_cityscapes/nonlocal_r50-d8_769x769_80k_cityscapes_20200607_193506.log.json) | +| NonLocal | R-101-D8 | 769x769 | 80000 | - | - | 79.40 | 80.85 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_769x769_80k_cityscapes/nonlocal_r101-d8_769x769_80k_cityscapes_20200607_183428-0e1fa4f9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_769x769_80k_cityscapes/nonlocal_r101-d8_769x769_80k_cityscapes_20200607_183428.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|----------|----------|-----------|--------:|----------|----------------|------:|--------------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| NonLocal | R-50-D8 | 512x512 | 80000 | 9.1 | 21.37 | 40.75 | 42.05 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x512_80k_ade20k/nonlocal_r50-d8_512x512_80k_ade20k_20200615_015801-5ae0aa33.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x512_80k_ade20k/nonlocal_r50-d8_512x512_80k_ade20k_20200615_015801.log.json) | +| NonLocal | R-101-D8 | 512x512 | 80000 | 12.6 | 13.97 | 42.90 | 44.27 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x512_80k_ade20k/nonlocal_r101-d8_512x512_80k_ade20k_20200615_015758-24105919.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x512_80k_ade20k/nonlocal_r101-d8_512x512_80k_ade20k_20200615_015758.log.json) | +| NonLocal | R-50-D8 | 512x512 | 160000 | - | - | 42.03 | 43.04 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x512_160k_ade20k/nonlocal_r50-d8_512x512_160k_ade20k_20200616_005410-baef45e3.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x512_160k_ade20k/nonlocal_r50-d8_512x512_160k_ade20k_20200616_005410.log.json) | +| NonLocal | R-101-D8 | 512x512 | 160000 | - | - | 43.36 | 44.83 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x512_160k_ade20k/nonlocal_r101-d8_512x512_160k_ade20k_20200616_003422-affd0f8d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x512_160k_ade20k/nonlocal_r101-d8_512x512_160k_ade20k_20200616_003422.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|----------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| NonLocal | R-50-D8 | 512x512 | 20000 | 6.4 | 21.21 | 76.20 | 77.12 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x512_20k_voc12aug/nonlocal_r50-d8_512x512_20k_voc12aug_20200617_222613-07f2a57c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x512_20k_voc12aug/nonlocal_r50-d8_512x512_20k_voc12aug_20200617_222613.log.json) | +| NonLocal | R-101-D8 | 512x512 | 20000 | 9.8 | 14.01 | 78.15 | 78.86 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x512_20k_voc12aug/nonlocal_r101-d8_512x512_20k_voc12aug_20200617_222615-948c68ab.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x512_20k_voc12aug/nonlocal_r101-d8_512x512_20k_voc12aug_20200617_222615.log.json) | +| NonLocal | R-50-D8 | 512x512 | 40000 | - | - | 76.65 | 77.47 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x512_40k_voc12aug/nonlocal_r50-d8_512x512_40k_voc12aug_20200614_000028-0139d4a9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r50-d8_512x512_40k_voc12aug/nonlocal_r50-d8_512x512_40k_voc12aug_20200614_000028.log.json) | +| NonLocal | R-101-D8 | 512x512 | 40000 | - | - | 78.27 | 79.12 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x512_40k_voc12aug/nonlocal_r101-d8_512x512_40k_voc12aug_20200614_000028-7e5ff470.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/nonlocal_net/nonlocal_r101-d8_512x512_40k_voc12aug/nonlocal_r101-d8_512x512_40k_voc12aug_20200614_000028.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..ef7b06dd3806c1d93be41943ab4d7d49f68ac830 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './nonlocal_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..7a1e66cf1c239eac3c6a4876a35d82e7b6ccec2e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './nonlocal_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..df9c2aca9c7c1999d74a08a58aca5d220f7df54a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './nonlocal_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..490f9873a29f2626ad764825eec97f16ee7f9f96 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './nonlocal_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..40d9190fba223251b794c105b036e4794865f785 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './nonlocal_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..0c6f60dac7b457d3b936a5f7f43eb84713c77e05 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './nonlocal_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..23e6da7f23180c2350253ea400f444c0c3064fd6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './nonlocal_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..0627e2b5a76dead859212d4cab116c160df21404 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './nonlocal_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..9d4dc7390370d0ffe21e7dcb686eeff7261952c4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/nonlocal_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..b0672b687ade8d554b71fdf0bc54de9f024fa30c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/nonlocal_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..b1adfbab882d9825a3f348ed99e401d1f164cd11 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/nonlocal_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..2e808d8072f34d09a7b0859f90261dd66c8815dd --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/nonlocal_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..66b443abec3282242c0f794a2f91e066596e7ee9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/nonlocal_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..8a7a2f509ba6627ad5ab972ac090362bbcd2ecb7 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/nonlocal_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..75adef324877d56c157b457eecbf8446aa6b192f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/nonlocal_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a0726c293d6026898110f7fa55d5e7d2d55d7a02 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/nonlocal_net/nonlocal_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/nonlocal_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0a4c75c708330be9163dae675b0c1f84d7722728 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/README.md @@ -0,0 +1,69 @@ +# Object-Contextual Representations for Semantic Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@article{YuanW18, + title={Ocnet: Object context network for scene parsing}, + author={Yuhui Yuan and Jingdong Wang}, + booktitle={arXiv preprint arXiv:1809.00916}, + year={2018} +} + +@article{YuanCW20, + title={Object-Contextual Representations for Semantic Segmentation}, + author={Yuhui Yuan and Xilin Chen and Jingdong Wang}, + booktitle={ECCV}, + year={2020} +} +``` + +## Results and models + +### Cityscapes + +#### HRNet backbone + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|--------------------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| OCRNet | HRNetV2p-W18-Small | 512x1024 | 40000 | 3.5 | 10.45 | 74.30 | 75.95 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_40k_cityscapes/ocrnet_hr18s_512x1024_40k_cityscapes_20200601_033304-fa2436c2.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_40k_cityscapes/ocrnet_hr18s_512x1024_40k_cityscapes_20200601_033304.log.json) | +| OCRNet | HRNetV2p-W18 | 512x1024 | 40000 | 4.7 | 7.50 | 77.72 | 79.49 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_40k_cityscapes/ocrnet_hr18_512x1024_40k_cityscapes_20200601_033320-401c5bdd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_40k_cityscapes/ocrnet_hr18_512x1024_40k_cityscapes_20200601_033320.log.json) | +| OCRNet | HRNetV2p-W48 | 512x1024 | 40000 | 8 | 4.22 | 80.58 | 81.79 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x1024_40k_cityscapes/ocrnet_hr48_512x1024_40k_cityscapes_20200601_033336-55b32491.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x1024_40k_cityscapes/ocrnet_hr48_512x1024_40k_cityscapes_20200601_033336.log.json) | +| OCRNet | HRNetV2p-W18-Small | 512x1024 | 80000 | - | - | 77.16 | 78.66 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_80k_cityscapes/ocrnet_hr18s_512x1024_80k_cityscapes_20200601_222735-55979e63.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_80k_cityscapes/ocrnet_hr18s_512x1024_80k_cityscapes_20200601_222735.log.json) | +| OCRNet | HRNetV2p-W18 | 512x1024 | 80000 | - | - | 78.57 | 80.46 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_80k_cityscapes/ocrnet_hr18_512x1024_80k_cityscapes_20200614_230521-c2e1dd4a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_80k_cityscapes/ocrnet_hr18_512x1024_80k_cityscapes_20200614_230521.log.json) | +| OCRNet | HRNetV2p-W48 | 512x1024 | 80000 | - | - | 80.70 | 81.87 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x1024_80k_cityscapes/ocrnet_hr48_512x1024_80k_cityscapes_20200601_222752-9076bcdf.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x1024_80k_cityscapes/ocrnet_hr48_512x1024_80k_cityscapes_20200601_222752.log.json) | +| OCRNet | HRNetV2p-W18-Small | 512x1024 | 160000 | - | - | 78.45 | 79.97 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_160k_cityscapes/ocrnet_hr18s_512x1024_160k_cityscapes_20200602_191005-f4a7af28.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x1024_160k_cityscapes/ocrnet_hr18s_512x1024_160k_cityscapes_20200602_191005.log.json) | +| OCRNet | HRNetV2p-W18 | 512x1024 | 160000 | - | - | 79.47 | 80.91 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_160k_cityscapes/ocrnet_hr18_512x1024_160k_cityscapes_20200602_191001-b9172d0c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x1024_160k_cityscapes/ocrnet_hr18_512x1024_160k_cityscapes_20200602_191001.log.json) | +| OCRNet | HRNetV2p-W48 | 512x1024 | 160000 | - | - | 81.35 | 82.70 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x1024_160k_cityscapes/ocrnet_hr48_512x1024_160k_cityscapes_20200602_191037-dfbf1b0c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x1024_160k_cityscapes/ocrnet_hr48_512x1024_160k_cityscapes_20200602_191037.log.json) | + +#### ResNet backbone + +| Method | Backbone | Crop Size | Batch Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|--------------------|-----------|--------|----------|-----------|----------------|------|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| OCRNet | R-101-D8 | 512x1024 | 8 | 40000 | - | - | 80.09 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_r101-d8_512x1024_40k_b8_cityscapes/ocrnet_r101-d8_512x1024_40k_b8_cityscapes-02ac0f13.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_r101-d8_512x1024_40k_b8_cityscapes/ocrnet_r101-d8_512x1024_40k_b8_cityscapes_20200717_110721.log.json) | +| OCRNet | R-101-D8 | 512x1024 | 16 | 40000 | 8.8 | 3.02 | 80.30 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_r101-d8_512x1024_40k_b16_cityscapes/ocrnet_r101-d8_512x1024_40k_b16_cityscapes-db500f80.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_r101-d8_512x1024_40k_b16_cityscapes/ocrnet_r101-d8_512x1024_40k_b16_cityscapes_20200723_193726.log.json) | +| OCRNet | R-101-D8 | 512x1024 | 16 | 80000 | 8.8 | 3.02 | 80.81 | - | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_r101-d8_512x1024_80k_b16_cityscapes/ocrnet_r101-d8_512x1024_80k_b16_cityscapes-78688424.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_r101-d8_512x1024_80k_b16_cityscapes/ocrnet_r101-d8_512x1024_80k_b16_cityscapes_20200723_192421.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|--------------------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| OCRNet | HRNetV2p-W18-Small | 512x512 | 80000 | 6.7 | 28.98 | 35.06 | 35.80 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x512_80k_ade20k/ocrnet_hr18s_512x512_80k_ade20k_20200615_055600-e80b62af.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x512_80k_ade20k/ocrnet_hr18s_512x512_80k_ade20k_20200615_055600.log.json) | +| OCRNet | HRNetV2p-W18 | 512x512 | 80000 | 7.9 | 18.93 | 37.79 | 39.16 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x512_80k_ade20k/ocrnet_hr18_512x512_80k_ade20k_20200615_053157-d173d83b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x512_80k_ade20k/ocrnet_hr18_512x512_80k_ade20k_20200615_053157.log.json) | +| OCRNet | HRNetV2p-W48 | 512x512 | 80000 | 11.2 | 16.99 | 43.00 | 44.30 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x512_80k_ade20k/ocrnet_hr48_512x512_80k_ade20k_20200615_021518-d168c2d1.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x512_80k_ade20k/ocrnet_hr48_512x512_80k_ade20k_20200615_021518.log.json) | +| OCRNet | HRNetV2p-W18-Small | 512x512 | 160000 | - | - | 37.19 | 38.40 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x512_160k_ade20k/ocrnet_hr18s_512x512_160k_ade20k_20200615_184505-8e913058.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x512_160k_ade20k/ocrnet_hr18s_512x512_160k_ade20k_20200615_184505.log.json) | +| OCRNet | HRNetV2p-W18 | 512x512 | 160000 | - | - | 39.32 | 40.80 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x512_160k_ade20k/ocrnet_hr18_512x512_160k_ade20k_20200615_200940-d8fcd9d1.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x512_160k_ade20k/ocrnet_hr18_512x512_160k_ade20k_20200615_200940.log.json) | +| OCRNet | HRNetV2p-W48 | 512x512 | 160000 | - | - | 43.25 | 44.88 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x512_160k_ade20k/ocrnet_hr48_512x512_160k_ade20k_20200615_184705-a073726d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x512_160k_ade20k/ocrnet_hr48_512x512_160k_ade20k_20200615_184705.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|--------------------|-----------|--------:|----------|----------------|------:|--------------:|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| OCRNet | HRNetV2p-W18-Small | 512x512 | 20000 | 3.5 | 31.55 | 71.70 | 73.84 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x512_20k_voc12aug/ocrnet_hr18s_512x512_20k_voc12aug_20200617_233913-02b04fcb.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x512_20k_voc12aug/ocrnet_hr18s_512x512_20k_voc12aug_20200617_233913.log.json) | +| OCRNet | HRNetV2p-W18 | 512x512 | 20000 | 4.7 | 19.91 | 74.75 | 77.11 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x512_20k_voc12aug/ocrnet_hr18_512x512_20k_voc12aug_20200617_233932-8954cbb7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x512_20k_voc12aug/ocrnet_hr18_512x512_20k_voc12aug_20200617_233932.log.json) | +| OCRNet | HRNetV2p-W48 | 512x512 | 20000 | 8.1 | 17.83 | 77.72 | 79.87 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x512_20k_voc12aug/ocrnet_hr48_512x512_20k_voc12aug_20200617_233932-9e82080a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x512_20k_voc12aug/ocrnet_hr48_512x512_20k_voc12aug_20200617_233932.log.json) | +| OCRNet | HRNetV2p-W18-Small | 512x512 | 40000 | - | - | 72.76 | 74.60 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x512_40k_voc12aug/ocrnet_hr18s_512x512_40k_voc12aug_20200614_002025-42b587ac.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18s_512x512_40k_voc12aug/ocrnet_hr18s_512x512_40k_voc12aug_20200614_002025.log.json) | +| OCRNet | HRNetV2p-W18 | 512x512 | 40000 | - | - | 74.98 | 77.40 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x512_40k_voc12aug/ocrnet_hr18_512x512_40k_voc12aug_20200614_015958-714302be.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr18_512x512_40k_voc12aug/ocrnet_hr18_512x512_40k_voc12aug_20200614_015958.log.json) | +| OCRNet | HRNetV2p-W48 | 512x512 | 40000 | - | - | 77.14 | 79.71 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x512_40k_voc12aug/ocrnet_hr48_512x512_40k_voc12aug_20200614_015958-255bc5ce.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/ocrnet/ocrnet_hr48_512x512_40k_voc12aug/ocrnet_hr48_512x512_40k_voc12aug_20200614_015958.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x1024_160k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x1024_160k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1c86eba17c46a863091d999b1a090e1237202ec5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x1024_160k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/ocrnet_hr18.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..2c73b3839c8c1bc859eb3b8864256a00cfd022fe --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/ocrnet_hr18.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..506ad9319a9418f50650c477698c9b5cb9bf6663 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/ocrnet_hr18.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..a3c86e18ea65c6aaa36a4fb6e2708f08c7ae1698 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_160k_ade20k.py @@ -0,0 +1,35 @@ +_base_ = [ + '../_base_/models/ocrnet_hr18.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict(decode_head=[ + dict( + type='FCNHead', + in_channels=[18, 36, 72, 144], + channels=sum([18, 36, 72, 144]), + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + kernel_size=1, + num_convs=1, + concat_input=False, + dropout_ratio=-1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[18, 36, 72, 144], + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + channels=512, + ocr_channels=256, + dropout_ratio=-1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), +]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..ab9d6446c9089bfae533b9dcd66e1352d81f74d0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_20k_voc12aug.py @@ -0,0 +1,36 @@ +_base_ = [ + '../_base_/models/ocrnet_hr18.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict(decode_head=[ + dict( + type='FCNHead', + in_channels=[18, 36, 72, 144], + channels=sum([18, 36, 72, 144]), + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + kernel_size=1, + num_convs=1, + concat_input=False, + dropout_ratio=-1, + num_classes=21, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[18, 36, 72, 144], + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + channels=512, + ocr_channels=256, + dropout_ratio=-1, + num_classes=21, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), +]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..df79a9cf13963d26384b00ced0cf5efa9f68a420 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_40k_voc12aug.py @@ -0,0 +1,36 @@ +_base_ = [ + '../_base_/models/ocrnet_hr18.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict(decode_head=[ + dict( + type='FCNHead', + in_channels=[18, 36, 72, 144], + channels=sum([18, 36, 72, 144]), + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + kernel_size=1, + num_convs=1, + concat_input=False, + dropout_ratio=-1, + num_classes=21, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[18, 36, 72, 144], + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + channels=512, + ocr_channels=256, + dropout_ratio=-1, + num_classes=21, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), +]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..6ad67722a50c2b2ece5fcb7f0dd1819061ff6b3e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18_512x512_80k_ade20k.py @@ -0,0 +1,35 @@ +_base_ = [ + '../_base_/models/ocrnet_hr18.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict(decode_head=[ + dict( + type='FCNHead', + in_channels=[18, 36, 72, 144], + channels=sum([18, 36, 72, 144]), + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + kernel_size=1, + num_convs=1, + concat_input=False, + dropout_ratio=-1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[18, 36, 72, 144], + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + channels=512, + ocr_channels=256, + dropout_ratio=-1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), +]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x1024_160k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x1024_160k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..fc7909785f743071cad2cd1032000405435f81d4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x1024_160k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './ocrnet_hr18_512x1024_160k_cityscapes.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..923731f74f80c11e196f6099b1c84875686cd441 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x1024_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './ocrnet_hr18_512x1024_40k_cityscapes.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..be6bf16a2fd234f3526bf8fb8c30179f1ef9df78 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './ocrnet_hr18_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..81f3d5cb91607134bb1d844d78df7a3c411c134d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_160k_ade20k.py @@ -0,0 +1,9 @@ +_base_ = './ocrnet_hr18_512x512_160k_ade20k.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..ceb944815b5a979ddb72015295375f6fe0c31a89 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_20k_voc12aug.py @@ -0,0 +1,9 @@ +_base_ = './ocrnet_hr18_512x512_20k_voc12aug.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..70babc91c99eb99ee4f941b34ea886236531832e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_40k_voc12aug.py @@ -0,0 +1,9 @@ +_base_ = './ocrnet_hr18_512x512_40k_voc12aug.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..36e77219ac2d7ee6795db7c40ad7341749a3b1c7 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr18s_512x512_80k_ade20k.py @@ -0,0 +1,9 @@ +_base_ = './ocrnet_hr18_512x512_80k_ade20k.py' +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w18_small', + backbone=dict( + extra=dict( + stage1=dict(num_blocks=(2, )), + stage2=dict(num_blocks=(2, 2)), + stage3=dict(num_modules=3, num_blocks=(2, 2, 2)), + stage4=dict(num_modules=2, num_blocks=(2, 2, 2, 2))))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x1024_160k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x1024_160k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..c094391b1dfcef2fa6278f0c181fb50c303f7a4c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x1024_160k_cityscapes.py @@ -0,0 +1,39 @@ +_base_ = './ocrnet_hr18_512x1024_160k_cityscapes.py' +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=[ + dict( + type='FCNHead', + in_channels=[48, 96, 192, 384], + channels=sum([48, 96, 192, 384]), + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + kernel_size=1, + num_convs=1, + norm_cfg=norm_cfg, + concat_input=False, + dropout_ratio=-1, + num_classes=19, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[48, 96, 192, 384], + channels=512, + ocr_channels=256, + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + norm_cfg=norm_cfg, + dropout_ratio=-1, + num_classes=19, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..0aada9d8dcd792fd4fc7da8908cc11d44a9ff521 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x1024_40k_cityscapes.py @@ -0,0 +1,39 @@ +_base_ = './ocrnet_hr18_512x1024_40k_cityscapes.py' +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=[ + dict( + type='FCNHead', + in_channels=[48, 96, 192, 384], + channels=sum([48, 96, 192, 384]), + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + kernel_size=1, + num_convs=1, + norm_cfg=norm_cfg, + concat_input=False, + dropout_ratio=-1, + num_classes=19, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[48, 96, 192, 384], + channels=512, + ocr_channels=256, + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + norm_cfg=norm_cfg, + dropout_ratio=-1, + num_classes=19, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1b2e0094393151fa8975a0d53c48b6048b7e1929 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x1024_80k_cityscapes.py @@ -0,0 +1,39 @@ +_base_ = './ocrnet_hr18_512x1024_80k_cityscapes.py' +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=[ + dict( + type='FCNHead', + in_channels=[48, 96, 192, 384], + channels=sum([48, 96, 192, 384]), + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + kernel_size=1, + num_convs=1, + norm_cfg=norm_cfg, + concat_input=False, + dropout_ratio=-1, + num_classes=19, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[48, 96, 192, 384], + channels=512, + ocr_channels=256, + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + norm_cfg=norm_cfg, + dropout_ratio=-1, + num_classes=19, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..3b3e8af9538e6ce3c929a902e3d1ee5be53469a5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_160k_ade20k.py @@ -0,0 +1,39 @@ +_base_ = './ocrnet_hr18_512x512_160k_ade20k.py' +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=[ + dict( + type='FCNHead', + in_channels=[48, 96, 192, 384], + channels=sum([48, 96, 192, 384]), + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + kernel_size=1, + num_convs=1, + norm_cfg=norm_cfg, + concat_input=False, + dropout_ratio=-1, + num_classes=150, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[48, 96, 192, 384], + channels=512, + ocr_channels=256, + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + norm_cfg=norm_cfg, + dropout_ratio=-1, + num_classes=150, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..c2dd6d1158bd31ecdd7874827fd37bffb5d26db6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_20k_voc12aug.py @@ -0,0 +1,39 @@ +_base_ = './ocrnet_hr18_512x512_20k_voc12aug.py' +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=[ + dict( + type='FCNHead', + in_channels=[48, 96, 192, 384], + channels=sum([48, 96, 192, 384]), + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + kernel_size=1, + num_convs=1, + norm_cfg=norm_cfg, + concat_input=False, + dropout_ratio=-1, + num_classes=21, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[48, 96, 192, 384], + channels=512, + ocr_channels=256, + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + norm_cfg=norm_cfg, + dropout_ratio=-1, + num_classes=21, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..89e6309f55f6b939f7d79271513da4934bbacbb6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_40k_voc12aug.py @@ -0,0 +1,39 @@ +_base_ = './ocrnet_hr18_512x512_40k_voc12aug.py' +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=[ + dict( + type='FCNHead', + in_channels=[48, 96, 192, 384], + channels=sum([48, 96, 192, 384]), + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + kernel_size=1, + num_convs=1, + norm_cfg=norm_cfg, + concat_input=False, + dropout_ratio=-1, + num_classes=21, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[48, 96, 192, 384], + channels=512, + ocr_channels=256, + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + norm_cfg=norm_cfg, + dropout_ratio=-1, + num_classes=21, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..04971226eb0fd6461b715358ac955dfb78102992 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_hr48_512x512_80k_ade20k.py @@ -0,0 +1,39 @@ +_base_ = './ocrnet_hr18_512x512_80k_ade20k.py' +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w48', + backbone=dict( + extra=dict( + stage2=dict(num_channels=(48, 96)), + stage3=dict(num_channels=(48, 96, 192)), + stage4=dict(num_channels=(48, 96, 192, 384)))), + decode_head=[ + dict( + type='FCNHead', + in_channels=[48, 96, 192, 384], + channels=sum([48, 96, 192, 384]), + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + kernel_size=1, + num_convs=1, + norm_cfg=norm_cfg, + concat_input=False, + dropout_ratio=-1, + num_classes=150, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[48, 96, 192, 384], + channels=512, + ocr_channels=256, + input_transform='resize_concat', + in_index=(0, 1, 2, 3), + norm_cfg=norm_cfg, + dropout_ratio=-1, + num_classes=150, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ]) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_r101-d8_512x1024_40k_b16_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_r101-d8_512x1024_40k_b16_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..3dd70b74a0bf912d8a6fd39f1f26be7f7571ccd6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_r101-d8_512x1024_40k_b16_cityscapes.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/ocrnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) +optimizer = dict(lr=0.02) +lr_config = dict(min_lr=2e-4) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_r101-d8_512x1024_40k_b8_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_r101-d8_512x1024_40k_b8_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..e34f3432e581ff506c9d2951c98b5aad7b1be6a5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_r101-d8_512x1024_40k_b8_cityscapes.py @@ -0,0 +1,5 @@ +_base_ = [ + '../_base_/models/ocrnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_r101-d8_512x1024_80k_b16_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_r101-d8_512x1024_80k_b16_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..33d96c76f68b92217ed38afe9538144dfedc4fd2 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/ocrnet/ocrnet_r101-d8_512x1024_80k_b16_cityscapes.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/ocrnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) +optimizer = dict(lr=0.02) +lr_config = dict(min_lr=2e-4) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/README.md new file mode 100644 index 0000000000000000000000000000000000000000..0dea3e31f8c1eb3da33251fa1e10227ae98561e3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/README.md @@ -0,0 +1,32 @@ +# PointRend: Image Segmentation as Rendering + +## Introduction + +[ALGORITHM] + +``` +@misc{alex2019pointrend, + title={PointRend: Image Segmentation as Rendering}, + author={Alexander Kirillov and Yuxin Wu and Kaiming He and Ross Girshick}, + year={2019}, + eprint={1912.08193}, + archivePrefix={arXiv}, + primaryClass={cs.CV} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|-----------|----------|-----------|--------:|---------:|----------------|------:|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PointRend | R-50 | 512x1024 | 80000 | 3.1 | 8.48 | 76.47 | 78.13 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/point_rend/pointrend_r50_512x1024_80k_cityscapes/pointrend_r50_512x1024_80k_cityscapes_20200711_015821-bb1ff523.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/point_rend/pointrend_r50_512x1024_80k_cityscapes/pointrend_r50_512x1024_80k_cityscapes-20200715_214714.log.json) | +| PointRend | R-101 | 512x1024 | 80000 | 4.2 | 7.00 | 78.30 | 79.97 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/point_rend/pointrend_r101_512x1024_80k_cityscapes/pointrend_r101_512x1024_80k_cityscapes_20200711_170850-d0ca84be.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/point_rend/pointrend_r101_512x1024_80k_cityscapes/pointrend_r101_512x1024_80k_cityscapes-20200715_214824.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|-----------|----------|-----------|--------:|---------:|----------------|------:|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PointRend | R-50 | 512x512 | 160000 | 5.1 | 17.31 | 37.64 | 39.17 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/point_rend/pointrend_r50_512x512_160k_ade20k/pointrend_r50_512x512_160k_ade20k_20200807_232644-ac3febf2.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/point_rend/pointrend_r50_512x512_160k_ade20k/pointrend_r50_512x512_160k_ade20k-20200807_232644.log.json) | +| PointRend | R-101 | 512x512 | 160000 | 6.1 | 15.50 | 40.02 | 41.60 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/point_rend/pointrend_r101_512x512_160k_ade20k/pointrend_r101_512x512_160k_ade20k_20200808_030852-8834902a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/point_rend/pointrend_r101_512x512_160k_ade20k/pointrend_r101_512x512_160k_ade20k-20200808_030852.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r101_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r101_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a8c14c8cf91d7cbcc05065a6dc387101dff8cdf6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r101_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './pointrend_r50_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r101_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r101_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..4d1f8c8154431b056fb8371772f03dfa49ac1ad3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r101_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './pointrend_r50_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r50_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r50_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..96cbaa48d61ee208117d074e9f06bf4218407d78 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r50_512x1024_80k_cityscapes.py @@ -0,0 +1,5 @@ +_base_ = [ + '../_base_/models/pointrend_r50.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +lr_config = dict(warmup='linear', warmup_iters=200) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r50_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r50_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..db8c634c0f889c69ce80f86c445c493dcfdbd3c8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/point_rend/pointrend_r50_512x512_160k_ade20k.py @@ -0,0 +1,32 @@ +_base_ = [ + '../_base_/models/pointrend_r50.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict(decode_head=[ + dict( + type='FPNHead', + in_channels=[256, 256, 256, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=-1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + dict( + type='PointHead', + in_channels=[256], + in_index=[0], + channels=256, + num_fcs=3, + coarse_pred_each_layer=True, + dropout_ratio=-1, + num_classes=150, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) +]) +lr_config = dict(warmup='linear', warmup_iters=200) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..fcb24103b8e2dd649c0ad8938319f201e3254d19 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/README.md @@ -0,0 +1,48 @@ +# PSANet: Point-wise Spatial Attention Network for Scene Parsing + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{zhao2018psanet, + title={Psanet: Point-wise spatial attention network for scene parsing}, + author={Zhao, Hengshuang and Zhang, Yi and Liu, Shu and Shi, Jianping and Change Loy, Chen and Lin, Dahua and Jia, Jiaya}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + pages={267--283}, + year={2018} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PSANet | R-50-D8 | 512x1024 | 40000 | 7 | 3.17 | 77.63 | 79.04 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x1024_40k_cityscapes/psanet_r50-d8_512x1024_40k_cityscapes_20200606_103117-99fac37c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x1024_40k_cityscapes/psanet_r50-d8_512x1024_40k_cityscapes_20200606_103117.log.json) | +| PSANet | R-101-D8 | 512x1024 | 40000 | 10.5 | 2.20 | 79.14 | 80.19 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x1024_40k_cityscapes/psanet_r101-d8_512x1024_40k_cityscapes_20200606_001418-27b9cfa7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x1024_40k_cityscapes/psanet_r101-d8_512x1024_40k_cityscapes_20200606_001418.log.json) | +| PSANet | R-50-D8 | 769x769 | 40000 | 7.9 | 1.40 | 77.99 | 79.64 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_769x769_40k_cityscapes/psanet_r50-d8_769x769_40k_cityscapes_20200530_033717-d5365506.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_769x769_40k_cityscapes/psanet_r50-d8_769x769_40k_cityscapes_20200530_033717.log.json) | +| PSANet | R-101-D8 | 769x769 | 40000 | 11.9 | 0.98 | 78.43 | 80.26 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_769x769_40k_cityscapes/psanet_r101-d8_769x769_40k_cityscapes_20200530_035107-997da1e6.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_769x769_40k_cityscapes/psanet_r101-d8_769x769_40k_cityscapes_20200530_035107.log.json) | +| PSANet | R-50-D8 | 512x1024 | 80000 | - | - | 77.24 | 78.69 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x1024_80k_cityscapes/psanet_r50-d8_512x1024_80k_cityscapes_20200606_161842-ab60a24f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x1024_80k_cityscapes/psanet_r50-d8_512x1024_80k_cityscapes_20200606_161842.log.json) | +| PSANet | R-101-D8 | 512x1024 | 80000 | - | - | 79.31 | 80.53 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x1024_80k_cityscapes/psanet_r101-d8_512x1024_80k_cityscapes_20200606_161823-0f73a169.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x1024_80k_cityscapes/psanet_r101-d8_512x1024_80k_cityscapes_20200606_161823.log.json) | +| PSANet | R-50-D8 | 769x769 | 80000 | - | - | 79.31 | 80.91 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_769x769_80k_cityscapes/psanet_r50-d8_769x769_80k_cityscapes_20200606_225134-fe42f49e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_769x769_80k_cityscapes/psanet_r50-d8_769x769_80k_cityscapes_20200606_225134.log.json) | +| PSANet | R-101-D8 | 769x769 | 80000 | - | - | 79.69 | 80.89 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_769x769_80k_cityscapes/psanet_r101-d8_769x769_80k_cityscapes_20200606_214550-7665827b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_769x769_80k_cityscapes/psanet_r101-d8_769x769_80k_cityscapes_20200606_214550.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PSANet | R-50-D8 | 512x512 | 80000 | 9 | 18.91 | 41.14 | 41.91 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x512_80k_ade20k/psanet_r50-d8_512x512_80k_ade20k_20200614_144141-835e4b97.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x512_80k_ade20k/psanet_r50-d8_512x512_80k_ade20k_20200614_144141.log.json) | +| PSANet | R-101-D8 | 512x512 | 80000 | 12.5 | 13.13 | 43.80 | 44.75 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x512_80k_ade20k/psanet_r101-d8_512x512_80k_ade20k_20200614_185117-1fab60d4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x512_80k_ade20k/psanet_r101-d8_512x512_80k_ade20k_20200614_185117.log.json) | +| PSANet | R-50-D8 | 512x512 | 160000 | - | - | 41.67 | 42.95 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x512_160k_ade20k/psanet_r50-d8_512x512_160k_ade20k_20200615_161258-148077dd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x512_160k_ade20k/psanet_r50-d8_512x512_160k_ade20k_20200615_161258.log.json) | +| PSANet | R-101-D8 | 512x512 | 160000 | - | - | 43.74 | 45.38 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x512_160k_ade20k/psanet_r101-d8_512x512_160k_ade20k_20200615_161537-dbfa564c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x512_160k_ade20k/psanet_r101-d8_512x512_160k_ade20k_20200615_161537.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PSANet | R-50-D8 | 512x512 | 20000 | 6.9 | 18.24 | 76.39 | 77.34 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x512_20k_voc12aug/psanet_r50-d8_512x512_20k_voc12aug_20200617_102413-2f1bbaa1.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x512_20k_voc12aug/psanet_r50-d8_512x512_20k_voc12aug_20200617_102413.log.json) | +| PSANet | R-101-D8 | 512x512 | 20000 | 10.4 | 12.63 | 77.91 | 79.30 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x512_20k_voc12aug/psanet_r101-d8_512x512_20k_voc12aug_20200617_110624-946fef11.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x512_20k_voc12aug/psanet_r101-d8_512x512_20k_voc12aug_20200617_110624.log.json) | +| PSANet | R-50-D8 | 512x512 | 40000 | - | - | 76.30 | 77.35 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x512_40k_voc12aug/psanet_r50-d8_512x512_40k_voc12aug_20200613_161946-f596afb5.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r50-d8_512x512_40k_voc12aug/psanet_r50-d8_512x512_40k_voc12aug_20200613_161946.log.json) | +| PSANet | R-101-D8 | 512x512 | 40000 | - | - | 77.73 | 79.05 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x512_40k_voc12aug/psanet_r101-d8_512x512_40k_voc12aug_20200613_161946-1f560f9e.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/psanet/psanet_r101-d8_512x512_40k_voc12aug/psanet_r101-d8_512x512_40k_voc12aug_20200613_161946.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..69d212f158552cf5a24f62174b24a9d4976477bb --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './psanet_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..bc25d6aaf67ccb7e9fcb44ba2d803bebfa31b160 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './psanet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..7f6795e5ef0e4bf1d10ee7ed4f608bf93ac24216 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './psanet_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..1a3c43495bbf9d302216d7ddf62df75446907a36 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './psanet_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..f62eef9773ddf41d996104de571bcda00c488e14 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './psanet_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..f8865a7c4d795d9de3f5bc6b762b305b3cabc22f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './psanet_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..ffc99f010903267fc7c1893f4a6b0dcd2cbe42e6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './psanet_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..6a9efc55ad2062facf3a568f8cdbba76c8c55950 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './psanet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..6671fcb4bf8430bc0128cd93a4b8cedea1856b03 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/psanet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a441013a4c1adc39fc064dbac23caaac9efdc4a6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/psanet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..9c6364eb43e2abc95011205b569627ff9367d0e5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/psanet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(mask_size=(66, 66), num_classes=150), + auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..af06cb66cc808c206d6946a4b2420a6942d3dc7e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/psanet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..803c42da35eda861bf32ce0e7866cdc9fad96d0d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/psanet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..0141a6d0925c2a2aa37517670a9f12ac7d3a02d4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/psanet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(mask_size=(66, 66), num_classes=150), + auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..690f8b5ef359be8a8be3a2d768aede24216a8706 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/psanet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..0966b4770cc649e95525c366b09801408b99567a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/psanet/psanet_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/psanet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..931cad900680281d6970626b511124704e954c43 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/README.md @@ -0,0 +1,62 @@ +# Pyramid Scene Parsing Network + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{zhao2017pspnet, + title={Pyramid Scene Parsing Network}, + author={Zhao, Hengshuang and Shi, Jianping and Qi, Xiaojuan and Wang, Xiaogang and Jia, Jiaya}, + booktitle={CVPR}, + year={2017} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PSPNet | R-50-D8 | 512x1024 | 40000 | 6.1 | 4.07 | 77.85 | 79.18 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338.log.json) | +| PSPNet | R-101-D8 | 512x1024 | 40000 | 9.6 | 2.68 | 78.34 | 79.74 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x1024_40k_cityscapes/pspnet_r101-d8_512x1024_40k_cityscapes_20200604_232751-467e7cf4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x1024_40k_cityscapes/pspnet_r101-d8_512x1024_40k_cityscapes_20200604_232751.log.json) | +| PSPNet | R-50-D8 | 769x769 | 40000 | 6.9 | 1.76 | 78.26 | 79.88 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_769x769_40k_cityscapes/pspnet_r50-d8_769x769_40k_cityscapes_20200606_112725-86638686.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_769x769_40k_cityscapes/pspnet_r50-d8_769x769_40k_cityscapes_20200606_112725.log.json) | +| PSPNet | R-101-D8 | 769x769 | 40000 | 10.9 | 1.15 | 79.08 | 80.28 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_769x769_40k_cityscapes/pspnet_r101-d8_769x769_40k_cityscapes_20200606_112753-61c6f5be.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_769x769_40k_cityscapes/pspnet_r101-d8_769x769_40k_cityscapes_20200606_112753.log.json) | +| PSPNet | R-18-D8 | 512x1024 | 80000 | 1.7 | 15.71 | 74.87 | 76.04 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r18-d8_512x1024_80k_cityscapes/pspnet_r18-d8_512x1024_80k_cityscapes_20201225_021458-09ffa746.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r18-d8_512x1024_80k_cityscapes/pspnet_r18-d8_512x1024_80k_cityscapes-20201225_021458.log.json) | +| PSPNet | R-50-D8 | 512x1024 | 80000 | - | - | 78.55 | 79.79 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes/pspnet_r50-d8_512x1024_80k_cityscapes_20200606_112131-2376f12b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes/pspnet_r50-d8_512x1024_80k_cityscapes_20200606_112131.log.json) | +| PSPNet | R-101-D8 | 512x1024 | 80000 | - | - | 79.76 | 81.01 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes/pspnet_r101-d8_512x1024_80k_cityscapes_20200606_112211-e1e1100f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes/pspnet_r101-d8_512x1024_80k_cityscapes_20200606_112211.log.json) | +| PSPNet | R-18-D8 | 769x769 | 80000 | 1.9 | 6.20 | 75.90 | 77.86 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r18-d8_769x769_80k_cityscapes/pspnet_r18-d8_769x769_80k_cityscapes_20201225_021458-3deefc62.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r18-d8_769x769_80k_cityscapes/pspnet_r18-d8_769x769_80k_cityscapes-20201225_021458.log.json) | +| PSPNet | R-50-D8 | 769x769 | 80000 | - | - | 79.59 | 80.69 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_769x769_80k_cityscapes/pspnet_r50-d8_769x769_80k_cityscapes_20200606_210121-5ccf03dd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_769x769_80k_cityscapes/pspnet_r50-d8_769x769_80k_cityscapes_20200606_210121.log.json) | +| PSPNet | R-101-D8 | 769x769 | 80000 | - | - | 79.77 | 81.06 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_769x769_80k_cityscapes/pspnet_r101-d8_769x769_80k_cityscapes_20200606_225055-dba412fa.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_769x769_80k_cityscapes/pspnet_r101-d8_769x769_80k_cityscapes_20200606_225055.log.json) | +| PSPNet | R-18b-D8 | 512x1024 | 80000 | 1.5 | 16.28 | 74.23 | 75.79 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r18b-d8_512x1024_80k_cityscapes/pspnet_r18b-d8_512x1024_80k_cityscapes_20201226_063116-26928a60.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r18b-d8_512x1024_80k_cityscapes/pspnet_r18b-d8_512x1024_80k_cityscapes-20201226_063116.log.json) | +| PSPNet | R-50b-D8 | 512x1024 | 80000 | 6.0 | 4.30 | 78.22 | 79.46 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50b-d8_512x1024_80k_cityscapes/pspnet_r50b-d8_512x1024_80k_cityscapes_20201225_094315-6344287a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50b-d8_512x1024_80k_cityscapes/pspnet_r50b-d8_512x1024_80k_cityscapes-20201225_094315.log.json) | +| PSPNet | R-101b-D8| 512x1024 | 80000 | 9.5 | 2.76 | 79.69 | 80.79 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101b-d8_512x1024_80k_cityscapes/pspnet_r101b-d8_512x1024_80k_cityscapes_20201226_170012-3a4d38ab.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101b-d8_512x1024_80k_cityscapes/pspnet_r101b-d8_512x1024_80k_cityscapes-20201226_170012.log.json) | +| PSPNet | R-18b-D8 | 769x769 | 80000 | 1.7 | 6.41 | 74.92 | 76.90 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r18b-d8_769x769_80k_cityscapes/pspnet_r18b-d8_769x769_80k_cityscapes_20201226_080942-bf98d186.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r18b-d8_769x769_80k_cityscapes/pspnet_r18b-d8_769x769_80k_cityscapes-20201226_080942.log.json) | +| PSPNet | R-50b-D8 | 769x769 | 80000 | 6.8 | 1.88 | 78.50 | 79.96 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50b-d8_769x769_80k_cityscapes/pspnet_r50b-d8_769x769_80k_cityscapes_20201225_094316-4c643cf6.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50b-d8_769x769_80k_cityscapes/pspnet_r50b-d8_769x769_80k_cityscapes-20201225_094316.log.json) | +| PSPNet | R-101b-D8| 769x769 | 80000 | 10.8 | 1.17 | 78.87 | 80.04 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101b-d8_769x769_80k_cityscapes/pspnet_r101b-d8_769x769_80k_cityscapes_20201226_171823-f0e7c293.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101b-d8_769x769_80k_cityscapes/pspnet_r101b-d8_769x769_80k_cityscapes-20201226_171823.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PSPNet | R-50-D8 | 512x512 | 80000 | 8.5 | 23.53 | 41.13 | 41.94 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x512_80k_ade20k/pspnet_r50-d8_512x512_80k_ade20k_20200615_014128-15a8b914.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x512_80k_ade20k/pspnet_r50-d8_512x512_80k_ade20k_20200615_014128.log.json) | +| PSPNet | R-101-D8 | 512x512 | 80000 | 12 | 15.30 | 43.57 | 44.35 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x512_80k_ade20k/pspnet_r101-d8_512x512_80k_ade20k_20200614_031423-b6e782f0.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x512_80k_ade20k/pspnet_r101-d8_512x512_80k_ade20k_20200614_031423.log.json) | +| PSPNet | R-50-D8 | 512x512 | 160000 | - | - | 42.48 | 43.44 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x512_160k_ade20k/pspnet_r50-d8_512x512_160k_ade20k_20200615_184358-1890b0bd.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x512_160k_ade20k/pspnet_r50-d8_512x512_160k_ade20k_20200615_184358.log.json) | +| PSPNet | R-101-D8 | 512x512 | 160000 | - | - | 44.39 | 45.35 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x512_160k_ade20k/pspnet_r101-d8_512x512_160k_ade20k_20200615_100650-967c316f.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x512_160k_ade20k/pspnet_r101-d8_512x512_160k_ade20k_20200615_100650.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PSPNet | R-50-D8 | 512x512 | 20000 | 6.1 | 23.59 | 76.78 | 77.61 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x512_20k_voc12aug/pspnet_r50-d8_512x512_20k_voc12aug_20200617_101958-ed5dfbd9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x512_20k_voc12aug/pspnet_r50-d8_512x512_20k_voc12aug_20200617_101958.log.json) | +| PSPNet | R-101-D8 | 512x512 | 20000 | 9.6 | 15.02 | 78.47 | 79.25 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x512_20k_voc12aug/pspnet_r101-d8_512x512_20k_voc12aug_20200617_102003-4aef3c9a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x512_20k_voc12aug/pspnet_r101-d8_512x512_20k_voc12aug_20200617_102003.log.json) | +| PSPNet | R-50-D8 | 512x512 | 40000 | - | - | 77.29 | 78.48 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x512_40k_voc12aug/pspnet_r50-d8_512x512_40k_voc12aug_20200613_161222-ae9c1b8c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r50-d8_512x512_40k_voc12aug/pspnet_r50-d8_512x512_40k_voc12aug_20200613_161222.log.json) | +| PSPNet | R-101-D8 | 512x512 | 40000 | - | - | 78.52 | 79.57 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x512_40k_voc12aug/pspnet_r101-d8_512x512_40k_voc12aug_20200613_161222-bc933b18.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_512x512_40k_voc12aug/pspnet_r101-d8_512x512_40k_voc12aug_20200613_161222.log.json) | + +### Pascal Context + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PSPNet | R-101-D8 | 480x480 | 40000 | 8.8 | 9.68 | 46.60 | 47.78 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_480x480_40k_pascal_context/pspnet_r101-d8_480x480_40k_pascal_context_20200911_211210-bf0f5d7c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_480x480_40k_pascal_context/pspnet_r101-d8_480x480_40k_pascal_context-20200911_211210.log.json) | +| PSPNet | R-101-D8 | 480x480 | 80000 | - | - | 46.03 | 47.15 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_480x480_80k_pascal_context/pspnet_r101-d8_480x480_80k_pascal_context_20200911_190530-c86d6233.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/pspnet/pspnet_r101-d8_480x480_80k_pascal_context/pspnet_r101-d8_480x480_80k_pascal_context-20200911_190530.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..0b5a990604a77238375cb6d2b8298a382a457dd6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_480x480_40k_pascal_context.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_480x480_40k_pascal_context.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..fda9110603d71e14cab6e537949be191f2adf6db --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_480x480_80k_pascal_context.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_480x480_80k_pascal_context.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..38fee11bc23d8c92c529acd0c02a68204e34ab91 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..9931a07bc2d137eb49b3fa4dad8f8681d4f5e943 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..6107b41544378ad371cee95ee5ebc2e98ccbd9ad --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..2221b202d6c53c4b04f2431d3344379cbfe06dd7 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..15f578b6002c481ada06befc3ea66accbbdd1f66 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..fb7c3d55d57b09296ea24889b218f9a0fb997463 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..c6e7e58508f31627766b8ab748bd81cd51c77eca --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..59b8c6dd5ef234334bcdfa3d5e3594b7a9989b17 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..ab8a3d3e3fcc12dd41223af190e2ae04f14d1cb8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = './pspnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet101', + backbone=dict(type='ResNet', depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..1a7cb708e551e90a12ad4267e2af6938c353f0ba --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r101b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = './pspnet_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet101', + backbone=dict(type='ResNet', depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d914f93c023a6384e0e856b8608280cef589d5c6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './pspnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet18_v1c', + backbone=dict(depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..5893e66a41cad73e8fb24aa58dc78ef002aecca5 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './pspnet_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnet18_v1c', + backbone=dict(depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..abeeedf84387d7846a8a2c10480b94c9d8405559 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './pspnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet18', + backbone=dict(type='ResNet', depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..284be6d09af1806b99bee5b85286b55ce02e8cbd --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r18b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = './pspnet_r50-d8_769x769_80k_cityscapes.py' +model = dict( + pretrained='torchvision://resnet18', + backbone=dict(type='ResNet', depth=18), + decode_head=dict( + in_channels=512, + channels=128, + ), + auxiliary_head=dict(in_channels=256, channels=64)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_480x480_40k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_480x480_40k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..86da94de5b32576f04240a2d02dfeccc0d6ddd45 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_480x480_40k_pascal_context.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', + '../_base_/datasets/pascal_context.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=60), auxiliary_head=dict(num_classes=60)) +test_cfg = dict(mode='slide', crop_size=(480, 480), stride=(320, 320)) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_480x480_80k_pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_480x480_80k_pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..cbb02714b9e252bab38b3f9d9095dabe570b9005 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_480x480_80k_pascal_context.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', + '../_base_/datasets/pascal_context.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=60), auxiliary_head=dict(num_classes=60)) +test_cfg = dict(mode='slide', crop_size=(480, 480), stride=(320, 320)) +optimizer = dict(type='SGD', lr=0.004, momentum=0.9, weight_decay=0.0001) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..5deb5872b00a30d5c18a980c4d6c1b0d915908b9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..4e9972849d6899fe43f435284d0e0b1bc3b0e7a9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..86584573a3d1afac73041b85516112ac21f1f17c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..cd88154d5e0be1a519e973331e0a14ae8a7de13e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..f0c20c12f6bcf04b732dccaa4bfdba10bd10b5e6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..52efdf51d7d66c3205c1448c45ae281649a0901e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..145cadb24016eeea87fccff8171c5b0dfb78f7ab --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..23a81eb7ef56a4cd8e7c9da65b86f3d0e562001a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50-d8_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/pspnet_r50-d8.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50b-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50b-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..946bf4fc84236942a4462c2daa7637cace4e90cf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50b-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_512x1024_80k_cityscapes.py' +model = dict(pretrained='torchvision://resnet50', backbone=dict(type='ResNet')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50b-d8_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50b-d8_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..b6087dcf9f7cc04e12a2b9bcbde7abc4a56e972e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/pspnet/pspnet_r50b-d8_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './pspnet_r50-d8_769x769_80k_cityscapes.py' +model = dict(pretrained='torchvision://resnet50', backbone=dict(type='ResNet')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/README.md new file mode 100644 index 0000000000000000000000000000000000000000..31bac01ec9f659d6c30f220a104c385326d3f04b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/README.md @@ -0,0 +1,34 @@ +# ResNeSt: Split-Attention Networks + +## Introduction + +[ALGORITHM] + +```latex +@article{zhang2020resnest, +title={ResNeSt: Split-Attention Networks}, +author={Zhang, Hang and Wu, Chongruo and Zhang, Zhongyue and Zhu, Yi and Zhang, Zhi and Lin, Haibin and Sun, Yue and He, Tong and Muller, Jonas and Manmatha, R. and Li, Mu and Smola, Alexander}, +journal={arXiv preprint arXiv:2004.08955}, +year={2020} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|----------|-----------|--------:|---------:|----------------|------:|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | S-101-D8 | 512x1024 | 80000 | 11.4 | 2.39 | 77.56 | 78.98 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/fcn_s101-d8_512x1024_80k_cityscapes/fcn_s101-d8_512x1024_80k_cityscapes_20200807_140631-f8d155b3.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/fcn_s101-d8_512x1024_80k_cityscapes/fcn_s101-d8_512x1024_80k_cityscapes-20200807_140631.log.json) | +| PSPNet | S-101-D8 | 512x1024 | 80000 | 11.8 | 2.52 | 78.57 | 79.19 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/pspnet_s101-d8_512x1024_80k_cityscapes/pspnet_s101-d8_512x1024_80k_cityscapes_20200807_140631-c75f3b99.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/pspnet_s101-d8_512x1024_80k_cityscapes/pspnet_s101-d8_512x1024_80k_cityscapes-20200807_140631.log.json) | +| DeepLabV3 | S-101-D8 | 512x1024 | 80000 | 11.9 | 1.88 | 79.67 | 80.51 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/deeplabv3_s101-d8_512x1024_80k_cityscapes/deeplabv3_s101-d8_512x1024_80k_cityscapes_20200807_144429-b73c4270.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/deeplabv3_s101-d8_512x1024_80k_cityscapes/deeplabv3_s101-d8_512x1024_80k_cityscapes-20200807_144429.log.json) | +| DeepLabV3+ | S-101-D8 | 512x1024 | 80000 | 13.2 | 2.36 | 79.62 | 80.27 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/deeplabv3plus_s101-d8_512x1024_80k_cityscapes/deeplabv3plus_s101-d8_512x1024_80k_cityscapes_20200807_144429-1239eb43.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/deeplabv3plus_s101-d8_512x1024_80k_cityscapes/deeplabv3plus_s101-d8_512x1024_80k_cityscapes-20200807_144429.log.json) | + +### ADE20k + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|------------|----------|-----------|--------:|---------:|----------------|------:|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FCN | S-101-D8 | 512x512 | 160000 | 14.2 | 12.86 | 45.62 | 46.16 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/fcn_s101-d8_512x512_160k_ade20k/fcn_s101-d8_512x512_160k_ade20k_20200807_145416-d3160329.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/fcn_s101-d8_512x512_160k_ade20k/fcn_s101-d8_512x512_160k_ade20k-20200807_145416.log.json) | +| PSPNet | S-101-D8 | 512x512 | 160000 | 14.2 | 13.02 | 45.44 | 46.28 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/pspnet_s101-d8_512x512_160k_ade20k/pspnet_s101-d8_512x512_160k_ade20k_20200807_145416-a6daa92a.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/pspnet_s101-d8_512x512_160k_ade20k/pspnet_s101-d8_512x512_160k_ade20k-20200807_145416.log.json) | +| DeepLabV3 | S-101-D8 | 512x512 | 160000 | 14.6 | 9.28 | 45.71 | 46.59 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/deeplabv3_s101-d8_512x512_160k_ade20k/deeplabv3_s101-d8_512x512_160k_ade20k_20200807_144503-17ecabe5.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/deeplabv3_s101-d8_512x512_160k_ade20k/deeplabv3_s101-d8_512x512_160k_ade20k-20200807_144503.log.json) | +| DeepLabV3+ | S-101-D8 | 512x512 | 160000 | 16.2 | 11.96 | 46.47 | 47.27 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/deeplabv3plus_s101-d8_512x512_160k_ade20k/deeplabv3plus_s101-d8_512x512_160k_ade20k_20200807_144503-27b26226.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/resnest/deeplabv3plus_s101-d8_512x512_160k_ade20k/deeplabv3plus_s101-d8_512x512_160k_ade20k-20200807_144503.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3_s101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3_s101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..f98398690eb3e1e77975d7fb94ea865424aa331b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3_s101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = '../deeplabv3/deeplabv3_r101-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnest101', + backbone=dict( + type='ResNeSt', + stem_channels=128, + radix=2, + reduction_factor=4, + avg_down_stride=True)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3_s101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3_s101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..e3924ad679cb3d7ba731322f9cdb67410baae59a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3_s101-d8_512x512_160k_ade20k.py @@ -0,0 +1,9 @@ +_base_ = '../deeplabv3/deeplabv3_r101-d8_512x512_160k_ade20k.py' +model = dict( + pretrained='open-mmlab://resnest101', + backbone=dict( + type='ResNeSt', + stem_channels=128, + radix=2, + reduction_factor=4, + avg_down_stride=True)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3plus_s101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3plus_s101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..69bef7238345cf6aabb126012af992602f910287 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3plus_s101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = '../deeplabv3plus/deeplabv3plus_r101-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnest101', + backbone=dict( + type='ResNeSt', + stem_channels=128, + radix=2, + reduction_factor=4, + avg_down_stride=True)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3plus_s101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3plus_s101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..d51bccb965dafc40d7859219d132dc9467740a1b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/deeplabv3plus_s101-d8_512x512_160k_ade20k.py @@ -0,0 +1,9 @@ +_base_ = '../deeplabv3plus/deeplabv3plus_r101-d8_512x512_160k_ade20k.py' +model = dict( + pretrained='open-mmlab://resnest101', + backbone=dict( + type='ResNeSt', + stem_channels=128, + radix=2, + reduction_factor=4, + avg_down_stride=True)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/fcn_s101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/fcn_s101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..33fa0252d8b4cc786f1297605c169ee6068195a4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/fcn_s101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = '../fcn/fcn_r101-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnest101', + backbone=dict( + type='ResNeSt', + stem_channels=128, + radix=2, + reduction_factor=4, + avg_down_stride=True)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/fcn_s101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/fcn_s101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..dcee8c280e833825f84b944c6db21e9a43125e06 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/fcn_s101-d8_512x512_160k_ade20k.py @@ -0,0 +1,9 @@ +_base_ = '../fcn/fcn_r101-d8_512x512_160k_ade20k.py' +model = dict( + pretrained='open-mmlab://resnest101', + backbone=dict( + type='ResNeSt', + stem_channels=128, + radix=2, + reduction_factor=4, + avg_down_stride=True)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/pspnet_s101-d8_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/pspnet_s101-d8_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..9737849cbd7470b03ef3fcb3b1225283370eb503 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/pspnet_s101-d8_512x1024_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = '../pspnet/pspnet_r101-d8_512x1024_80k_cityscapes.py' +model = dict( + pretrained='open-mmlab://resnest101', + backbone=dict( + type='ResNeSt', + stem_channels=128, + radix=2, + reduction_factor=4, + avg_down_stride=True)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/pspnet_s101-d8_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/pspnet_s101-d8_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..6a622eae963401e143004a62ff53071ddbf61c01 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/resnest/pspnet_s101-d8_512x512_160k_ade20k.py @@ -0,0 +1,9 @@ +_base_ = '../pspnet/pspnet_r101-d8_512x512_160k_ade20k.py' +model = dict( + pretrained='open-mmlab://resnest101', + backbone=dict( + type='ResNeSt', + stem_channels=128, + radix=2, + reduction_factor=4, + avg_down_stride=True)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/README.md new file mode 100644 index 0000000000000000000000000000000000000000..c73ade624817b61be2f226c6fae03d0b023c570a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/README.md @@ -0,0 +1,35 @@ +# Panoptic Feature Pyramid Networks + +## Introduction + +[ALGORITHM] + +```latex +@article{Kirillov_2019, + title={Panoptic Feature Pyramid Networks}, + ISBN={9781728132938}, + url={http://dx.doi.org/10.1109/CVPR.2019.00656}, + DOI={10.1109/cvpr.2019.00656}, + journal={2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, + publisher={IEEE}, + author={Kirillov, Alexander and Girshick, Ross and He, Kaiming and Dollar, Piotr}, + year={2019}, + month={Jun} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|---------:|----------------|------:|---------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FPN | R-50 | 512x1024 | 80000 | 2.8 | 13.54 | 74.52 | 76.08 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/sem_fpn/fpn_r50_512x1024_80k_cityscapes/fpn_r50_512x1024_80k_cityscapes_20200717_021437-94018a0d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/sem_fpn/fpn_r50_512x1024_80k_cityscapes/fpn_r50_512x1024_80k_cityscapes-20200717_021437.log.json) | +| FPN | R-101 | 512x1024 | 80000 | 3.9 | 10.29 | 75.80 | 77.40 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/sem_fpn/fpn_r101_512x1024_80k_cityscapes/fpn_r101_512x1024_80k_cityscapes_20200717_012416-c5800d4c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/sem_fpn/fpn_r101_512x1024_80k_cityscapes/fpn_r101_512x1024_80k_cityscapes-20200717_012416.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|--------|----------|-----------|--------:|---------:|----------------|------:|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| FPN | R-50 | 512x512 | 160000 | 4.9 | 55.77 | 37.49 | 39.09 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/sem_fpn/fpn_r50_512x512_160k_ade20k/fpn_r50_512x512_160k_ade20k_20200718_131734-5b5a6ab9.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/sem_fpn/fpn_r50_512x512_160k_ade20k/fpn_r50_512x512_160k_ade20k-20200718_131734.log.json) | +| FPN | R-101 | 512x512 | 160000 | 5.9 | 40.58 | 39.35 | 40.72 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/sem_fpn/fpn_r101_512x512_160k_ade20k/fpn_r101_512x512_160k_ade20k_20200718_131734-306b5004.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/sem_fpn/fpn_r101_512x512_160k_ade20k/fpn_r101_512x512_160k_ade20k-20200718_131734.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_r101_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_r101_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..d1d3c98e72ec64565f2f93e39ee3511a38469753 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_r101_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './fpn_r50_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_r18_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_r18_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..dd54a65db6ab93624e21affb6a1cc4600b652743 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_r18_512x512_80k_ade20k.py @@ -0,0 +1,4 @@ +_base_ = './fpn_r50_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet18_v1c', + backbone=dict(depth=18), + neck=dict(in_channels=[64, 128, 256, 512])) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_r50_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_r50_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..7a6456acc756745f56e62430cc56854d198695e6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_r50_512x512_80k_ade20k.py @@ -0,0 +1,18 @@ +_base_ = [ + '../_base_/models/fpn_r50.py', + '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py' +] +model = dict(decode_head=dict(num_classes=150)) + +gpu_factor = 2 #mmseg默认4卡训练 我这边8卡的话 lr*2, iter/2 +# optimizer +optimizer = dict(type='SGD', lr=0.01*gpu_factor, momentum=0.9, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=0.0, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=80000//gpu_factor) +checkpoint_config = dict(by_epoch=False, interval=8000//gpu_factor) +evaluation = dict(interval=8000, metric='mIoU') + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_x101324d_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_x101324d_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..86db4a27fb469a4d4060f511ac5d51fcbe048fe9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_x101324d_512x512_80k_ade20k.py @@ -0,0 +1,7 @@ +_base_ = './fpn_r50_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnext101_32x4d', + backbone=dict( + type='ResNeXt', + depth=101, + groups=32, + base_width=4)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_x101644d_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_x101644d_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..6e5a798eb67a7080f94653b979134ce5213f4a34 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/sem_fpn/fpn_x101644d_512x512_80k_ade20k.py @@ -0,0 +1,7 @@ +_base_ = './fpn_r50_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnext101_64x4d', + backbone=dict( + type='ResNeXt', + depth=101, + groups=64, + base_width=4)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..d815510a19ade68c4962f04b8dee2c317f1788ce --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/README.md @@ -0,0 +1,50 @@ +# U-Net: Convolutional Networks for Biomedical Image Segmentation + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{ronneberger2015u, + title={U-net: Convolutional networks for biomedical image segmentation}, + author={Ronneberger, Olaf and Fischer, Philipp and Brox, Thomas}, + booktitle={International Conference on Medical image computing and computer-assisted intervention}, + pages={234--241}, + year={2015}, + organization={Springer} +} +``` + +## Results and models + +### DRIVE + +| Backbone | Head | Image Size | Crop Size | Stride | Lr schd | Mem (GB) | Inf time (fps) | Dice | download | +|--------|----------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| UNet-S5-D16 | FCN | 584x565 | 64x64 | 42x42 | 40000 | 0.680 | - | 78.67 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/fcn_unet_s5-d16_64x64_40k_drive/fcn_unet_s5-d16_64x64_40k_drive_20201223_191051-26cee593.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/fcn_unet_s5-d16_64x64_40k_drive/fcn_unet_s5-d16_64x64_40k_drive-20201223_191051.log.json) | +| UNet-S5-D16 | PSPNet | 584x565 | 64x64 | 42x42 | 40000 | 0.599 | - | 78.62 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/pspnet_unet_s5-d16_64x64_40k_drive/pspnet_unet_s5-d16_64x64_40k_drive_20201227_181818-aac73387.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/pspnet_unet_s5-d16_64x64_40k_drive/pspnet_unet_s5-d16_64x64_40k_drive-20201227_181818.log.json) | +| UNet-S5-D16 | DeepLabV3 | 584x565 | 64x64 | 42x42 | 40000 | 0.596 | - | 78.69 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/deeplabv3_unet_s5-d16_64x64_40k_drive/deeplabv3_unet_s5-d16_64x64_40k_drive_20201226_094047-0671ff20.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/deeplabv3_unet_s5-d16_64x64_40k_drive/deeplabv3_unet_s5-d16_64x64_40k_drive-20201226_094047.log.json) | + +### STARE + +| Backbone | Head | Image Size | Crop Size | Stride | Lr schd | Mem (GB) | Inf time (fps) | Dice | download | +|--------|----------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| UNet-S5-D16 | FCN | 605x700 | 128x128 | 85x85 | 40000 | 0.968 | - | 81.02 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/fcn_unet_s5-d16_128x128_40k_stare/fcn_unet_s5-d16_128x128_40k_stare_20201223_191051-6ea7cfda.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/fcn_unet_s5-d16_128x128_40k_stare/fcn_unet_s5-d16_128x128_40k_stare-20201223_191051.log.json) | +| UNet-S5-D16 | PSPNet | 605x700 | 128x128 | 85x85 | 40000 | 0.982 | - | 81.22 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/pspnet_unet_s5-d16_128x128_40k_stare/pspnet_unet_s5-d16_128x128_40k_stare_20201227_181818-3c2923c4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/pspnet_unet_s5-d16_128x128_40k_stare/pspnet_unet_s5-d16_128x128_40k_stare-20201227_181818.log.json) | +| UNet-S5-D16 | DeepLabV3 | 605x700 | 128x128 | 85x85 | 40000 | 0.999 | - | 80.93 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/deeplabv3_unet_s5-d16_128x128_40k_stare/deeplabv3_unet_s5-d16_128x128_40k_stare_20201226_094047-93dcb93c.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/deeplabv3_unet_s5-d16_128x128_40k_stare/deeplabv3_unet_s5-d16_128x128_40k_stare-20201226_094047.log.json) | + +### CHASE_DB1 + +| Backbone | Head | Image Size | Crop Size | Stride | Lr schd | Mem (GB) | Inf time (fps) | Dice | download | +|--------|----------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| UNet-S5-D16 | FCN | 960x999 | 128x128 | 85x85 | 40000 | 0.968 | - | 80.24 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/fcn_unet_s5-d16_128x128_40k_chase_db1/fcn_unet_s5-d16_128x128_40k_chase_db1_20201223_191051-95852f45.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/fcn_unet_s5-d16_128x128_40k_chase_db1/fcn_unet_s5-d16_128x128_40k_chase_db1-20201223_191051.log.json) | +| UNet-S5-D16 | PSPNet | 960x999 | 128x128 | 85x85 | 40000 | 0.982 | - | 80.36 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/pspnet_unet_s5-d16_128x128_40k_chase_db1/pspnet_unet_s5-d16_128x128_40k_chase_db1_20201227_181818-68d4e609.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/pspnet_unet_s5-d16_128x128_40k_chase_db1/pspnet_unet_s5-d16_128x128_40k_chase_db1-20201227_181818.log.json) | +| UNet-S5-D16 | DeepLabV3 | 960x999 | 128x128 | 85x85 | 40000 | 0.999 | - | 80.47 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/deeplabv3_unet_s5-d16_128x128_40k_chase_db1/deeplabv3_unet_s5-d16_128x128_40k_chase_db1_20201226_094047-4c5aefa3.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/deeplabv3_unet_s5-d16_128x128_40k_chase_db1/deeplabv3_unet_s5-d16_128x128_40k_chase_db1-20201226_094047.log.json) | + +### HRF + +| Backbone | Head | Image Size | Crop Size | Stride | Lr schd | Mem (GB) | Inf time (fps) | Dice | download | +|--------|----------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| UNet-S5-D16 | FCN | 2336x3504 | 256x256 | 170x170 | 40000 | 2.525 | - | 79.45 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/fcn_unet_s5-d16_256x256_40k_hrf/fcn_unet_s5-d16_256x256_40k_hrf_20201223_173724-df3ec8c4.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/fcn_unet_s5-d16_256x256_40k_hrf/fcn_unet_s5-d16_256x256_40k_hrf-20201223_173724.log.json) | +| UNet-S5-D16 | PSPNet | 2336x3504 | 256x256 | 170x170 | 40000 | 2.588 | - | 80.07 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/pspnet_unet_s5-d16_256x256_40k_hrf/pspnet_unet_s5-d16_256x256_40k_hrf_20201227_181818-fdb7e29b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/pspnet_unet_s5-d16_256x256_40k_hrf/pspnet_unet_s5-d16_256x256_40k_hrf-20201227_181818.log.json) | +| UNet-S5-D16 | DeepLabV3 | 2336x3504 | 256x256 | 170x170 | 40000 | 2.604 | - | 80.21 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/unet/deeplabv3_unet_s5-d16_256x256_40k_hrf/deeplabv3_unet_s5-d16_256x256_40k_hrf_20201226_094047-3a1fdf85.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/unet/deeplabv3_unet_s5-d16_256x256_40k_hrf/deeplabv3_unet_s5-d16_256x256_40k_hrf-20201226_094047.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_128x128_40k_chase_db1.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_128x128_40k_chase_db1.py new file mode 100644 index 0000000000000000000000000000000000000000..c706cf3548e311a7930e5b58299e05af30c43d98 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_128x128_40k_chase_db1.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/deeplabv3_unet_s5-d16.py', + '../_base_/datasets/chase_db1.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(128, 128), stride=(85, 85))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_128x128_40k_stare.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_128x128_40k_stare.py new file mode 100644 index 0000000000000000000000000000000000000000..0ef02dcc491871f148b1ad038d281d250eb6e2f4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_128x128_40k_stare.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/deeplabv3_unet_s5-d16.py', '../_base_/datasets/stare.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(128, 128), stride=(85, 85))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_256x256_40k_hrf.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_256x256_40k_hrf.py new file mode 100644 index 0000000000000000000000000000000000000000..118428bc44d3078517e231399b131db492f2bc7e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_256x256_40k_hrf.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/deeplabv3_unet_s5-d16.py', '../_base_/datasets/hrf.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(256, 256), stride=(170, 170))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_64x64_40k_drive.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_64x64_40k_drive.py new file mode 100644 index 0000000000000000000000000000000000000000..1f8862a0e89243d67634f37c3aca94ca98feff5c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/deeplabv3_unet_s5-d16_64x64_40k_drive.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/deeplabv3_unet_s5-d16.py', '../_base_/datasets/drive.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(64, 64), stride=(42, 42))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_128x128_40k_chase_db1.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_128x128_40k_chase_db1.py new file mode 100644 index 0000000000000000000000000000000000000000..2bc52d96293f214adf1e3e1878746ed8bd2434f6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_128x128_40k_chase_db1.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/fcn_unet_s5-d16.py', '../_base_/datasets/chase_db1.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(128, 128), stride=(85, 85))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_128x128_40k_stare.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_128x128_40k_stare.py new file mode 100644 index 0000000000000000000000000000000000000000..5d836c61dfd568dd4d29d876980001067dcaa200 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_128x128_40k_stare.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/fcn_unet_s5-d16.py', '../_base_/datasets/stare.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(128, 128), stride=(85, 85))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_256x256_40k_hrf.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_256x256_40k_hrf.py new file mode 100644 index 0000000000000000000000000000000000000000..be8eec77792f4eb16475dc5ab8607fb5682f0acf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_256x256_40k_hrf.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/fcn_unet_s5-d16.py', '../_base_/datasets/hrf.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(256, 256), stride=(170, 170))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_64x64_40k_drive.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_64x64_40k_drive.py new file mode 100644 index 0000000000000000000000000000000000000000..80483ade4a4bc3dc5cb8805e8b74c100e872da0c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/fcn_unet_s5-d16_64x64_40k_drive.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/fcn_unet_s5-d16.py', '../_base_/datasets/drive.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(64, 64), stride=(42, 42))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_128x128_40k_chase_db1.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_128x128_40k_chase_db1.py new file mode 100644 index 0000000000000000000000000000000000000000..b085a17d6bab5f4d33668bfcf232e30f2a9830fe --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_128x128_40k_chase_db1.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/pspnet_unet_s5-d16.py', + '../_base_/datasets/chase_db1.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(128, 128), stride=(85, 85))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_128x128_40k_stare.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_128x128_40k_stare.py new file mode 100644 index 0000000000000000000000000000000000000000..9d729cea699e1c845549c74b52703c9ee3273662 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_128x128_40k_stare.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/pspnet_unet_s5-d16.py', '../_base_/datasets/stare.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(128, 128), stride=(85, 85))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_256x256_40k_hrf.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_256x256_40k_hrf.py new file mode 100644 index 0000000000000000000000000000000000000000..f57c9166b67a18fd74f474754b3baec6584b17cf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_256x256_40k_hrf.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/pspnet_unet_s5-d16.py', '../_base_/datasets/hrf.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(256, 256), stride=(170, 170))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_64x64_40k_drive.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_64x64_40k_drive.py new file mode 100644 index 0000000000000000000000000000000000000000..7b5421ad6877e4b35b0a6ae6e516e577404547ce --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/unet/pspnet_unet_s5-d16_64x64_40k_drive.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/pspnet_unet_s5-d16.py', '../_base_/datasets/drive.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] +model = dict(test_cfg=dict(crop_size=(64, 64), stride=(42, 42))) +evaluation = dict(metric='mDice') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/README.md new file mode 100644 index 0000000000000000000000000000000000000000..4d53a92f9bdb67ca9e4c3974ee368ca49d84619c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/README.md @@ -0,0 +1,48 @@ +# Unified Perceptual Parsing for Scene Understanding + +## Introduction + +[ALGORITHM] + +```latex +@inproceedings{xiao2018unified, + title={Unified perceptual parsing for scene understanding}, + author={Xiao, Tete and Liu, Yingcheng and Zhou, Bolei and Jiang, Yuning and Sun, Jian}, + booktitle={Proceedings of the European Conference on Computer Vision (ECCV)}, + pages={418--434}, + year={2018} +} +``` + +## Results and models + +### Cityscapes + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|---------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| UPerNet | R-50 | 512x1024 | 40000 | 6.4 | 4.25 | 77.10 | 78.37 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x1024_40k_cityscapes/upernet_r50_512x1024_40k_cityscapes_20200605_094827-aa54cb54.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x1024_40k_cityscapes/upernet_r50_512x1024_40k_cityscapes_20200605_094827.log.json) | +| UPerNet | R-101 | 512x1024 | 40000 | 7.4 | 3.79 | 78.69 | 80.11 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x1024_40k_cityscapes/upernet_r101_512x1024_40k_cityscapes_20200605_094933-ebce3b10.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x1024_40k_cityscapes/upernet_r101_512x1024_40k_cityscapes_20200605_094933.log.json) | +| UPerNet | R-50 | 769x769 | 40000 | 7.2 | 1.76 | 77.98 | 79.70 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_769x769_40k_cityscapes/upernet_r50_769x769_40k_cityscapes_20200530_033048-92d21539.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_769x769_40k_cityscapes/upernet_r50_769x769_40k_cityscapes_20200530_033048.log.json) | +| UPerNet | R-101 | 769x769 | 40000 | 8.4 | 1.56 | 79.03 | 80.77 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_769x769_40k_cityscapes/upernet_r101_769x769_40k_cityscapes_20200530_040819-83c95d01.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_769x769_40k_cityscapes/upernet_r101_769x769_40k_cityscapes_20200530_040819.log.json) | +| UPerNet | R-50 | 512x1024 | 80000 | - | - | 78.19 | 79.19 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x1024_80k_cityscapes/upernet_r50_512x1024_80k_cityscapes_20200607_052207-848beca8.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x1024_80k_cityscapes/upernet_r50_512x1024_80k_cityscapes_20200607_052207.log.json) | +| UPerNet | R-101 | 512x1024 | 80000 | - | - | 79.40 | 80.46 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x1024_80k_cityscapes/upernet_r101_512x1024_80k_cityscapes_20200607_002403-f05f2345.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x1024_80k_cityscapes/upernet_r101_512x1024_80k_cityscapes_20200607_002403.log.json) | +| UPerNet | R-50 | 769x769 | 80000 | - | - | 79.39 | 80.92 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_769x769_80k_cityscapes/upernet_r50_769x769_80k_cityscapes_20200607_005107-82ae7d15.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_769x769_80k_cityscapes/upernet_r50_769x769_80k_cityscapes_20200607_005107.log.json) | +| UPerNet | R-101 | 769x769 | 80000 | - | - | 80.10 | 81.49 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_769x769_80k_cityscapes/upernet_r101_769x769_80k_cityscapes_20200607_001014-082fc334.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_769x769_80k_cityscapes/upernet_r101_769x769_80k_cityscapes_20200607_001014.log.json) | + +### ADE20K + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|---------|----------|-----------|--------:|----------|----------------|------:|--------------:|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| UPerNet | R-50 | 512x512 | 80000 | 8.1 | 23.40 | 40.70 | 41.81 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x512_80k_ade20k/upernet_r50_512x512_80k_ade20k_20200614_144127-ecc8377b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x512_80k_ade20k/upernet_r50_512x512_80k_ade20k_20200614_144127.log.json) | +| UPerNet | R-101 | 512x512 | 80000 | 9.1 | 20.34 | 42.91 | 43.96 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x512_80k_ade20k/upernet_r101_512x512_80k_ade20k_20200614_185117-32e4db94.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x512_80k_ade20k/upernet_r101_512x512_80k_ade20k_20200614_185117.log.json) | +| UPerNet | R-50 | 512x512 | 160000 | - | - | 42.05 | 42.78 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x512_160k_ade20k/upernet_r50_512x512_160k_ade20k_20200615_184328-8534de8d.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x512_160k_ade20k/upernet_r50_512x512_160k_ade20k_20200615_184328.log.json) | +| UPerNet | R-101 | 512x512 | 160000 | - | - | 43.82 | 44.85 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x512_160k_ade20k/upernet_r101_512x512_160k_ade20k_20200615_161951-91b32684.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x512_160k_ade20k/upernet_r101_512x512_160k_ade20k_20200615_161951.log.json) | + +### Pascal VOC 2012 + Aug + +| Method | Backbone | Crop Size | Lr schd | Mem (GB) | Inf time (fps) | mIoU | mIoU(ms+flip) | download | +|---------|----------|-----------|--------:|----------|----------------|------:|--------------:|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| UPerNet | R-50 | 512x512 | 20000 | 6.4 | 23.17 | 74.82 | 76.35 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x512_20k_voc12aug/upernet_r50_512x512_20k_voc12aug_20200617_165330-5b5890a7.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x512_20k_voc12aug/upernet_r50_512x512_20k_voc12aug_20200617_165330.log.json) | +| UPerNet | R-101 | 512x512 | 20000 | 7.5 | 19.98 | 77.10 | 78.29 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x512_20k_voc12aug/upernet_r101_512x512_20k_voc12aug_20200617_165629-f14e7f27.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x512_20k_voc12aug/upernet_r101_512x512_20k_voc12aug_20200617_165629.log.json) | +| UPerNet | R-50 | 512x512 | 40000 | - | - | 75.92 | 77.44 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x512_40k_voc12aug/upernet_r50_512x512_40k_voc12aug_20200613_162257-ca9bcc6b.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r50_512x512_40k_voc12aug/upernet_r50_512x512_40k_voc12aug_20200613_162257.log.json) | +| UPerNet | R-101 | 512x512 | 40000 | - | - | 77.43 | 78.56 | [model](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x512_40k_voc12aug/upernet_r101_512x512_40k_voc12aug_20200613_163549-e26476ac.pth) | [log](https://download.openmmlab.com/mmsegmentation/v0.5/upernet/upernet_r101_512x512_40k_voc12aug/upernet_r101_512x512_40k_voc12aug_20200613_163549.log.json) | diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..b90b597d831a664761d6051397d2b1862feb59c6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x1024_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './upernet_r50_512x1024_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..420ca2e42836099213c1f91cb925088cfe7c1269 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x1024_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './upernet_r50_512x1024_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..146f13eb79053cc69d4934d294aad9ba723b2577 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_160k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './upernet_r50_512x512_160k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..56345d1806482ac822d709893fe6942f44be6f74 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_20k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './upernet_r50_512x512_20k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..0669b741b9b3e3e1a309147b920d3d2a1952ab75 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_40k_voc12aug.py @@ -0,0 +1,2 @@ +_base_ = './upernet_r50_512x512_40k_voc12aug.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..abfb9c5d9f35407d590cdc3325006b396ec52820 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_512x512_80k_ade20k.py @@ -0,0 +1,2 @@ +_base_ = './upernet_r50_512x512_80k_ade20k.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..e5f3a3fae18cb769fd04b0c669785c5728cf479f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_769x769_40k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './upernet_r50_769x769_40k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..a709165657d257df4fc76148d225261c63f88d8a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r101_769x769_80k_cityscapes.py @@ -0,0 +1,2 @@ +_base_ = './upernet_r50_769x769_80k_cityscapes.py' +model = dict(pretrained='open-mmlab://resnet101_v1c', backbone=dict(depth=101)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x1024_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x1024_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..d621e89ce62c06424db7c2e5f5fd00a0a2e85a61 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x1024_40k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/upernet_r50.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_40k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x1024_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x1024_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..95fffcc76c2ff4f61f8dd80a00d35b7875262a50 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x1024_80k_cityscapes.py @@ -0,0 +1,4 @@ +_base_ = [ + '../_base_/models/upernet_r50.py', '../_base_/datasets/cityscapes.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_160k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_160k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..f5dd9aa4ed59d4939bcb49ffe129a9935e303201 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_160k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/upernet_r50.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_160k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_20k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_20k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..95f5c09567144db47e07fc802b114bedd6a00725 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_20k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/upernet_r50.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_20k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_40k_voc12aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_40k_voc12aug.py new file mode 100644 index 0000000000000000000000000000000000000000..9621fd1f5c24e582b4a1eda18fcc0a13d2bcb953 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_40k_voc12aug.py @@ -0,0 +1,7 @@ +_base_ = [ + '../_base_/models/upernet_r50.py', + '../_base_/datasets/pascal_voc12_aug.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(num_classes=21), auxiliary_head=dict(num_classes=21)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_80k_ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_80k_ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..f561e309e3bddb439c90af930c4de5a0c7e209a7 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_512x512_80k_ade20k.py @@ -0,0 +1,6 @@ +_base_ = [ + '../_base_/models/upernet_r50.py', '../_base_/datasets/ade20k.py', + '../_base_/default_runtime.py', '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(num_classes=150), auxiliary_head=dict(num_classes=150)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_769x769_40k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_769x769_40k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..89b18aa2840d12e67339ce0b7a0561fa2ba0c6fa --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_769x769_40k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/upernet_r50.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_40k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_769x769_80k_cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_769x769_80k_cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..29af98f2ebe341998fcf93f8a5c018cabcc0c0ba --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/configs/upernet/upernet_r50_769x769_80k_cityscapes.py @@ -0,0 +1,9 @@ +_base_ = [ + '../_base_/models/upernet_r50.py', + '../_base_/datasets/cityscapes_769x769.py', '../_base_/default_runtime.py', + '../_base_/schedules/schedule_80k.py' +] +model = dict( + decode_head=dict(align_corners=True), + auxiliary_head=dict(align_corners=True), + test_cfg=dict(mode='slide', crop_size=(769, 769), stride=(513, 513))) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/demo/image_demo.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/demo/image_demo.py new file mode 100644 index 0000000000000000000000000000000000000000..183f23871b7ff4607066b4e29727313e575b14e4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/demo/image_demo.py @@ -0,0 +1,29 @@ +from argparse import ArgumentParser + +from mmseg.apis import inference_segmentor, init_segmentor, show_result_pyplot +from mmseg.core.evaluation import get_palette + + +def main(): + parser = ArgumentParser() + parser.add_argument('img', help='Image file') + parser.add_argument('config', help='Config file') + parser.add_argument('checkpoint', help='Checkpoint file') + parser.add_argument( + '--device', default='cuda:0', help='Device used for inference') + parser.add_argument( + '--palette', + default='cityscapes', + help='Color palette used for segmentation map') + args = parser.parse_args() + + # build the model from a config file and a checkpoint file + model = init_segmentor(args.config, args.checkpoint, device=args.device) + # test a single image + result = inference_segmentor(model, args.img) + # show the results + show_result_pyplot(model, args.img, result, get_palette(args.palette)) + + +if __name__ == '__main__': + main() diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docker/Dockerfile b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docker/Dockerfile new file mode 100644 index 0000000000000000000000000000000000000000..8e090f73a9e5b8aa09eee256e7876c8b4401f055 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docker/Dockerfile @@ -0,0 +1,22 @@ +ARG PYTORCH="1.6.0" +ARG CUDA="10.1" +ARG CUDNN="7" + +FROM pytorch/pytorch:${PYTORCH}-cuda${CUDA}-cudnn${CUDNN}-devel + +ENV TORCH_CUDA_ARCH_LIST="6.0 6.1 7.0+PTX" +ENV TORCH_NVCC_FLAGS="-Xfatbin -compress-all" +ENV CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" + +RUN apt-get update && apt-get install -y git ninja-build libglib2.0-0 libsm6 libxrender-dev libxext6 \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/* + +# Install mmsegmentation +RUN conda clean --all + +RUN pip install mmcv-full==latest+torch1.6.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html +RUN git clone https://github.com/open-mmlab/mmsegmenation.git /mmsegmentation +WORKDIR /mmsegmentation +RUN pip install -r requirements/build.txt +RUN pip install --no-cache-dir -e . diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/Makefile b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/Makefile new file mode 100644 index 0000000000000000000000000000000000000000..d4bb2cbb9eddb1bb1b4f366623044af8e4830919 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/Makefile @@ -0,0 +1,20 @@ +# Minimal makefile for Sphinx documentation +# + +# You can set these variables from the command line, and also +# from the environment for the first two. +SPHINXOPTS ?= +SPHINXBUILD ?= sphinx-build +SOURCEDIR = . +BUILDDIR = _build + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/changelog.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/changelog.md new file mode 100644 index 0000000000000000000000000000000000000000..faf1df3d217d728ea84f8c4685722018ae523a72 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/changelog.md @@ -0,0 +1,157 @@ +## Changelog + +### V0.11 (02/02/2021) + +**Highlights** + +- Support memory efficient test, add more UNet models. + +**Bug Fixes** + +- Fixed TTA resize scale ([#334](https://github.com/open-mmlab/mmsegmentation/pull/334)) +- Fixed CI for pip 20.3 ([#307](https://github.com/open-mmlab/mmsegmentation/pull/307)) +- Fixed ADE20k test ([#359](https://github.com/open-mmlab/mmsegmentation/pull/359)) + +**New Features** + +- Support memory efficient test ([#330](https://github.com/open-mmlab/mmsegmentation/pull/330)) +- Add more UNet benchmarks ([#324](https://github.com/open-mmlab/mmsegmentation/pull/324)) +- Support Lovasz Loss ([#351](https://github.com/open-mmlab/mmsegmentation/pull/351)) + +**Improvements** + +- Move train_cfg/test_cfg inside model ([#341](https://github.com/open-mmlab/mmsegmentation/pull/341)) + +### V0.10 (01/01/2021) + +**Highlights** + +- Support MobileNetV3, DMNet, APCNet. Add models of ResNet18V1b, ResNet18V1c, ResNet50V1b. + +**Bug Fixes** + +- Fixed CPU TTA ([#276](https://github.com/open-mmlab/mmsegmentation/pull/276)) +- Fixed CI for pip 20.3 ([#307](https://github.com/open-mmlab/mmsegmentation/pull/307)) + +**New Features** + +- Add ResNet18V1b, ResNet18V1c, ResNet50V1b, ResNet101V1b models ([#316](https://github.com/open-mmlab/mmsegmentation/pull/316)) +- Support MobileNetV3 ([#268](https://github.com/open-mmlab/mmsegmentation/pull/268)) +- Add 4 retinal vessel segmentation benchmark ([#315](https://github.com/open-mmlab/mmsegmentation/pull/315)) +- Support DMNet ([#313](https://github.com/open-mmlab/mmsegmentation/pull/313)) +- Support APCNet ([#299](https://github.com/open-mmlab/mmsegmentation/pull/299)) + +**Improvements** + +- Refactor Documentation page ([#311](https://github.com/open-mmlab/mmsegmentation/pull/311)) +- Support resize data augmentation according to original image size ([#291](https://github.com/open-mmlab/mmsegmentation/pull/291)) + +### V0.9 (30/11/2020) + +**Highlights** + +- Support 4 medical dataset, UNet and CGNet. + +**New Features** + +- Support RandomRotate transform ([#215](https://github.com/open-mmlab/mmsegmentation/pull/215), [#260](https://github.com/open-mmlab/mmsegmentation/pull/260)) +- Support RGB2Gray transform ([#227](https://github.com/open-mmlab/mmsegmentation/pull/227)) +- Support Rerange transform ([#228](https://github.com/open-mmlab/mmsegmentation/pull/228)) +- Support ignore_index for BCE loss ([#210](https://github.com/open-mmlab/mmsegmentation/pull/210)) +- Add modelzoo statistics ([#263](https://github.com/open-mmlab/mmsegmentation/pull/263)) +- Support Dice evaluation metric ([#225](https://github.com/open-mmlab/mmsegmentation/pull/225)) +- Support Adjust Gamma transform ([#232](https://github.com/open-mmlab/mmsegmentation/pull/232)) +- Support CLAHE transform ([#229](https://github.com/open-mmlab/mmsegmentation/pull/229)) + +**Bug Fixes** + +- Fixed detail API link ([#267](https://github.com/open-mmlab/mmsegmentation/pull/267)) + +### V0.8 (03/11/2020) + +**Highlights** + +- Support 4 medical dataset, UNet and CGNet. + +**New Features** + +- Support customize runner ([#118](https://github.com/open-mmlab/mmsegmentation/pull/118)) +- Support UNet ([#161](https://github.com/open-mmlab/mmsegmentation/pull/162)) +- Support CHASE_DB1, DRIVE, STARE, HRD ([#203](https://github.com/open-mmlab/mmsegmentation/pull/203)) +- Support CGNet ([#223](https://github.com/open-mmlab/mmsegmentation/pull/223)) + +### V0.7 (07/10/2020) + +**Highlights** + +- Support Pascal Context dataset and customizing class dataset. + +**Bug Fixes** + +- Fixed CPU inference ([#153](https://github.com/open-mmlab/mmsegmentation/pull/153)) + +**New Features** + +- Add DeepLab OS16 models ([#154](https://github.com/open-mmlab/mmsegmentation/pull/154)) +- Support Pascal Context dataset ([#133](https://github.com/open-mmlab/mmsegmentation/pull/133)) +- Support customizing dataset classes ([#71](https://github.com/open-mmlab/mmsegmentation/pull/71)) +- Support customizing dataset palette ([#157](https://github.com/open-mmlab/mmsegmentation/pull/157)) + +**Improvements** + +- Support 4D tensor output in ONNX ([#150](https://github.com/open-mmlab/mmsegmentation/pull/150)) +- Remove redundancies in ONNX export ([#160](https://github.com/open-mmlab/mmsegmentation/pull/160)) +- Migrate to MMCV DepthwiseSeparableConv ([#158](https://github.com/open-mmlab/mmsegmentation/pull/158)) +- Migrate to MMCV collect_env ([#137](https://github.com/open-mmlab/mmsegmentation/pull/137)) +- Use img_prefix and seg_prefix for loading ([#153](https://github.com/open-mmlab/mmsegmentation/pull/153)) + +### V0.6 (10/09/2020) + +**Highlights** + +- Support new methods i.e. MobileNetV2, EMANet, DNL, PointRend, Semantic FPN, Fast-SCNN, ResNeSt. + +**Bug Fixes** + +- Fixed sliding inference ONNX export ([#90](https://github.com/open-mmlab/mmsegmentation/pull/90)) + +**New Features** + +- Support MobileNet v2 ([#86](https://github.com/open-mmlab/mmsegmentation/pull/86)) +- Support EMANet ([#34](https://github.com/open-mmlab/mmsegmentation/pull/34)) +- Support DNL ([#37](https://github.com/open-mmlab/mmsegmentation/pull/37)) +- Support PointRend ([#109](https://github.com/open-mmlab/mmsegmentation/pull/109)) +- Support Semantic FPN ([#94](https://github.com/open-mmlab/mmsegmentation/pull/94)) +- Support Fast-SCNN ([#58](https://github.com/open-mmlab/mmsegmentation/pull/58)) +- Support ResNeSt backbone ([#47](https://github.com/open-mmlab/mmsegmentation/pull/47)) +- Support ONNX export (experimental) ([#12](https://github.com/open-mmlab/mmsegmentation/pull/12)) + +**Improvements** + +- Support Upsample in ONNX ([#100](https://github.com/open-mmlab/mmsegmentation/pull/100)) +- Support Windows install (experimental) ([#75](https://github.com/open-mmlab/mmsegmentation/pull/75)) +- Add more OCRNet results ([#20](https://github.com/open-mmlab/mmsegmentation/pull/20)) +- Add PyTorch 1.6 CI ([#64](https://github.com/open-mmlab/mmsegmentation/pull/64)) +- Get version and githash automatically ([#55](https://github.com/open-mmlab/mmsegmentation/pull/55)) + +### v0.5.1 (11/08/2020) + +**Highlights** + +- Support FP16 and more generalized OHEM + +**Bug Fixes** + +- Fixed Pascal VOC conversion script (#19) +- Fixed OHEM weight assign bug (#54) +- Fixed palette type when palette is not given (#27) + +**New Features** + +- Support FP16 (#21) +- Generalized OHEM (#54) + +**Improvements** + +- Add load-from flag (#33) +- Fixed training tricks doc about different learning rates of model (#26) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/conf.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/conf.py new file mode 100644 index 0000000000000000000000000000000000000000..f472acb30abdbcf5191926a8d89f478c1210744c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/conf.py @@ -0,0 +1,88 @@ +# Configuration file for the Sphinx documentation builder. +# +# This file only contains a selection of the most common options. For a full +# list see the documentation: +# https://www.sphinx-doc.org/en/master/usage/configuration.html + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import subprocess +import sys + +sys.path.insert(0, os.path.abspath('..')) + +# -- Project information ----------------------------------------------------- + +project = 'MMSegmentation' +copyright = '2020-2020, OpenMMLab' +author = 'MMSegmentation Authors' +version_file = '../mmseg/version.py' + + +def get_version(): + with open(version_file, 'r') as f: + exec(compile(f.read(), version_file, 'exec')) + return locals()['__version__'] + + +# The full version, including alpha/beta/rc tags +release = get_version() + +# -- General configuration --------------------------------------------------- + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.autodoc', + 'sphinx.ext.napoleon', + 'sphinx.ext.viewcode', + 'recommonmark', + 'sphinx_markdown_tables', +] + +autodoc_mock_imports = ['matplotlib', 'pycocotools', 'mmseg.version'] + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# The suffix(es) of source filenames. +# You can specify multiple suffix as a list of string: +# +source_suffix = { + '.rst': 'restructuredtext', + '.md': 'markdown', +} + +# The master toctree document. +master_doc = 'index' + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path. +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = 'sphinx_rtd_theme' + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] + + +def builder_inited_handler(app): + subprocess.run(['./stat.py']) + + +def setup(app): + app.connect('builder-inited', builder_inited_handler) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/dataset_prepare.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/dataset_prepare.md new file mode 100644 index 0000000000000000000000000000000000000000..5407339f13909c3bf32556dc273076d8bb351ba6 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/dataset_prepare.md @@ -0,0 +1,165 @@ +## Prepare datasets + +It is recommended to symlink the dataset root to `$MMSEGMENTATION/data`. +If your folder structure is different, you may need to change the corresponding paths in config files. + +```none +mmsegmentation +├── mmseg +├── tools +├── configs +├── data +│ ├── cityscapes +│ │ ├── leftImg8bit +│ │ │ ├── train +│ │ │ ├── val +│ │ ├── gtFine +│ │ │ ├── train +│ │ │ ├── val +│ ├── VOCdevkit +│ │ ├── VOC2012 +│ │ │ ├── JPEGImages +│ │ │ ├── SegmentationClass +│ │ │ ├── ImageSets +│ │ │ │ ├── Segmentation +│ │ ├── VOC2010 +│ │ │ ├── JPEGImages +│ │ │ ├── SegmentationClassContext +│ │ │ ├── ImageSets +│ │ │ │ ├── SegmentationContext +│ │ │ │ │ ├── train.txt +│ │ │ │ │ ├── val.txt +│ │ │ ├── trainval_merged.json +│ │ ├── VOCaug +│ │ │ ├── dataset +│ │ │ │ ├── cls +│ ├── ade +│ │ ├── ADEChallengeData2016 +│ │ │ ├── annotations +│ │ │ │ ├── training +│ │ │ │ ├── validation +│ │ │ ├── images +│ │ │ │ ├── training +│ │ │ │ ├── validation +│ ├── CHASE_DB1 +│ │ ├── images +│ │ │ ├── training +│ │ │ ├── validation +│ │ ├── annotations +│ │ │ ├── training +│ │ │ ├── validation +│ ├── DRIVE +│ │ ├── images +│ │ │ ├── training +│ │ │ ├── validation +│ │ ├── annotations +│ │ │ ├── training +│ │ │ ├── validation +│ ├── HRF +│ │ ├── images +│ │ │ ├── training +│ │ │ ├── validation +│ │ ├── annotations +│ │ │ ├── training +│ │ │ ├── validation +│ ├── STARE +│ │ ├── images +│ │ │ ├── training +│ │ │ ├── validation +│ │ ├── annotations +│ │ │ ├── training +│ │ │ ├── validation + +``` + +### Cityscapes + +The data could be found [here](https://www.cityscapes-dataset.com/downloads/) after registration. + +By convention, `**labelTrainIds.png` are used for cityscapes training. +We provided a [scripts](https://github.com/open-mmlab/mmsegmentation/blob/master/tools/convert_datasets/cityscapes.py) based on [cityscapesscripts](https://github.com/mcordts/cityscapesScripts) +to generate `**labelTrainIds.png`. + +```shell +# --nproc means 8 process for conversion, which could be omitted as well. +python tools/convert_datasets/cityscapes.py data/cityscapes --nproc 8 +``` + +### Pascal VOC + +Pascal VOC 2012 could be downloaded from [here](http://host.robots.ox.ac.uk/pascal/VOC/voc2012/VOCtrainval_11-May-2012.tar). +Beside, most recent works on Pascal VOC dataset usually exploit extra augmentation data, which could be found [here](http://www.eecs.berkeley.edu/Research/Projects/CS/vision/grouping/semantic_contours/benchmark.tgz). + +If you would like to use augmented VOC dataset, please run following command to convert augmentation annotations into proper format. + +```shell +# --nproc means 8 process for conversion, which could be omitted as well. +python tools/convert_datasets/voc_aug.py data/VOCdevkit data/VOCdevkit/VOCaug --nproc 8 +``` + +Please refer to [concat dataset](https://github.com/open-mmlab/mmsegmentation/blob/master/docs/tutorials/new_dataset.md#concatenate-dataset) for details about how to concatenate them and train them together. + +### ADE20K + +The training and validation set of ADE20K could be download from this [link](http://data.csail.mit.edu/places/ADEchallenge/ADEChallengeData2016.zip). +We may also download test set from [here](http://data.csail.mit.edu/places/ADEchallenge/release_test.zip). + +### Pascal Context + +The training and validation set of Pascal Context could be download from [here](http://host.robots.ox.ac.uk/pascal/VOC/voc2010/VOCtrainval_03-May-2010.tar). You may also download test set from [here](http://host.robots.ox.ac.uk:8080/eval/downloads/VOC2010test.tar) after registration. + +To split the training and validation set from original dataset, you may download trainval_merged.json from [here](https://codalabuser.blob.core.windows.net/public/trainval_merged.json). + +If you would like to use Pascal Context dataset, please install [Detail](https://github.com/zhanghang1989/detail-api) and then run the following command to convert annotations into proper format. + +```shell +python tools/convert_datasets/pascal_context.py data/VOCdevkit data/VOCdevkit/VOC2010/trainval_merged.json +``` + +### CHASE DB1 + +The training and validation set of CHASE DB1 could be download from [here](https://staffnet.kingston.ac.uk/~ku15565/CHASE_DB1/assets/CHASEDB1.zip). + +To convert CHASE DB1 dataset to MMSegmentation format, you should run the following command: + +```shell +python tools/convert_datasets/chase_db1.py /path/to/CHASEDB1.zip +``` + +The script will make directory structure automatically. + +### DRIVE + +The training and validation set of DRIVE could be download from [here](https://drive.grand-challenge.org/). Before that, you should register an account. Currently '1st_manual' is not provided officially. + +To convert DRIVE dataset to MMSegmentation format, you should run the following command: + +```shell +python tools/convert_datasets/drive.py /path/to/training.zip /path/to/test.zip +``` + +The script will make directory structure automatically. + +### HRF + +First, download [healthy.zip](https://www5.cs.fau.de/fileadmin/research/datasets/fundus-images/healthy.zip), [glaucoma.zip](https://www5.cs.fau.de/fileadmin/research/datasets/fundus-images/glaucoma.zip), [diabetic_retinopathy.zip](https://www5.cs.fau.de/fileadmin/research/datasets/fundus-images/diabetic_retinopathy.zip), [healthy_manualsegm.zip](https://www5.cs.fau.de/fileadmin/research/datasets/fundus-images/healthy_manualsegm.zip), [glaucoma_manualsegm.zip](https://www5.cs.fau.de/fileadmin/research/datasets/fundus-images/glaucoma_manualsegm.zip) and [diabetic_retinopathy_manualsegm.zip](https://www5.cs.fau.de/fileadmin/research/datasets/fundus-images/diabetic_retinopathy_manualsegm.zip). + +To convert HRF dataset to MMSegmentation format, you should run the following command: + +```shell +python tools/convert_datasets/hrf.py /path/to/healthy.zip /path/to/healthy_manualsegm.zip /path/to/glaucoma.zip /path/to/glaucoma_manualsegm.zip /path/to/diabetic_retinopathy.zip /path/to/diabetic_retinopathy_manualsegm.zip +``` + +The script will make directory structure automatically. + +### STARE + +First, download [stare-images.tar](http://cecas.clemson.edu/~ahoover/stare/probing/stare-images.tar), [labels-ah.tar](http://cecas.clemson.edu/~ahoover/stare/probing/labels-ah.tar) and [labels-vk.tar](http://cecas.clemson.edu/~ahoover/stare/probing/labels-vk.tar). + +To convert STARE dataset to MMSegmentation format, you should run the following command: + +```shell +python tools/convert_datasets/stare.py /path/to/stare-images.tar /path/to/labels-ah.tar /path/to/labels-vk.tar +``` + +The script will make directory structure automatically. diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/get_started.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/get_started.md new file mode 100644 index 0000000000000000000000000000000000000000..3182c53451dae3024a4e99aace1c766856773d66 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/get_started.md @@ -0,0 +1,193 @@ +## Prerequisites + +- Linux or macOS (Windows is in experimental support) +- Python 3.6+ +- PyTorch 1.3+ +- CUDA 9.2+ (If you build PyTorch from source, CUDA 9.0 is also compatible) +- GCC 5+ +- [MMCV](https://mmcv.readthedocs.io/en/latest/#installation) + +Note: You need to run `pip uninstall mmcv` first if you have mmcv installed. +If mmcv and mmcv-full are both installed, there will be `ModuleNotFoundError`. + +## Installation + +a. Create a conda virtual environment and activate it. + +```shell +conda create -n open-mmlab python=3.7 -y +conda activate open-mmlab +``` + +b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/). +Here we use PyTorch 1.6.0 and CUDA 10.1. +You may also switch to other version by specifying the version number. + +```shell +conda install pytorch=1.6.0 torchvision cudatoolkit=10.1 -c pytorch +``` + +c. Install [MMCV](https://mmcv.readthedocs.io/en/latest/) following the [official instructions](https://mmcv.readthedocs.io/en/latest/#installation). +Either `mmcv` or `mmcv-full` is compatible with MMSegmentation, but for methods like CCNet and PSANet, CUDA ops in `mmcv-full` is required. + +**Install mmcv for Linux:** + +The pre-build mmcv-full (with PyTorch 1.5 and CUDA 10.1) can be installed by running: (other available versions could be found [here](https://mmcv.readthedocs.io/en/latest/#install-with-pip)) + +```shell +pip install mmcv-full==latest+torch1.5.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html +``` + +**Install mmcv for Windows (Experimental):** + +For Windows, the installation of MMCV requires native C++ compilers, such as cl.exe. Please add the compiler to %PATH%. + +A typical path for cl.exe looks like the following if you have Windows SDK and Visual Studio installed on your computer: + +```shell +C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.26.28801\bin\Hostx86\x64 +``` + +Or you should download the cl compiler from web and then set up the path. + +Then, clone mmcv from github and install mmcv via pip: + +```shell +git clone https://github.com/open-mmlab/mmcv.git +cd mmcv +pip install -e . +``` + +Or simply: + +```shell +pip install mmcv +``` + +Currently, mmcv-full is not supported on Windows. + +d. Install MMSegmentation. + +```shell +pip install mmsegmentation # install the latest release +``` + +or + +```shell +pip install git+https://github.com/open-mmlab/mmsegmentation.git # install the master branch +``` + +Instead, if you would like to install MMSegmentation in `dev` mode, run following + +```shell +git clone https://github.com/open-mmlab/mmsegmentation.git +cd mmsegmentation +pip install -e . # or "python setup.py develop" +``` + +Note: + +1. When training or testing models on Windows, please ensure that all the '\\' in paths are replaced with '/'. Add .replace('\\', '/') to your python code wherever path strings occur. +2. The `version+git_hash` will also be saved in trained models meta, e.g. 0.5.0+c415a2e. +3. When MMsegmentation is installed on `dev` mode, any local modifications made to the code will take effect without the need to reinstall it. +4. If you would like to use `opencv-python-headless` instead of `opencv-python`, + you can install it before installing MMCV. +5. Some dependencies are optional. Simply running `pip install -e .` will only install the minimum runtime requirements. + To use optional dependencies like `cityscapessripts` either install them manually with `pip install -r requirements/optional.txt` or specify desired extras when calling `pip` (e.g. `pip install -e .[optional]`). Valid keys for the extras field are: `all`, `tests`, `build`, and `optional`. + +### A from-scratch setup script + +#### Linux + +Here is a full script for setting up mmsegmentation with conda and link the dataset path (supposing that your dataset path is $DATA_ROOT). + +```shell +conda create -n open-mmlab python=3.7 -y +conda activate open-mmlab + +conda install pytorch=1.6.0 torchvision cudatoolkit=10.1 -c pytorch +pip install mmcv-full==latest+torch1.5.0+cu101 -f https://download.openmmlab.com/mmcv/dist/index.html +git clone https://github.com/open-mmlab/mmsegmentation.git +cd mmsegmentation +pip install -e . # or "python setup.py develop" + +mkdir data +ln -s $DATA_ROOT data +``` + +#### Windows(Experimental) + +Here is a full script for setting up mmsegmentation with conda and link the dataset path (supposing that your dataset path is +%DATA_ROOT%. Notice: It must be an absolute path). + +```shell +conda create -n open-mmlab python=3.7 -y +conda activate open-mmlab + +conda install pytorch=1.6.0 torchvision cudatoolkit=10.1 -c pytorch +set PATH=full\path\to\your\cpp\compiler;%PATH% +pip install mmcv + +git clone https://github.com/open-mmlab/mmsegmentation.git +cd mmsegmentation +pip install -e . # or "python setup.py develop" + +mklink /D data %DATA_ROOT% +``` + +#### Developing with multiple MMSegmentation versions + +The train and test scripts already modify the `PYTHONPATH` to ensure the script use the MMSegmentation in the current directory. + +To use the default MMSegmentation installed in the environment rather than that you are working with, you can remove the following line in those scripts + +```shell +PYTHONPATH="$(dirname $0)/..":$PYTHONPATH +``` + +## Verification + +To verify whether MMSegmentation and the required environment are installed correctly, we can run sample python codes to initialize a detector and inference a demo image: + +```python +from mmseg.apis import inference_segmentor, init_segmentor +import mmcv + +config_file = 'configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py' +checkpoint_file = 'checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth' + +# build the model from a config file and a checkpoint file +model = init_segmentor(config_file, checkpoint_file, device='cuda:0') + +# test a single image and show the results +img = 'test.jpg' # or img = mmcv.imread(img), which will only load it once +result = inference_segmentor(model, img) +# visualize the results in a new window +model.show_result(img, result, show=True) +# or save the visualization results to image files +model.show_result(img, result, out_file='result.jpg') + +# test a video and show the results +video = mmcv.VideoReader('video.mp4') +for frame in video: + result = inference_segmentor(model, frame) + model.show_result(frame, result, wait_time=1) +``` + +The above code is supposed to run successfully upon you finish the installation. + +We also provide a demo script to test a single image. + +```shell +python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} [--device ${DEVICE_NAME}] [--palette-thr ${PALETTE}] +``` + +Examples: + +```shell +python demo/image_demo.py demo/demo.jpg configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py \ + checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth --device cuda:0 --palette cityscapes +``` + +A notebook demo can be found in [demo/inference_demo.ipynb](../demo/inference_demo.ipynb). diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/inference.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/inference.md new file mode 100644 index 0000000000000000000000000000000000000000..d7bc21b65acb9da4a38ade92be6ccf3c56574982 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/inference.md @@ -0,0 +1,101 @@ +## Inference with pretrained models + +We provide testing scripts to evaluate a whole dataset (Cityscapes, PASCAL VOC, ADE20k, etc.), +and also some high-level apis for easier integration to other projects. + +### Test a dataset + +- single GPU +- single node multiple GPU +- multiple node + +You can use the following commands to test a dataset. + +```shell +# single-gpu testing +python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show] + +# multi-gpu testing +./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] +``` + +Optional arguments: + +- `RESULT_FILE`: Filename of the output results in pickle format. If not specified, the results will not be saved to a file. +- `EVAL_METRICS`: Items to be evaluated on the results. Allowed values depend on the dataset, e.g., `mIoU` is available for all dataset. Cityscapes could be evaluated by `cityscapes` as well as standard `mIoU` metrics. +- `--show`: If specified, segmentation results will be plotted on the images and shown in a new window. It is only applicable to single GPU testing and used for debugging and visualization. Please make sure that GUI is available in your environment, otherwise you may encounter the error like `cannot connect to X server`. +- `--show-dir`: If specified, segmentation results will be plotted on the images and saved to the specified directory. It is only applicable to single GPU testing and used for debugging and visualization. You do NOT need a GUI available in your environment for using this option. +- `--eval-options`: Optional parameters during evaluation. When `efficient_test=True`, it will save intermediate results to local files to save CPU memory. Make sure that you have enough local storage space (more than 20GB). + +Examples: + +Assume that you have already downloaded the checkpoints to the directory `checkpoints/`. + +1. Test PSPNet and visualize the results. Press any key for the next image. + + ```shell + python tools/test.py configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py \ + checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth \ + --show + ``` + +2. Test PSPNet and save the painted images for latter visualization. + + ```shell + python tools/test.py configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py \ + checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth \ + --show-dir psp_r50_512x1024_40ki_cityscapes_results + ``` + +3. Test PSPNet on PASCAL VOC (without saving the test results) and evaluate the mIoU. + + ```shell + python tools/test.py configs/pspnet/pspnet_r50-d8_512x1024_20k_voc12aug.py \ + checkpoints/pspnet_r50-d8_512x1024_20k_voc12aug_20200605_003338-c57ef100.pth \ + --eval mAP + ``` + +4. Test PSPNet with 4 GPUs, and evaluate the standard mIoU and cityscapes metric. + + ```shell + ./tools/dist_test.sh configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py \ + checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth \ + 4 --out results.pkl --eval mIoU cityscapes + ``` + + Note: There is some gap (~0.1%) between cityscapes mIoU and our mIoU. The reason is that cityscapes average each class with class size by default. + We use the simple version without average for all datasets. + +5. Test PSPNet on cityscapes test split with 4 GPUs, and generate the png files to be submit to the official evaluation server. + + First, add following to config file `configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py`, + + ```python + data = dict( + test=dict( + img_dir='leftImg8bit/test', + ann_dir='gtFine/test')) + ``` + + Then run test. + + ```shell + ./tools/dist_test.sh configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py \ + checkpoints/pspnet_r50-d8_512x1024_40k_cityscapes_20200605_003338-2966598c.pth \ + 4 --format-only --eval-options "imgfile_prefix=./pspnet_test_results" + ``` + + You will get png files under `./pspnet_test_results` directory. + You may run `zip -r results.zip pspnet_test_results/` and submit the zip file to [evaluation server](https://www.cityscapes-dataset.com/submit/). + +6. CPU memory efficient test DeeplabV3+ on Cityscapes (without saving the test results) and evaluate the mIoU. + + ```shell + python tools/test.py \ + configs/deeplabv3plus/deeplabv3plus_r18-d8_512x1024_80k_cityscapes.py \ + deeplabv3plus_r18-d8_512x1024_80k_cityscapes_20201226_080942-cff257fe.pth \ + --eval-options efficient_test=True \ + --eval mIoU + ``` + + Using ```pmap``` to view CPU memory footprint, it used 2.25GB CPU memory with ```efficient_test=True``` and 11.06GB CPU memory with ```efficient_test=False``` . This optional parameter can save a lot of memory. diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/make.bat b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/make.bat new file mode 100644 index 0000000000000000000000000000000000000000..922152e96a04a242e6fc40f124261d74890617d8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/make.bat @@ -0,0 +1,35 @@ +@ECHO OFF + +pushd %~dp0 + +REM Command file for Sphinx documentation + +if "%SPHINXBUILD%" == "" ( + set SPHINXBUILD=sphinx-build +) +set SOURCEDIR=. +set BUILDDIR=_build + +if "%1" == "" goto help + +%SPHINXBUILD% >NUL 2>NUL +if errorlevel 9009 ( + echo. + echo.The 'sphinx-build' command was not found. Make sure you have Sphinx + echo.installed, then set the SPHINXBUILD environment variable to point + echo.to the full path of the 'sphinx-build' executable. Alternatively you + echo.may add the Sphinx directory to PATH. + echo. + echo.If you don't have Sphinx installed, grab it from + echo.http://sphinx-doc.org/ + exit /b 1 +) + +%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% +goto end + +:help +%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O% + +:end +popd diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/model_zoo.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/model_zoo.md new file mode 100644 index 0000000000000000000000000000000000000000..2d4c1c2ac999c771e7048c36bd94d316457f0e50 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/model_zoo.md @@ -0,0 +1,163 @@ +# Benchmark and Model Zoo + +## Common settings + +* We use distributed training with 4 GPUs by default. +* All pytorch-style pretrained backbones on ImageNet are train by ourselves, with the same procedure in the [paper](https://arxiv.org/pdf/1812.01187.pdf). + Our ResNet style backbone are based on ResNetV1c variant, where the 7x7 conv in the input stem is replaced with three 3x3 convs. +* For the consistency across different hardwares, we report the GPU memory as the maximum value of `torch.cuda.max_memory_allocated()` for all 4 GPUs with `torch.backends.cudnn.benchmark=False`. + Note that this value is usually less than what `nvidia-smi` shows. +* We report the inference time as the total time of network forwarding and post-processing, excluding the data loading time. + Results are obtained with the script `tools/benchmark.py` which computes the average time on 200 images with `torch.backends.cudnn.benchmark=False`. +* There are two inference modes in this framework. + + * `slide` mode: The `test_cfg` will be like `dict(mode='slide', crop_size=(769, 769), stride=(513, 513))`. + + In this mode, multiple patches will be cropped from input image, passed into network individually. + The crop size and stride between patches are specified by `crop_size` and `stride`. + The overlapping area will be merged by average + + * `whole` mode: The `test_cfg` will be like `dict(mode='whole')`. + + In this mode, the whole imaged will be passed into network directly. + + By default, we use `slide` inference for 769x769 trained model, `whole` inference for the rest. +* For input size of 8x+1 (e.g. 769), `align_corner=True` is adopted as a traditional practice. + Otherwise, for input size of 8x (e.g. 512, 1024), `align_corner=False` is adopted. + +## Baselines + +### FCN + +Please refer to [FCN](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/fcn) for details. + +### PSPNet + +Please refer to [PSPNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/pspnet) for details. + +### DeepLabV3 + +Please refer to [DeepLabV3](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/deeplabv3) for details. + +### PSANet + +Please refer to [PSANet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/psanet) for details. + +### DeepLabV3+ + +Please refer to [DeepLabV3+](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/deeplabv3plus) for details. + +### UPerNet + +Please refer to [UPerNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/upernet) for details. + +### NonLocal Net + +Please refer to [NonLocal Net](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/nlnet) for details. + +### EncNet + +Please refer to [EncNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/encnet) for details. + +### CCNet + +Please refer to [CCNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/ccnet) for details. + +### DANet + +Please refer to [DANet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/danet) for details. + +### APCNet + +Please refer to [APCNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/apcnet) for details. + +### HRNet + +Please refer to [HRNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/hrnet) for details. + +### GCNet + +Please refer to [GCNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/gcnet) for details. + +### DMNet + +Please refer to [DMNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/dmnet) for details. + +### ANN + +Please refer to [ANN](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/ann) for details. + +### OCRNet + +Please refer to [OCRNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/ocrnet) for details. + +### Fast-SCNN + +Please refer to [Fast-SCNN](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/fastscnn) for details. + +### ResNeSt + +Please refer to [ResNeSt](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/resnest) for details. + +### Semantic FPN + +Please refer to [Semantic FPN](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/semfpn) for details. + +### PointRend + +Please refer to [PointRend](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/point_rend) for details. + +### MobileNetV2 + +Please refer to [MobileNetV2](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/mobilenet_v2) for details. + +### MobileNetV3 + +Please refer to [MobileNetV3](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/mobilenet_v3) for details. + +### EMANet + +Please refer to [EMANet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/emanet) for details. + +### DNLNet + +Please refer to [DNLNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/dnlnet) for details. + +### CGNet + +Please refer to [CGNet](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/cgnet) for details. + +### Mixed Precision (FP16) Training + +Please refer [Mixed Precision (FP16) Training](https://github.com/open-mmlab/mmsegmentation/blob/master/configs/fp16/README.md) for details. + +## Speed benchmark + +### Hardware + +* 8 NVIDIA Tesla V100 (32G) GPUs +* Intel(R) Xeon(R) Gold 6148 CPU @ 2.40GHz + +### Software environment + +* Python 3.7 +* PyTorch 1.5 +* CUDA 10.1 +* CUDNN 7.6.03 +* NCCL 2.4.08 + +### Training speed + +For fair comparison, we benchmark all implementations with ResNet-101V1c. +The input size is fixed to 1024x512 with batch size 2. + +The training speed is reported as followed, in terms of second per iter (s/iter). The lower, the better. + +| Implementation | PSPNet (s/iter) | DeepLabV3+ (s/iter) | +|----------------|-----------------|---------------------| +| [MMSegmentation](https://github.com/open-mmlab/mmsegmentation) | **0.83** | **0.85** | +| [SegmenTron](https://github.com/LikeLy-Journey/SegmenTron) | 0.84 | 0.85 | +| [CASILVision](https://github.com/CSAILVision/semantic-segmentation-pytorch) | 1.15 | N/A | +| [vedaseg](https://github.com/Media-Smart/vedaseg) | 0.95 | 1.25 | + +Note: The output stride of DeepLabV3+ is 8. diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/stat.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/stat.py new file mode 100644 index 0000000000000000000000000000000000000000..3aaf0607004e1aa9d92001da78521afe51057034 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/stat.py @@ -0,0 +1,62 @@ +#!/usr/bin/env python +import functools as func +import glob +import os.path as osp +import re + +import numpy as np + +url_prefix = 'https://github.com/open-mmlab/mmsegmentation/blob/master/' + +files = sorted(glob.glob('../configs/*/README.md')) + +stats = [] +titles = [] +num_ckpts = 0 + +for f in files: + url = osp.dirname(f.replace('../', url_prefix)) + + with open(f, 'r') as content_file: + content = content_file.read() + + title = content.split('\n')[0].replace('#', '').strip() + ckpts = set(x.lower().strip() + for x in re.findall(r'https?://download.*\.pth', content) + if 'mmsegmentation' in x) + if len(ckpts) == 0: + continue + + _papertype = [x for x in re.findall(r'\[([A-Z]+)\]', content)] + assert len(_papertype) > 0 + papertype = _papertype[0] + + paper = set([(papertype, title)]) + + titles.append(title) + num_ckpts += len(ckpts) + statsmsg = f""" +\t* [{papertype}] [{title}]({url}) ({len(ckpts)} ckpts) +""" + stats.append((paper, ckpts, statsmsg)) + +allpapers = func.reduce(lambda a, b: a.union(b), [p for p, _, _ in stats]) +msglist = '\n'.join(x for _, _, x in stats) + +papertypes, papercounts = np.unique([t for t, _ in allpapers], + return_counts=True) +countstr = '\n'.join( + [f' - {t}: {c}' for t, c in zip(papertypes, papercounts)]) + +modelzoo = f""" +# Model Zoo Statistics + +* Number of papers: {len(set(titles))} +{countstr} + +* Number of checkpoints: {num_ckpts} +{msglist} +""" + +with open('modelzoo_statistics.md', 'w') as f: + f.write(modelzoo) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/train.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/train.md new file mode 100644 index 0000000000000000000000000000000000000000..1deac95f7d18185ff0586f81095587471eac46c4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/train.md @@ -0,0 +1,83 @@ +## Train a model + +MMSegmentation implements distributed training and non-distributed training, +which uses `MMDistributedDataParallel` and `MMDataParallel` respectively. + +All outputs (log files and checkpoints) will be saved to the working directory, +which is specified by `work_dir` in the config file. + +By default we evaluate the model on the validation set after some iterations, you can change the evaluation interval by adding the interval argument in the training config. + +```python +evaluation = dict(interval=4000) # This evaluate the model per 4000 iterations. +``` + +**\*Important\***: The default learning rate in config files is for 4 GPUs and 2 img/gpu (batch size = 4x2 = 8). +Equivalently, you may also use 8 GPUs and 1 imgs/gpu since all models using cross-GPU SyncBN. + +To trade speed with GPU memory, you may pass in `--options model.backbone.with_cp=True` to enable checkpoint in backbone. + +### Train with a single GPU + +```shell +python tools/train.py ${CONFIG_FILE} [optional arguments] +``` + +If you want to specify the working directory in the command, you can add an argument `--work-dir ${YOUR_WORK_DIR}`. + +### Train with multiple GPUs + +```shell +./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] +``` + +Optional arguments are: + +- `--no-validate` (**not suggested**): By default, the codebase will perform evaluation at every k iterations during the training. To disable this behavior, use `--no-validate`. +- `--work-dir ${WORK_DIR}`: Override the working directory specified in the config file. +- `--resume-from ${CHECKPOINT_FILE}`: Resume from a previous checkpoint file (to continue the training process). +- `--load-from ${CHECKPOINT_FILE}`: Load weights from a checkpoint file (to start finetuning for another task). + +Difference between `resume-from` and `load-from`: + +- `resume-from` loads both the model weights and optimizer state including the iteration number. +- `load-from` loads only the model weights, starts the training from iteration 0. + +### Train with multiple machines + +If you run MMSegmentation on a cluster managed with [slurm](https://slurm.schedmd.com/), you can use the script `slurm_train.sh`. (This script also supports single machine training.) + +```shell +[GPUS=${GPUS}] ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} --work-dir ${WORK_DIR} +``` + +Here is an example of using 16 GPUs to train PSPNet on the dev partition. + +```shell +GPUS=16 ./tools/slurm_train.sh dev pspr50 configs/pspnet/pspnet_r50-d8_512x1024_40k_cityscapes.py /nfs/xxxx/psp_r50_512x1024_40ki_cityscapes +``` + +You can check [slurm_train.sh](../tools/slurm_train.sh) for full arguments and environment variables. + +If you have just multiple machines connected with ethernet, you can refer to +PyTorch [launch utility](https://pytorch.org/docs/stable/distributed_deprecated.html#launch-utility). +Usually it is slow if you do not have high speed networking like InfiniBand. + +### Launch multiple jobs on a single machine + +If you launch multiple jobs on a single machine, e.g., 2 jobs of 4-GPU training on a machine with 8 GPUs, +you need to specify different ports (29500 by default) for each job to avoid communication conflict. Otherwise, there will be error message saying `RuntimeError: Address already in use`. + +If you use `dist_train.sh` to launch training jobs, you can set the port in commands with environment variable `PORT`. + +```shell +CUDA_VISIBLE_DEVICES=0,1,2,3 PORT=29500 ./tools/dist_train.sh ${CONFIG_FILE} 4 +CUDA_VISIBLE_DEVICES=4,5,6,7 PORT=29501 ./tools/dist_train.sh ${CONFIG_FILE} 4 +``` + +If you use `slurm_train.sh` to launch training jobs, you can set the port in commands with environment variable `MASTER_PORT`. + +```shell +MASTER_PORT=29500 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} +MASTER_PORT=29501 ./tools/slurm_train.sh ${PARTITION} ${JOB_NAME} ${CONFIG_FILE} +``` diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/config.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/config.md new file mode 100644 index 0000000000000000000000000000000000000000..b243c06d5b60fc09135b227235241c6583325a6b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/config.md @@ -0,0 +1,381 @@ +# Tutorial 1: Learn about Configs + +We incorporate modular and inheritance design into our config system, which is convenient to conduct various experiments. +If you wish to inspect the config file, you may run `python tools/print_config.py /PATH/TO/CONFIG` to see the complete config. +You may also pass `--options xxx.yyy=zzz` to see updated config. + +## Config File Structure + +There are 4 basic component types under `config/_base_`, dataset, model, schedule, default_runtime. +Many methods could be easily constructed with one of each like DeepLabV3, PSPNet. +The configs that are composed by components from `_base_` are called _primitive_. + +For all configs under the same folder, it is recommended to have only **one** _primitive_ config. All other configs should inherit from the _primitive_ config. In this way, the maximum of inheritance level is 3. + +For easy understanding, we recommend contributors to inherit from exiting methods. +For example, if some modification is made base on DeepLabV3, user may first inherit the basic DeepLabV3 structure by specifying `_base_ = ../deeplabv3/deeplabv3_r50_512x1024_40ki_cityscapes.py`, then modify the necessary fields in the config files. + +If you are building an entirely new method that does not share the structure with any of the existing methods, you may create a folder `xxxnet` under `configs`, + +Please refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#config) for detailed documentation. + +## Config Name Style + +We follow the below style to name config files. Contributors are advised to follow the same style. + +``` +{model}_{backbone}_[misc]_[gpu x batch_per_gpu]_{resolution}_{schedule}_{dataset} +``` + +`{xxx}` is required field and `[yyy]` is optional. + +- `{model}`: model type like `psp`, `deeplabv3`, etc. +- `{backbone}`: backbone type like `r50` (ResNet-50), `x101` (ResNeXt-101). +- `[misc]`: miscellaneous setting/plugins of model, e.g. `dconv`, `gcb`, `attention`, `mstrain`. +- `[gpu x batch_per_gpu]`: GPUs and samples per GPU, `8x2` is used by default. +- `{schedule}`: training schedule, `20ki` means 20k iterations. +- `{dataset}`: dataset like `cityscapes`, `voc12aug`, `ade`. + +## An Example of PSPNet + +To help the users have a basic idea of a complete config and the modules in a modern semantic segmentation system, +we make brief comments on the config of PSPNet using ResNet50V1c as the following. +For more detailed usage and the corresponding alternative for each modules, please refer to the API documentation. + +```python +norm_cfg = dict(type='SyncBN', requires_grad=True) # Segmentation usually uses SyncBN +model = dict( + type='EncoderDecoder', # Name of segmentor + pretrained='open-mmlab://resnet50_v1c', # The ImageNet pretrained backbone to be loaded + backbone=dict( + type='ResNetV1c', # The type of backbone. Please refer to mmseg/backbone/resnet.py for details. + depth=50, # Depth of backbone. Normally 50, 101 are used. + num_stages=4, # Number of stages of backbone. + out_indices=(0, 1, 2, 3), # The index of output feature maps produced in each stages. + dilations=(1, 1, 2, 4), # The dilation rate of each layer. + strides=(1, 2, 1, 1), # The stride of each layer. + norm_cfg=dict( # The configuration of norm layer. + type='SyncBN', # Type of norm layer. Usually it is SyncBN. + requires_grad=True), # Whether to train the gamma and beta in norm + norm_eval=False, # Whether to freeze the statistics in BN + style='pytorch', # The style of backbone, 'pytorch' means that stride 2 layers are in 3x3 conv, 'caffe' means stride 2 layers are in 1x1 convs. + contract_dilation=True), # When dilation > 1, whether contract first layer of dilation. + decode_head=dict( + type='PSPHead', # Type of decode head. Please refer to mmseg/models/decode_heads for available options. + in_channels=2048, # Input channel of decode head. + in_index=3, # The index of feature map to select. + channels=512, # The intermediate channels of decode head. + pool_scales=(1, 2, 3, 6), # The avg pooling scales of PSPHead. Please refer to paper for details. + dropout_ratio=0.1, # The dropout ratio before final classification layer. + num_classes=19, # Number of segmentation classs. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k. + norm_cfg=dict(type='SyncBN', requires_grad=True), # The configuration of norm layer. + align_corners=False, # The align_corners argument for resize in decoding. + loss_decode=dict( # Config of loss function for the decode_head. + type='CrossEntropyLoss', # Type of loss used for segmentation. + use_sigmoid=False, # Whether use sigmoid activation for segmentation. + loss_weight=1.0)), # Loss weight of decode head. + auxiliary_head=dict( + type='FCNHead', # Type of auxiliary head. Please refer to mmseg/models/decode_heads for available options. + in_channels=1024, # Input channel of auxiliary head. + in_index=2, # The index of feature map to select. + channels=256, # The intermediate channels of decode head. + num_convs=1, # Number of convs in FCNHead. It is usually 1 in auxiliary head. + concat_input=False, # Whether concat output of convs with input before classification layer. + dropout_ratio=0.1, # The dropout ratio before final classification layer. + num_classes=19, # Number of segmentation classs. Usually 19 for cityscapes, 21 for VOC, 150 for ADE20k. + norm_cfg=dict(type='SyncBN', requires_grad=True), # The configuration of norm layer. + align_corners=False, # The align_corners argument for resize in decoding. + loss_decode=dict( # Config of loss function for the decode_head. + type='CrossEntropyLoss', # Type of loss used for segmentation. + use_sigmoid=False, # Whether use sigmoid activation for segmentation. + loss_weight=0.4))) # Loss weight of auxiliary head, which is usually 0.4 of decode head. +train_cfg = dict() # train_cfg is just a place holder for now. +test_cfg = dict(mode='whole') # The test mode, options are 'whole' and 'sliding'. 'whole': whole image fully-convolutional test. 'sliding': sliding crop window on the image. +dataset_type = 'CityscapesDataset' # Dataset type, this will be used to define the dataset. +data_root = 'data/cityscapes/' # Root path of data. +img_norm_cfg = dict( # Image normalization config to normalize the input images. + mean=[123.675, 116.28, 103.53], # Mean values used to pre-training the pre-trained backbone models. + std=[58.395, 57.12, 57.375], # Standard variance used to pre-training the pre-trained backbone models. + to_rgb=True) # The channel orders of image used to pre-training the pre-trained backbone models. +crop_size = (512, 1024) # The crop size during training. +train_pipeline = [ # Training pipeline. + dict(type='LoadImageFromFile'), # First pipeline to load images from file path. + dict(type='LoadAnnotations'), # Second pipeline to load annotations for current image. + dict(type='Resize', # Augmentation pipeline that resize the images and their annotations. + img_scale=(2048, 1024), # The largest scale of image. + ratio_range=(0.5, 2.0)), # The augmented scale range as ratio. + dict(type='RandomCrop', # Augmentation pipeline that randomly crop a patch from current image. + crop_size=(512, 1024), # The crop size of patch. + cat_max_ratio=0.75), # The max area ratio that could be occupied by single category. + dict( + type='RandomFlip', # Augmentation pipeline that flip the images and their annotations + flip_ratio=0.5), # The ratio or probability to flip + dict(type='PhotoMetricDistortion'), # Augmentation pipeline that distort current image with several photo metric methods. + dict( + type='Normalize', # Augmentation pipeline that normalize the input images + mean=[123.675, 116.28, 103.53], # These keys are the same of img_norm_cfg since the + std=[58.395, 57.12, 57.375], # keys of img_norm_cfg are used here as arguments + to_rgb=True), + dict(type='Pad', # Augmentation pipeline that pad the image to specified size. + size=(512, 1024), # The output size of padding. + pad_val=0, # The padding value for image. + seg_pad_val=255), # The padding value of 'gt_semantic_seg'. + dict(type='DefaultFormatBundle'), # Default format bundle to gather data in the pipeline + dict(type='Collect', # Pipeline that decides which keys in the data should be passed to the segmentor + keys=['img', 'gt_semantic_seg']) +] +test_pipeline = [ + dict(type='LoadImageFromFile'), # First pipeline to load images from file path + dict( + type='MultiScaleFlipAug', # An encapsulation that encapsulates the test time augmentations + img_scale=(2048, 1024), # Decides the largest scale for testing, used for the Resize pipeline + flip=False, # Whether to flip images during testing + transforms=[ + dict(type='Resize', # Use resize augmentation + keep_ratio=True), # Whether to keep the ratio between height and width, the img_scale set here will be supressed by the img_scale set above. + dict(type='RandomFlip'), # Thought RandomFlip is added in pipeline, it is not used when flip=False + dict( + type='Normalize', # Normalization config, the values are from img_norm_cfg + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + to_rgb=True), + dict(type='ImageToTensor', # Convert image to tensor + keys=['img']), + dict(type='Collect', # Collect pipeline that collect necessary keys for testing. + keys=['img']) + ]) +] +data = dict( + samples_per_gpu=2, # Batch size of a single GPU + workers_per_gpu=2, # Worker to pre-fetch data for each single GPU + train=dict( # Train dataset config + type='CityscapesDataset', # Type of dataset, refer to mmseg/datasets/ for details. + data_root='data/cityscapes/', # The root of dataset. + img_dir='leftImg8bit/train', # The image directory of dataset. + ann_dir='gtFine/train', # The annotation directory of dataset. + pipeline=[ # pipeline, this is passed by the train_pipeline created before. + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict( + type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=(512, 1024), cat_max_ratio=0.75), + dict(type='RandomFlip', flip_ratio=0.5), + dict(type='PhotoMetricDistortion'), + dict( + type='Normalize', + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + to_rgb=True), + dict(type='Pad', size=(512, 1024), pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']) + ]), + val=dict( # Validation dataset config + type='CityscapesDataset', + data_root='data/cityscapes/', + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=[ # Pipeline is passed by test_pipeline created before + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict( + type='Normalize', + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + to_rgb=True), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) + ]), + test=dict( + type='CityscapesDataset', + data_root='data/cityscapes/', + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=[ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict( + type='Normalize', + mean=[123.675, 116.28, 103.53], + std=[58.395, 57.12, 57.375], + to_rgb=True), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) + ])) +log_config = dict( # config to register logger hook + interval=50, # Interval to print the log + hooks=[ + # dict(type='TensorboardLoggerHook') # The Tensorboard logger is also supported + dict(type='TextLoggerHook', by_epoch=False) + ]) +dist_params = dict(backend='nccl') # Parameters to setup distributed training, the port can also be set. +log_level = 'INFO' # The level of logging. +load_from = None # load models as a pre-trained model from a given path. This will not resume training. +resume_from = None # Resume checkpoints from a given path, the training will be resumed from the iteration when the checkpoint's is saved. +workflow = [('train', 1)] # Workflow for runner. [('train', 1)] means there is only one workflow and the workflow named 'train' is executed once. The workflow trains the model by 40000 iterations according to the `runner.max_iters`. +cudnn_benchmark = True # Whether use cudnn_benchmark to speed up, which is fast for fixed input size. +optimizer = dict( # Config used to build optimizer, support all the optimizers in PyTorch whose arguments are also the same as those in PyTorch + type='SGD', # Type of optimizers, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/optimizer/default_constructor.py#L13 for more details + lr=0.01, # Learning rate of optimizers, see detail usages of the parameters in the documentation of PyTorch + momentum=0.9, # Momentum + weight_decay=0.0005) # Weight decay of SGD +optimizer_config = dict() # Config used to build the optimizer hook, refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/optimizer.py#L8 for implementation details. +lr_config = dict( + policy='poly', # The policy of scheduler, also support Step, CosineAnnealing, Cyclic, etc. Refer to details of supported LrUpdater from https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py#L9. + power=0.9, # The power of polynomial decay. + min_lr=0.0001, # The minimum learning rate to stable the training. + by_epoch=False) # Whethe count by epoch or not. +runner = dict( + type='IterBasedRunner', # Type of runner to use (i.e. IterBasedRunner or EpochBasedRunner) + max_iters=40000) # Total number of iterations. For EpochBasedRunner use `max_epochs` +checkpoint_config = dict( # Config to set the checkpoint hook, Refer to https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/checkpoint.py for implementation. + by_epoch=False, # Whethe count by epoch or not. + interval=4000) # The save interval. +evaluation = dict( # The config to build the evaluation hook. Please refer to mmseg/core/evaulation/eval_hook.py for details. + interval=4000, # The interval of evaluation. + metric='mIoU') # The evaluation metric. + + +``` + +## FAQ + +### Ignore some fields in the base configs + +Sometimes, you may set `_delete_=True` to ignore some of fields in base configs. +You may refer to [mmcv](https://mmcv.readthedocs.io/en/latest/utils.html#inherit-from-base-config-with-ignored-fields) for simple inllustration. + +In MMSegmentation, for example, to change the backbone of PSPNet with the following config. + +```python +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='MaskRCNN', + pretrained='torchvision://resnet50', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict(...), + auxiliary_head=dict(...)) +``` + +`ResNet` and `HRNet` use different keywords to construct. + +```python +_base_ = '../pspnet/psp_r50_512x1024_40ki_cityscpaes.py' +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + pretrained='open-mmlab://msra/hrnetv2_w32', + backbone=dict( + _delete_=True, + type='HRNet', + norm_cfg=norm_cfg, + extra=dict( + stage1=dict( + num_modules=1, + num_branches=1, + block='BOTTLENECK', + num_blocks=(4, ), + num_channels=(64, )), + stage2=dict( + num_modules=1, + num_branches=2, + block='BASIC', + num_blocks=(4, 4), + num_channels=(32, 64)), + stage3=dict( + num_modules=4, + num_branches=3, + block='BASIC', + num_blocks=(4, 4, 4), + num_channels=(32, 64, 128)), + stage4=dict( + num_modules=3, + num_branches=4, + block='BASIC', + num_blocks=(4, 4, 4, 4), + num_channels=(32, 64, 128, 256)))), + decode_head=dict(...), + auxiliary_head=dict(...)) +``` + +The `_delete_=True` would replace all old keys in `backbone` field with new keys new keys. + +### Use intermediate variables in configs + +Some intermediate variables are used in the configs files, like `train_pipeline`/`test_pipeline` in datasets. +It's worth noting that when modifying intermediate variables in the children configs, user need to pass the intermediate variables into corresponding fields again. +For example, we would like to change multi scale strategy to train/test a PSPNet. `train_pipeline`/`test_pipeline` are intermediate variable we would like modify. + +```python +_base_ = '../pspnet/psp_r50_512x1024_40ki_cityscapes.py' +crop_size = (512, 1024) +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(1.0, 2.0)), # change to [1., 2.] + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', flip_ratio=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], # change to multi scale testing + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + train=dict(pipeline=train_pipeline), + val=dict(pipeline=test_pipeline), + test=dict(pipeline=test_pipeline)) +``` + +We first define the new `train_pipeline`/`test_pipeline` and pass them into `data`. + +Similarly, if we would like to switch from `SyncBN` to `BN` or `MMSyncBN`, we need to substitute every `norm_cfg` in the config. + +```python +_base_ = '../pspnet/psp_r50_512x1024_40ki_cityscpaes.py' +norm_cfg = dict(type='BN', requires_grad=True) +model = dict( + backbone=dict(norm_cfg=norm_cfg), + decode_head=dict(norm_cfg=norm_cfg), + auxiliary_head=dict(norm_cfg=norm_cfg)) +``` diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/customize_datasets.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/customize_datasets.md new file mode 100644 index 0000000000000000000000000000000000000000..020d51316e15a7f6926f49d81dcd2509f5170e07 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/customize_datasets.md @@ -0,0 +1,172 @@ +# Tutorial 2: Customize Datasets + +## Customize datasets by reorganizing data + +The simplest way is to convert your dataset to organize your data into folders. + +An example of file structure is as followed. + +```none +├── data +│ ├── my_dataset +│ │ ├── img_dir +│ │ │ ├── train +│ │ │ │ ├── xxx{img_suffix} +│ │ │ │ ├── yyy{img_suffix} +│ │ │ │ ├── zzz{img_suffix} +│ │ │ ├── val +│ │ ├── ann_dir +│ │ │ ├── train +│ │ │ │ ├── xxx{seg_map_suffix} +│ │ │ │ ├── yyy{seg_map_suffix} +│ │ │ │ ├── zzz{seg_map_suffix} +│ │ │ ├── val + +``` + +A training pair will consist of the files with same suffix in img_dir/ann_dir. + +If `split` argument is given, only part of the files in img_dir/ann_dir will be loaded. +We may specify the prefix of files we would like to be included in the split txt. + +More specifically, for a split txt like following, + +```none +xxx +zzz +``` + +Only +`data/my_dataset/img_dir/train/xxx{img_suffix}`, +`data/my_dataset/img_dir/train/zzz{img_suffix}`, +`data/my_dataset/ann_dir/train/xxx{seg_map_suffix}`, +`data/my_dataset/ann_dir/train/zzz{seg_map_suffix}` will be loaded. + +Note: The annotations are images of shape (H, W), the value pixel should fall in range `[0, num_classes - 1]`. +You may use `'P'` mode of [pillow](https://pillow.readthedocs.io/en/stable/handbook/concepts.html#palette) to create your annotation image with color. + +## Customize datasets by mixing dataset + +MMSegmentation also supports to mix dataset for training. +Currently it supports to concat and repeat datasets. + +### Repeat dataset + +We use `RepeatDataset` as wrapper to repeat the dataset. +For example, suppose the original dataset is `Dataset_A`, to repeat it, the config looks like the following + +```python +dataset_A_train = dict( + type='RepeatDataset', + times=N, + dataset=dict( # This is the original config of Dataset_A + type='Dataset_A', + ... + pipeline=train_pipeline + ) + ) +``` + +### Concatenate dataset + +There 2 ways to concatenate the dataset. + +1. If the datasets you want to concatenate are in the same type with different annotation files, + you can concatenate the dataset configs like the following. + + 1. You may concatenate two `ann_dir`. + + ```python + dataset_A_train = dict( + type='Dataset_A', + img_dir = 'img_dir', + ann_dir = ['anno_dir_1', 'anno_dir_2'], + pipeline=train_pipeline + ) + ``` + + 2. You may concatenate two `split`. + + ```python + dataset_A_train = dict( + type='Dataset_A', + img_dir = 'img_dir', + ann_dir = 'anno_dir', + split = ['split_1.txt', 'split_2.txt'], + pipeline=train_pipeline + ) + ``` + + 3. You may concatenate two `ann_dir` and `split` simultaneously. + + ```python + dataset_A_train = dict( + type='Dataset_A', + img_dir = 'img_dir', + ann_dir = ['anno_dir_1', 'anno_dir_2'], + split = ['split_1.txt', 'split_2.txt'], + pipeline=train_pipeline + ) + ``` + + In this case, `ann_dir_1` and `ann_dir_2` are corresponding to `split_1.txt` and `split_2.txt`. + +2. In case the dataset you want to concatenate is different, you can concatenate the dataset configs like the following. + + ```python + dataset_A_train = dict() + dataset_B_train = dict() + + data = dict( + imgs_per_gpu=2, + workers_per_gpu=2, + train = [ + dataset_A_train, + dataset_B_train + ], + val = dataset_A_val, + test = dataset_A_test + ) + ``` + +A more complex example that repeats `Dataset_A` and `Dataset_B` by N and M times, respectively, and then concatenates the repeated datasets is as the following. + +```python +dataset_A_train = dict( + type='RepeatDataset', + times=N, + dataset=dict( + type='Dataset_A', + ... + pipeline=train_pipeline + ) +) +dataset_A_val = dict( + ... + pipeline=test_pipeline +) +dataset_A_test = dict( + ... + pipeline=test_pipeline +) +dataset_B_train = dict( + type='RepeatDataset', + times=M, + dataset=dict( + type='Dataset_B', + ... + pipeline=train_pipeline + ) +) +data = dict( + imgs_per_gpu=2, + workers_per_gpu=2, + train = [ + dataset_A_train, + dataset_B_train + ], + val = dataset_A_val, + test = dataset_A_test +) + +``` diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/customize_models.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/customize_models.md new file mode 100644 index 0000000000000000000000000000000000000000..f637fd6f0431ea0de748e39806be51d1b4849c8e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/customize_models.md @@ -0,0 +1,234 @@ +# Tutorial 4: Customize Models + +## Customize optimizer + +Assume you want to add a optimizer named as `MyOptimizer`, which has arguments `a`, `b`, and `c`. +You need to first implement the new optimizer in a file, e.g., in `mmseg/core/optimizer/my_optimizer.py`: + +```python +from mmcv.runner import OPTIMIZERS +from torch.optim import Optimizer + + +@OPTIMIZERS.register_module +class MyOptimizer(Optimizer): + + def __init__(self, a, b, c) + +``` + +Then add this module in `mmseg/core/optimizer/__init__.py` thus the registry will +find the new module and add it: + +```python +from .my_optimizer import MyOptimizer +``` + +Then you can use `MyOptimizer` in `optimizer` field of config files. +In the configs, the optimizers are defined by the field `optimizer` like the following: + +```python +optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +``` + +To use your own optimizer, the field can be changed as + +```python +optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value) +``` + +We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files. +For example, if you want to use `ADAM`, though the performance will drop a lot, the modification could be as the following. + +```python +optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001) +``` + +The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch. + +## Customize optimizer constructor + +Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNoarm layers. +The users can do those fine-grained parameter tuning through customizing optimizer constructor. + +``` +from mmcv.utils import build_from_cfg + +from mmcv.runner import OPTIMIZER_BUILDERS +from .cocktail_optimizer import CocktailOptimizer + + +@OPTIMIZER_BUILDERS.register_module +class CocktailOptimizerConstructor(object): + + def __init__(self, optimizer_cfg, paramwise_cfg=None): + + def __call__(self, model): + + return my_optimizer + +``` + +## Develop new components + +There are mainly 2 types of components in MMSegmentation. + +- backbone: usually stacks of convolutional network to extract feature maps, e.g., ResNet, HRNet. +- head: the component for semantic segmentation map decoding. + +### Add new backbones + +Here we show how to develop new components with an example of MobileNet. + +1. Create a new file `mmseg/models/backbones/mobilenet.py`. + +```python +import torch.nn as nn + +from ..registry import BACKBONES + + +@BACKBONES.register_module +class MobileNet(nn.Module): + + def __init__(self, arg1, arg2): + pass + + def forward(self, x): # should return a tuple + pass + + def init_weights(self, pretrained=None): + pass +``` + +2. Import the module in `mmseg/models/backbones/__init__.py`. + +```python +from .mobilenet import MobileNet +``` + +3. Use it in your config file. + +```python +model = dict( + ... + backbone=dict( + type='MobileNet', + arg1=xxx, + arg2=xxx), + ... +``` + +### Add new heads + +In MMSegmentation, we provide a base [BaseDecodeHead](https://github.com/open-mmlab/mmsegmentation/blob/master/mmseg/models/decode_heads/decode_head.py) for all segmentation head. +All newly implemented decode heads should be derived from it. +Here we show how to develop a new head with the example of [PSPNet](https://arxiv.org/abs/1612.01105) as the following. + +First, add a new decode head in `mmseg/models/decode_heads/psp_head.py`. +PSPNet implements a decode head for segmentation decode. +To implement a decode head, basically we need to implement three functions of the new module as the following. + +```python +@HEADS.register_module() +class PSPHead(BaseDecodeHead): + + def __init__(self, pool_scales=(1, 2, 3, 6), **kwargs): + super(PSPHead, self).__init__(**kwargs) + + def init_weights(self): + + def forward(self, inputs): + +``` + +Next, the users need to add the module in the `mmseg/models/decode_heads/__init__.py` thus the corresponding registry could find and load them. + +To config file of PSPNet is as the following + +```python +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='pretrain_model/resnet50_v1c_trick-2cccc1ad.pth', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='PSPHead', + in_channels=2048, + in_index=3, + channels=512, + pool_scales=(1, 2, 3, 6), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0))) + +``` + +### Add new loss + +Assume you want to add a new loss as `MyLoss` for segmentation decode. +To add a new loss function, the users need implement it in `mmseg/models/losses/my_loss.py`. +The decorator `weighted_loss` enable the loss to be weighted for each element. + +```python +import torch +import torch.nn as nn + +from ..builder import LOSSES +from .utils import weighted_loss + +@weighted_loss +def my_loss(pred, target): + assert pred.size() == target.size() and target.numel() > 0 + loss = torch.abs(pred - target) + return loss + +@LOSSES.register_module +class MyLoss(nn.Module): + + def __init__(self, reduction='mean', loss_weight=1.0): + super(MyLoss, self).__init__() + self.reduction = reduction + self.loss_weight = loss_weight + + def forward(self, + pred, + target, + weight=None, + avg_factor=None, + reduction_override=None): + assert reduction_override in (None, 'none', 'mean', 'sum') + reduction = ( + reduction_override if reduction_override else self.reduction) + loss = self.loss_weight * my_loss( + pred, target, weight, reduction=reduction, avg_factor=avg_factor) + return loss +``` + +Then the users need to add it in the `mmseg/models/losses/__init__.py`. + +```python +from .my_loss import MyLoss, my_loss + +``` + +To use it, modify the `loss_xxx` field. +Then you need to modify the `loss_decode` field in the head. +`loss_weight` could be used to balance multiple losses. + +```python +loss_decode=dict(type='MyLoss', loss_weight=1.0)) +``` diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/customize_runtime.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/customize_runtime.md new file mode 100644 index 0000000000000000000000000000000000000000..dd67ef54f639fa0b3b9a01727d16352752d30899 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/customize_runtime.md @@ -0,0 +1,243 @@ +# Tutorial 6: Customize Runtime Settings + +## Customize optimization settings + +### Customize optimizer supported by Pytorch + +We already support to use all the optimizers implemented by PyTorch, and the only modification is to change the `optimizer` field of config files. +For example, if you want to use `ADAM` (note that the performance could drop a lot), the modification could be as the following. + +```python +optimizer = dict(type='Adam', lr=0.0003, weight_decay=0.0001) +``` + +To modify the learning rate of the model, the users only need to modify the `lr` in the config of optimizer. The users can directly set arguments following the [API doc](https://pytorch.org/docs/stable/optim.html?highlight=optim#module-torch.optim) of PyTorch. + +### Customize self-implemented optimizer + +#### 1. Define a new optimizer + +A customized optimizer could be defined as following. + +Assume you want to add a optimizer named `MyOptimizer`, which has arguments `a`, `b`, and `c`. +You need to create a new directory named `mmseg/core/optimizer`. +And then implement the new optimizer in a file, e.g., in `mmseg/core/optimizer/my_optimizer.py`: + +```python +from .registry import OPTIMIZERS +from torch.optim import Optimizer + + +@OPTIMIZERS.register_module() +class MyOptimizer(Optimizer): + + def __init__(self, a, b, c) + +``` + +#### 2. Add the optimizer to registry + +To find the above module defined above, this module should be imported into the main namespace at first. There are two options to achieve it. + +- Modify `mmseg/core/optimizer/__init__.py` to import it. + + The newly defined module should be imported in `mmseg/core/optimizer/__init__.py` so that the registry will + find the new module and add it: + +```python +from .my_optimizer import MyOptimizer +``` + +- Use `custom_imports` in the config to manually import it + +```python +custom_imports = dict(imports=['mmseg.core.optimizer.my_optimizer'], allow_failed_imports=False) +``` + +The module `mmseg.core.optimizer.my_optimizer` will be imported at the beginning of the program and the class `MyOptimizer` is then automatically registered. +Note that only the package containing the class `MyOptimizer` should be imported. +`mmseg.core.optimizer.my_optimizer.MyOptimizer` **cannot** be imported directly. + +Actually users can use a totally different file directory structure using this importing method, as long as the module root can be located in `PYTHONPATH`. + +#### 3. Specify the optimizer in the config file + +Then you can use `MyOptimizer` in `optimizer` field of config files. +In the configs, the optimizers are defined by the field `optimizer` like the following: + +```python +optimizer = dict(type='SGD', lr=0.02, momentum=0.9, weight_decay=0.0001) +``` + +To use your own optimizer, the field can be changed to + +```python +optimizer = dict(type='MyOptimizer', a=a_value, b=b_value, c=c_value) +``` + +### Customize optimizer constructor + +Some models may have some parameter-specific settings for optimization, e.g. weight decay for BatchNorm layers. +The users can do those fine-grained parameter tuning through customizing optimizer constructor. + +```python +from mmcv.utils import build_from_cfg + +from mmcv.runner.optimizer import OPTIMIZER_BUILDERS, OPTIMIZERS +from mmseg.utils import get_root_logger +from .my_optimizer import MyOptimizer + + +@OPTIMIZER_BUILDERS.register_module() +class MyOptimizerConstructor(object): + + def __init__(self, optimizer_cfg, paramwise_cfg=None): + + def __call__(self, model): + + return my_optimizer + +``` + +The default optimizer constructor is implemented [here](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/optimizer/default_constructor.py#L11), which could also serve as a template for new optimizer constructor. + +### Additional settings + +Tricks not implemented by the optimizer should be implemented through optimizer constructor (e.g., set parameter-wise learning rates) or hooks. We list some common settings that could stabilize the training or accelerate the training. Feel free to create PR, issue for more settings. + +- __Use gradient clip to stabilize training__: + Some models need gradient clip to clip the gradients to stabilize the training process. An example is as below: + + ```python + optimizer_config = dict( + _delete_=True, grad_clip=dict(max_norm=35, norm_type=2)) + ``` + + If your config inherits the base config which already sets the `optimizer_config`, you might need `_delete_=True` to overide the unnecessary settings. See the [config documenetation](https://mmsegmentation.readthedocs.io/en/latest/config.html) for more details. + +- __Use momentum schedule to accelerate model convergence__: + We support momentum scheduler to modify model's momentum according to learning rate, which could make the model converge in a faster way. + Momentum scheduler is usually used with LR scheduler, for example, the following config is used in 3D detection to accelerate convergence. + For more details, please refer to the implementation of [CyclicLrUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/lr_updater.py#L327) and [CyclicMomentumUpdater](https://github.com/open-mmlab/mmcv/blob/f48241a65aebfe07db122e9db320c31b685dc674/mmcv/runner/hooks/momentum_updater.py#L130). + + ```python + lr_config = dict( + policy='cyclic', + target_ratio=(10, 1e-4), + cyclic_times=1, + step_ratio_up=0.4, + ) + momentum_config = dict( + policy='cyclic', + target_ratio=(0.85 / 0.95, 1), + cyclic_times=1, + step_ratio_up=0.4, + ) + ``` + +## Customize training schedules + +By default we use step learning rate with 40k/80k schedule, this calls [`PolyLrUpdaterHook`](https://github.com/open-mmlab/mmcv/blob/826d3a7b68596c824fa1e2cb89b6ac274f52179c/mmcv/runner/hooks/lr_updater.py#L196) in MMCV. +We support many other learning rate schedule [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/runner/hooks/lr_updater.py), such as `CosineAnnealing` and `Poly` schedule. Here are some examples + +- Step schedule: + + ```python + lr_config = dict(policy='step', step=[9, 10]) + ``` + +- ConsineAnnealing schedule: + + ```python + lr_config = dict( + policy='CosineAnnealing', + warmup='linear', + warmup_iters=1000, + warmup_ratio=1.0 / 10, + min_lr_ratio=1e-5) + ``` + +## Customize workflow + +Workflow is a list of (phase, epochs) to specify the running order and epochs. +By default it is set to be + +```python +workflow = [('train', 1)] +``` + +which means running 1 epoch for training. +Sometimes user may want to check some metrics (e.g. loss, accuracy) about the model on the validate set. +In such case, we can set the workflow as + +```python +[('train', 1), ('val', 1)] +``` + +so that 1 epoch for training and 1 epoch for validation will be run iteratively. + +**Note**: + +1. The parameters of model will not be updated during val epoch. +2. Keyword `total_epochs` in the config only controls the number of training epochs and will not affect the validation workflow. +3. Workflows `[('train', 1), ('val', 1)]` and `[('train', 1)]` will not change the behavior of `EvalHook` because `EvalHook` is called by `after_train_epoch` and validation workflow only affect hooks that are called through `after_val_epoch`. Therefore, the only difference between `[('train', 1), ('val', 1)]` and `[('train', 1)]` is that the runner will calculate losses on validation set after each training epoch. + +## Customize hooks + +### Use hooks implemented in MMCV + +If the hook is already implemented in MMCV, you can directly modify the config to use the hook as below + +```python +custom_hooks = [ + dict(type='MyHook', a=a_value, b=b_value, priority='NORMAL') +] +``` + +### Modify default runtime hooks + +There are some common hooks that are not registerd through `custom_hooks`, they are + +- log_config +- checkpoint_config +- evaluation +- lr_config +- optimizer_config +- momentum_config + +In those hooks, only the logger hook has the `VERY_LOW` priority, others' priority are `NORMAL`. +The above-mentioned tutorials already covers how to modify `optimizer_config`, `momentum_config`, and `lr_config`. +Here we reveals how what we can do with `log_config`, `checkpoint_config`, and `evaluation`. + +#### Checkpoint config + +The MMCV runner will use `checkpoint_config` to initialize [`CheckpointHook`](https://github.com/open-mmlab/mmcv/blob/9ecd6b0d5ff9d2172c49a182eaa669e9f27bb8e7/mmcv/runner/hooks/checkpoint.py#L9). + +```python +checkpoint_config = dict(interval=1) +``` + +The users could set `max_keep_ckpts` to only save only small number of checkpoints or decide whether to store state dict of optimizer by `save_optimizer`. More details of the arguments are [here](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.CheckpointHook) + +#### Log config + +The `log_config` wraps multiple logger hooks and enables to set intervals. Now MMCV supports `WandbLoggerHook`, `MlflowLoggerHook`, and `TensorboardLoggerHook`. +The detail usages can be found in the [doc](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.LoggerHook). + +```python +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook'), + dict(type='TensorboardLoggerHook') + ]) +``` + +#### Evaluation config + +The config of `evaluation` will be used to initialize the [`EvalHook`](https://github.com/open-mmlab/mmsegmentation/blob/e3f6f655d69b777341aec2fe8829871cc0beadcb/mmseg/core/evaluation/eval_hooks.py#L7). +Except the key `interval`, other arguments such as `metric` will be passed to the `dataset.evaluate()` + +```python +evaluation = dict(interval=1, metric='mIoU') +``` diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/data_pipeline.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/data_pipeline.md new file mode 100644 index 0000000000000000000000000000000000000000..1eecfe91d433ca897ae47a5ae9b8c5f406a3fe2b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/data_pipeline.md @@ -0,0 +1,171 @@ +# Tutorial 3: Customize Data Pipelines + +## Design of Data pipelines + +Following typical conventions, we use `Dataset` and `DataLoader` for data loading +with multiple workers. `Dataset` returns a dict of data items corresponding +the arguments of models' forward method. +Since the data in semantic segmentation may not be the same size, +we introduce a new `DataContainer` type in MMCV to help collect and distribute +data of different size. +See [here](https://github.com/open-mmlab/mmcv/blob/master/mmcv/parallel/data_container.py) for more details. + +The data preparation pipeline and the dataset is decomposed. Usually a dataset +defines how to process the annotations and a data pipeline defines all the steps to prepare a data dict. +A pipeline consists of a sequence of operations. Each operation takes a dict as input and also output a dict for the next transform. + +The operations are categorized into data loading, pre-processing, formatting and test-time augmentation. + +Here is an pipeline example for PSPNet. + +```python +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 1024) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', flip_ratio=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +``` + +For each operation, we list the related dict fields that are added/updated/removed. + +### Data loading + +`LoadImageFromFile` + +- add: img, img_shape, ori_shape + +`LoadAnnotations` + +- add: gt_semantic_seg, seg_fields + +### Pre-processing + +`Resize` + +- add: scale, scale_idx, pad_shape, scale_factor, keep_ratio +- update: img, img_shape, *seg_fields + +`RandomFlip` + +- add: flip +- update: img, *seg_fields + +`Pad` + +- add: pad_fixed_size, pad_size_divisor +- update: img, pad_shape, *seg_fields + +`RandomCrop` + +- update: img, pad_shape, *seg_fields + +`Normalize` + +- add: img_norm_cfg +- update: img + +`SegRescale` + +- update: gt_semantic_seg + +`PhotoMetricDistortion` + +- update: img + +### Formatting + +`ToTensor` + +- update: specified by `keys`. + +`ImageToTensor` + +- update: specified by `keys`. + +`Transpose` + +- update: specified by `keys`. + +`ToDataContainer` + +- update: specified by `fields`. + +`DefaultFormatBundle` + +- update: img, gt_semantic_seg + +`Collect` + +- add: img_meta (the keys of img_meta is specified by `meta_keys`) +- remove: all other keys except for those specified by `keys` + +### Test time augmentation + +`MultiScaleFlipAug` + +## Extend and use custom pipelines + +1. Write a new pipeline in any file, e.g., `my_pipeline.py`. It takes a dict as input and return a dict. + + ```python + from mmseg.datasets import PIPELINES + + @PIPELINES.register_module() + class MyTransform: + + def __call__(self, results): + results['dummy'] = True + return results + ``` + +2. Import the new class. + + ```python + from .my_pipeline import MyTransform + ``` + +3. Use it in config files. + + ```python + img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) + crop_size = (512, 1024) + train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', flip_ratio=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='MyTransform'), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), + ] + ``` diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/training_tricks.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/training_tricks.md new file mode 100644 index 0000000000000000000000000000000000000000..98a201fa649d94525facd8237d32effd6560c658 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/tutorials/training_tricks.md @@ -0,0 +1,52 @@ +# Tutorial 5: Training Tricks + +MMSegmentation support following training tricks out of box. + +## Different Learning Rate(LR) for Backbone and Heads + +In semantic segmentation, some methods make the LR of heads larger than backbone to achieve better performance or faster convergence. + +In MMSegmentation, you may add following lines to config to make the LR of heads 10 times of backbone. + +```python +optimizer=dict( + paramwise_cfg = dict( + custom_keys={ + 'head': dict(lr_mult=10.)})) +``` + +With this modification, the LR of any parameter group with `'head'` in name will be multiplied by 10. +You may refer to [MMCV doc](https://mmcv.readthedocs.io/en/latest/api.html#mmcv.runner.DefaultOptimizerConstructor) for further details. + +## Online Hard Example Mining (OHEM) + +We implement pixel sampler [here](https://github.com/open-mmlab/mmsegmentation/tree/master/mmseg/core/seg/sampler) for training sampling. +Here is an example config of training PSPNet with OHEM enabled. + +```python +_base_ = './pspnet_r50-d8_512x1024_40k_cityscapes.py' +model=dict( + decode_head=dict( + sampler=dict(type='OHEMPixelSampler', thresh=0.7, min_kept=100000)) ) +``` + +In this way, only pixels with confidence score under 0.7 are used to train. And we keep at least 100000 pixels during training. If `thresh` is not specified, pixels of top ``min_kept`` loss will be selected. + +## Class Balanced Loss + +For dataset that is not balanced in classes distribution, you may change the loss weight of each class. +Here is an example for cityscapes dataset. + +```python +_base_ = './pspnet_r50-d8_512x1024_40k_cityscapes.py' +model=dict( + decode_head=dict( + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0, + # DeepLab used this class weight for cityscapes + class_weight=[0.8373, 0.9180, 0.8660, 1.0345, 1.0166, 0.9969, 0.9754, + 1.0489, 0.8786, 1.0023, 0.9539, 0.9843, 1.1116, 0.9037, + 1.0865, 1.0955, 1.0865, 1.1529, 1.0507]))) +``` + +`class_weight` will be passed into `CrossEntropyLoss` as `weight` argument. Please refer to [PyTorch Doc](https://pytorch.org/docs/stable/nn.html?highlight=crossentropy#torch.nn.CrossEntropyLoss) for details. diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/useful_tools.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/useful_tools.md new file mode 100644 index 0000000000000000000000000000000000000000..514b5680ee4eff351affd3bfa169c11d5dee9b9f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/docs/useful_tools.md @@ -0,0 +1,64 @@ +Apart from training/testing scripts, We provide lots of useful tools under the + `tools/` directory. + +### Get the FLOPs and params (experimental) + +We provide a script adapted from [flops-counter.pytorch](https://github.com/sovrasov/flops-counter.pytorch) to compute the FLOPs and params of a given model. + +```shell +python tools/get_flops.py ${CONFIG_FILE} [--shape ${INPUT_SHAPE}] +``` + +You will get the result like this. + +```none +============================== +Input shape: (3, 2048, 1024) +Flops: 1429.68 GMac +Params: 48.98 M +============================== +``` + +**Note**: This tool is still experimental and we do not guarantee that the number is correct. You may well use the result for simple comparisons, but double check it before you adopt it in technical reports or papers. + +(1) FLOPs are related to the input shape while parameters are not. The default input shape is (1, 3, 1280, 800). +(2) Some operators are not counted into FLOPs like GN and custom operators. + +### Publish a model + +Before you upload a model to AWS, you may want to +(1) convert model weights to CPU tensors, (2) delete the optimizer states and +(3) compute the hash of the checkpoint file and append the hash id to the filename. + +```shell +python tools/publish_model.py ${INPUT_FILENAME} ${OUTPUT_FILENAME} +``` + +E.g., + +```shell +python tools/publish_model.py work_dirs/pspnet/latest.pth psp_r50_hszhao_200ep.pth +``` + +The final output filename will be `psp_r50_512x1024_40ki_cityscapes-{hash id}.pth`. + +### Convert to ONNX (experimental) + +We provide a script to convert model to [ONNX](https://github.com/onnx/onnx) format. The converted model could be visualized by tools like [Netron](https://github.com/lutzroeder/netron). Besides, we also support comparing the output results between Pytorch and ONNX model. + +```shell +python tools/pytorch2onnx.py ${CONFIG_FILE} --checkpoint ${CHECKPOINT_FILE} --output-file ${ONNX_FILE} [--shape ${INPUT_SHAPE} --verify] +``` + +**Note**: This tool is still experimental. Some customized operators are not supported for now. + +## Miscellaneous + +### Print the entire config + +`tools/print_config.py` prints the whole config verbatim, expanding all its + imports. + +```shell +python tools/print_config.py ${CONFIG} [-h] [--options ${OPTIONS [OPTIONS...]}] +``` diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/ade20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/ade20k.py new file mode 100644 index 0000000000000000000000000000000000000000..efc8b4bb20c981f3db6df7eb52b3dc0744c94cc0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/ade20k.py @@ -0,0 +1,54 @@ +# dataset settings +dataset_type = 'ADE20KDataset' +data_root = 'data/ade/ADEChallengeData2016' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 512) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations', reduce_zero_label=True), + dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 512), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/ade20k_repeat.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/ade20k_repeat.py new file mode 100644 index 0000000000000000000000000000000000000000..27fac27761ca53368d9f8cba78f1d68bc1d10416 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/ade20k_repeat.py @@ -0,0 +1,57 @@ +# dataset settings +dataset_type = 'ADE20KDataset' +data_root = 'data/ade/ADEChallengeData2016' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 512) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations', reduce_zero_label=True), + dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 512), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='AlignedResize', keep_ratio=True, size_divisor=32), # Ensure the long and short sides are divisible by 32 + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=50, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/chase_db1.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/chase_db1.py new file mode 100644 index 0000000000000000000000000000000000000000..298594ea925f87f22b37094a2ec50e370aec96a0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/chase_db1.py @@ -0,0 +1,59 @@ +# dataset settings +dataset_type = 'ChaseDB1Dataset' +data_root = 'data/CHASE_DB1' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +img_scale = (960, 999) +crop_size = (128, 128) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']) +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) +] + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=40000, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes.py new file mode 100644 index 0000000000000000000000000000000000000000..f21867c63e1835f6fceb61f066e802fd8fd2a735 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes.py @@ -0,0 +1,54 @@ +# dataset settings +dataset_type = 'CityscapesDataset' +data_root = 'data/cityscapes/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 1024) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/train', + ann_dir='gtFine/train', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes_1024x1024_repeat.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes_1024x1024_repeat.py new file mode 100644 index 0000000000000000000000000000000000000000..fdf36ee3cf719f82effc2559704d5270dce60fe0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes_1024x1024_repeat.py @@ -0,0 +1,57 @@ +# dataset settings +dataset_type = 'CityscapesDataset' +data_root = '' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (1024, 1024) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type='RepeatDataset', + times=500, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/train', + ann_dir='gtFine/train', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes_768x768_repeat.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes_768x768_repeat.py new file mode 100644 index 0000000000000000000000000000000000000000..ca140341a05a5f8eaf1bb9df3dea215f44a9ff21 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes_768x768_repeat.py @@ -0,0 +1,57 @@ +# dataset settings +dataset_type = 'CityscapesDataset' +data_root = 'data/cityscapes/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (768, 768) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type='RepeatDataset', + times=500, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/train', + ann_dir='gtFine/train', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes_repeat.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes_repeat.py new file mode 100644 index 0000000000000000000000000000000000000000..ef92413029c14d9e9482e729add52cd88aaeffb4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/cityscapes_repeat.py @@ -0,0 +1,57 @@ +# dataset settings +dataset_type = 'CityscapesDataset' +data_root = 'data/cityscapes/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 1024) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type='RepeatDataset', + times=300, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/train', + ann_dir='gtFine/train', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/drive.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/drive.py new file mode 100644 index 0000000000000000000000000000000000000000..06e8ff606e0d2a4514ec8b7d2c6c436a32efcbf4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/drive.py @@ -0,0 +1,59 @@ +# dataset settings +dataset_type = 'DRIVEDataset' +data_root = 'data/DRIVE' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +img_scale = (584, 565) +crop_size = (64, 64) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']) +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) +] + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=40000, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/hrf.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/hrf.py new file mode 100644 index 0000000000000000000000000000000000000000..242d790eb1b83e75cf6b7eaa7a35c674099311ad --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/hrf.py @@ -0,0 +1,59 @@ +# dataset settings +dataset_type = 'HRFDataset' +data_root = 'data/HRF' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +img_scale = (2336, 3504) +crop_size = (256, 256) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']) +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) +] + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=40000, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/mapillary_1024x1024_repeat.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/mapillary_1024x1024_repeat.py new file mode 100644 index 0000000000000000000000000000000000000000..58c9c32daf55b8a0a3c3c274c10a3d9d16b8b2c2 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/mapillary_1024x1024_repeat.py @@ -0,0 +1,58 @@ +# dataset settings +dataset_type = 'MapillaryDataset' +data_root = 'data/Mapillary/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (1024, 1024) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='MaillaryHack'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 1.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='AlignedResize', keep_ratio=True, size_divisor=32), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type='RepeatDataset', + times=100, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='training/images', + ann_dir='training/labels', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='validation/images', + ann_dir='validation/labels', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='validation/images', + ann_dir='validation/labels', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/mapillary_768x768_repeat.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/mapillary_768x768_repeat.py new file mode 100644 index 0000000000000000000000000000000000000000..c032dc8b098f891ff2e74d170bd9c5a28252b504 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/mapillary_768x768_repeat.py @@ -0,0 +1,58 @@ +# dataset settings +dataset_type = 'MapillaryDataset' +data_root = 'data/Mapillary/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (768, 768) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='MaillaryHack'), + dict(type='Resize', img_scale=(2048, 1024), ratio_range=(0.5, 1.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 1024), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='AlignedResize', keep_ratio=True, size_divisor=32), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=2, + train=dict( + type='RepeatDataset', + times=100, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='training/images', + ann_dir='training/labels', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='validation/images', + ann_dir='validation/labels', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='validation/images', + ann_dir='validation/labels', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/pascal_context.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/pascal_context.py new file mode 100644 index 0000000000000000000000000000000000000000..ff65bad1b86d7e3a5980bb5b9fc55798dc8df5f4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/pascal_context.py @@ -0,0 +1,60 @@ +# dataset settings +dataset_type = 'PascalContextDataset' +data_root = 'data/VOCdevkit/VOC2010/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) + +img_scale = (520, 520) +crop_size = (480, 480) + +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClassContext', + split='ImageSets/SegmentationContext/train.txt', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClassContext', + split='ImageSets/SegmentationContext/val.txt', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClassContext', + split='ImageSets/SegmentationContext/val.txt', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/pascal_voc12.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/pascal_voc12.py new file mode 100644 index 0000000000000000000000000000000000000000..ba1d42d0c5781f56dc177d860d856bb34adce555 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/pascal_voc12.py @@ -0,0 +1,57 @@ +# dataset settings +dataset_type = 'PascalVOCDataset' +data_root = 'data/VOCdevkit/VOC2012' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 512) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(2048, 512), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 512), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClass', + split='ImageSets/Segmentation/train.txt', + pipeline=train_pipeline), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClass', + split='ImageSets/Segmentation/val.txt', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='JPEGImages', + ann_dir='SegmentationClass', + split='ImageSets/Segmentation/val.txt', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/pascal_voc12_aug.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/pascal_voc12_aug.py new file mode 100644 index 0000000000000000000000000000000000000000..3f23b6717d53ad29f02dd15046802a2631a5076b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/pascal_voc12_aug.py @@ -0,0 +1,9 @@ +_base_ = './pascal_voc12.py' +# dataset settings +data = dict( + train=dict( + ann_dir=['SegmentationClass', 'SegmentationClassAug'], + split=[ + 'ImageSets/Segmentation/train.txt', + 'ImageSets/Segmentation/aug.txt' + ])) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/stare.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/stare.py new file mode 100644 index 0000000000000000000000000000000000000000..3f71b25488cc11a6b4d582ac52b5a24e1ad1cf8e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/datasets/stare.py @@ -0,0 +1,59 @@ +# dataset settings +dataset_type = 'STAREDataset' +data_root = 'data/STARE' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +img_scale = (605, 700) +crop_size = (128, 128) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=img_scale, ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']) +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=img_scale, + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75, 2.0], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']) + ]) +] + +data = dict( + samples_per_gpu=4, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=40000, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/default_runtime.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/default_runtime.py new file mode 100644 index 0000000000000000000000000000000000000000..b564cc4e7e7d9a67dacaaddecb100e4d8f5c005b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/default_runtime.py @@ -0,0 +1,14 @@ +# yapf:disable +log_config = dict( + interval=50, + hooks=[ + dict(type='TextLoggerHook', by_epoch=False), + # dict(type='TensorboardLoggerHook') + ]) +# yapf:enable +dist_params = dict(backend='nccl') +log_level = 'INFO' +load_from = None +resume_from = None +workflow = [('train', 1)] +cudnn_benchmark = True diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ann_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ann_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..a2cb653827e44e6015b3b83bc578003e614a6aa1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ann_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='ANNHead', + in_channels=[1024, 2048], + in_index=[2, 3], + channels=512, + project_channels=256, + query_scales=(1, ), + key_pool_scales=(1, 3, 6, 8), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/apcnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/apcnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..c8f5316cbcf3896ba9de7ca2c801eba512f01d5e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/apcnet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='APCHead', + in_channels=2048, + in_index=3, + channels=512, + pool_scales=(1, 2, 3, 6), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=dict(type='SyncBN', requires_grad=True), + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ccnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ccnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..794148f576b9e215c3c6963e73dffe98204b7717 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ccnet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='CCHead', + in_channels=2048, + in_index=3, + channels=512, + recurrence=2, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/cgnet.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/cgnet.py new file mode 100644 index 0000000000000000000000000000000000000000..eff8d9458c877c5db894957e0b1b4597e40da6ab --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/cgnet.py @@ -0,0 +1,35 @@ +# model settings +norm_cfg = dict(type='SyncBN', eps=1e-03, requires_grad=True) +model = dict( + type='EncoderDecoder', + backbone=dict( + type='CGNet', + norm_cfg=norm_cfg, + in_channels=3, + num_channels=(32, 64, 128), + num_blocks=(3, 21), + dilations=(2, 4), + reductions=(8, 16)), + decode_head=dict( + type='FCNHead', + in_channels=256, + in_index=2, + channels=256, + num_convs=0, + concat_input=False, + dropout_ratio=0, + num_classes=19, + norm_cfg=norm_cfg, + loss_decode=dict( + type='CrossEntropyLoss', + use_sigmoid=False, + loss_weight=1.0, + class_weight=[ + 2.5959933, 6.7415504, 3.5354059, 9.8663225, 9.690899, 9.369352, + 10.289121, 9.953208, 4.3097677, 9.490387, 7.674431, 9.396905, + 10.347791, 6.3927646, 10.226669, 10.241062, 10.280587, + 10.396974, 10.055647 + ])), + # model training and testing settings + train_cfg=dict(sampler=None), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/danet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/danet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..2c934939fac48525f22ad86f489a041dd7db7d09 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/danet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='DAHead', + in_channels=2048, + in_index=3, + channels=512, + pam_channels=64, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/deeplabv3_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/deeplabv3_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..d7a43bee01422ad4795dd27874e0cd4bb6cbfecf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/deeplabv3_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='ASPPHead', + in_channels=2048, + in_index=3, + channels=512, + dilations=(1, 12, 24, 36), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/deeplabv3_unet_s5-d16.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/deeplabv3_unet_s5-d16.py new file mode 100644 index 0000000000000000000000000000000000000000..0cd262999d8b2cb8e14a5c32190ae73f479d8e81 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/deeplabv3_unet_s5-d16.py @@ -0,0 +1,50 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained=None, + backbone=dict( + type='UNet', + in_channels=3, + base_channels=64, + num_stages=5, + strides=(1, 1, 1, 1, 1), + enc_num_convs=(2, 2, 2, 2, 2), + dec_num_convs=(2, 2, 2, 2), + downsamples=(True, True, True, True), + enc_dilations=(1, 1, 1, 1, 1), + dec_dilations=(1, 1, 1, 1), + with_cp=False, + conv_cfg=None, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + upsample_cfg=dict(type='InterpConv'), + norm_eval=False), + decode_head=dict( + type='ASPPHead', + in_channels=64, + in_index=4, + channels=16, + dilations=(1, 12, 24, 36), + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=128, + in_index=3, + channels=64, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='slide', crop_size=256, stride=170)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/deeplabv3plus_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/deeplabv3plus_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..050e39e091d816df9028d23aa3ecf9db74e441e1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/deeplabv3plus_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='DepthwiseSeparableASPPHead', + in_channels=2048, + in_index=3, + channels=512, + dilations=(1, 12, 24, 36), + c1_in_channels=256, + c1_channels=48, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/dmnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/dmnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..d22ba52640bebd805b3b8d07025e276dfb023759 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/dmnet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='DMHead', + in_channels=2048, + in_index=3, + channels=512, + filter_sizes=(1, 3, 5, 7), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=dict(type='SyncBN', requires_grad=True), + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/dnl_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/dnl_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..edb4c174c51e34c103737ba39bfc48bf831e561d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/dnl_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='DNLHead', + in_channels=2048, + in_index=3, + channels=512, + dropout_ratio=0.1, + reduction=2, + use_scale=True, + mode='embedded_gaussian', + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/emanet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/emanet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..26adcd430926de0862204a71d345f2543167f27b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/emanet_r50-d8.py @@ -0,0 +1,47 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='EMAHead', + in_channels=2048, + in_index=3, + channels=256, + ema_channels=512, + num_bases=64, + num_stages=3, + momentum=0.1, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/encnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/encnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..be777123a886503172a95fe0719e956a147bbd68 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/encnet_r50-d8.py @@ -0,0 +1,48 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='EncHead', + in_channels=[512, 1024, 2048], + in_index=(1, 2, 3), + channels=512, + num_codes=32, + use_se_loss=True, + add_lateral=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0), + loss_se_decode=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.2)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fast_scnn.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fast_scnn.py new file mode 100644 index 0000000000000000000000000000000000000000..32fdeb659355a5ce5ef2cc7c2f30742703811cdf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fast_scnn.py @@ -0,0 +1,57 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True, momentum=0.01) +model = dict( + type='EncoderDecoder', + backbone=dict( + type='FastSCNN', + downsample_dw_channels=(32, 48), + global_in_channels=64, + global_block_channels=(64, 96, 128), + global_block_strides=(2, 2, 1), + global_out_channels=128, + higher_in_channels=64, + lower_in_channels=128, + fusion_out_channels=128, + out_indices=(0, 1, 2), + norm_cfg=norm_cfg, + align_corners=False), + decode_head=dict( + type='DepthwiseSeparableFCNHead', + in_channels=128, + channels=128, + concat_input=False, + num_classes=19, + in_index=-1, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)), + auxiliary_head=[ + dict( + type='FCNHead', + in_channels=128, + channels=32, + num_convs=1, + num_classes=19, + in_index=-2, + norm_cfg=norm_cfg, + concat_input=False, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)), + dict( + type='FCNHead', + in_channels=64, + channels=32, + num_convs=1, + num_classes=19, + in_index=-3, + norm_cfg=norm_cfg, + concat_input=False, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=True, loss_weight=0.4)), + ], + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fcn_hr18.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fcn_hr18.py new file mode 100644 index 0000000000000000000000000000000000000000..c3e299bc89ada56ca14bbffcbdb08a586b8ed9e9 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fcn_hr18.py @@ -0,0 +1,52 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://msra/hrnetv2_w18', + backbone=dict( + type='HRNet', + norm_cfg=norm_cfg, + norm_eval=False, + extra=dict( + stage1=dict( + num_modules=1, + num_branches=1, + block='BOTTLENECK', + num_blocks=(4, ), + num_channels=(64, )), + stage2=dict( + num_modules=1, + num_branches=2, + block='BASIC', + num_blocks=(4, 4), + num_channels=(18, 36)), + stage3=dict( + num_modules=4, + num_branches=3, + block='BASIC', + num_blocks=(4, 4, 4), + num_channels=(18, 36, 72)), + stage4=dict( + num_modules=3, + num_branches=4, + block='BASIC', + num_blocks=(4, 4, 4, 4), + num_channels=(18, 36, 72, 144)))), + decode_head=dict( + type='FCNHead', + in_channels=[18, 36, 72, 144], + in_index=(0, 1, 2, 3), + channels=sum([18, 36, 72, 144]), + input_transform='resize_concat', + kernel_size=1, + num_convs=1, + concat_input=False, + dropout_ratio=-1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fcn_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fcn_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..5e98f6cc918b6146fc6d613c6918e825ef1355c3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fcn_r50-d8.py @@ -0,0 +1,45 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='FCNHead', + in_channels=2048, + in_index=3, + channels=512, + num_convs=2, + concat_input=True, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fcn_unet_s5-d16.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fcn_unet_s5-d16.py new file mode 100644 index 0000000000000000000000000000000000000000..a33e7972877f902d0e7d18401ca675e3e4e60a18 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fcn_unet_s5-d16.py @@ -0,0 +1,51 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained=None, + backbone=dict( + type='UNet', + in_channels=3, + base_channels=64, + num_stages=5, + strides=(1, 1, 1, 1, 1), + enc_num_convs=(2, 2, 2, 2, 2), + dec_num_convs=(2, 2, 2, 2), + downsamples=(True, True, True, True), + enc_dilations=(1, 1, 1, 1, 1), + dec_dilations=(1, 1, 1, 1), + with_cp=False, + conv_cfg=None, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + upsample_cfg=dict(type='InterpConv'), + norm_eval=False), + decode_head=dict( + type='FCNHead', + in_channels=64, + in_index=4, + channels=64, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=128, + in_index=3, + channels=64, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='slide', crop_size=256, stride=170)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fpn_r50.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fpn_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..86ab327db92e44c14822d65f1c9277cb007f17c1 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/fpn_r50.py @@ -0,0 +1,36 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 1, 1), + strides=(1, 2, 2, 2), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + neck=dict( + type='FPN', + in_channels=[256, 512, 1024, 2048], + out_channels=256, + num_outs=4), + decode_head=dict( + type='FPNHead', + in_channels=[256, 256, 256, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/gcnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/gcnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..3d2ad69f5c22adfe79d5fdabf920217628987166 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/gcnet_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='GCHead', + in_channels=2048, + in_index=3, + channels=512, + ratio=1 / 4., + pooling_type='att', + fusion_types=('channel_add', ), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/lraspp_m-v3-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/lraspp_m-v3-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..93258242a90695cc94a7c6bd41562d6a75988771 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/lraspp_m-v3-d8.py @@ -0,0 +1,25 @@ +# model settings +norm_cfg = dict(type='SyncBN', eps=0.001, requires_grad=True) +model = dict( + type='EncoderDecoder', + backbone=dict( + type='MobileNetV3', + arch='large', + out_indices=(1, 3, 16), + norm_cfg=norm_cfg), + decode_head=dict( + type='LRASPPHead', + in_channels=(16, 24, 960), + in_index=(0, 1, 2), + channels=128, + input_transform='multiple_select', + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/nonlocal_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/nonlocal_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..5674a39854cafd1f2e363bac99c58ccae62f24da --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/nonlocal_r50-d8.py @@ -0,0 +1,46 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='NLHead', + in_channels=2048, + in_index=3, + channels=512, + dropout_ratio=0.1, + reduction=2, + use_scale=True, + mode='embedded_gaussian', + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ocrnet_hr18.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ocrnet_hr18.py new file mode 100644 index 0000000000000000000000000000000000000000..c60f62a7cdf3f5c5096a7a7e725e8268fddcb057 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ocrnet_hr18.py @@ -0,0 +1,68 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='CascadeEncoderDecoder', + num_stages=2, + pretrained='open-mmlab://msra/hrnetv2_w18', + backbone=dict( + type='HRNet', + norm_cfg=norm_cfg, + norm_eval=False, + extra=dict( + stage1=dict( + num_modules=1, + num_branches=1, + block='BOTTLENECK', + num_blocks=(4, ), + num_channels=(64, )), + stage2=dict( + num_modules=1, + num_branches=2, + block='BASIC', + num_blocks=(4, 4), + num_channels=(18, 36)), + stage3=dict( + num_modules=4, + num_branches=3, + block='BASIC', + num_blocks=(4, 4, 4), + num_channels=(18, 36, 72)), + stage4=dict( + num_modules=3, + num_branches=4, + block='BASIC', + num_blocks=(4, 4, 4, 4), + num_channels=(18, 36, 72, 144)))), + decode_head=[ + dict( + type='FCNHead', + in_channels=[18, 36, 72, 144], + channels=sum([18, 36, 72, 144]), + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + kernel_size=1, + num_convs=1, + concat_input=False, + dropout_ratio=-1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=[18, 36, 72, 144], + in_index=(0, 1, 2, 3), + input_transform='resize_concat', + channels=512, + ocr_channels=256, + dropout_ratio=-1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + ], + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ocrnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ocrnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..615aa3ff703942b6c22b2d6e9642504dd3e41ebd --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/ocrnet_r50-d8.py @@ -0,0 +1,47 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='CascadeEncoderDecoder', + num_stages=2, + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=[ + dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + dict( + type='OCRHead', + in_channels=2048, + in_index=3, + channels=512, + ocr_channels=256, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ], + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/pointrend_r50.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/pointrend_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..9d323dbf9466d41e0800aa57ef84045f3d874bdf --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/pointrend_r50.py @@ -0,0 +1,56 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='CascadeEncoderDecoder', + num_stages=2, + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 1, 1), + strides=(1, 2, 2, 2), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + neck=dict( + type='FPN', + in_channels=[256, 512, 1024, 2048], + out_channels=256, + num_outs=4), + decode_head=[ + dict( + type='FPNHead', + in_channels=[256, 256, 256, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=-1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + dict( + type='PointHead', + in_channels=[256], + in_index=[0], + channels=256, + num_fcs=3, + coarse_pred_each_layer=True, + dropout_ratio=-1, + num_classes=19, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)) + ], + # model training and testing settings + train_cfg=dict( + num_points=2048, oversample_ratio=3, importance_sample_ratio=0.75), + test_cfg=dict( + mode='whole', + subdivision_steps=2, + subdivision_num_points=8196, + scale_factor=2)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/psanet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/psanet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..689513fa9d2a40f14bf0ae4ae61f38f0dcc1b3da --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/psanet_r50-d8.py @@ -0,0 +1,49 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='PSAHead', + in_channels=2048, + in_index=3, + channels=512, + mask_size=(97, 97), + psa_type='bi-direction', + compact=False, + shrink_factor=2, + normalization_factor=1.0, + psa_softmax=True, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/pspnet_r50-d8.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/pspnet_r50-d8.py new file mode 100644 index 0000000000000000000000000000000000000000..f451e08ad2eb0732dcb806b1851eb978d4acf136 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/pspnet_r50-d8.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 2, 4), + strides=(1, 2, 1, 1), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='PSPHead', + in_channels=2048, + in_index=3, + channels=512, + pool_scales=(1, 2, 3, 6), + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/pspnet_unet_s5-d16.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/pspnet_unet_s5-d16.py new file mode 100644 index 0000000000000000000000000000000000000000..fcff9ec4f41fad158344ecd77313dc14564f3682 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/pspnet_unet_s5-d16.py @@ -0,0 +1,50 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained=None, + backbone=dict( + type='UNet', + in_channels=3, + base_channels=64, + num_stages=5, + strides=(1, 1, 1, 1, 1), + enc_num_convs=(2, 2, 2, 2, 2), + dec_num_convs=(2, 2, 2, 2), + downsamples=(True, True, True, True), + enc_dilations=(1, 1, 1, 1, 1), + dec_dilations=(1, 1, 1, 1), + with_cp=False, + conv_cfg=None, + norm_cfg=norm_cfg, + act_cfg=dict(type='ReLU'), + upsample_cfg=dict(type='InterpConv'), + norm_eval=False), + decode_head=dict( + type='PSPHead', + in_channels=64, + in_index=4, + channels=16, + pool_scales=(1, 2, 3, 6), + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=128, + in_index=3, + channels=64, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=2, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='slide', crop_size=256, stride=170)) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/segformer.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/segformer.py new file mode 100644 index 0000000000000000000000000000000000000000..347ac3d9a462d4f1218e0baf52de4ca347680ca3 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/segformer.py @@ -0,0 +1,24 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained=None, + backbone=dict( + type='IMTRv21_5', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/upernet_r50.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/upernet_r50.py new file mode 100644 index 0000000000000000000000000000000000000000..10974962fdd7136031fd06de1700f497d355ceaa --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/models/upernet_r50.py @@ -0,0 +1,44 @@ +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +model = dict( + type='EncoderDecoder', + pretrained='open-mmlab://resnet50_v1c', + backbone=dict( + type='ResNetV1c', + depth=50, + num_stages=4, + out_indices=(0, 1, 2, 3), + dilations=(1, 1, 1, 1), + strides=(1, 2, 2, 2), + norm_cfg=norm_cfg, + norm_eval=False, + style='pytorch', + contract_dilation=True), + decode_head=dict( + type='UPerHead', + in_channels=[256, 512, 1024, 2048], + in_index=[0, 1, 2, 3], + pool_scales=(1, 2, 3, 6), + channels=512, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + auxiliary_head=dict( + type='FCNHead', + in_channels=1024, + in_index=2, + channels=256, + num_convs=1, + concat_input=False, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + loss_decode=dict( + type='CrossEntropyLoss', use_sigmoid=False, loss_weight=0.4)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_160k.py new file mode 100644 index 0000000000000000000000000000000000000000..52603890b10f25faf8eec9f9e5a4468fae09b811 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_160k.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=160000) +checkpoint_config = dict(by_epoch=False, interval=16000) +evaluation = dict(interval=16000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_160k_adamw.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_160k_adamw.py new file mode 100644 index 0000000000000000000000000000000000000000..f8624ab699577cd27ad42bb4053b9a73678cde1b --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_160k_adamw.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='AdamW', lr=0.0002, weight_decay=0.0001) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=0.0, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=160000) +checkpoint_config = dict(by_epoch=False, interval=4000) +evaluation = dict(interval=40000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_20k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_20k.py new file mode 100644 index 0000000000000000000000000000000000000000..bf780a1b6f6521833c6a5859675147824efa599d --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_20k.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=20000) +checkpoint_config = dict(by_epoch=False, interval=2000) +evaluation = dict(interval=2000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_40k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_40k.py new file mode 100644 index 0000000000000000000000000000000000000000..cdbf841abcb26eed87bf76ab816aff4bae0630ee --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_40k.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=40000) +checkpoint_config = dict(by_epoch=False, interval=4000) +evaluation = dict(interval=4000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_40k_adamw.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_40k_adamw.py new file mode 100644 index 0000000000000000000000000000000000000000..dc2fcd07a6a69e42494ae8f42aa6ddce96ee336f --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_40k_adamw.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='AdamW', lr=0.0002, weight_decay=0.0001) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=0.0, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=40000) +checkpoint_config = dict(by_epoch=False, interval=4000) +evaluation = dict(interval=4000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_80k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_80k.py new file mode 100644 index 0000000000000000000000000000000000000000..c190cee6bdc7922b688ea75dc8f152fa15c24617 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_80k.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='SGD', lr=0.01, momentum=0.9, weight_decay=0.0005) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=1e-4, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=80000) +checkpoint_config = dict(by_epoch=False, interval=8000) +evaluation = dict(interval=8000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_80k_adamw.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_80k_adamw.py new file mode 100644 index 0000000000000000000000000000000000000000..b3d15f6c2dfb3332088af25b749f8c4c20e831b4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/_base_/schedules/schedule_80k_adamw.py @@ -0,0 +1,9 @@ +# optimizer +optimizer = dict(type='AdamW', lr=0.0002, weight_decay=0.0001) +optimizer_config = dict() +# learning policy +lr_config = dict(policy='poly', power=0.9, min_lr=0.0, by_epoch=False) +# runtime settings +runner = dict(type='IterBasedRunner', max_iters=80000) +checkpoint_config = dict(by_epoch=False, interval=4000) +evaluation = dict(interval=4000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.1024x1024.city.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.1024x1024.city.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..3f6614ba738d0042719896b6aef48e7906b48e12 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.1024x1024.city.160k.py @@ -0,0 +1,51 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/cityscapes_1024x1024_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b0.pth', + backbone=dict( + type='mit_b0', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[32, 64, 160, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=256), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + # test_cfg=dict(mode='whole')) + test_cfg=dict(mode='slide', crop_size=(1024,1024), stride=(768,768))) + +# data +data = dict(samples_per_gpu=4, workers_per_gpu=8) +evaluation = dict(interval=4000, metric='mIoU') + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.0001, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.512x1024.city.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.512x1024.city.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..0cdae7f11d8040d832b44e1af9cceeec5a649a08 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.512x1024.city.160k.py @@ -0,0 +1,106 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='BN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b0.pth', + backbone=dict( + type='mit_b0', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[32, 64, 160, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=256), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) + +# dataset settings +dataset_type = 'CityscapesDataset' +data_root = '/home/crm/mmsegmentation/data/cityscapes/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (512, 1024) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(1024, 512), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(1024, 512), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=1, + workers_per_gpu=2, + train=dict( + type='RepeatDataset', + times=500, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/train', + ann_dir='gtFine/train', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline)) + +evaluation = dict(interval=4000, metric='mIoU') + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.512x512.ade.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.512x512.ade.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..ec024b85fdf7b8071b449fb1abb5318ffb34bf2c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.512x512.ade.160k.py @@ -0,0 +1,48 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/ade20k_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b0.pth', + backbone=dict( + type='mit_b0', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[32, 64, 160, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=256), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + +data = dict(samples_per_gpu=2) +evaluation = dict(interval=16000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.640x1280.city.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.640x1280.city.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..c88ffbd8e46357ea3d71f1628bc2d6d8314a6520 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.640x1280.city.160k.py @@ -0,0 +1,105 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b0.pth', + backbone=dict( + type='mit_b0', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[32, 64, 160, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=256), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) + +# dataset settings +dataset_type = 'CityscapesDataset' +data_root = 'data/cityscapes/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (640, 1280) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(1280, 640), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(1280, 640), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=1, + workers_per_gpu=2, + train=dict( + type='RepeatDataset', + times=500, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/train', + ann_dir='gtFine/train', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline)) + +evaluation = dict(interval=4000, metric='mIoU') + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.768x768.city.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.768x768.city.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..02be43c086efa84bb9f9e907b00a76d12fc87c43 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B0/segformer.b0.768x768.city.160k.py @@ -0,0 +1,106 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b0.pth', + backbone=dict( + type='mit_b0', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[32, 64, 160, 256], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=256), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + # test_cfg=dict(mode='whole')) + test_cfg=dict(mode='slide', crop_size=(768,768), stride=(768,768))) + +# dataset settings +dataset_type = 'CityscapesDataset' +data_root = 'data/cityscapes/' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (768, 768) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations'), + dict(type='Resize', img_scale=(1536, 768), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(1536, 768), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='Resize', keep_ratio=True), + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=1, + workers_per_gpu=2, + train=dict( + type='RepeatDataset', + times=500, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/train', + ann_dir='gtFine/train', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='leftImg8bit/val', + ann_dir='gtFine/val', + pipeline=test_pipeline)) + +evaluation = dict(interval=4000, metric='mIoU') + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B1/segformer.b1.1024x1024.city.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B1/segformer.b1.1024x1024.city.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..554492421daf3ca7a49b264860b7d90974c3bcf4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B1/segformer.b1.1024x1024.city.160k.py @@ -0,0 +1,51 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/cityscapes_1024x1024_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b1.pth', + backbone=dict( + type='mit_b1', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=256), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + # test_cfg=dict(mode='whole')) + test_cfg=dict(mode='slide', crop_size=(1024,1024), stride=(768,768))) + +# data +data = dict(samples_per_gpu=1) +evaluation = dict(interval=4000, metric='mIoU') + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..1b68633b9ecdb4a053ff0ec13bb8bc62a4e8a10e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B1/segformer.b1.512x512.ade.160k.py @@ -0,0 +1,48 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/ade20k_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b1.pth', + backbone=dict( + type='mit_b1', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=256), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + +data = dict(samples_per_gpu=2) +evaluation = dict(interval=16000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B2/segformer.b2.1024x1024.city.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B2/segformer.b2.1024x1024.city.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..fc0f76e7a559bc2fc0b2bdf05f0f6fb9254dd33e --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B2/segformer.b2.1024x1024.city.160k.py @@ -0,0 +1,51 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/cityscapes_1024x1024_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b2.pth', + backbone=dict( + type='mit_b2', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=768), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + # test_cfg=dict(mode='whole')) + test_cfg=dict(mode='slide', crop_size=(1024,1024), stride=(768,768))) + +# data +data = dict(samples_per_gpu=1) +evaluation = dict(interval=4000, metric='mIoU') + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B2/segformer.b2.512x512.ade.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B2/segformer.b2.512x512.ade.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..d7f736bacab6fb2784cc24fe8097874e80b90e8a --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B2/segformer.b2.512x512.ade.160k.py @@ -0,0 +1,48 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/ade20k_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b2.pth', + backbone=dict( + type='mit_b2', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + # type='MLPHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=768), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + +data = dict(samples_per_gpu=2) +evaluation = dict(interval=16000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B3/segformer.b3.1024x1024.city.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B3/segformer.b3.1024x1024.city.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..0a2c47a9e7bda3a9b25e1f8d79e15ba1f80d9c27 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B3/segformer.b3.1024x1024.city.160k.py @@ -0,0 +1,51 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/cityscapes_1024x1024_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b3.pth', + backbone=dict( + type='mit_b3', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=768), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + # test_cfg=dict(mode='whole')) + test_cfg=dict(mode='slide', crop_size=(1024,1024), stride=(768,768))) + +# data +data = dict(samples_per_gpu=1) +evaluation = dict(interval=4000, metric='mIoU') + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B3/segformer.b3.512x512.ade.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B3/segformer.b3.512x512.ade.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..2fa27342c05ffc71a0b9a61ecb1a91cd2d2368a2 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B3/segformer.b3.512x512.ade.160k.py @@ -0,0 +1,48 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/ade20k_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b3.pth', + backbone=dict( + type='mit_b3', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=768), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + +data = dict(samples_per_gpu=2) +evaluation = dict(interval=16000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B4/segformer.b4.1024x1024.city.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B4/segformer.b4.1024x1024.city.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..55d515de12af55fbd00be01bdfe94a839d4cc866 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B4/segformer.b4.1024x1024.city.160k.py @@ -0,0 +1,51 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/cityscapes_1024x1024_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b4.pth', + backbone=dict( + type='mit_b4', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=768), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + # test_cfg=dict(mode='whole')) + test_cfg=dict(mode='slide', crop_size=(1024,1024), stride=(768,768))) + +# data +data = dict(samples_per_gpu=1) +evaluation = dict(interval=4000, metric='mIoU') + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B4/segformer.b4.512x512.ade.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B4/segformer.b4.512x512.ade.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..1bbb6f4e8441b547a9bdd09923a296c267ca4eb4 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B4/segformer.b4.512x512.ade.160k.py @@ -0,0 +1,48 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/ade20k_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b4.pth', + backbone=dict( + type='mit_b4', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=768), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + +data = dict(samples_per_gpu=2) +evaluation = dict(interval=16000, metric='mIoU') diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B5/segformer.b5.1024x1024.city.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B5/segformer.b5.1024x1024.city.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..166651d759448ef22867387c231a63900a0a33e8 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B5/segformer.b5.1024x1024.city.160k.py @@ -0,0 +1,51 @@ +_base_ = [ + '../../_base_/models/segformer.py', + '../../_base_/datasets/cityscapes_1024x1024_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b5.pth', + backbone=dict( + type='mit_b5', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=19, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=768), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + # test_cfg=dict(mode='whole')) + test_cfg=dict(mode='slide', crop_size=(1024,1024), stride=(768,768))) + +# data +data = dict(samples_per_gpu=1) +evaluation = dict(interval=4000, metric='mIoU') + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B5/segformer.b5.640x640.ade.160k.py b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B5/segformer.b5.640x640.ade.160k.py new file mode 100644 index 0000000000000000000000000000000000000000..d315f00a959a36666eaac469c601e9f99da76123 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/local_configs/segformer/B5/segformer.b5.640x640.ade.160k.py @@ -0,0 +1,105 @@ +_base_ = [ + '../../_base_/models/segformer.py', + # '../../_base_/datasets/ade20k_repeat.py', + '../../_base_/default_runtime.py', + '../../_base_/schedules/schedule_160k_adamw.py' +] + +# data settings +dataset_type = 'ADE20KDataset' +data_root = 'data/ade/ADEChallengeData2016' +img_norm_cfg = dict( + mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True) +crop_size = (640, 640) +train_pipeline = [ + dict(type='LoadImageFromFile'), + dict(type='LoadAnnotations', reduce_zero_label=True), + dict(type='Resize', img_scale=(2048, 640), ratio_range=(0.5, 2.0)), + dict(type='RandomCrop', crop_size=crop_size, cat_max_ratio=0.75), + dict(type='RandomFlip', prob=0.5), + dict(type='PhotoMetricDistortion'), + dict(type='Normalize', **img_norm_cfg), + dict(type='Pad', size=crop_size, pad_val=0, seg_pad_val=255), + dict(type='DefaultFormatBundle'), + dict(type='Collect', keys=['img', 'gt_semantic_seg']), +] +test_pipeline = [ + dict(type='LoadImageFromFile'), + dict( + type='MultiScaleFlipAug', + img_scale=(2048, 640), + # img_ratios=[0.5, 0.75, 1.0, 1.25, 1.5, 1.75], + flip=False, + transforms=[ + dict(type='AlignedResize', keep_ratio=True, size_divisor=32), # Ensure the long and short sides are divisible by 32 + dict(type='RandomFlip'), + dict(type='Normalize', **img_norm_cfg), + dict(type='ImageToTensor', keys=['img']), + dict(type='Collect', keys=['img']), + ]) +] +data = dict( + samples_per_gpu=2, + workers_per_gpu=4, + train=dict( + type='RepeatDataset', + times=50, + dataset=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/training', + ann_dir='annotations/training', + pipeline=train_pipeline)), + val=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline), + test=dict( + type=dataset_type, + data_root=data_root, + img_dir='images/validation', + ann_dir='annotations/validation', + pipeline=test_pipeline)) + +# model settings +norm_cfg = dict(type='SyncBN', requires_grad=True) +find_unused_parameters = True +model = dict( + type='EncoderDecoder', + pretrained='pretrained/mit_b5.pth', + backbone=dict( + type='mit_b5', + style='pytorch'), + decode_head=dict( + type='SegFormerHead', + in_channels=[64, 128, 320, 512], + in_index=[0, 1, 2, 3], + feature_strides=[4, 8, 16, 32], + channels=128, + dropout_ratio=0.1, + num_classes=150, + norm_cfg=norm_cfg, + align_corners=False, + decoder_params=dict(embed_dim=768), + loss_decode=dict(type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)), + # model training and testing settings + train_cfg=dict(), + test_cfg=dict(mode='whole')) + +# optimizer +optimizer = dict(_delete_=True, type='AdamW', lr=0.00006, betas=(0.9, 0.999), weight_decay=0.01, + paramwise_cfg=dict(custom_keys={'pos_block': dict(decay_mult=0.), + 'norm': dict(decay_mult=0.), + 'head': dict(lr_mult=10.) + })) + +lr_config = dict(_delete_=True, policy='poly', + warmup='linear', + warmup_iters=1500, + warmup_ratio=1e-6, + power=1.0, min_lr=0.0, by_epoch=False) + +evaluation = dict(interval=16000, metric='mIoU') + diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/mmcv-1.2.7/MANIFEST.in b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/mmcv-1.2.7/MANIFEST.in new file mode 100644 index 0000000000000000000000000000000000000000..16f9cc8938d9ba15f999bf69343f492837604845 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/mmcv-1.2.7/MANIFEST.in @@ -0,0 +1,5 @@ +include requirements/runtime.txt +include mmcv/model_zoo/open_mmlab.json mmcv/model_zoo/deprecated.json mmcv/model_zoo/mmcls.json +include mmcv/ops/csrc/*.cuh mmcv/ops/csrc/*.hpp +include mmcv/ops/csrc/pytorch/*.cu mmcv/ops/csrc/pytorch/*.cpp +include mmcv/ops/csrc/parrots/*.cu mmcv/ops/csrc/parrots/*.cpp diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/mmcv-1.2.7/PKG-INFO b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/mmcv-1.2.7/PKG-INFO new file mode 100644 index 0000000000000000000000000000000000000000..7dee09da7c0905f6218d4d6638e9b03db35fca9c --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/mmcv-1.2.7/PKG-INFO @@ -0,0 +1,19 @@ +Metadata-Version: 1.1 +Name: mmcv +Version: 1.2.7 +Summary: OpenMMLab Computer Vision Foundation +Home-page: https://github.com/open-mmlab/mmcv +Author: MMCV Authors +Author-email: openmmlab@gmail.com +License: UNKNOWN +Description: UNKNOWN +Keywords: computer vision +Platform: UNKNOWN +Classifier: Development Status :: 4 - Beta +Classifier: License :: OSI Approved :: Apache Software License +Classifier: Operating System :: OS Independent +Classifier: Programming Language :: Python :: 3 +Classifier: Programming Language :: Python :: 3.6 +Classifier: Programming Language :: Python :: 3.7 +Classifier: Programming Language :: Python :: 3.8 +Classifier: Topic :: Utilities diff --git a/PyTorch/contrib/cv/semantic_segmentation/SegFormer/mmcv-1.2.7/README.md b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/mmcv-1.2.7/README.md new file mode 100644 index 0000000000000000000000000000000000000000..a6649c0e00c4e97400a4c4c4ef095e14f5ceb1e0 --- /dev/null +++ b/PyTorch/contrib/cv/semantic_segmentation/SegFormer/mmcv-1.2.7/README.md @@ -0,0 +1,160 @@ +
CUDA | +torch 1.7 | +torch 1.6 | +torch 1.5 | +torch 1.4 | +torch 1.3 | +
---|---|---|---|---|---|
11.0 | +install |
+ + | + | + | + |
10.2 | +install |
+ install |
+ install |
+ + | + |
10.1 | +install |
+ install |
+ install |
+ install |
+ install |
+
9.2 | +install |
+ install |
+ install |
+ install |
+ install |
+
cpu | +install |
+ install |
+ install |
+ install |
+ install |
+