LibMTL.architecture
¶
- class AbsArchitecture(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶
Bases:
torch.nn.Module
An abstract class for MTL architectures.
- Parameters
task_name (list) – A list of strings for all tasks.
encoder_class (class) – A neural network class.
decoders (dict) – A dictionary of name-decoder pairs of type (
str
,torch.nn.Module
).rep_grad (bool) – If
True
, the gradient of the representation for each task can be computed.multi_input (bool) – Is
True
if each task has its own input data, otherwise isFalse
.device (torch.device) – The device where model and data will be allocated.
kwargs (dict) – A dictionary of hyperparameters of architectures.
- forward(self, inputs, task_name=None)¶
- Parameters
inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to
inputs
ifmulti_input
isTrue
.
- Returns
A dictionary of name-prediction pairs of type (
str
,torch.Tensor
).- Return type
dict
Return the shared parameters of the model.
Set gradients of the shared parameters to zero.
- class HPS(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶
Bases:
LibMTL.architecture.abstract_arch.AbsArchitecture
Hard Parameter Sharing (HPS).
This method is proposed in Multitask Learning: A Knowledge-Based Source of Inductive Bias (ICML 1993) and implemented by us.
- class Cross_stitch(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)¶
Bases:
LibMTL.architecture.abstract_arch.AbsArchitecture
Cross-stitch Networks (Cross_stitch).
This method is proposed in Cross-stitch Networks for Multi-task Learning (CVPR 2016) and implemented by us.
Warning
Cross_stitch
does not work with multiple inputs MTL problem, i.e.,multi_input
must beFalse
.Cross_stitch
is only supported by ResNet-based encoders.
- class MMoE(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)¶
Bases:
LibMTL.architecture.abstract_arch.AbsArchitecture
Multi-gate Mixture-of-Experts (MMoE).
This method is proposed in Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts (KDD 2018) and implemented by us.
- Parameters
img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.
num_experts (int) – The number of experts shared for all tasks. Each expert is an encoder network.
- forward(self, inputs, task_name=None)¶
- Parameters
inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to
inputs
ifmulti_input
isTrue
.
- Returns
A dictionary of name-prediction pairs of type (
str
,torch.Tensor
).- Return type
dict
Return the shared parameters of the model.
Set gradients of the shared parameters to zero.
- class MTAN(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)¶
Bases:
LibMTL.architecture.abstract_arch.AbsArchitecture
Multi-Task Attention Network (MTAN).
This method is proposed in End-To-End Multi-Task Learning With Attention (CVPR 2019) and implemented by modifying from the official PyTorch implementation.
Warning
MTAN
is only supported by ResNet-based encoders.- forward(self, inputs, task_name=None)¶
- Parameters
inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to
inputs
ifmulti_input
isTrue
.
- Returns
A dictionary of name-prediction pairs of type (
str
,torch.Tensor
).- Return type
dict
Return the shared parameters of the model.
Set gradients of the shared parameters to zero.
- class CGC(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶
Bases:
LibMTL.architecture.MMoE.MMoE
Customized Gate Control (CGC).
This method is proposed in Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations (ACM RecSys 2020 Best Paper) and implemented by us.
- Parameters
img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.
num_experts (list) – The numbers of experts shared by all the tasks and specific to each task, respectively. Each expert is an encoder network.
- forward(self, inputs, task_name=None)¶
- class PLE(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶
Bases:
LibMTL.architecture.abstract_arch.AbsArchitecture
Progressive Layered Extraction (PLE).
This method is proposed in Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations (ACM RecSys 2020 Best Paper) and implemented by us.
- Parameters
img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.
num_experts (list) – The numbers of experts shared by all the tasks and specific to each task, respectively. Each expert is an encoder network.
Warning
- forward(self, inputs, task_name=None)¶
- Parameters
inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to
inputs
ifmulti_input
isTrue
.
- Returns
A dictionary of name-prediction pairs of type (
str
,torch.Tensor
).- Return type
dict
Return the shared parameters of the model.
Set gradients of the shared parameters to zero.
- class DSelect_k(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶
Bases:
LibMTL.architecture.MMoE.MMoE
DSelect-k.
This method is proposed in DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning (NeurIPS 2021) and implemented by modifying from the official TensorFlow implementation.
- Parameters
img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.
num_experts (int) – The number of experts shared by all the tasks. Each expert is an encoder network.
num_nonzeros (int) – The number of selected experts.
kgamma (float, default=1.0) – A scaling parameter for the smooth-step function.
- forward(self, inputs, task_name=None)¶
- class LTB(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶
Bases:
LibMTL.architecture.abstract_arch.AbsArchitecture
Learning To Branch (LTB).
This method is proposed in Learning to Branch for Multi-Task Learning (ICML 2020) and implemented by us.
Warning
- forward(self, inputs, task_name=None)¶
- Parameters
inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to
inputs
ifmulti_input
isTrue
.
- Returns
A dictionary of name-prediction pairs of type (
str
,torch.Tensor
).- Return type
dict