`LibMTL.architecture`¶

class AbsArchitecture(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶

Bases: torch.nn.Module

An abstract class for MTL architectures.

Parameters:

task_name (list) – A list of strings for all tasks.
encoder_class (class) – A neural network class.
decoders (dict) – A dictionary of name-decoder pairs of type (str, torch.nn.Module).
rep_grad (bool) – If True, the gradient of the representation for each task can be computed.
multi_input (bool) – Is True if each task has its own input data, otherwise is False.
device (torch.device) – The device where model and data will be allocated.
kwargs (dict) – A dictionary of hyperparameters of architectures.

forward(inputs, task_name=None)[source]¶

Parameters:

inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns:

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type:

dict

get_share_params()[source]¶: Return the shared parameters of the model.

zero_grad_share_params()[source]¶: Set gradients of the shared parameters to zero.

class HPS(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Hard Parameter Sharing (HPS).

This method is proposed in Multitask Learning: A Knowledge-Based Source of Inductive Bias (ICML 1993) and implemented by us.

class Cross_stitch(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Cross-stitch Networks (Cross_stitch).

This method is proposed in Cross-stitch Networks for Multi-task Learning (CVPR 2016) and implemented by us.

Warning

Cross_stitch does not work with multiple inputs MTL problem, i.e., multi_input must be False.
Cross_stitch is only supported by ResNet-based encoders.

class MMoE(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Multi-gate Mixture-of-Experts (MMoE).

This method is proposed in Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts (KDD 2018) and implemented by us.

Parameters:

img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.
num_experts (int) – The number of experts shared for all tasks. Each expert is an encoder network.

forward(inputs, task_name=None)[source]¶

Parameters:

inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns:

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type:

dict

get_share_params()[source]¶: Return the shared parameters of the model.

zero_grad_share_params()[source]¶: Set gradients of the shared parameters to zero.

class MTAN(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Multi-Task Attention Network (MTAN).

This method is proposed in End-To-End Multi-Task Learning With Attention (CVPR 2019) and implemented by modifying from the official PyTorch implementation.

Warning

MTAN is only supported by ResNet-based encoders.

forward(inputs, task_name=None)[source]¶

Parameters:

inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns:

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type:

dict

get_share_params()[source]¶: Return the shared parameters of the model.

zero_grad_share_params()[source]¶: Set gradients of the shared parameters to zero.

class CGC(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶

Bases: LibMTL.architecture.MMoE.MMoE

Customized Gate Control (CGC).

This method is proposed in Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations (ACM RecSys 2020 Best Paper) and implemented by us.

Parameters:

img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.
num_experts (list) – The numbers of experts shared by all the tasks and specific to each task, respectively. Each expert is an encoder network.

forward(inputs, task_name=None)[source]¶

class PLE(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Progressive Layered Extraction (PLE).

This method is proposed in Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations (ACM RecSys 2020 Best Paper) and implemented by us.

Parameters:

img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.
num_experts (list) – The numbers of experts shared by all the tasks and specific to each task, respectively. Each expert is an encoder network.

Warning

PLE does not work with multi-input problems, i.e., multi_input must be False.
PLE is only supported by ResNet-based encoders.

forward(inputs, task_name=None)[source]¶

Parameters:

inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns:

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type:

dict

get_share_params()[source]¶: Return the shared parameters of the model.

zero_grad_share_params()[source]¶: Set gradients of the shared parameters to zero.

class DSelect_k(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶

Bases: LibMTL.architecture.MMoE.MMoE

DSelect-k.

This method is proposed in DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning (NeurIPS 2021) and implemented by modifying from the official TensorFlow implementation.

Parameters:

img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.
num_experts (int) – The number of experts shared by all the tasks. Each expert is an encoder network.
num_nonzeros (int) – The number of selected experts.
kgamma (float, default=1.0) – A scaling parameter for the smooth-step function.

forward(inputs, task_name=None)[source]¶

class LTB(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]¶

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Learning To Branch (LTB).

This method is proposed in Learning to Branch for Multi-Task Learning (ICML 2020) and implemented by us.

Warning

LTB does not work with multi-input problems, i.e., multi_input must be False.
LTB is only supported by ResNet-based encoders.

forward(inputs, task_name=None)[source]¶

Parameters:

inputs (torch.Tensor) – The input data.
task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns:

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type:

dict

LibMTL.architecture¶

`LibMTL.architecture`¶