LibMTL.architecture

class AbsArchitecture(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]

Bases: torch.nn.Module

An abstract class for MTL architectures.

Parameters
  • task_name (list) – A list of strings for all tasks.

  • encoder_class (class) – A neural network class.

  • decoders (dict) – A dictionary of name-decoder pairs of type (str, torch.nn.Module).

  • rep_grad (bool) – If True, the gradient of the representation for each task can be computed.

  • multi_input (bool) – Is True if each task has its own input data, otherwise is False.

  • device (torch.device) – The device where model and data will be allocated.

  • kwargs (dict) – A dictionary of hyperparameters of architectures.

forward(self, inputs, task_name=None)
Parameters
  • inputs (torch.Tensor) – The input data.

  • task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type

dict

get_share_params(self)

Return the shared parameters of the model.

zero_grad_share_params(self)

Set gradients of the shared parameters to zero.

class HPS(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Hard Parameter Sharing (HPS).

This method is proposed in Multitask Learning: A Knowledge-Based Source of Inductive Bias (ICML 1993) and implemented by us.

class Cross_stitch(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Cross-stitch Networks (Cross_stitch).

This method is proposed in Cross-stitch Networks for Multi-task Learning (CVPR 2016) and implemented by us.

Warning

  • Cross_stitch does not work with multiple inputs MTL problem, i.e., multi_input must be False.

  • Cross_stitch is only supported by ResNet-based encoders.

class MMoE(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Multi-gate Mixture-of-Experts (MMoE).

This method is proposed in Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts (KDD 2018) and implemented by us.

Parameters
  • img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.

  • num_experts (int) – The number of experts shared for all tasks. Each expert is an encoder network.

forward(self, inputs, task_name=None)
Parameters
  • inputs (torch.Tensor) – The input data.

  • task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type

dict

get_share_params(self)

Return the shared parameters of the model.

zero_grad_share_params(self)

Set gradients of the shared parameters to zero.

class MTAN(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Multi-Task Attention Network (MTAN).

This method is proposed in End-To-End Multi-Task Learning With Attention (CVPR 2019) and implemented by modifying from the official PyTorch implementation.

Warning

MTAN is only supported by ResNet-based encoders.

forward(self, inputs, task_name=None)
Parameters
  • inputs (torch.Tensor) – The input data.

  • task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type

dict

get_share_params(self)

Return the shared parameters of the model.

zero_grad_share_params(self)

Set gradients of the shared parameters to zero.

class CGC(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]

Bases: LibMTL.architecture.MMoE.MMoE

Customized Gate Control (CGC).

This method is proposed in Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations (ACM RecSys 2020 Best Paper) and implemented by us.

Parameters
  • img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.

  • num_experts (list) – The numbers of experts shared by all the tasks and specific to each task, respectively. Each expert is an encoder network.

forward(self, inputs, task_name=None)
class PLE(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Progressive Layered Extraction (PLE).

This method is proposed in Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations (ACM RecSys 2020 Best Paper) and implemented by us.

Parameters
  • img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.

  • num_experts (list) – The numbers of experts shared by all the tasks and specific to each task, respectively. Each expert is an encoder network.

Warning

  • PLE does not work with multi-input problems, i.e., multi_input must be False.

  • PLE is only supported by ResNet-based encoders.

forward(self, inputs, task_name=None)
Parameters
  • inputs (torch.Tensor) – The input data.

  • task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type

dict

get_share_params(self)

Return the shared parameters of the model.

zero_grad_share_params(self)

Set gradients of the shared parameters to zero.

class DSelect_k(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]

Bases: LibMTL.architecture.MMoE.MMoE

DSelect-k.

This method is proposed in DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning (NeurIPS 2021) and implemented by modifying from the official TensorFlow implementation.

Parameters
  • img_size (list) – The size of input data. For example, [3, 244, 244] denotes input images with size 3x224x224.

  • num_experts (int) – The number of experts shared by all the tasks. Each expert is an encoder network.

  • num_nonzeros (int) – The number of selected experts.

  • kgamma (float, default=1.0) – A scaling parameter for the smooth-step function.

forward(self, inputs, task_name=None)
class LTB(task_name, encoder_class, decoders, rep_grad, multi_input, device, **kwargs)[source]

Bases: LibMTL.architecture.abstract_arch.AbsArchitecture

Learning To Branch (LTB).

This method is proposed in Learning to Branch for Multi-Task Learning (ICML 2020) and implemented by us.

Warning

  • LTB does not work with multi-input problems, i.e., multi_input must be False.

  • LTB is only supported by ResNet-based encoders.

forward(self, inputs, task_name=None)
Parameters
  • inputs (torch.Tensor) – The input data.

  • task_name (str, default=None) – The task name corresponding to inputs if multi_input is True.

Returns

A dictionary of name-prediction pairs of type (str, torch.Tensor).

Return type

dict