An interpretable deep learning model for classifying adaptor protein complexes from sequence information

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

An interpretable deep learning model for classifying adaptor protein complexes from sequence information. / Kha, Quang Hien; Tran, Thi Oanh; Nguyen, Trinh Trung Duong; Nguyen, Van Nui; Than, Khoat; Le, Nguyen Quoc Khanh.

I: Methods, Bind 207, 2022, s. 90-96.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Kha, QH, Tran, TO, Nguyen, TTD, Nguyen, VN, Than, K & Le, NQK 2022, 'An interpretable deep learning model for classifying adaptor protein complexes from sequence information', Methods, bind 207, s. 90-96. https://doi.org/10.1016/j.ymeth.2022.09.007

APA

Kha, Q. H., Tran, T. O., Nguyen, T. T. D., Nguyen, V. N., Than, K., & Le, N. Q. K. (2022). An interpretable deep learning model for classifying adaptor protein complexes from sequence information. Methods, 207, 90-96. https://doi.org/10.1016/j.ymeth.2022.09.007

Vancouver

Kha QH, Tran TO, Nguyen TTD, Nguyen VN, Than K, Le NQK. An interpretable deep learning model for classifying adaptor protein complexes from sequence information. Methods. 2022;207:90-96. https://doi.org/10.1016/j.ymeth.2022.09.007

Author

Kha, Quang Hien ; Tran, Thi Oanh ; Nguyen, Trinh Trung Duong ; Nguyen, Van Nui ; Than, Khoat ; Le, Nguyen Quoc Khanh. / An interpretable deep learning model for classifying adaptor protein complexes from sequence information. I: Methods. 2022 ; Bind 207. s. 90-96.

Bibtex

@article{2989614e78c0444087f8cc6013e8b255,
title = "An interpretable deep learning model for classifying adaptor protein complexes from sequence information",
abstract = "Adaptor proteins (APs) are a family of proteins that aids in intracellular membrane trafficking, and their impairments or defects are closely related to various disorders. Traditional methods to identify and classify APs require time and complex techniques, which were then advanced by machine learning and computational approaches to facilitate the APs recognition task. However, most studies focused on recognizing separate ones in the APs family or the APs in general with non-APs, lacking one comprehensive strategy to distinguish the complexes of AP subtypes. Herein, we proposed a novel method to implement one novel task as discriminating the AP complexes in the APs family, utilizing an interpretable deep neural network architecture on sequence-based encoding features. This work also introduced a benchmark data set of AP complexes originating from the UniProt and GeneOntology databases. To assess the robustness of our proposed method, we compared our performance to various machine learning algorithms and feature extraction strategies. Furthermore, the interpretation of the model's prediction performance was implemented using t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and SHapley Additive exPlanations (SHAP) analysis to show the distribution of AP complexes on optimal features. The promising performance of our architecture can assist scientists not only in AP complexes distinction but also in general protein sequences. Moreover, we have also made our work publicly on GitHub https://github.com/khanhlee/adaptor-dnn.",
keywords = "Adaptor protein, Computational biology, Deep neural network, Interpretable machine learning, Protein function prediction, Sequence analysis",
author = "Kha, {Quang Hien} and Tran, {Thi Oanh} and Nguyen, {Trinh Trung Duong} and Nguyen, {Van Nui} and Khoat Than and Le, {Nguyen Quoc Khanh}",
note = "Funding Information: This work was funded by Gia Lam Urban Development and Investment Company Limited, Vingroup and supported by Vingroup Innovation Foundation (VINIF) under project code VINIF.2019.DA18. ",
year = "2022",
doi = "10.1016/j.ymeth.2022.09.007",
language = "English",
volume = "207",
pages = "90--96",
journal = "Methods",
issn = "1046-2023",
publisher = "Academic Press",

}

RIS

TY - JOUR

T1 - An interpretable deep learning model for classifying adaptor protein complexes from sequence information

AU - Kha, Quang Hien

AU - Tran, Thi Oanh

AU - Nguyen, Trinh Trung Duong

AU - Nguyen, Van Nui

AU - Than, Khoat

AU - Le, Nguyen Quoc Khanh

N1 - Funding Information: This work was funded by Gia Lam Urban Development and Investment Company Limited, Vingroup and supported by Vingroup Innovation Foundation (VINIF) under project code VINIF.2019.DA18.

PY - 2022

Y1 - 2022

N2 - Adaptor proteins (APs) are a family of proteins that aids in intracellular membrane trafficking, and their impairments or defects are closely related to various disorders. Traditional methods to identify and classify APs require time and complex techniques, which were then advanced by machine learning and computational approaches to facilitate the APs recognition task. However, most studies focused on recognizing separate ones in the APs family or the APs in general with non-APs, lacking one comprehensive strategy to distinguish the complexes of AP subtypes. Herein, we proposed a novel method to implement one novel task as discriminating the AP complexes in the APs family, utilizing an interpretable deep neural network architecture on sequence-based encoding features. This work also introduced a benchmark data set of AP complexes originating from the UniProt and GeneOntology databases. To assess the robustness of our proposed method, we compared our performance to various machine learning algorithms and feature extraction strategies. Furthermore, the interpretation of the model's prediction performance was implemented using t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and SHapley Additive exPlanations (SHAP) analysis to show the distribution of AP complexes on optimal features. The promising performance of our architecture can assist scientists not only in AP complexes distinction but also in general protein sequences. Moreover, we have also made our work publicly on GitHub https://github.com/khanhlee/adaptor-dnn.

AB - Adaptor proteins (APs) are a family of proteins that aids in intracellular membrane trafficking, and their impairments or defects are closely related to various disorders. Traditional methods to identify and classify APs require time and complex techniques, which were then advanced by machine learning and computational approaches to facilitate the APs recognition task. However, most studies focused on recognizing separate ones in the APs family or the APs in general with non-APs, lacking one comprehensive strategy to distinguish the complexes of AP subtypes. Herein, we proposed a novel method to implement one novel task as discriminating the AP complexes in the APs family, utilizing an interpretable deep neural network architecture on sequence-based encoding features. This work also introduced a benchmark data set of AP complexes originating from the UniProt and GeneOntology databases. To assess the robustness of our proposed method, we compared our performance to various machine learning algorithms and feature extraction strategies. Furthermore, the interpretation of the model's prediction performance was implemented using t-distributed stochastic neighbor embedding (t-SNE), uniform manifold approximation and projection (UMAP), and SHapley Additive exPlanations (SHAP) analysis to show the distribution of AP complexes on optimal features. The promising performance of our architecture can assist scientists not only in AP complexes distinction but also in general protein sequences. Moreover, we have also made our work publicly on GitHub https://github.com/khanhlee/adaptor-dnn.

KW - Adaptor protein

KW - Computational biology

KW - Deep neural network

KW - Interpretable machine learning

KW - Protein function prediction

KW - Sequence analysis

U2 - 10.1016/j.ymeth.2022.09.007

DO - 10.1016/j.ymeth.2022.09.007

M3 - Journal article

C2 - 36174933

AN - SCOPUS:85139011804

VL - 207

SP - 90

EP - 96

JO - Methods

JF - Methods

SN - 1046-2023

ER -

ID: 322785136