{"id":20713,"date":"2025-01-10T13:29:47","date_gmt":"2025-01-10T12:29:47","guid":{"rendered":"https:\/\/sano.science\/?post_type=research&#038;p=20713"},"modified":"2025-02-21T12:40:24","modified_gmt":"2025-02-21T11:40:24","slug":"effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data","status":"publish","type":"research","link":"https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/","title":{"rendered":"Effects of data transformation and model selection on feature importance in microbiome classification data"},"content":{"rendered":"\n<h2 class=\"wp-block-heading eplus-wrapper\" id=\"h-zuzanna-karwowska-oliver-aasmets-estonian-biobank-research-team-tomasz-kosciolek-elin-org\">Zuzanna Karwowska, Oliver Aasmets, Estonian Biobank research team, Tomasz Kosciolek, Elin Org<\/h2>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n<p class=\" eplus-wrapper\">The effective classification of host phenotypes through microbiome data is essential for the progression of microbiome-centered treatments, where machine learning serves as a pivotal tool. The inherent complexity of the gut microbiome, coupled with issues like data sparsity, compositionality, and variability across populations, poses substantial challenges. Although transforming microbiome data can mitigate some of these difficulties, its application in machine learning endeavors remains largely underinvestigated.<br>In our study, we examined more than 8500 samples across 24 shotgun metagenomic datasets, discovering that it is feasible to differentiate between healthy and diseased states using microbiome data, with minimal reliance on specific algorithms or data transformations. We found that presence-absence data transformations were as effective as those based on abundance, and that accurate classification could be achieved using only a limited set of predictive features. Despite similar levels of classification accuracy across different transformations, the key features identified varied significantly, underscoring the importance of reevaluating the detection of biomarkers through machine learning.<br>Our results demonstrate that while microbiome data transformations have a substantial impact on feature selection, they do not significantly alter classification accuracy. This indicates that although the classification process is stable across various transformations, careful consideration is necessary in the selection of features for biomarker discovery using machine learning. This study not only contributes valuable insights into the application of machine learning to microbiome data but also points to crucial areas for future research.<\/p>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n<p class=\" eplus-wrapper\"><strong>DOI<\/strong>: 10.1186\/s40168-024-01996-6<\/p>\n\n\n\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n\t\n    \n        \n\t\t\t<a href=\"https:\/\/pmc.ncbi.nlm.nih.gov\/articles\/PMC11699698\/\" target=\"_self\"  class=\"button primary \">\n\n\t\t\t\t<span>\n\t\t\t\t\tREAD HERE\n\t\t\t\t<\/span>\n\n\t\t\t<\/a>\n\n        \n    \n","protected":false},"excerpt":{"rendered":"<p>Journal paper in: Springer Nature &#8211; BioMed Central, 2025<\/p>\n","protected":false},"featured_media":0,"template":"","research_type":[8],"research_team":[113],"class_list":["post-20713","research","type-research","status-publish","hentry","research_type-publications","research_team-structural-and-functional-genomics"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.3 (Yoast SEO v27.3) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Effects of data transformation and model selection on feature importance in microbiome classification data - Centre for Computational Personalized Medicine<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Effects of data transformation and model selection on feature importance in microbiome classification data\" \/>\n<meta property=\"og:description\" content=\"Journal paper in: Springer Nature - BioMed Central, 2025\" \/>\n<meta property=\"og:url\" content=\"https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/\" \/>\n<meta property=\"og:site_name\" content=\"Centre for Computational Personalized Medicine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/sano.science\/\" \/>\n<meta property=\"article:modified_time\" content=\"2025-02-21T11:40:24+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@sanoscience\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/sano.science\\\/research\\\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\\\/\",\"url\":\"https:\\\/\\\/sano.science\\\/research\\\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\\\/\",\"name\":\"Effects of data transformation and model selection on feature importance in microbiome classification data - Centre for Computational Personalized Medicine\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/sano.science\\\/#website\"},\"datePublished\":\"2025-01-10T12:29:47+00:00\",\"dateModified\":\"2025-02-21T11:40:24+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/sano.science\\\/research\\\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/sano.science\\\/research\\\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/sano.science\\\/research\\\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/sano.science\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Research\",\"item\":\"https:\\\/\\\/sano.science\\\/research\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Publications\",\"item\":\"https:\\\/\\\/sano.science\\\/research-type\\\/publications\\\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Effects of data transformation and model selection on feature importance in microbiome classification data\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/sano.science\\\/#website\",\"url\":\"https:\\\/\\\/sano.science\\\/\",\"name\":\"Centre for Computational Personalized Medicine\",\"description\":\"Sano \u2013 Centre for Computational Medicine\",\"publisher\":{\"@id\":\"https:\\\/\\\/sano.science\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/sano.science\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/sano.science\\\/#organization\",\"name\":\"Sano \u2013 Centre for Computational Medicine\",\"alternateName\":\"Sano\",\"url\":\"https:\\\/\\\/sano.science\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/sano.science\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/sano.science\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/logo_sano_podstawowe.png\",\"contentUrl\":\"https:\\\/\\\/sano.science\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/logo_sano_podstawowe.png\",\"width\":700,\"height\":265,\"caption\":\"Sano \u2013 Centre for Computational Medicine\"},\"image\":{\"@id\":\"https:\\\/\\\/sano.science\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/sano.science\\\/\",\"https:\\\/\\\/x.com\\\/sanoscience\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/sanoscience\\\/\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UCDZ_8TcjMWUG2ZcgKKgfpwQ\",\"https:\\\/\\\/bsky.app\\\/profile\\\/sanoscience.bsky.social\"]}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Effects of data transformation and model selection on feature importance in microbiome classification data - Centre for Computational Personalized Medicine","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/","og_locale":"en_US","og_type":"article","og_title":"Effects of data transformation and model selection on feature importance in microbiome classification data","og_description":"Journal paper in: Springer Nature - BioMed Central, 2025","og_url":"https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/","og_site_name":"Centre for Computational Personalized Medicine","article_publisher":"https:\/\/www.facebook.com\/sano.science\/","article_modified_time":"2025-02-21T11:40:24+00:00","twitter_card":"summary_large_image","twitter_site":"@sanoscience","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/","url":"https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/","name":"Effects of data transformation and model selection on feature importance in microbiome classification data - Centre for Computational Personalized Medicine","isPartOf":{"@id":"https:\/\/sano.science\/#website"},"datePublished":"2025-01-10T12:29:47+00:00","dateModified":"2025-02-21T11:40:24+00:00","breadcrumb":{"@id":"https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/sano.science\/research\/effects-of-data-transformation-and-model-selection-on-feature-importance-in-microbiome-classification-data\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/sano.science\/"},{"@type":"ListItem","position":2,"name":"Research","item":"https:\/\/sano.science\/research\/"},{"@type":"ListItem","position":3,"name":"Publications","item":"https:\/\/sano.science\/research-type\/publications\/"},{"@type":"ListItem","position":4,"name":"Effects of data transformation and model selection on feature importance in microbiome classification data"}]},{"@type":"WebSite","@id":"https:\/\/sano.science\/#website","url":"https:\/\/sano.science\/","name":"Centre for Computational Personalized Medicine","description":"Sano \u2013 Centre for Computational Medicine","publisher":{"@id":"https:\/\/sano.science\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/sano.science\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/sano.science\/#organization","name":"Sano \u2013 Centre for Computational Medicine","alternateName":"Sano","url":"https:\/\/sano.science\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/sano.science\/#\/schema\/logo\/image\/","url":"https:\/\/sano.science\/wp-content\/uploads\/2024\/05\/logo_sano_podstawowe.png","contentUrl":"https:\/\/sano.science\/wp-content\/uploads\/2024\/05\/logo_sano_podstawowe.png","width":700,"height":265,"caption":"Sano \u2013 Centre for Computational Medicine"},"image":{"@id":"https:\/\/sano.science\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/sano.science\/","https:\/\/x.com\/sanoscience","https:\/\/www.linkedin.com\/company\/sanoscience\/","https:\/\/www.youtube.com\/channel\/UCDZ_8TcjMWUG2ZcgKKgfpwQ","https:\/\/bsky.app\/profile\/sanoscience.bsky.social"]}]}},"acf":[],"gutenberg_blocks":[{"blockName":"custom-styles","attrs":{"styles":""}},{"blockName":"core\/heading","attrs":{"epAnimationGeneratedClass":"edplus_anim-8PQu2K","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<h2 class=\"wp-block-heading eplus-wrapper\" id=\"h-zuzanna-karwowska-oliver-aasmets-estonian-biobank-research-team-tomasz-kosciolek-elin-org\">Zuzanna Karwowska, Oliver Aasmets, Estonian Biobank research team, Tomasz Kosciolek, Elin Org<\/h2>\n","innerContent":["\n<h2 class=\"wp-block-heading eplus-wrapper\" id=\"h-zuzanna-karwowska-oliver-aasmets-estonian-biobank-research-team-tomasz-kosciolek-elin-org\">Zuzanna Karwowska, Oliver Aasmets, Estonian Biobank research team, Tomasz Kosciolek, Elin Org<\/h2>\n"]},{"blockName":"core\/spacer","attrs":{"height":"50px","epAnimationGeneratedClass":"edplus_anim-UZJmFC","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n","innerContent":["\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n"]},{"blockName":"core\/paragraph","attrs":{"epAnimationGeneratedClass":"edplus_anim-kOAyOm","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<p class=\" eplus-wrapper\">The effective classification of host phenotypes through microbiome data is essential for the progression of microbiome-centered treatments, where machine learning serves as a pivotal tool. The inherent complexity of the gut microbiome, coupled with issues like data sparsity, compositionality, and variability across populations, poses substantial challenges. Although transforming microbiome data can mitigate some of these difficulties, its application in machine learning endeavors remains largely underinvestigated.<br>In our study, we examined more than 8500 samples across 24 shotgun metagenomic datasets, discovering that it is feasible to differentiate between healthy and diseased states using microbiome data, with minimal reliance on specific algorithms or data transformations. We found that presence-absence data transformations were as effective as those based on abundance, and that accurate classification could be achieved using only a limited set of predictive features. Despite similar levels of classification accuracy across different transformations, the key features identified varied significantly, underscoring the importance of reevaluating the detection of biomarkers through machine learning.<br>Our results demonstrate that while microbiome data transformations have a substantial impact on feature selection, they do not significantly alter classification accuracy. This indicates that although the classification process is stable across various transformations, careful consideration is necessary in the selection of features for biomarker discovery using machine learning. This study not only contributes valuable insights into the application of machine learning to microbiome data but also points to crucial areas for future research.<\/p>\n","innerContent":["\n<p class=\" eplus-wrapper\">The effective classification of host phenotypes through microbiome data is essential for the progression of microbiome-centered treatments, where machine learning serves as a pivotal tool. The inherent complexity of the gut microbiome, coupled with issues like data sparsity, compositionality, and variability across populations, poses substantial challenges. Although transforming microbiome data can mitigate some of these difficulties, its application in machine learning endeavors remains largely underinvestigated.<br>In our study, we examined more than 8500 samples across 24 shotgun metagenomic datasets, discovering that it is feasible to differentiate between healthy and diseased states using microbiome data, with minimal reliance on specific algorithms or data transformations. We found that presence-absence data transformations were as effective as those based on abundance, and that accurate classification could be achieved using only a limited set of predictive features. Despite similar levels of classification accuracy across different transformations, the key features identified varied significantly, underscoring the importance of reevaluating the detection of biomarkers through machine learning.<br>Our results demonstrate that while microbiome data transformations have a substantial impact on feature selection, they do not significantly alter classification accuracy. This indicates that although the classification process is stable across various transformations, careful consideration is necessary in the selection of features for biomarker discovery using machine learning. This study not only contributes valuable insights into the application of machine learning to microbiome data but also points to crucial areas for future research.<\/p>\n"]},{"blockName":"core\/spacer","attrs":{"height":"40px","epAnimationGeneratedClass":"edplus_anim-HPXIta","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n","innerContent":["\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n"]},{"blockName":"core\/paragraph","attrs":{"epAnimationGeneratedClass":"edplus_anim-H79dq1","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<p class=\" eplus-wrapper\"><strong>DOI<\/strong>: 10.1186\/s40168-024-01996-6<\/p>\n","innerContent":["\n<p class=\" eplus-wrapper\"><strong>DOI<\/strong>: 10.1186\/s40168-024-01996-6<\/p>\n"]},{"blockName":"core\/spacer","attrs":{"height":"40px","epAnimationGeneratedClass":"edplus_anim-HPXIta","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n","innerContent":["\n<div style=\"height:40px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n"]},{"blockName":"acf\/button","attrs":{"title":"READ HERE","button_type":"link","url":"https:\/\/pmc.ncbi.nlm.nih.gov\/articles\/PMC11699698\/","button_style":"primary","target":"_self","button_extra_classes":""},"innerBlocks":[],"innerHTML":"","innerContent":[]}],"meta_data":{"is_automatically_other_posts":true,"number_of_posts":"3","is_automatically_check_also_posts":true},"_links":{"self":[{"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research\/20713","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research"}],"about":[{"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/types\/research"}],"version-history":[{"count":6,"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research\/20713\/revisions"}],"predecessor-version":[{"id":21581,"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research\/20713\/revisions\/21581"}],"wp:attachment":[{"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/media?parent=20713"}],"wp:term":[{"taxonomy":"research_type","embeddable":true,"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research_type?post=20713"},{"taxonomy":"research_team","embeddable":true,"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research_team?post=20713"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}