{"id":12905,"date":"2023-07-25T22:09:27","date_gmt":"2023-07-25T20:09:27","guid":{"rendered":"https:\/\/sano.science\/?post_type=research&#038;p=12905"},"modified":"2024-01-05T13:39:25","modified_gmt":"2024-01-05T12:39:25","slug":"sanda-a-small-and-incomplete-dataset-analyser","status":"publish","type":"research","link":"https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/","title":{"rendered":"SaNDA: A small and iNcomplete dataset analyser"},"content":{"rendered":"\n<h2 class=\"wp-block-heading eplus-wrapper\"><strong>Alfredo Ibias, Varun Ravi Varma, Karol Capa\u0142a, Luca Gherardini, Jose Sousa<\/strong> <\/h2>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n<p class=\" eplus-wrapper\">In personalized health, small datasets with missing data are quite common. Current&nbsp;Machine Learning&nbsp;methods are unable to process such datasets in a meaningful way due to the huge data volume requirement. To address this problem, we propose a new Small and iNcomplete Dataset Analyser (SaNDA) to process such datasets in a meaningful way. Due to the characteristics of these datasets and the&nbsp;criticality&nbsp;of the domain, an explainable method is mandatory to facilitate decision-making interpretation. Thus, SaNDA prioritises&nbsp;explainability&nbsp;over efficiency by design. We evaluated our proposal against&nbsp;Random Forest&nbsp;as a baseline for explainable methods, and against gcForest as state-of-the-art for small datasets. We observed that our proposal outperforms Random Forest when there is more missing data and\/or a lower number of entries in the dataset, obtaining less favourable results over larger, well-curated datasets. It is also preferable than gcForest due to its explainability and privacy protection capabilities. Given the difficulties in obtaining complete, reliable data in the healthcare field, we consider that our proposal could be useful for practitioners.<\/p>\n\n\n\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n\n\n\n\t\n    \n        \n\t\t\t<a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0020025523006631\" target=\"_blank\" rel= \"noopener noreferrer nofollow\" class=\"button primary \">\n\n\t\t\t\t<span>\n\t\t\t\t\tREAD HERE\n\t\t\t\t<\/span>\n\n\t\t\t<\/a>\n\n        \n    \n","protected":false},"excerpt":{"rendered":"<p>In: Information Sciences, 2023<\/p>\n","protected":false},"featured_media":0,"template":"","research_type":[8],"research_team":[14],"class_list":["post-12905","research","type-research","status-publish","hentry","research_type-publications","research_team-computational-intelligence"],"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v27.5 (Yoast SEO v27.5) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>SaNDA: A small and iNcomplete dataset analyser - Centre for Computational Personalized Medicine<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"SaNDA: A small and iNcomplete dataset analyser\" \/>\n<meta property=\"og:description\" content=\"In: Information Sciences, 2023\" \/>\n<meta property=\"og:url\" content=\"https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/\" \/>\n<meta property=\"og:site_name\" content=\"Centre for Computational Personalized Medicine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/sano.science\/\" \/>\n<meta property=\"article:modified_time\" content=\"2024-01-05T12:39:25+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:site\" content=\"@sanoscience\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/sano.science\\\/research\\\/sanda-a-small-and-incomplete-dataset-analyser\\\/\",\"url\":\"https:\\\/\\\/sano.science\\\/research\\\/sanda-a-small-and-incomplete-dataset-analyser\\\/\",\"name\":\"SaNDA: A small and iNcomplete dataset analyser - Centre for Computational Personalized Medicine\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/sano.science\\\/#website\"},\"datePublished\":\"2023-07-25T20:09:27+00:00\",\"dateModified\":\"2024-01-05T12:39:25+00:00\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/sano.science\\\/research\\\/sanda-a-small-and-incomplete-dataset-analyser\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/sano.science\\\/research\\\/sanda-a-small-and-incomplete-dataset-analyser\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/sano.science\\\/research\\\/sanda-a-small-and-incomplete-dataset-analyser\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/sano.science\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Research\",\"item\":\"https:\\\/\\\/sano.science\\\/research\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Publications\",\"item\":\"https:\\\/\\\/sano.science\\\/research-type\\\/publications\\\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"SaNDA: A small and iNcomplete dataset analyser\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/sano.science\\\/#website\",\"url\":\"https:\\\/\\\/sano.science\\\/\",\"name\":\"Centre for Computational Personalized Medicine\",\"description\":\"Sano \u2013 Centre for Computational Medicine\",\"publisher\":{\"@id\":\"https:\\\/\\\/sano.science\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/sano.science\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/sano.science\\\/#organization\",\"name\":\"Sano \u2013 Centre for Computational Medicine\",\"alternateName\":\"Sano\",\"url\":\"https:\\\/\\\/sano.science\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/sano.science\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/sano.science\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/logo_sano_podstawowe.png\",\"contentUrl\":\"https:\\\/\\\/sano.science\\\/wp-content\\\/uploads\\\/2024\\\/05\\\/logo_sano_podstawowe.png\",\"width\":700,\"height\":265,\"caption\":\"Sano \u2013 Centre for Computational Medicine\"},\"image\":{\"@id\":\"https:\\\/\\\/sano.science\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/sano.science\\\/\",\"https:\\\/\\\/x.com\\\/sanoscience\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/sanoscience\\\/\",\"https:\\\/\\\/www.youtube.com\\\/channel\\\/UCDZ_8TcjMWUG2ZcgKKgfpwQ\",\"https:\\\/\\\/bsky.app\\\/profile\\\/sanoscience.bsky.social\"]}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"SaNDA: A small and iNcomplete dataset analyser - Centre for Computational Personalized Medicine","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/","og_locale":"en_US","og_type":"article","og_title":"SaNDA: A small and iNcomplete dataset analyser","og_description":"In: Information Sciences, 2023","og_url":"https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/","og_site_name":"Centre for Computational Personalized Medicine","article_publisher":"https:\/\/www.facebook.com\/sano.science\/","article_modified_time":"2024-01-05T12:39:25+00:00","twitter_card":"summary_large_image","twitter_site":"@sanoscience","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/","url":"https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/","name":"SaNDA: A small and iNcomplete dataset analyser - Centre for Computational Personalized Medicine","isPartOf":{"@id":"https:\/\/sano.science\/#website"},"datePublished":"2023-07-25T20:09:27+00:00","dateModified":"2024-01-05T12:39:25+00:00","breadcrumb":{"@id":"https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/sano.science\/research\/sanda-a-small-and-incomplete-dataset-analyser\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/sano.science\/"},{"@type":"ListItem","position":2,"name":"Research","item":"https:\/\/sano.science\/research\/"},{"@type":"ListItem","position":3,"name":"Publications","item":"https:\/\/sano.science\/research-type\/publications\/"},{"@type":"ListItem","position":4,"name":"SaNDA: A small and iNcomplete dataset analyser"}]},{"@type":"WebSite","@id":"https:\/\/sano.science\/#website","url":"https:\/\/sano.science\/","name":"Centre for Computational Personalized Medicine","description":"Sano \u2013 Centre for Computational Medicine","publisher":{"@id":"https:\/\/sano.science\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/sano.science\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/sano.science\/#organization","name":"Sano \u2013 Centre for Computational Medicine","alternateName":"Sano","url":"https:\/\/sano.science\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/sano.science\/#\/schema\/logo\/image\/","url":"https:\/\/sano.science\/wp-content\/uploads\/2024\/05\/logo_sano_podstawowe.png","contentUrl":"https:\/\/sano.science\/wp-content\/uploads\/2024\/05\/logo_sano_podstawowe.png","width":700,"height":265,"caption":"Sano \u2013 Centre for Computational Medicine"},"image":{"@id":"https:\/\/sano.science\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/sano.science\/","https:\/\/x.com\/sanoscience","https:\/\/www.linkedin.com\/company\/sanoscience\/","https:\/\/www.youtube.com\/channel\/UCDZ_8TcjMWUG2ZcgKKgfpwQ","https:\/\/bsky.app\/profile\/sanoscience.bsky.social"]}]}},"acf":[],"gutenberg_blocks":[{"blockName":"custom-styles","attrs":{"styles":""}},{"blockName":"core\/heading","attrs":{"epAnimationGeneratedClass":"edplus_anim-IxLYNe","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<h2 class=\"wp-block-heading eplus-wrapper\"><strong>Alfredo Ibias, Varun Ravi Varma, Karol Capa\u0142a, Luca Gherardini, Jose Sousa<\/strong> <\/h2>\n","innerContent":["\n<h2 class=\"wp-block-heading eplus-wrapper\"><strong>Alfredo Ibias, Varun Ravi Varma, Karol Capa\u0142a, Luca Gherardini, Jose Sousa<\/strong> <\/h2>\n"]},{"blockName":"core\/spacer","attrs":{"height":"50px","epAnimationGeneratedClass":"edplus_anim-G8C2CJ","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n","innerContent":["\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n"]},{"blockName":"core\/paragraph","attrs":{"epAnimationGeneratedClass":"edplus_anim-D5g7jz","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<p class=\" eplus-wrapper\">In personalized health, small datasets with missing data are quite common. Current&nbsp;Machine Learning&nbsp;methods are unable to process such datasets in a meaningful way due to the huge data volume requirement. To address this problem, we propose a new Small and iNcomplete Dataset Analyser (SaNDA) to process such datasets in a meaningful way. Due to the characteristics of these datasets and the&nbsp;criticality&nbsp;of the domain, an explainable method is mandatory to facilitate decision-making interpretation. Thus, SaNDA prioritises&nbsp;explainability&nbsp;over efficiency by design. We evaluated our proposal against&nbsp;Random Forest&nbsp;as a baseline for explainable methods, and against gcForest as state-of-the-art for small datasets. We observed that our proposal outperforms Random Forest when there is more missing data and\/or a lower number of entries in the dataset, obtaining less favourable results over larger, well-curated datasets. It is also preferable than gcForest due to its explainability and privacy protection capabilities. Given the difficulties in obtaining complete, reliable data in the healthcare field, we consider that our proposal could be useful for practitioners.<\/p>\n","innerContent":["\n<p class=\" eplus-wrapper\">In personalized health, small datasets with missing data are quite common. Current&nbsp;Machine Learning&nbsp;methods are unable to process such datasets in a meaningful way due to the huge data volume requirement. To address this problem, we propose a new Small and iNcomplete Dataset Analyser (SaNDA) to process such datasets in a meaningful way. Due to the characteristics of these datasets and the&nbsp;criticality&nbsp;of the domain, an explainable method is mandatory to facilitate decision-making interpretation. Thus, SaNDA prioritises&nbsp;explainability&nbsp;over efficiency by design. We evaluated our proposal against&nbsp;Random Forest&nbsp;as a baseline for explainable methods, and against gcForest as state-of-the-art for small datasets. We observed that our proposal outperforms Random Forest when there is more missing data and\/or a lower number of entries in the dataset, obtaining less favourable results over larger, well-curated datasets. It is also preferable than gcForest due to its explainability and privacy protection capabilities. Given the difficulties in obtaining complete, reliable data in the healthcare field, we consider that our proposal could be useful for practitioners.<\/p>\n"]},{"blockName":"core\/spacer","attrs":{"height":"50px","epAnimationGeneratedClass":"edplus_anim-Zsu9XH","epGeneratedClass":"eplus-wrapper"},"innerBlocks":[],"innerHTML":"\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n","innerContent":["\n<div style=\"height:50px\" aria-hidden=\"true\" class=\"wp-block-spacer eplus-wrapper\"><\/div>\n"]},{"blockName":"acf\/button","attrs":{"title":"READ HERE","button_type":"link","url":"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S0020025523006631","button_style":"primary","target":"_blank","button_extra_classes":""},"innerBlocks":[],"innerHTML":"","innerContent":[]}],"meta_data":{"is_automatically_other_posts":true,"number_of_posts":"3","is_automatically_check_also_posts":true},"_links":{"self":[{"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research\/12905","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research"}],"about":[{"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/types\/research"}],"version-history":[{"count":7,"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research\/12905\/revisions"}],"predecessor-version":[{"id":14677,"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research\/12905\/revisions\/14677"}],"wp:attachment":[{"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/media?parent=12905"}],"wp:term":[{"taxonomy":"research_type","embeddable":true,"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research_type?post=12905"},{"taxonomy":"research_team","embeddable":true,"href":"https:\/\/sano.science\/index.php\/wp-json\/wp\/v2\/research_team?post=12905"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}