{"id":51,"date":"2026-05-21T15:10:12","date_gmt":"2026-05-21T22:10:12","guid":{"rendered":"https:\/\/drstats.pipelinedatascience.org\/?p=51"},"modified":"2026-05-21T15:41:20","modified_gmt":"2026-05-21T22:41:20","slug":"intuition-is-a-magnificently-powerful-engine","status":"publish","type":"post","link":"https:\/\/drstats.pipelinedatascience.org\/?p=51","title":{"rendered":"A Data Science Lesson Inspired by UCSF&#8217;s Health Atlas &#8212; Part 1"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I am not going to sugarcoat this: I am a statistician, and I believe a good deal of what we call \u201cdata science\u201d has to do with statistics and statistical thinking!<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But then again, I have been around for some time, and have collaborated enough with computer scientists, mathematicians, biologists, neuroscientists, physicians, and education experts to know that while statistics is very, and I mean VERY, useful in tackling big-time data science problems, its applicability and robustness are greatly amplified when it draws strength from the approaches practiced in CS, the hard sciences, Math Ed, philosophy, and many other disciplines.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One thing I learned from many years of teaching data analysis in the classroom is that <em>intuition is a magnificently powerful engine<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The other day I was attending this cool workshop where the magnificent instructors introduced us to this super awesome website: <a href=\"https:\/\/healthatlas.ucsf.edu\/\">UCSF Health Atlas.<\/a><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And let me tell you something: once you enter that site, the teacher in you immediately goes into self-drive mode.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"576\" height=\"327\" src=\"https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/drstats-health-atlas-fig01-screenshot.png\" alt=\"Screenshot of the UCSF Health Atlas\" class=\"wp-image-54\" srcset=\"https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/drstats-health-atlas-fig01-screenshot.png 576w, https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/drstats-health-atlas-fig01-screenshot-300x170.png 300w\" sizes=\"auto, (max-width: 576px) 100vw, 576px\" \/><figcaption class=\"wp-element-caption\">The UCSF Health Atlas Dashboard<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">You start zooming in and out of maps, changing variables, comparing counties, switching color palettes, checking rates, spotting patterns, and before you know it, you have exported half the database onto your laptop.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And that\u2019s exactly what happened to me.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I downloaded every file I could get my hands on.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>And then the fun really started.<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I began poking around the files like a <em>raccoon<\/em> that had just discovered an unattended Costco dumpster.<br><br>Some variables immediately jumped out at me. Educational attainment. Preschool enrollment. Poverty indicators. Disability prevalence.<br><br>Now listen, before we go any further, let me clarify something important.<\/p>\n\n\n\n<p class=\"is-style-default has-medium-font-size wp-block-paragraph\"><strong>Data scientists are professionally trained overthinkers.<\/strong><\/p>\n\n\n\n<p class=\"is-style-default wp-block-paragraph\">You see one variable and your brain says, \u201cnice.\u201d<br>You see two variables and your brain says, \u201chmmmm.\u201d<br>You see six variables and a map, and suddenly you\u2019re Sherlock Holmes wearing cargo shorts and debugging R code at 2:17 in the morning.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">One thing led to another, and before I knew it, I had settled on a simple question:<\/p>\n\n\n\n<p class=\"is-style-default wp-block-paragraph\"><strong>Do counties with lower educational opportunity and higher economic vulnerability also tend to show higher disability prevalence?<\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Notice something very important here.<\/p>\n\n\n\n<p class=\"has-text-align-center is-style-default wp-block-paragraph\"><strong>I did NOT begin with a sophisticated statistical model.<br><\/strong><br>I began with curiosity.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">That, my friends, is a huge part of data science.<br><br>The statistics come later. The machine learning comes later. The fancy words come later.<br><br>But first comes the \u201cI wonder if\u2026\u201d<br><br>And that little sentence is incredibly powerful.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"668\" src=\"https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image.png\" alt=\"\" class=\"wp-image-84\" srcset=\"https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image.png 936w, https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-300x214.png 300w, https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-768x548.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><\/figure>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\"><em>Disability prevalence versus educational attainment and poverty indicators.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>The first thing I wanted to see <\/strong>was how these variables behaved individually across the country.<br><br>A couple of things immediately became apparent. Disability prevalence across U.S. counties is not uniformly distributed. Neither is educational attainment. Some counties exhibit dramatically higher rates of adults lacking a high-school diploma, while others show substantially higher bachelor\u2019s degree attainment.<br><br>Already, your intuition starts <em>whispering<\/em> to you.<br><br>The scatterplots and correlation maps reinforced the visual patterns. Counties with higher percentages of adults lacking a high-school diploma tended to exhibit higher disability prevalence. Counties with higher proportions of extremely low-income households also tended to show elevated disability prevalence.<br><br>But the strongest visual relationship appeared to involve bachelor\u2019s degree attainment. The relationship was strikingly negative.<br><br>At this point, however, the maps started telling an even more compelling story.<br><br><br><strong>And THIS is where data science becomes essential.<\/strong><br><br>Clusters of elevated disability prevalence appeared throughout portions of Appalachia and the rural South. Similar spatial clustering emerged for low educational attainment and economic vulnerability.<br><br><strong>Now naturally, the statistician in me couldn\u2019t resist fitting a model.<br><\/strong><br><strong>Actually several models.<br><\/strong><br>First, I constructed a simple socioeconomic vulnerability index combining poverty burden, lower educational attainment, and weaker preschool enrollment indicators.<br><br>The relationship was remarkably clear. Counties with greater socioeconomic vulnerability tended to exhibit substantially higher disability prevalence.<br><br>Next, I fitted a standard multiple regression model predicting disability prevalence from poverty and educational indicators. The model explained approximately one-third of the county-level variation in disability prevalence.<br><br>But then came my favorite part.<br><br><strong>The spatial model.<br><\/strong><br>The spatial generalized additive model substantially improved predictive performance, explaining over half of the observed variability in disability prevalence.<br><br>And perhaps even more interestingly, the residual maps and Moran\u2019s statistics still revealed remaining spatial structure.<br><br><strong>Translation?<br><br>Even after accounting for education and poverty, geography was STILL trying to tell us something.<br><\/strong><br>That is one of the deepest lessons in all of data science.<br><br>Good analyses rarely \u201cfinish\u201d a problem.<br><br><strong>Instead, good analyses reveal better questions.<br><\/strong><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">And that may be the coolest part of the whole thing!<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"791\" height=\"1024\" src=\"https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-1-791x1024.png\" alt=\"\" class=\"wp-image-88\" srcset=\"https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-1-791x1024.png 791w, https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-1-232x300.png 232w, https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-1-768x994.png 768w, https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-1.png 936w\" sizes=\"auto, (max-width: 791px) 100vw, 791px\" \/><\/figure>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\"><em>Geographic patterns in disability prevalence, education, and poverty.<\/em><\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"576\" src=\"https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-2.png\" alt=\"\" class=\"wp-image-90\" srcset=\"https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-2.png 936w, https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-2-300x185.png 300w, https:\/\/drstats.pipelinedatascience.org\/wp-content\/uploads\/2026\/05\/image-2-768x473.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><\/figure>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\"><em>Residual geographic structure after spatial modeling.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I am not going to sugarcoat this: I am a statistician, and I believe a good deal of what we call \u201cdata science\u201d has to do with statistics and statistical thinking! But then again, I have been around for some time, and have collaborated enough with computer scientists, mathematicians, biologists, neuroscientists, physicians, and education experts [&hellip;]<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,5,1],"tags":[9,7],"class_list":["post-51","post","type-post","status-publish","format-standard","hentry","category-health-atlas-lesson","category-lesson-ideas","category-uncategorized","tag-data-science-lesson","tag-ucsf-health-atlas"],"_links":{"self":[{"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=\/wp\/v2\/posts\/51","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=51"}],"version-history":[{"count":16,"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=\/wp\/v2\/posts\/51\/revisions"}],"predecessor-version":[{"id":91,"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=\/wp\/v2\/posts\/51\/revisions\/91"}],"wp:attachment":[{"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=51"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=51"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/drstats.pipelinedatascience.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=51"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}