<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Eric Ekholm</title>
<link>https://www.ericekholm.com/blog.html</link>
<atom:link href="https://www.ericekholm.com/blog.xml" rel="self" type="application/rss+xml"/>
<description></description>
<generator>quarto-1.3.361</generator>
<lastBuildDate>Tue, 12 Nov 2024 05:00:00 GMT</lastBuildDate>
<item>
  <title>Deploying an API with R, Quarto, and Google Cloud</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/gcp-quarto-api/index.html</link>
  <description><![CDATA[ 




<p>I wrote a step-by-step tutorial on how to deploy an API (using R and the plumber package) that renders Quarto documents via Google Cloud. You can check out the README in <a href="https://github.com/ekholme/gcp_quarto_api">this repo</a> for a full walkthrough.</p>



<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2024,
  author = {Ekholm, Eric},
  title = {Deploying an {API} with {R,} {Quarto,} and {Google} {Cloud}},
  date = {2024-11-12},
  url = {https://www.ericekholm.com/posts/gcp-quarto-api},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2024" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2024. <span>“Deploying an API with R, Quarto, and Google
Cloud.”</span> November 12, 2024. <a href="https://www.ericekholm.com/posts/gcp-quarto-api">https://www.ericekholm.com/posts/gcp-quarto-api</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>GCP</category>
  <category>Quarto</category>
  <category>plumber</category>
  <guid>https://www.ericekholm.com/posts/gcp-quarto-api/index.html</guid>
  <pubDate>Tue, 12 Nov 2024 05:00:00 GMT</pubDate>
</item>
<item>
  <title>Introducing Bluey Colors</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/bluey-colors/index.html</link>
  <description><![CDATA[ 




<p>I just released an R package that provides Bluey-inspired color palettes to use in ggplot. Check it out <a href="https://ekholme.github.io/blueycolors/">here</a></p>



<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2024,
  author = {Ekholm, Eric},
  title = {Introducing {Bluey} {Colors}},
  date = {2024-05-16},
  url = {https://www.ericekholm.com/posts/bluey-colors},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2024" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2024. <span>“Introducing Bluey Colors.”</span> May 16,
2024. <a href="https://www.ericekholm.com/posts/bluey-colors">https://www.ericekholm.com/posts/bluey-colors</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>Bluey</category>
  <category>Package Development</category>
  <guid>https://www.ericekholm.com/posts/bluey-colors/index.html</guid>
  <pubDate>Thu, 16 May 2024 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Probability of Drawing a Full House</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/full-house/index.html</link>
  <description><![CDATA[ 




<p>I recently saw someone mention that they received an interview question for a DS position in which they were asked to calculate the probability of drawing a full house when drawing 5 cards from a standard 52-card deck.</p>
<p>So let’s solve that in Julia.</p>
<section id="solving-analytically" class="level2">
<h2 class="anchored" data-anchor-id="solving-analytically">Solving analytically</h2>
<p>The function we want is <code>binomial(n::Integer, k::Integer)</code>, which returns the <a href="https://en.wikipedia.org/wiki/Binomial_coefficient">binomial coefficient</a> – the number of ways to choose <code>k</code> out of <code>n</code> items.</p>
<p>Let’s look at some examples. First, if we try 4C1 (4 choose 1), we expect to just get 4 – there are 4 different ways to choose 1 item from a group of 4 items.</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binomial</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="2">
<pre><code>4</code></pre>
</div>
</div>
<p>Now imagine we choose 2 different items from a group of 4. We expect to get 6 (assuming we don’t care about order, i.e.&nbsp;that 1,2 is the same as 2,1):</p>
<ol type="1">
<li>1, 2</li>
<li>1, 3</li>
<li>1, 4</li>
<li>2, 3</li>
<li>2, 4</li>
<li>3, 4</li>
</ol>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb3-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binomial</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>6</code></pre>
</div>
</div>
<p>So let’s solve the actual problem now. A full house is 5 cards comprising 3-of-a-kind and a pair. There are 52 cards in a deck – 4 suits comprising 13 unique values (2, 3, …, Ace) each.</p>
<p>The approach here is to calculate the number of ways to get a full house and divide that by the number of ways to draw 5 cards from a deck. We can start with the number of ways to draw 5 cards from a deck (the denominator) first, since it’s the most straightforward:</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb5-1">denom <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binomial</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">52</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<pre><code>2598960</code></pre>
</div>
</div>
<p>Then let’s calculate the number of ways we can get three of a kind. There are 13 different card values and 4 different suits. We need to choose 1 value with 3 different suits:</p>
<ul>
<li><code>binomial(13, 1)</code> gives us the number of ways to choose 1 value from 13 options (which is just 13)</li>
<li><code>binomial(4, 3)</code> gives us the number of ways to choose 3 different suits from 4 possible options</li>
</ul>
<p>And then since this is probability, we multiply everything together:</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb7-1">three_kind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binomial</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binomial</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<pre><code>52</code></pre>
</div>
</div>
<p>Then we do the same thing for drawing a pair. There are now 12 different card values (we can’t get a pair of the value that we already drew three-of-a-kind for), and we need to choose 1 value with 2 different suits:</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb9-1">two_kind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binomial</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">binomial</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>72</code></pre>
</div>
</div>
<p>And from here, we can estimate the probability of a full house by multiplying and dividing:</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb11-1">(three_kind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> two_kind) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> denom</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<pre><code>0.0014405762304921968</code></pre>
</div>
</div>
<p>So there’s a 0.144% chance of drawing a full house from a typical 52-card deck.</p>
</section>
<section id="solving-with-simulation" class="level2">
<h2 class="anchored" data-anchor-id="solving-with-simulation">Solving with simulation</h2>
<p>We could also take a simulation-approach to solving this. First, let’s create a deck of cards.</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb13-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random</span></span>
<span id="cb13-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">StatsBase</span></span>
<span id="cb13-3"></span>
<span id="cb13-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random</span>.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seed!</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0408</span>)</span>
<span id="cb13-5"></span>
<span id="cb13-6">deck <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">repeat</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">13</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<pre><code>52-element Vector{Int64}:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
  ⋮
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13</code></pre>
</div>
</div>
<p>Then we’ll create a few functions to help us with the simulation:</p>
<ol type="1">
<li><code>make_hands()</code> will draw <code>n</code> 5-card hands from the deck;</li>
<li><code>is_full_house()</code> will check whether any given hand is a full house;</li>
<li><code>count_full_house()</code> takes a vector of hands and counts the number of them that have a full house</li>
</ol>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb15-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_hands</span>(deck<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">AbstractVector{&lt;:Integer}</span>, n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">Int64</span>)</span>
<span id="cb15-2">    v <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Vector</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">{Vector{Int64}}</span>(<span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">undef</span>, n)</span>
<span id="cb15-3">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>n</span>
<span id="cb15-4">        v[i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(deck, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>; replace<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">false</span>)</span>
<span id="cb15-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb15-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> v</span>
<span id="cb15-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb15-8"></span>
<span id="cb15-9"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_full_house</span>(hand<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">AbstractVector{&lt;:Integer}</span>)</span>
<span id="cb15-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extrema</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">values</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">countmap</span>(hand))) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb15-11"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb15-12"></span>
<span id="cb15-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count_full_house</span>(hands<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">Vector{Vector{Int64}}</span>)</span>
<span id="cb15-14">    s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb15-15">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">eachindex</span>(hands)</span>
<span id="cb15-16">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is_full_house</span>(hands[i])</span>
<span id="cb15-17">            s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb15-18">        <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb15-19">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb15-20">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> s</span>
<span id="cb15-21"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<pre><code>count_full_house (generic function with 1 method)</code></pre>
</div>
</div>
<p>Then from here we just run our simulation.</p>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb17-1">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1_000_000</span></span>
<span id="cb17-2"></span>
<span id="cb17-3">hands <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_hands</span>(deck, n);</span>
<span id="cb17-4"></span>
<span id="cb17-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count_full_house</span>(hands) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> n</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<pre><code>0.001446</code></pre>
</div>
</div>
<p>And we see that we get roughly the same answer as we did previously.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2024,
  author = {Ekholm, Eric},
  title = {Probability of {Drawing} a {Full} {House}},
  date = {2024-03-29},
  url = {https://www.ericekholm.com/posts/full-house},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2024" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2024. <span>“Probability of Drawing a Full House.”</span>
March 29, 2024. <a href="https://www.ericekholm.com/posts/full-house">https://www.ericekholm.com/posts/full-house</a>.
</div></div></section></div> ]]></description>
  <category>Julia</category>
  <category>Probability</category>
  <guid>https://www.ericekholm.com/posts/full-house/index.html</guid>
  <pubDate>Fri, 29 Mar 2024 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Crossfit Open ’24 Analysis</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/cf-open-24/index.html</link>
  <description><![CDATA[ 




<p>Now that the 2024 Open is over, I figured it might be fun to look at some of the data from this year’s top 100 finishers. There’s a lot we can look at here, and I’m sure I’m going to leave a few things out – feel free to drop me a line if there’s something, like, urgent that I missed.</p>
<p>If you’re into <a href="https://www.r-project.org/">R</a> and want to do your own analysis, you can check out my work-in-progress <a href="https://github.com/ekholme/crossfitgames">crossfitgames package</a> that has some tools for fetching and processing data from the CrossFit API. It’s in kind of a janky state right now, but it works fine for what I want to do today.</p>
<p>I’m including the code used to pull/clean data and create graphs here. It’ll be folded up by default, and I won’t really explain what it’s doing step-by-step, but if you’re into that sort of thing, you can take a look.</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(crossfitgames)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gt)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(hrbrthemes)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(lubridate)</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#placeholder for now</span></span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_ipsum_rc</span>())</span>
<span id="cb1-9"></span>
<span id="cb1-10">women_raw <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">open_leaderboard</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"women"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">top_n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb1-11">men_raw <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">open_leaderboard</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2024</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"men"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">top_n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb1-12"></span>
<span id="cb1-13">women_lb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_final_leaderboard</span>(women_raw) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">div =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"women"</span>)</span>
<span id="cb1-15"></span>
<span id="cb1-16">men_lb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_final_leaderboard</span>(men_raw) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">div =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"men"</span>)</span>
<span id="cb1-18"></span>
<span id="cb1-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#combine the two dataframes</span></span>
<span id="cb1-20">all_lb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(women_lb, men_lb)</span>
<span id="cb1-21"></span>
<span id="cb1-22">women_workout_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workout_results</span>(women_raw) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">div =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"women"</span>)</span>
<span id="cb1-24"></span>
<span id="cb1-25">men_workout_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workout_results</span>(men_raw) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">div =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"men"</span>)</span>
<span id="cb1-27"></span>
<span id="cb1-28">all_workouts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rbind</span>(women_workout_res, men_workout_res)</span></code></pre></div>
</details>
</div>
<section id="top-finishers" class="level2">
<h2 class="anchored" data-anchor-id="top-finishers">Top Finishers</h2>
<p>First, let’s look at the top finishers for the men and women. We’ll start with the women:</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">women_lb <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_min</span>(score, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>div) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb2-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb2-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb2-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Top 10 Women"</span>,</span>
<span id="cb2-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2024 CF Open"</span></span>
<span id="cb2-8">  )</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="mrtwwheggs" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

:where(#mrtwwheggs) .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

:where(#mrtwwheggs) .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

:where(#mrtwwheggs) .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

:where(#mrtwwheggs) .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

:where(#mrtwwheggs) .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

:where(#mrtwwheggs) .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

:where(#mrtwwheggs) .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

:where(#mrtwwheggs) .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

:where(#mrtwwheggs) .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

:where(#mrtwwheggs) .gt_from_md > :first-child {
  margin-top: 0;
}

:where(#mrtwwheggs) .gt_from_md > :last-child {
  margin-bottom: 0;
}

:where(#mrtwwheggs) .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

:where(#mrtwwheggs) .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#mrtwwheggs) .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

:where(#mrtwwheggs) .gt_row_group_first td {
  border-top-width: 2px;
}

:where(#mrtwwheggs) .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#mrtwwheggs) .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_first_summary_row.thick {
  border-top-width: 2px;
}

:where(#mrtwwheggs) .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#mrtwwheggs) .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

:where(#mrtwwheggs) .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-left: 4px;
  padding-right: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#mrtwwheggs) .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#mrtwwheggs) .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#mrtwwheggs) .gt_left {
  text-align: left;
}

:where(#mrtwwheggs) .gt_center {
  text-align: center;
}

:where(#mrtwwheggs) .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

:where(#mrtwwheggs) .gt_font_normal {
  font-weight: normal;
}

:where(#mrtwwheggs) .gt_font_bold {
  font-weight: bold;
}

:where(#mrtwwheggs) .gt_font_italic {
  font-style: italic;
}

:where(#mrtwwheggs) .gt_super {
  font-size: 65%;
}

:where(#mrtwwheggs) .gt_two_val_uncert {
  display: inline-block;
  line-height: 1em;
  text-align: right;
  font-size: 60%;
  vertical-align: -0.25em;
  margin-left: 0.1em;
}

:where(#mrtwwheggs) .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 75%;
  vertical-align: 0.4em;
}

:where(#mrtwwheggs) .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

:where(#mrtwwheggs) .gt_slash_mark {
  font-size: 0.7em;
  line-height: 0.7em;
  vertical-align: 0.15em;
}

:where(#mrtwwheggs) .gt_fraction_numerator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: 0.45em;
}

:where(#mrtwwheggs) .gt_fraction_denominator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: -0.05em;
}
</style>
<table class="gt_table">
  <thead class="gt_header">
    <tr>
      <th colspan="3" class="gt_heading gt_title gt_font_normal" style="">Top 10 Women</th>
    </tr>
    <tr>
      <th colspan="3" class="gt_heading gt_subtitle gt_font_normal gt_bottom_border" style="">2024 CF Open</th>
    </tr>
  </thead>
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">rank</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1">athlete</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">score</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td class="gt_row gt_right">1</td>
<td class="gt_row gt_left">Grace Walton</td>
<td class="gt_row gt_right">35</td></tr>
    <tr><td class="gt_row gt_right">2</td>
<td class="gt_row gt_left">Mirjam von Rohr</td>
<td class="gt_row gt_right">40</td></tr>
    <tr><td class="gt_row gt_right">3</td>
<td class="gt_row gt_left">Anikha Greer</td>
<td class="gt_row gt_right">72</td></tr>
    <tr><td class="gt_row gt_right">4</td>
<td class="gt_row gt_left">Arielle Loewen</td>
<td class="gt_row gt_right">77</td></tr>
    <tr><td class="gt_row gt_right">5</td>
<td class="gt_row gt_left">Carolyne Prevost</td>
<td class="gt_row gt_right">91</td></tr>
    <tr><td class="gt_row gt_right">6</td>
<td class="gt_row gt_left">Christina Agerbeck</td>
<td class="gt_row gt_right">97</td></tr>
    <tr><td class="gt_row gt_right">7</td>
<td class="gt_row gt_left">Julia Blazejowska</td>
<td class="gt_row gt_right">113</td></tr>
    <tr><td class="gt_row gt_right">8</td>
<td class="gt_row gt_left">Seher Kaya</td>
<td class="gt_row gt_right">114</td></tr>
    <tr><td class="gt_row gt_right">8</td>
<td class="gt_row gt_left">Katrina DiGiacomo</td>
<td class="gt_row gt_right">114</td></tr>
    <tr><td class="gt_row gt_right">10</td>
<td class="gt_row gt_left">Kara Saunders</td>
<td class="gt_row gt_right">115</td></tr>
    <tr><td class="gt_row gt_right">10</td>
<td class="gt_row gt_left">Aimee Cringle</td>
<td class="gt_row gt_right">115</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
<p>One thing to keep in mind here is that lower scores are better in the Open, since your score is the sum of your rankings for each workout. Since there are 3 workouts, the best score possible would be 3.</p>
<p>On the women’s side, the most recognizable names are Loewen and Saunders, and a lot of the bigger names (Tia, Laura, Emma Lawson) are absent. This shouldn’t be concerning, but it’s potentially interesting.</p>
<p>And moving to the men’s side:</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">men_lb <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_min</span>(score, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>div) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb3-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb3-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb3-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Top 10 Men"</span>,</span>
<span id="cb3-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2024 CF Open"</span></span>
<span id="cb3-8">  )</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="okiistpnxi" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

:where(#okiistpnxi) .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

:where(#okiistpnxi) .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

:where(#okiistpnxi) .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

:where(#okiistpnxi) .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

:where(#okiistpnxi) .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

:where(#okiistpnxi) .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

:where(#okiistpnxi) .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

:where(#okiistpnxi) .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

:where(#okiistpnxi) .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

:where(#okiistpnxi) .gt_from_md > :first-child {
  margin-top: 0;
}

:where(#okiistpnxi) .gt_from_md > :last-child {
  margin-bottom: 0;
}

:where(#okiistpnxi) .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

:where(#okiistpnxi) .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#okiistpnxi) .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

:where(#okiistpnxi) .gt_row_group_first td {
  border-top-width: 2px;
}

:where(#okiistpnxi) .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#okiistpnxi) .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_first_summary_row.thick {
  border-top-width: 2px;
}

:where(#okiistpnxi) .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#okiistpnxi) .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

:where(#okiistpnxi) .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-left: 4px;
  padding-right: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#okiistpnxi) .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#okiistpnxi) .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#okiistpnxi) .gt_left {
  text-align: left;
}

:where(#okiistpnxi) .gt_center {
  text-align: center;
}

:where(#okiistpnxi) .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

:where(#okiistpnxi) .gt_font_normal {
  font-weight: normal;
}

:where(#okiistpnxi) .gt_font_bold {
  font-weight: bold;
}

:where(#okiistpnxi) .gt_font_italic {
  font-style: italic;
}

:where(#okiistpnxi) .gt_super {
  font-size: 65%;
}

:where(#okiistpnxi) .gt_two_val_uncert {
  display: inline-block;
  line-height: 1em;
  text-align: right;
  font-size: 60%;
  vertical-align: -0.25em;
  margin-left: 0.1em;
}

:where(#okiistpnxi) .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 75%;
  vertical-align: 0.4em;
}

:where(#okiistpnxi) .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

:where(#okiistpnxi) .gt_slash_mark {
  font-size: 0.7em;
  line-height: 0.7em;
  vertical-align: 0.15em;
}

:where(#okiistpnxi) .gt_fraction_numerator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: 0.45em;
}

:where(#okiistpnxi) .gt_fraction_denominator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: -0.05em;
}
</style>
<table class="gt_table">
  <thead class="gt_header">
    <tr>
      <th colspan="3" class="gt_heading gt_title gt_font_normal" style="">Top 10 Men</th>
    </tr>
    <tr>
      <th colspan="3" class="gt_heading gt_subtitle gt_font_normal gt_bottom_border" style="">2024 CF Open</th>
    </tr>
  </thead>
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">rank</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1">athlete</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">score</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td class="gt_row gt_right">1</td>
<td class="gt_row gt_left">Jonne Koski</td>
<td class="gt_row gt_right">30</td></tr>
    <tr><td class="gt_row gt_right">2</td>
<td class="gt_row gt_left">Saxon Panchik</td>
<td class="gt_row gt_right">39</td></tr>
    <tr><td class="gt_row gt_right">3</td>
<td class="gt_row gt_left">Jay Crouch</td>
<td class="gt_row gt_right">64</td></tr>
    <tr><td class="gt_row gt_right">4</td>
<td class="gt_row gt_left">Luka Vunjak</td>
<td class="gt_row gt_right">81</td></tr>
    <tr><td class="gt_row gt_right">5</td>
<td class="gt_row gt_left">Noah Ohlsen</td>
<td class="gt_row gt_right">82</td></tr>
    <tr><td class="gt_row gt_right">6</td>
<td class="gt_row gt_left">Fabian Beneito</td>
<td class="gt_row gt_right">93</td></tr>
    <tr><td class="gt_row gt_right">7</td>
<td class="gt_row gt_left">Cale Layman</td>
<td class="gt_row gt_right">95</td></tr>
    <tr><td class="gt_row gt_right">8</td>
<td class="gt_row gt_left">Jeffrey Adler</td>
<td class="gt_row gt_right">103</td></tr>
    <tr><td class="gt_row gt_right">9</td>
<td class="gt_row gt_left">Brandon Luckett</td>
<td class="gt_row gt_right">124</td></tr>
    <tr><td class="gt_row gt_right">10</td>
<td class="gt_row gt_left">Patrick Vellner</td>
<td class="gt_row gt_right">139</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
<p>The story is a little different here – lots of noticeable names that have historically done well at the games. Vellner, Adler, Ohlsen, Koski, Panchik, and Crouch.</p>
</section>
<section id="men-and-women-scores-by-rank" class="level2">
<h2 class="anchored" data-anchor-id="men-and-women-scores-by-rank">Men and Women Scores by Rank</h2>
<p>Another way we might look at the data is to compare the scores of the men to those of the women. This can give us a sense of the “depth” or level of competition across the field.</p>
<p>For instance, we can plot each athlete’s rank (overall placement) against the total score for men and women:</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(all_lb, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> rank, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> score, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> div)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_reverse</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb4-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Score"</span>,</span>
<span id="cb4-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rank (Overall Place)"</span>,</span>
<span id="cb4-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Score by Overall Rank"</span>,</span>
<span id="cb4-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Top 100 Men and Women"</span></span>
<span id="cb4-9">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_discrete</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Division"</span>)</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/cf-open-24/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>In cases where, for any given rank, the women’s score is less than the men’s score, we can assume (sort of) the women outperformed the men. For example, the 100th place woman had considerably fewer points than the 100th place man. This might tell us that, relative to the rest of the field, the 100th place woman is “better” than the 100th place man. Of course it’s not that straightforward – we could just as reasonably conclude that the men’s field is deeper than the women’s field – but it’s a fun thought exercise.</p>
<p>What is interesting, though, is that the score gap between the 1st and 100th woman is much smaller than the gap between the 1st and 100th point man. And that we see separation between the men and women around ~50th place.</p>
</section>
<section id="variance-in-workouts-women" class="level2">
<h2 class="anchored" data-anchor-id="variance-in-workouts-women">Variance in Workouts – Women</h2>
<p>Another interesting datapoint we can look at is the spread between an athlete’s best finish and their worst finish. Since the open is only 3 events, this rank will explain a lot of their overall variance (at the Games, where there are more events, the best-to-worst spread is kinda less important). Obviously, all of these athletes finished top 100 overall, so nobody completely bombed anything.</p>
<p>Let’s start with the women:</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">women_best_worst <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> women_workout_res <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb5-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">worst =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(workout_place),</span>
<span id="cb5-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">best =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(workout_place),</span>
<span id="cb5-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">spread =</span> worst <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> best</span>
<span id="cb5-7">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(women_lb, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"athlete"</span>)</span>
<span id="cb5-9"></span>
<span id="cb5-10">women_best_worst <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> rank, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> best)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_segment</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xend =</span> worst, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yend =</span> rank), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"steelblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_reverse</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb5-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Finish Position (Spread)"</span>,</span>
<span id="cb5-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rank (Overall Place)"</span>,</span>
<span id="cb5-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Best and Worst Finishes by Overall Rank"</span>,</span>
<span id="cb5-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Women"</span></span>
<span id="cb5-19">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/cf-open-24/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>Ok, so the way you interpret this plot:</p>
<ul>
<li>The athlete’s overall rank is on the y-axis (the vertical axis)</li>
<li>The blue bar represents the spread of their performance. The left end is their best performance, the right end is their worst performance, and the width is the gap between best and worst</li>
</ul>
<p>Keep in mind that having a small spread between your best and worst finish isn’t <em>necessarily</em> a good thing if your best finish is relatively high.</p>
<p>All in all, though, we tend to see bigger spreads the further down we get in the rankings, which kinda makes sense because a single bad event can crush your overall score with just 3 total events.</p>
<p>From here, we might want to look at which women had the largest best-to-worst even spreads:</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">women_best_worst <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_max</span>(spread, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>div) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb6-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Largest Differences between Best and Worst Event"</span>,</span>
<span id="cb6-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Women, CF Open 2024"</span></span>
<span id="cb6-8">  )</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="lhkfpexygq" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

:where(#lhkfpexygq) .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

:where(#lhkfpexygq) .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

:where(#lhkfpexygq) .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

:where(#lhkfpexygq) .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

:where(#lhkfpexygq) .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

:where(#lhkfpexygq) .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

:where(#lhkfpexygq) .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

:where(#lhkfpexygq) .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

:where(#lhkfpexygq) .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

:where(#lhkfpexygq) .gt_from_md > :first-child {
  margin-top: 0;
}

:where(#lhkfpexygq) .gt_from_md > :last-child {
  margin-bottom: 0;
}

:where(#lhkfpexygq) .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

:where(#lhkfpexygq) .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#lhkfpexygq) .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

:where(#lhkfpexygq) .gt_row_group_first td {
  border-top-width: 2px;
}

:where(#lhkfpexygq) .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#lhkfpexygq) .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_first_summary_row.thick {
  border-top-width: 2px;
}

:where(#lhkfpexygq) .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#lhkfpexygq) .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

:where(#lhkfpexygq) .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-left: 4px;
  padding-right: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#lhkfpexygq) .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#lhkfpexygq) .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#lhkfpexygq) .gt_left {
  text-align: left;
}

:where(#lhkfpexygq) .gt_center {
  text-align: center;
}

:where(#lhkfpexygq) .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

:where(#lhkfpexygq) .gt_font_normal {
  font-weight: normal;
}

:where(#lhkfpexygq) .gt_font_bold {
  font-weight: bold;
}

:where(#lhkfpexygq) .gt_font_italic {
  font-style: italic;
}

:where(#lhkfpexygq) .gt_super {
  font-size: 65%;
}

:where(#lhkfpexygq) .gt_two_val_uncert {
  display: inline-block;
  line-height: 1em;
  text-align: right;
  font-size: 60%;
  vertical-align: -0.25em;
  margin-left: 0.1em;
}

:where(#lhkfpexygq) .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 75%;
  vertical-align: 0.4em;
}

:where(#lhkfpexygq) .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

:where(#lhkfpexygq) .gt_slash_mark {
  font-size: 0.7em;
  line-height: 0.7em;
  vertical-align: 0.15em;
}

:where(#lhkfpexygq) .gt_fraction_numerator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: 0.45em;
}

:where(#lhkfpexygq) .gt_fraction_denominator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: -0.05em;
}
</style>
<table class="gt_table">
  <thead class="gt_header">
    <tr>
      <th colspan="6" class="gt_heading gt_title gt_font_normal" style="">Largest Differences between Best and Worst Event</th>
    </tr>
    <tr>
      <th colspan="6" class="gt_heading gt_subtitle gt_font_normal gt_bottom_border" style="">Women, CF Open 2024</th>
    </tr>
  </thead>
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1">athlete</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">worst</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">best</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">spread</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">rank</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">score</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td class="gt_row gt_left">Emily Rolfe</td>
<td class="gt_row gt_right">504</td>
<td class="gt_row gt_right">22</td>
<td class="gt_row gt_right">482</td>
<td class="gt_row gt_right">99</td>
<td class="gt_row gt_right">596</td></tr>
    <tr><td class="gt_row gt_left">Caitlin Bernardin</td>
<td class="gt_row gt_right">475</td>
<td class="gt_row gt_right">18</td>
<td class="gt_row gt_right">457</td>
<td class="gt_row gt_right">89</td>
<td class="gt_row gt_right">543</td></tr>
    <tr><td class="gt_row gt_left">Tracy Johnson</td>
<td class="gt_row gt_right">450</td>
<td class="gt_row gt_right">39</td>
<td class="gt_row gt_right">411</td>
<td class="gt_row gt_right">100</td>
<td class="gt_row gt_right">605</td></tr>
    <tr><td class="gt_row gt_left">Laura Horvath</td>
<td class="gt_row gt_right">395</td>
<td class="gt_row gt_right">6</td>
<td class="gt_row gt_right">389</td>
<td class="gt_row gt_right">62</td>
<td class="gt_row gt_right">408</td></tr>
    <tr><td class="gt_row gt_left">Aizhan Zharasova</td>
<td class="gt_row gt_right">400</td>
<td class="gt_row gt_right">33</td>
<td class="gt_row gt_right">367</td>
<td class="gt_row gt_right">94</td>
<td class="gt_row gt_right">573</td></tr>
    <tr><td class="gt_row gt_left">Sara Alicia Fernandez Costas</td>
<td class="gt_row gt_right">354</td>
<td class="gt_row gt_right">6</td>
<td class="gt_row gt_right">348</td>
<td class="gt_row gt_right">77</td>
<td class="gt_row gt_right">477</td></tr>
    <tr><td class="gt_row gt_left">Baylee Rayl Christophel</td>
<td class="gt_row gt_right">380</td>
<td class="gt_row gt_right">39</td>
<td class="gt_row gt_right">341</td>
<td class="gt_row gt_right">80</td>
<td class="gt_row gt_right">499</td></tr>
    <tr><td class="gt_row gt_left">Makenna Enslin</td>
<td class="gt_row gt_right">382</td>
<td class="gt_row gt_right">48</td>
<td class="gt_row gt_right">334</td>
<td class="gt_row gt_right">85</td>
<td class="gt_row gt_right">514</td></tr>
    <tr><td class="gt_row gt_left">Addison DesRosiers</td>
<td class="gt_row gt_right">348</td>
<td class="gt_row gt_right">20</td>
<td class="gt_row gt_right">328</td>
<td class="gt_row gt_right">76</td>
<td class="gt_row gt_right">475</td></tr>
    <tr><td class="gt_row gt_left">Linda Keesman</td>
<td class="gt_row gt_right">363</td>
<td class="gt_row gt_right">41</td>
<td class="gt_row gt_right">322</td>
<td class="gt_row gt_right">98</td>
<td class="gt_row gt_right">584</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
<p>We see Laura on here, and I think she did (relatively) poorly on the first event compared to her other 2 event finishes, hence the large spread:</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">women_workout_res <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Laura Horvath"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>div) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb7-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Laura Horvath Open Finishes"</span></span>
<span id="cb7-7">  )</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="kwggryfbws" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

:where(#kwggryfbws) .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

:where(#kwggryfbws) .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

:where(#kwggryfbws) .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

:where(#kwggryfbws) .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

:where(#kwggryfbws) .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

:where(#kwggryfbws) .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

:where(#kwggryfbws) .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

:where(#kwggryfbws) .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

:where(#kwggryfbws) .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

:where(#kwggryfbws) .gt_from_md > :first-child {
  margin-top: 0;
}

:where(#kwggryfbws) .gt_from_md > :last-child {
  margin-bottom: 0;
}

:where(#kwggryfbws) .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

:where(#kwggryfbws) .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#kwggryfbws) .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

:where(#kwggryfbws) .gt_row_group_first td {
  border-top-width: 2px;
}

:where(#kwggryfbws) .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#kwggryfbws) .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_first_summary_row.thick {
  border-top-width: 2px;
}

:where(#kwggryfbws) .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#kwggryfbws) .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

:where(#kwggryfbws) .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-left: 4px;
  padding-right: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#kwggryfbws) .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#kwggryfbws) .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#kwggryfbws) .gt_left {
  text-align: left;
}

:where(#kwggryfbws) .gt_center {
  text-align: center;
}

:where(#kwggryfbws) .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

:where(#kwggryfbws) .gt_font_normal {
  font-weight: normal;
}

:where(#kwggryfbws) .gt_font_bold {
  font-weight: bold;
}

:where(#kwggryfbws) .gt_font_italic {
  font-style: italic;
}

:where(#kwggryfbws) .gt_super {
  font-size: 65%;
}

:where(#kwggryfbws) .gt_two_val_uncert {
  display: inline-block;
  line-height: 1em;
  text-align: right;
  font-size: 60%;
  vertical-align: -0.25em;
  margin-left: 0.1em;
}

:where(#kwggryfbws) .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 75%;
  vertical-align: 0.4em;
}

:where(#kwggryfbws) .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

:where(#kwggryfbws) .gt_slash_mark {
  font-size: 0.7em;
  line-height: 0.7em;
  vertical-align: 0.15em;
}

:where(#kwggryfbws) .gt_fraction_numerator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: 0.45em;
}

:where(#kwggryfbws) .gt_fraction_denominator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: -0.05em;
}
</style>
<table class="gt_table">
  <thead class="gt_header">
    <tr>
      <th colspan="5" class="gt_heading gt_title gt_font_normal gt_bottom_border" style="">Laura Horvath Open Finishes</th>
    </tr>
    
  </thead>
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1">athlete</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">workout_num</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">workout_place</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">points</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1">score</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td class="gt_row gt_left">Laura Horvath</td>
<td class="gt_row gt_right">1</td>
<td class="gt_row gt_right">395</td>
<td class="gt_row gt_right">395</td>
<td class="gt_row gt_left">7:03</td></tr>
    <tr><td class="gt_row gt_left">Laura Horvath</td>
<td class="gt_row gt_right">2</td>
<td class="gt_row gt_right">6</td>
<td class="gt_row gt_right">6</td>
<td class="gt_row gt_left">911 reps</td></tr>
    <tr><td class="gt_row gt_left">Laura Horvath</td>
<td class="gt_row gt_right">3</td>
<td class="gt_row gt_right">7</td>
<td class="gt_row gt_right">7</td>
<td class="gt_row gt_left">8:59</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
<p>Right, so she finished 395th in event 1, but then 6th and 7th.</p>
</section>
<section id="variance-in-workouts-men" class="level2">
<h2 class="anchored" data-anchor-id="variance-in-workouts-men">Variance in Workouts – Men</h2>
<p>And we can do the same thing for the men:</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">men_best_worst <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> men_workout_res <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb8-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">worst =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(workout_place),</span>
<span id="cb8-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">best =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(workout_place),</span>
<span id="cb8-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">spread =</span> worst <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> best</span>
<span id="cb8-7">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(men_lb, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"athlete"</span>)</span>
<span id="cb8-9"></span>
<span id="cb8-10">men_best_worst <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb8-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> rank, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> best)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_segment</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xend =</span> worst, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yend =</span> rank), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"steelblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_reverse</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb8-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Finish Position (Spread)"</span>,</span>
<span id="cb8-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rank (Overall Place)"</span>,</span>
<span id="cb8-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Best and Worst Finishes by Overall Rank"</span>,</span>
<span id="cb8-18">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Men"</span></span>
<span id="cb8-19">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/cf-open-24/index_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>So the point about having a small spread not necessarily being ideal is illustrated here – we see whoever finished in 98th has a teeny spread, but all of his finishes were ~250ish.</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">men_best_worst <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_max</span>(spread, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb9-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb9-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Largest Differences between Best and Worst Event"</span>,</span>
<span id="cb9-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Men, CF Open 2024"</span></span>
<span id="cb9-7">  )</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="uhlnjkpcmn" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

:where(#uhlnjkpcmn) .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

:where(#uhlnjkpcmn) .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

:where(#uhlnjkpcmn) .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

:where(#uhlnjkpcmn) .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

:where(#uhlnjkpcmn) .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

:where(#uhlnjkpcmn) .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

:where(#uhlnjkpcmn) .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

:where(#uhlnjkpcmn) .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

:where(#uhlnjkpcmn) .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

:where(#uhlnjkpcmn) .gt_from_md > :first-child {
  margin-top: 0;
}

:where(#uhlnjkpcmn) .gt_from_md > :last-child {
  margin-bottom: 0;
}

:where(#uhlnjkpcmn) .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

:where(#uhlnjkpcmn) .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#uhlnjkpcmn) .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

:where(#uhlnjkpcmn) .gt_row_group_first td {
  border-top-width: 2px;
}

:where(#uhlnjkpcmn) .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#uhlnjkpcmn) .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_first_summary_row.thick {
  border-top-width: 2px;
}

:where(#uhlnjkpcmn) .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#uhlnjkpcmn) .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

:where(#uhlnjkpcmn) .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-left: 4px;
  padding-right: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#uhlnjkpcmn) .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#uhlnjkpcmn) .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#uhlnjkpcmn) .gt_left {
  text-align: left;
}

:where(#uhlnjkpcmn) .gt_center {
  text-align: center;
}

:where(#uhlnjkpcmn) .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

:where(#uhlnjkpcmn) .gt_font_normal {
  font-weight: normal;
}

:where(#uhlnjkpcmn) .gt_font_bold {
  font-weight: bold;
}

:where(#uhlnjkpcmn) .gt_font_italic {
  font-style: italic;
}

:where(#uhlnjkpcmn) .gt_super {
  font-size: 65%;
}

:where(#uhlnjkpcmn) .gt_two_val_uncert {
  display: inline-block;
  line-height: 1em;
  text-align: right;
  font-size: 60%;
  vertical-align: -0.25em;
  margin-left: 0.1em;
}

:where(#uhlnjkpcmn) .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 75%;
  vertical-align: 0.4em;
}

:where(#uhlnjkpcmn) .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

:where(#uhlnjkpcmn) .gt_slash_mark {
  font-size: 0.7em;
  line-height: 0.7em;
  vertical-align: 0.15em;
}

:where(#uhlnjkpcmn) .gt_fraction_numerator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: 0.45em;
}

:where(#uhlnjkpcmn) .gt_fraction_denominator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: -0.05em;
}
</style>
<table class="gt_table">
  <thead class="gt_header">
    <tr>
      <th colspan="7" class="gt_heading gt_title gt_font_normal" style="">Largest Differences between Best and Worst Event</th>
    </tr>
    <tr>
      <th colspan="7" class="gt_heading gt_subtitle gt_font_normal gt_bottom_border" style="">Men, CF Open 2024</th>
    </tr>
  </thead>
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1">athlete</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">worst</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">best</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">spread</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">rank</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1">score</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1">div</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td class="gt_row gt_left">Colin Bosshard</td>
<td class="gt_row gt_right">605</td>
<td class="gt_row gt_right">3</td>
<td class="gt_row gt_right">602</td>
<td class="gt_row gt_right">86</td>
<td class="gt_row gt_right">659</td>
<td class="gt_row gt_left">men</td></tr>
    <tr><td class="gt_row gt_left">Lazar Đukić</td>
<td class="gt_row gt_right">609</td>
<td class="gt_row gt_right">26</td>
<td class="gt_row gt_right">583</td>
<td class="gt_row gt_right">88</td>
<td class="gt_row gt_right">662</td>
<td class="gt_row gt_left">men</td></tr>
    <tr><td class="gt_row gt_left">Henry Matthews</td>
<td class="gt_row gt_right">609</td>
<td class="gt_row gt_right">35</td>
<td class="gt_row gt_right">574</td>
<td class="gt_row gt_right">90</td>
<td class="gt_row gt_right">689</td>
<td class="gt_row gt_left">men</td></tr>
    <tr><td class="gt_row gt_left">Taylor Self</td>
<td class="gt_row gt_right">565</td>
<td class="gt_row gt_right">2</td>
<td class="gt_row gt_right">563</td>
<td class="gt_row gt_right">84</td>
<td class="gt_row gt_right">650</td>
<td class="gt_row gt_left">men</td></tr>
    <tr><td class="gt_row gt_left">Fernando Llaneza Pardillos</td>
<td class="gt_row gt_right">567</td>
<td class="gt_row gt_right">13</td>
<td class="gt_row gt_right">554</td>
<td class="gt_row gt_right">82</td>
<td class="gt_row gt_right">606</td>
<td class="gt_row gt_left">men</td></tr>
    <tr><td class="gt_row gt_left">Bailey MacDonald</td>
<td class="gt_row gt_right">577</td>
<td class="gt_row gt_right">51</td>
<td class="gt_row gt_right">526</td>
<td class="gt_row gt_right">95</td>
<td class="gt_row gt_right">723</td>
<td class="gt_row gt_left">men</td></tr>
    <tr><td class="gt_row gt_left">Carlos Ferrara Coloma</td>
<td class="gt_row gt_right">466</td>
<td class="gt_row gt_right">5</td>
<td class="gt_row gt_right">461</td>
<td class="gt_row gt_right">70</td>
<td class="gt_row gt_right">509</td>
<td class="gt_row gt_left">men</td></tr>
    <tr><td class="gt_row gt_left">Tyler Christophel</td>
<td class="gt_row gt_right">522</td>
<td class="gt_row gt_right">82</td>
<td class="gt_row gt_right">440</td>
<td class="gt_row gt_right">96</td>
<td class="gt_row gt_right">725</td>
<td class="gt_row gt_left">men</td></tr>
    <tr><td class="gt_row gt_left">Javier Gonzalez Fernandez</td>
<td class="gt_row gt_right">497</td>
<td class="gt_row gt_right">64</td>
<td class="gt_row gt_right">433</td>
<td class="gt_row gt_right">100</td>
<td class="gt_row gt_right">754</td>
<td class="gt_row gt_left">men</td></tr>
    <tr><td class="gt_row gt_left">Chandler Smith</td>
<td class="gt_row gt_right">442</td>
<td class="gt_row gt_right">12</td>
<td class="gt_row gt_right">430</td>
<td class="gt_row gt_right">72</td>
<td class="gt_row gt_right">525</td>
<td class="gt_row gt_left">men</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
<p>These spreads are wider than the ones we saw for the women, which again maybe suggests that the men’s field is more variable than the women’s field?</p>
</section>
<section id="section" class="level2">
<h2 class="anchored" data-anchor-id="section">24.1</h2>
<p>Now let’s look briefly at individual workout results. Obviously, we’ll start with 24.1. You can see the workout <a href="https://games.crossfit.com/workouts/open/2024/1">here</a>.</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">wk1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> all_workouts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(workout_num <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time_score =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">period_to_seconds</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ms</span>(score)))</span>
<span id="cb10-4"></span>
<span id="cb10-5">wk1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> time_score, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> workout_place, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> div)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_reverse</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_discrete</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Division"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb10-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Time (Seconds)"</span>,</span>
<span id="cb10-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Workout Place"</span>,</span>
<span id="cb10-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Time to Complete 24.1 by Workout Place"</span>,</span>
<span id="cb10-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Top 100 Overall Men and Women"</span></span>
<span id="cb10-15">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/cf-open-24/index_files/figure-html/unnamed-chunk-9-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>I think this is a pretty cool plot in that it shows the “shape” of the performances by these athletes. The waterfall shape suggests that there’s more separation (horizontal space) amongst the very top finishers, but the increasingly steep slope toward the right end of the graph suggests there’s less time between worse-finishing places. Which makes sense.</p>
<p>See that one pink dot all the way to the left – who crushed this workout that hard?</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">wk1 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">slice_min</span>(time_score, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb11-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(athlete)</span></code></pre></div>
</details>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "Colten Mertens"</code></pre>
</div>
</div>
</section>
<section id="section-1" class="level2">
<h2 class="anchored" data-anchor-id="section-1">24.2</h2>
<p>We’ll do the same thing for 24.2. You can see the workout description <a href="https://games.crossfit.com/workouts/open/2024/2">here</a></p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">wk2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> all_workouts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(workout_num <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb13-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n_reps =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">parse_number</span>(score))</span>
<span id="cb13-4"></span>
<span id="cb13-5">wk2 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb13-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> n_reps, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> workout_place, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> div)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_reverse</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_discrete</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Division"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb13-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Reps"</span>,</span>
<span id="cb13-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Workout Place"</span>,</span>
<span id="cb13-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Reps Completed in 24.2 by Workout Place"</span>,</span>
<span id="cb13-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Top 100 Overall Men and Women"</span></span>
<span id="cb13-15">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/cf-open-24/index_files/figure-html/unnamed-chunk-11-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>We see basically the inverse pattern here – the curve flattens toward the right (since more reps is better in this case). What’s interesting is that we see this flattening occur at around the top ~100 finishers of the workout for women, but at maybe the top ~30 for men. This might have something to do with the scoring (1 double under obviously isn’t equivalent to 1 deadlift), or it might be an indication that the top 100 women were considerably better than the next others in this workout. Again, it’s kinda hard to tell.</p>
</section>
<section id="section-2" class="level2">
<h2 class="anchored" data-anchor-id="section-2">24.3</h2>
<p>And we’ll wrap up by looking at the same type of plot for 24.3. You can see the workout description <a href="https://games.crossfit.com/workouts/open/2024/3">here</a></p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">wk3 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> all_workouts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(workout_num <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">time_score =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">period_to_seconds</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ms</span>(score)))</span>
<span id="cb14-4"></span>
<span id="cb14-5">wk3 <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb14-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> time_score, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> workout_place, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> div)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_reverse</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_discrete</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Division"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb14-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb14-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Time (Seconds)"</span>,</span>
<span id="cb14-12">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Workout Place"</span>,</span>
<span id="cb14-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Time to Complete 24.3 by Workout Place"</span>,</span>
<span id="cb14-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Top 100 Overall Men and Women"</span></span>
<span id="cb14-15">  )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/cf-open-24/index_files/figure-html/unnamed-chunk-12-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>Again we see basically the same pattern we saw in 24.1</p>
<p>I’ll probably do something like this again for the Quarterfinals and Semis, so if there’s anything else people would like to see, let me know.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2024,
  author = {Ekholm, Eric},
  title = {Crossfit {Open} ’24 {Analysis}},
  date = {2024-03-19},
  url = {https://www.ericekholm.com/posts/cf-open-24},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2024" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2024. <span>“Crossfit Open ’24 Analysis.”</span> March 19,
2024. <a href="https://www.ericekholm.com/posts/cf-open-24">https://www.ericekholm.com/posts/cf-open-24</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>Crossfit</category>
  <category>EDA</category>
  <guid>https://www.ericekholm.com/posts/cf-open-24/index.html</guid>
  <pubDate>Tue, 19 Mar 2024 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Kaduzs!</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/kaduzs/index.html</link>
  <description><![CDATA[ 




<p>What an amazing Crossfit Games we just had! Other venues have documented all of the storylines leading into the games better than I could, so I’m not going to recap those here. But suffice to say that there was a lot of hype going into the Games, and they lived up to all of that hype and then some.</p>
<p>The point of this blog post is to prolong the Games-weekend high by digging into the available data a little bit and exploring how the weekend shook out. If you’re into <a href="https://www.r-project.org/">R</a> and want to use this data yourself, check out my (work-in-progress) <a href="https://github.com/ekholme/crossfitgames"><code>{crossfitgames}</code></a> package, which will help you retrieve and process data from the Crossfit API.</p>
<p>By default, the code in this post is folded up (you can click to expand it), and I won’t necessarily explain what it does. But again, if you’re into that sort of thing, you can view it.</p>
<p>So let’s get into the data.</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(crossfitgames)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(harrypotter) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># for colors</span></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(hrbrthemes)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(gt)</span>
<span id="cb1-6"></span>
<span id="cb1-7"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_ipsum_rc</span>())</span>
<span id="cb1-8"></span>
<span id="cb1-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">options</span>(</span>
<span id="cb1-10">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ggplot2.discrete.fill =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-11">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span>
<span id="cb1-12">    ),</span>
<span id="cb1-13">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ggplot2.discrete.color =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-14">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span>
<span id="cb1-15">    )</span>
<span id="cb1-16">)</span>
<span id="cb1-17"></span>
<span id="cb1-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get men's data</span></span>
<span id="cb1-19">men_23 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">games_leaderboard</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"men"</span>)</span>
<span id="cb1-20"></span>
<span id="cb1-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># get women's data</span></span>
<span id="cb1-22">women_23 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">games_leaderboard</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2023</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"women"</span>)</span>
<span id="cb1-23"></span>
<span id="cb1-24"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># use extractor functions to get final leaderboard and by-workout results</span></span>
<span id="cb1-25">men_lb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_final_leaderboard</span>(men_23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-26">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Men"</span>)</span>
<span id="cb1-27"></span>
<span id="cb1-28">women_lb <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_final_leaderboard</span>(women_23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-29">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Women"</span>)</span>
<span id="cb1-30"></span>
<span id="cb1-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># by workout results</span></span>
<span id="cb1-32">men_workout_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workout_results</span>(men_23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-33">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Men"</span>)</span>
<span id="cb1-34"></span>
<span id="cb1-35">women_workout_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract_workout_results</span>(women_23) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-36">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Women"</span>)</span>
<span id="cb1-37"></span>
<span id="cb1-38"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># combining men and women dfs</span></span>
<span id="cb1-39">lb_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(men_lb, women_lb)</span>
<span id="cb1-40"></span>
<span id="cb1-41">workout_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(men_workout_df, women_workout_df)</span></code></pre></div>
</details>
</div>
<section id="overall-points" class="level1">
<h1>Overall Points</h1>
<p>This isn’t the most exciting graph, but it makes sense to start by looking at the overall points. If you followed the Games coverage over the weekend, though, you probably know all of this already.</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1">lb_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb2-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> score, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(athlete, score), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> score, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> score <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(division), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb2-8">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Points"</span>,</span>
<span id="cb2-9">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb2-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Points by Athlete, 2023 CFG"</span>,</span>
<span id="cb2-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Only top 20 included"</span></span>
<span id="cb2-12">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb2-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb2-15">    )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/kaduzs/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>One thing that sticks out to me when looking at this is how close places 4-6 were to one another on both the men and the women’s sides. Only 17 points separated Gabi Migala from Alexis Raptis, and only 10 points separated Brent Fikowski from Jonne Koski.</p>
</section>
<section id="number-of-top-3-finishes" class="level1">
<h1>Number of Top 3 Finishes</h1>
<p>Next, let’s take a look at the number of top 3 finishes by all of our athletes. To preserve some space, I’m only going to include those athletes with multiple top 3 finishes.</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">workout_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(workout_place <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(division, athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(athlete, n), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb3-9">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb3-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Number of Top 3 Finishes"</span>,</span>
<span id="cb3-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Number of Top 3 Finishes by Athlete"</span>,</span>
<span id="cb3-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Only athletes with multiple top 3 finishes shown"</span></span>
<span id="cb3-13">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-14">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb3-15">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb3-16">    )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/kaduzs/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>So there are a few things that stand out to me here:</p>
<ul>
<li>Laura absolutely crushed it. She had 6 total top 3 finishes (5 of which were wins), which was 2 more than the next-closest person.</li>
<li>Emma Lawson, even though she ended up in 2nd and led for a decent amount of the competition, only had 2 top 3 finishes.</li>
<li>Justin Medeiros still had 3 top 3 finishes, despite placing 13th overall, which just goes to show how damaging a few <em>very bad</em> finishes can be.</li>
</ul>
</section>
<section id="event-placement-variability" class="level1">
<h1>Event Placement Variability</h1>
<p>Medeiros’s performance provides a nice segue into the next thing we’ll look at – variability. That is, to what extent were athletes consistent in their finishes. The plot below shows athletes <em>average</em> event placement as a point, and it shows the standard error of this estimate (the variability) as bars extending around the point. The wider the bars, the more variable (less consistent) the athlete was in their finishes.</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">top_20_athletes <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> lb_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(athlete)</span>
<span id="cb4-4"></span>
<span id="cb4-5">workout_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> top_20_athletes) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(division, athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb4-9">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">std_err =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(workout_place) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>),</span>
<span id="cb4-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">avg =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(workout_place)</span>
<span id="cb4-11">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(lb_df, athlete, rank)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-14">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> avg, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(athlete, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>rank), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-15">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-16">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_errorbarh</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xmin =</span> avg <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> std_err, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xmax =</span> avg <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> std_err), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">height =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-17">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(division), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-18">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb4-19">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Avg Event Placement"</span>,</span>
<span id="cb4-20">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb4-21">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Average Event Placement and Variability"</span>,</span>
<span id="cb4-22">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Top 20 Athletes Only"</span></span>
<span id="cb4-23">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-24">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_hp_d</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-25">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb4-26">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb4-27">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb4-28">    )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/kaduzs/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>One thing to note here is that the y-axes are sorted by final leaderboard position, whereas the position of the point on the x-axis for each athlete is the average event finish. If all events are weighted equally, these would points cascade downward and outward – the average event placement would increase as you go down the leaderboard. But since the point gaps between places increase as athletes are cut, it’s possible for athletes to have higher average event finishes and yet still finish better on the leaderboard. This is the case for athletes who do better on Saturday and Sunday than they did on Thursday and Friday. We can see this with Gabi Migala and Pat Vellner. Inversely, athletes who do worse on Saturday and Sunday will have lower average event finishes relative to their overall place. We can see this with Chandler Smith and Roman Khrennikov (who didn’t do well on Sunday for obvious reasons).</p>
<p>Beyond that, though, it’s interesting to look at the width of some of these error bars. On the women’s side, we can see how narrow Emma Lawson’s bars were, showing that she consistently finished around 7th place in events. Likewise for Chandler Smith on the men’s side – he seems much more consistent than many of the other top 20, and I’m not sure anyone would say consistency was Chandler’s vibe before this year. It’s also probably worth mentioning that Roman’s performance would (probably) have been much more consistent had he not broken his foot.</p>
<p>Inversely, we can see that some athletes were very inconsistent. For instance, Justin Medeiros, Pat Vellner, and Sam Kwant on the men’s side, and Katrin, Emma Cary, and Olivia Kerstetter on the women’s side. This all makes sense – Justin, Pat, and Sam all had a few terribly events, but as we saw earlier they were also top 3 in multiple events. Likewise for Katrin, Olivia, and Emma.</p>
</section>
<section id="top-10-athlete-finishes-by-event" class="level1">
<h1>Top 10 Athlete Finishes by Event</h1>
<p>The error bars in the above plot tell us something about variability, but we might want to look at the actual event finishes to unpack this variability a bit more. The plot below is a jitter plot, which is basically just a scatter plot with the points jittered a little so they don’t overlap. Since this might be too busy otherwise, I’m going to drop down to just the top 10 athletes in each division here.</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">top_10_athletes <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> lb_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(athlete)</span>
<span id="cb5-4"></span>
<span id="cb5-5">workout_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> top_10_athletes) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(lb_df, athlete, rank)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> workout_place, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(athlete, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>rank), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%%</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_jitter</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">width =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">height =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">6</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(division), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb5-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Workout Place"</span>,</span>
<span id="cb5-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb5-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Variability in Workout Placement"</span>,</span>
<span id="cb5-15">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Top 10 athletes per division"</span></span>
<span id="cb5-16">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-17">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_hp_d</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-18">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb5-19">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb5-20">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb5-21">    )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/kaduzs/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>Right, so here we can get a better sense of athletes by performances. We can see, for example, that both Katrin and Emma Cary <em>mostly</em> did well, but they had a few events where they bombed. Same thing with Vellner (and probably Medeiros too, if he were displayed here). Jelle Hoste’s finishes appear much more smoothly spread – it’s not like he vacillated between dominating events and bombing them. He won 1 (the 5k), finished a few others in the top 10, finished a few others in the top 20, and then had a handful of worse finishes. We see a similar pattern for Fikowski, Paige Powers, Gabi Migala, and Danielle Brandon.</p>
</section>
<section id="placement-for-top-5-athletes-across-the-competition" class="level1">
<h1>Placement for Top 5 Athletes Across the Competition</h1>
<p>As a penultimate glance at the data, let’s track the overall placement of the eventual top 5 finishers across the competition. I’m going to color the eventual winners in gold and the others in gray, which will make the winners’ trajectories easier to follow and make the graph less busy (although it does make it slightly harder to follow the non-winners’ trajectories).</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">top_5_athletes <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> lb_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(athlete)</span>
<span id="cb6-4"></span>
<span id="cb6-5">athlete_cumul <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> workout_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Jason Smith"</span>)  <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(</span>
<span id="cb6-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cum_pts =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cumsum</span>(points)</span>
<span id="cb6-10">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(workout_num, division) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(cum_pts), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.by_group =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cum_rank =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()) </span>
<span id="cb6-15"></span>
<span id="cb6-16">top_5_cumul <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> athlete_cumul <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-17">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> top_5_athletes) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(lb_df, athlete, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">final_rank =</span> rank))</span>
<span id="cb6-20"></span>
<span id="cb6-21">top_5_cumul <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb6-22">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> workout_num, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> cum_rank, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> final_rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">group =</span> athlete)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-23">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-24">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> final_rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-25">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(</span>
<span id="cb6-26">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(</span>
<span id="cb6-27">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace_all</span>(athlete, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^(.*) (.*)$"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">2"</span>),</span>
<span id="cb6-28">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> final_rank,</span>
<span id="cb6-29">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">12.5</span></span>
<span id="cb6-30">        ),</span>
<span id="cb6-31">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb6-32">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-33">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-34">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">limits =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-35">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb6-36">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Workout Number"</span>,</span>
<span id="cb6-37">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Place after Workout"</span>,</span>
<span id="cb6-38">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Placement of the Top 5 Athletes over the Competition"</span></span>
<span id="cb6-39">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-40">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"grey50"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#e59950ff"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-41">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_size_manual</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-42">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_reverse</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-43">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb6-44">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb6-45">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.minor.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>()</span>
<span id="cb6-46">    )</span></code></pre></div>
</details>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/kaduzs/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>The first thing that stands out to me is what we all know – Roman dominated most of the men’s competition, and it was a real bummer that he broke his foot. Another thing that stands out to me is that Dallin Pepper made a huge late charge to go from ~14th after event 7 up to 5th at the end of Sunday, which is a lot of ground to make up in 5 events. We can also see Vellner doing Vellner things – pulling himself out of an early hole to eventually take 2nd.</p>
<p>On the women’s side, we can see that Emma Lawson was basically always at or near the top. Laura had a bit of a roller-coaster ride to get back to the top spot, but she ended up there when it mattered. The other thing on the women’s side that jumps out is that Gabi Migala had sort of a similar trajectory as Dallin Pepper – she dug herself into a bit of a hole over the first half of the competition that she climbed out of by the end.</p>
</section>
<section id="exploring-lauras-event-wins" class="level1">
<h1>Exploring Laura’s Event Wins</h1>
<p>And let’s end with a little bit about Laura Horvath. I’ve been a huge Laura fan since 2018, and I’m so happy for her that she finally got a W this year. I hope this is just the first of many for her, and I truly think she’s a great ambassador for the sport.</p>
<p>All of that said, one of the things you couldn’t help but notice when watching her was how dominant her event wins were. The table below shows her score for each of her event wins as well as the score of the woman who finished in 2nd.</p>
<div class="cell">
<details>
<summary>Code</summary>
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">lh_wins <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">9</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">11</span>)</span>
<span id="cb7-2"></span>
<span id="cb7-3">lh_win_workouts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> workout_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(</span>
<span id="cb7-5">    workout_num <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> lh_wins,</span>
<span id="cb7-6">    workout_place <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>,</span>
<span id="cb7-7">    division <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Women"</span></span>
<span id="cb7-8">    ) </span>
<span id="cb7-9"></span>
<span id="cb7-10">comparison_tbl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> lh_win_workouts <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"workout_place"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"points"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"division"</span>))</span>
<span id="cb7-12"></span>
<span id="cb7-13">lh_tbl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> comparison_tbl <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Laura Horvath"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lh_score =</span> score)</span>
<span id="cb7-16"></span>
<span id="cb7-17">other_tbl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> comparison_tbl <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Laura Horvath"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">other_athlete =</span> athlete, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">other_score =</span> score)</span>
<span id="cb7-20"></span>
<span id="cb7-21">comp_tbl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> lh_tbl <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-22">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(other_tbl, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"workout_num"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-23">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>athlete)</span>
<span id="cb7-24"></span>
<span id="cb7-25">comp_tbl <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-26">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gt</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-27">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cols_label</span>(</span>
<span id="cb7-28">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">workout_num =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Workout #"</span>,</span>
<span id="cb7-29">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lh_score =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Laura Score"</span>,</span>
<span id="cb7-30">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">other_athlete =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2nd Place Finisher"</span>,</span>
<span id="cb7-31">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">other_score =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"2nd Place Score"</span></span>
<span id="cb7-32">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-33">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_style</span>(</span>
<span id="cb7-34">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">style =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb7-35">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cell_text</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">weight =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bold"</span>)</span>
<span id="cb7-36">    ),</span>
<span id="cb7-37">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">locations =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cells_column_labels</span>()</span>
<span id="cb7-38">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span> </span>
<span id="cb7-39">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tab_header</span>(</span>
<span id="cb7-40">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Margin of Victory for Laura's Workout Wins"</span></span>
<span id="cb7-41">  )</span></code></pre></div>
</details>
<div class="cell-output-display">

<div id="fxsgsiznlx" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;">
<style>html {
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
}

:where(#fxsgsiznlx) .gt_table {
  display: table;
  border-collapse: collapse;
  margin-left: auto;
  margin-right: auto;
  color: #333333;
  font-size: 16px;
  font-weight: normal;
  font-style: normal;
  background-color: #FFFFFF;
  width: auto;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #A8A8A8;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #A8A8A8;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_heading {
  background-color: #FFFFFF;
  text-align: center;
  border-bottom-color: #FFFFFF;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_title {
  color: #333333;
  font-size: 125%;
  font-weight: initial;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-color: #FFFFFF;
  border-bottom-width: 0;
}

:where(#fxsgsiznlx) .gt_subtitle {
  color: #333333;
  font-size: 85%;
  font-weight: initial;
  padding-top: 0;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-color: #FFFFFF;
  border-top-width: 0;
}

:where(#fxsgsiznlx) .gt_bottom_border {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_col_headings {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_col_heading {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 6px;
  padding-left: 5px;
  padding-right: 5px;
  overflow-x: hidden;
}

:where(#fxsgsiznlx) .gt_column_spanner_outer {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: normal;
  text-transform: inherit;
  padding-top: 0;
  padding-bottom: 0;
  padding-left: 4px;
  padding-right: 4px;
}

:where(#fxsgsiznlx) .gt_column_spanner_outer:first-child {
  padding-left: 0;
}

:where(#fxsgsiznlx) .gt_column_spanner_outer:last-child {
  padding-right: 0;
}

:where(#fxsgsiznlx) .gt_column_spanner {
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: bottom;
  padding-top: 5px;
  padding-bottom: 5px;
  overflow-x: hidden;
  display: inline-block;
  width: 100%;
}

:where(#fxsgsiznlx) .gt_group_heading {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
}

:where(#fxsgsiznlx) .gt_empty_group_heading {
  padding: 0.5px;
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  vertical-align: middle;
}

:where(#fxsgsiznlx) .gt_from_md > :first-child {
  margin-top: 0;
}

:where(#fxsgsiznlx) .gt_from_md > :last-child {
  margin-bottom: 0;
}

:where(#fxsgsiznlx) .gt_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  margin: 10px;
  border-top-style: solid;
  border-top-width: 1px;
  border-top-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 1px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 1px;
  border-right-color: #D3D3D3;
  vertical-align: middle;
  overflow-x: hidden;
}

:where(#fxsgsiznlx) .gt_stub {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#fxsgsiznlx) .gt_stub_row_group {
  color: #333333;
  background-color: #FFFFFF;
  font-size: 100%;
  font-weight: initial;
  text-transform: inherit;
  border-right-style: solid;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
  padding-left: 5px;
  padding-right: 5px;
  vertical-align: top;
}

:where(#fxsgsiznlx) .gt_row_group_first td {
  border-top-width: 2px;
}

:where(#fxsgsiznlx) .gt_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#fxsgsiznlx) .gt_first_summary_row {
  border-top-style: solid;
  border-top-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_first_summary_row.thick {
  border-top-width: 2px;
}

:where(#fxsgsiznlx) .gt_last_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_grand_summary_row {
  color: #333333;
  background-color: #FFFFFF;
  text-transform: inherit;
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#fxsgsiznlx) .gt_first_grand_summary_row {
  padding-top: 8px;
  padding-bottom: 8px;
  padding-left: 5px;
  padding-right: 5px;
  border-top-style: double;
  border-top-width: 6px;
  border-top-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_striped {
  background-color: rgba(128, 128, 128, 0.05);
}

:where(#fxsgsiznlx) .gt_table_body {
  border-top-style: solid;
  border-top-width: 2px;
  border-top-color: #D3D3D3;
  border-bottom-style: solid;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_footnotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_footnote {
  margin: 0px;
  font-size: 90%;
  padding-left: 4px;
  padding-right: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#fxsgsiznlx) .gt_sourcenotes {
  color: #333333;
  background-color: #FFFFFF;
  border-bottom-style: none;
  border-bottom-width: 2px;
  border-bottom-color: #D3D3D3;
  border-left-style: none;
  border-left-width: 2px;
  border-left-color: #D3D3D3;
  border-right-style: none;
  border-right-width: 2px;
  border-right-color: #D3D3D3;
}

:where(#fxsgsiznlx) .gt_sourcenote {
  font-size: 90%;
  padding-top: 4px;
  padding-bottom: 4px;
  padding-left: 5px;
  padding-right: 5px;
}

:where(#fxsgsiznlx) .gt_left {
  text-align: left;
}

:where(#fxsgsiznlx) .gt_center {
  text-align: center;
}

:where(#fxsgsiznlx) .gt_right {
  text-align: right;
  font-variant-numeric: tabular-nums;
}

:where(#fxsgsiznlx) .gt_font_normal {
  font-weight: normal;
}

:where(#fxsgsiznlx) .gt_font_bold {
  font-weight: bold;
}

:where(#fxsgsiznlx) .gt_font_italic {
  font-style: italic;
}

:where(#fxsgsiznlx) .gt_super {
  font-size: 65%;
}

:where(#fxsgsiznlx) .gt_two_val_uncert {
  display: inline-block;
  line-height: 1em;
  text-align: right;
  font-size: 60%;
  vertical-align: -0.25em;
  margin-left: 0.1em;
}

:where(#fxsgsiznlx) .gt_footnote_marks {
  font-style: italic;
  font-weight: normal;
  font-size: 75%;
  vertical-align: 0.4em;
}

:where(#fxsgsiznlx) .gt_asterisk {
  font-size: 100%;
  vertical-align: 0;
}

:where(#fxsgsiznlx) .gt_slash_mark {
  font-size: 0.7em;
  line-height: 0.7em;
  vertical-align: 0.15em;
}

:where(#fxsgsiznlx) .gt_fraction_numerator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: 0.45em;
}

:where(#fxsgsiznlx) .gt_fraction_denominator {
  font-size: 0.6em;
  line-height: 0.6em;
  vertical-align: -0.05em;
}
</style>
<table class="gt_table">
  <thead class="gt_header">
    <tr>
      <th colspan="4" class="gt_heading gt_title gt_font_normal gt_bottom_border" style="">Margin of Victory for Laura's Workout Wins</th>
    </tr>
    
  </thead>
  <thead class="gt_col_headings">
    <tr>
      <th class="gt_col_heading gt_columns_bottom_border gt_right" rowspan="1" colspan="1" style="font-weight: bold;">Workout #</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" style="font-weight: bold;">Laura Score</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" style="font-weight: bold;">2nd Place Finisher</th>
      <th class="gt_col_heading gt_columns_bottom_border gt_left" rowspan="1" colspan="1" style="font-weight: bold;">2nd Place Score</th>
    </tr>
  </thead>
  <tbody class="gt_table_body">
    <tr><td class="gt_row gt_right">2</td>
<td class="gt_row gt_left">13:50.40</td>
<td class="gt_row gt_left">Arielle Loewen</td>
<td class="gt_row gt_left">14:43.54</td></tr>
    <tr><td class="gt_row gt_right">5</td>
<td class="gt_row gt_left">04:36.18</td>
<td class="gt_row gt_left">Alex Gazan</td>
<td class="gt_row gt_left">05:50.32</td></tr>
    <tr><td class="gt_row gt_right">9</td>
<td class="gt_row gt_left">470 lb</td>
<td class="gt_row gt_left">Christine Kolenbrander</td>
<td class="gt_row gt_left">447 lb</td></tr>
    <tr><td class="gt_row gt_right">10</td>
<td class="gt_row gt_left">08:41.22</td>
<td class="gt_row gt_left">Alex Gazan</td>
<td class="gt_row gt_left">08:59.04</td></tr>
    <tr><td class="gt_row gt_right">11</td>
<td class="gt_row gt_left">08:36.46</td>
<td class="gt_row gt_left">Danielle Brandon</td>
<td class="gt_row gt_left">08:48.42</td></tr>
  </tbody>
  
  
</table>
</div>
</div>
</div>
<p>The first 2 wins are absolutely nutzo. Winning by nearly a minute in a ~14 minute event is crazy enough, but then <em>winning by over a minute in a 5 minute event</em> is bonkers. Her next 3 wins don’t quite reach those same levels of insanity, but none of them are particularly close. In a lifting even where 1-5 lbs separated many athletes, Laura won by 23 lbs. And in events 10 and 11, she still won by 18 and 12 seconds, which is a lot (go ahead and count to 12 slowly and see how long it feels).</p>
<p>All in all, I’m pumped for Laura, pumped for Adler (who I didn’t write much about here, but he does seem like a good dude), and pumped for another great Games weekend.</p>
<p>I may dig into some other aspects of this data more in the future – who knows – but for now this is it.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2023,
  author = {Ekholm, Eric},
  title = {Kaduzs!},
  date = {2023-08-07},
  url = {https://www.ericekholm.com/posts/kaduzs},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2023" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2023. <span>“Kaduzs!”</span> August 7, 2023. <a href="https://www.ericekholm.com/posts/kaduzs">https://www.ericekholm.com/posts/kaduzs</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>Crossfit</category>
  <category>EDA</category>
  <guid>https://www.ericekholm.com/posts/kaduzs/index.html</guid>
  <pubDate>Mon, 07 Aug 2023 04:00:00 GMT</pubDate>
</item>
<item>
  <title>String Matching in Julia</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/string-match-jl/index.html</link>
  <description><![CDATA[ 




<p>Yesterday, I stumbled across this <a href="https://josiahparry.com/posts/2023-04-13-counting-chars/">couple-month old blog post from Josiah Parry</a> walking through creating R, Rust, and C++ functions to compare multiple candidate strings to a reference string (his real-world application for this is geohashing, but in the demo he uses arbitrary strings).</p>
<p>Those languages are cool and all, but what about <em>Julia</em>? The gist of his blog is that Rust is super fast. And since the whole, like raison d’etre of Julia is that it’s fast, I figured I’d write a version of this in Julia as well. I’m still new-ish to Julia, so I’d love if any experts could tell me how to optimize this even further.</p>
<section id="load-packages-and-generate-data" class="level1">
<h1>Load Packages and Generate Data</h1>
<p>For this, we just need the <code>Random</code> package to set a seed and sample our strings as well as the <code>BenchmarkTools</code> package to benchmark the function performance.</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random</span></span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">BenchmarkTools</span></span>
<span id="cb1-3"></span>
<span id="cb1-4"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random</span>.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seed!</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0408</span>)</span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#function to generate some strings</span></span>
<span id="cb1-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_strings</span>(n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">Int</span>)</span>
<span id="cb1-8">    v <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Vector</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">{String}</span>(<span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">undef</span>, n)</span>
<span id="cb1-9"></span>
<span id="cb1-10">    letters <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"abcde"</span></span>
<span id="cb1-11">    numbers <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"12345"</span></span>
<span id="cb1-12"></span>
<span id="cb1-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">∈</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">eachindex</span>(v)</span>
<span id="cb1-14">        x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">randstring</span>(letters, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb1-15">        y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">randstring</span>(numbers, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb1-16">        v[i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> y</span>
<span id="cb1-17">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb1-18">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> v</span>
<span id="cb1-19"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="2">
<pre><code>make_strings (generic function with 1 method)</code></pre>
</div>
</div>
<p>This will make a vector of length <code>n</code> where each element is a 7-character string. In each of these strings, the first 4 characters will be sampled (with replacement) from <code>"abcde"</code>, and the last 3 characters will be sampled (with replacement) from <code>"12345"</code>.</p>
<p>Next we’ll set <code>n</code> to 100,000 and generate our strings. I’ll also make an arbitrary reference string to compare the candidate strings against. Note that we’re not benchmarking any of this stuff – just the comparisions that will come later.</p>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb3-1">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">100_000</span></span>
<span id="cb3-2"></span>
<span id="cb3-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#returns a vector of 100k strings</span></span>
<span id="cb3-4">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_strings</span>(n);</span>
<span id="cb3-5"></span>
<span id="cb3-6">ref <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"aade124"</span> <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#making a reference string to compare against</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>"aade124"</code></pre>
</div>
</div>
</section>
<section id="write-comparison-functions" class="level1">
<h1>Write Comparison Functions</h1>
<p>So now we want to compare each element of <code>x</code> to <code>ref</code>. The goal is count how many characters match until we hit the first characters that don’t match. For example, if we’re comparing <code>"abcd123"</code> to <code>"abde123"</code>, the result would be <code>2</code>, since the first two characters (<code>ab</code> vs <code>ab</code>) match in each, but the third characters (<code>c</code> vs <code>d</code>) don’t.</p>
<p>My first step here is to write a function that compares 1 string to 1 string – that is’, I’m not worrying about the fact that I want to do this for all the of the elements in <code>x</code> yet – I just want to do it for 1 element.</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb5-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">compare_strings</span>(x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">String</span>, y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">String</span>)</span>
<span id="cb5-2">    s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span></span>
<span id="cb5-3">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">∈</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">eachindex</span>(x)</span>
<span id="cb5-4">        x[i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> y[i] ? <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">break</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb5-5">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb5-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> s</span>
<span id="cb5-7"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<pre><code>compare_strings (generic function with 1 method)</code></pre>
</div>
</div>
<p>This will: 1. Create a counter, <code>s</code> (for sum) and set it equal to 0; 2. For each index <code>i</code> (position) in x – recall that x and y will have the same length – compare <code>x[i]</code> and <code>y[i]</code>; 3. If they’re not equal, <code>break</code> the loop and return <code>s</code>; 4. If they are equal, increment <code>s</code> by one and keep going</p>
<p>We can check that this works by using the previous example strings:</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">compare_strings</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"abcd123"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"abde123"</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<pre><code>2</code></pre>
</div>
</div>
<p>Now we want to write a version of this function that accepts a vector of strings and compares each element of that vector to the reference string. The cool thing about Julia is that its <a href="https://docs.julialang.org/en/v1/manual/methods">multiple dispatch</a> feature allows us to define another <code>compare_strings()</code> function that accepts different types of arguments.</p>
<p>So we can write the following and it’s perfectly acceptable and, honestly, way better IMO than how you might have to handle this in <code>R</code> or <code>python</code></p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb9-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">compare_strings</span>(x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">Vector{String}</span>, y<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">String</span>)</span>
<span id="cb9-2">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> [<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">compare_strings</span>(i, y) for i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> x]</span>
<span id="cb9-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>compare_strings (generic function with 2 methods)</code></pre>
</div>
</div>
<p>Notice that the new function has the same name (<code>compare_strings()</code>) but its <code>x</code> argument is a vector of strings rather than a single string. Then, inside the function, we just call our other method that requires <code>x</code> to be a single string. We do these calls inside of a list comprehension to iterate over all of the elements in <code>x</code>.</p>
</section>
<section id="benchmark" class="level1">
<h1>Benchmark</h1>
<p>Now we just run the benchmark to see how our code does</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb11-1"><span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">@benchmark</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">compare_strings</span>(x, ref)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<div class="ansi-escaped-output">
<pre>BenchmarkTools.Trial: 5038 samples with 1 evaluation.
 Range <span class="ansi-bright-black-fg">(</span><span class="ansi-cyan-fg ansi-bold">min</span> … <span class="ansi-magenta-fg">max</span><span class="ansi-bright-black-fg">):  </span><span class="ansi-cyan-fg ansi-bold">880.500 μs</span> … <span class="ansi-magenta-fg">  2.713 ms</span>  <span class="ansi-bright-black-fg">┊</span> GC <span class="ansi-bright-black-fg">(</span>min … max<span class="ansi-bright-black-fg">): </span>0.00% … 26.16%
 Time  <span class="ansi-bright-black-fg">(</span><span class="ansi-blue-fg ansi-bold">median</span><span class="ansi-bright-black-fg">):     </span><span class="ansi-blue-fg ansi-bold">951.200 μs               </span><span class="ansi-bright-black-fg">┊</span> GC <span class="ansi-bright-black-fg">(</span>median<span class="ansi-bright-black-fg">):    </span>0.00%
 Time  <span class="ansi-bright-black-fg">(</span><span class="ansi-green-fg ansi-bold">mean</span> ± <span class="ansi-green-fg">σ</span><span class="ansi-bright-black-fg">):   </span><span class="ansi-green-fg ansi-bold">984.414 μs</span> ± <span class="ansi-green-fg">115.531 μs</span>  <span class="ansi-bright-black-fg">┊</span> GC <span class="ansi-bright-black-fg">(</span>mean ± σ<span class="ansi-bright-black-fg">):  </span>0.87% ±  4.08%
    ▁▆██<span class="ansi-blue-fg">▄</span>▂ <span class="ansi-green-fg"> </span>                                                     
  ▂▅████<span class="ansi-blue-fg">█</span>█▇<span class="ansi-green-fg">▇</span>▆▅▅▄▃▃▃▃▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▁▂▁▁▂▂▂▁▁▂▂▂▂▁▂▂▂▂▂▂▂▂▂▂ ▃
  880 μs<span class="ansi-bright-black-fg">           Histogram: frequency by time</span>         1.58 ms <span class="ansi-bold">&lt;</span>
 Memory estimate<span class="ansi-bright-black-fg">: </span><span class="ansi-yellow-fg">781.30 KiB</span>, allocs estimate<span class="ansi-bright-black-fg">: </span><span class="ansi-yellow-fg">2</span>.</pre>
</div>
</div>
</div>
<p>Obviously this isn’t an apples-to-apples comparison with the code Josiah wrote – we have different machines, different input vectors, he was calling both Rust and C++ from R, etc. But the point remains that Julia is also fast…just in case people hadn’t heard :)</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2023,
  author = {Ekholm, Eric},
  title = {String {Matching} in {Julia}},
  date = {2023-06-09},
  url = {https://www.ericekholm.com/posts/string-match-jl},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2023" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2023. <span>“String Matching in Julia.”</span> June 9,
2023. <a href="https://www.ericekholm.com/posts/string-match-jl">https://www.ericekholm.com/posts/string-match-jl</a>.
</div></div></section></div> ]]></description>
  <guid>https://www.ericekholm.com/posts/string-match-jl/index.html</guid>
  <pubDate>Fri, 09 Jun 2023 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Stranger Strings</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/stranger-strings/index.html</link>
  <description><![CDATA[ 




<p>In my quest to continue learning how to do things in Julia, I wanted to play around with last week’s <a href="https://github.com/rfordatascience/tidytuesday/tree/master/data/2022/2022-10-18">#TidyTuesday dataset</a>, which was the dialogue from every episode of Stranger Things. In data-analysis-dabbling in Julia so far, I’ve more or less avoided strings. This has mostly been because I’ve been focusing on numerical topics (like maximum likelihood estimation), but also because working with strings can be a pain. That said, it felt like time to explore strings in Julia, and this dataset provided a good opportunity to practice.</p>
<p>The goal of this analysis is going to be do something fairly straightforward – I’m going to count the most-frequently used words in the series. But this will require learning some fundamental tools like tokenizing, pivoting/reshaping data, and cleaning text data, among others.</p>
<p>As always, the point of this is to work through my own learning process. I’m certainly not claiming to be an expert, and if you are an expert and can recommend better approaches, I’d love to hear them!</p>
<p>So let’s get to it.</p>
<section id="setup-and-examine-data" class="level1">
<h1>Setup and Examine Data</h1>
<p>First, let’s load the packages we’ll use and read the data in:</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">CSV </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#for reading CSVs</span></span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">DataFrames </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#dataframe utilities</span></span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Chain </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#chain macro, similar to R's pipe</span></span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Languages </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#for stopwords</span></span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">CairoMakie </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#plotting</span></span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Statistics </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#for median</span></span>
<span id="cb1-7"></span>
<span id="cb1-8">st_things_dialogue <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> CSV.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">download</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-10-18/stranger_things_all_dialogue.csv"</span>), DataFrame);</span></code></pre></div>
</div>
<p>And then we can look at the size of the dataframe:</p>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">size</span>(st_things_dialogue)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>(32519, 8)</code></pre>
</div>
</div>
<p>As well as see the first few rows:</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">first</span>(st_things_dialogue, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<div class="data-frame"><p>3 rows × 8 columns (omitted printing of 1 columns)</p>
<table class="data-frame table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">season</th>
<th data-quarto-table-cell-role="th">episode</th>
<th data-quarto-table-cell-role="th">line</th>
<th data-quarto-table-cell-role="th">raw_text</th>
<th data-quarto-table-cell-role="th">stage_direction</th>
<th data-quarto-table-cell-role="th">dialogue</th>
<th data-quarto-table-cell-role="th">start_time</th>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="String">String</th>
<th data-quarto-table-cell-role="th" title="Union{Missing, String}">String?</th>
<th data-quarto-table-cell-role="th" title="Union{Missing, String}">String?</th>
<th data-quarto-table-cell-role="th" title="Dates.Time">Time</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>[crickets chirping]</td>
<td>[crickets chirping]</td>
<td><em>missing</em></td>
<td>00:00:07</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">2</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>[alarm blaring]</td>
<td>[alarm blaring]</td>
<td><em>missing</em></td>
<td>00:00:49</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">3</td>
<td>1</td>
<td>1</td>
<td>3</td>
<td>[panting]</td>
<td>[panting]</td>
<td><em>missing</em></td>
<td>00:00:52</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<p>So we can see that dialogue might be <code>missing</code> if the line is just stage directions. For our purposes here, let’s just use the lines with dialogue. To do this, we can use the <code>dropmissing()</code> function and then pass in the Dataframe and the column we want to only keep complete cases of, which is <code>:dialogue</code> in this case. Note that Julia uses <code>:</code> to denote symbols.</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb5-1">dialogue_complete <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dropmissing</span>(st_things_dialogue, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<div class="data-frame"><p>26,435 rows × 8 columns (omitted printing of 4 columns)</p>
<table class="data-frame table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">season</th>
<th data-quarto-table-cell-role="th">episode</th>
<th data-quarto-table-cell-role="th">line</th>
<th data-quarto-table-cell-role="th">raw_text</th>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="String">String</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>[Mike] Something is coming. Something hungry for blood.</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">2</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>A shadow grows on the wall behind you, swallowing you in darkness.</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">3</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>-It is almost here. -What is it?</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">4</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>What if it's the Demogorgon?</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">5</td>
<td>1</td>
<td>1</td>
<td>13</td>
<td>Oh, Jesus, we're so screwed if it's the Demogorgon.</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">6</td>
<td>1</td>
<td>1</td>
<td>14</td>
<td>It's not the Demogorgon.</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">7</td>
<td>1</td>
<td>1</td>
<td>15</td>
<td>An army of troglodytes charge into the chamber!</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">8</td>
<td>1</td>
<td>1</td>
<td>16</td>
<td>-Troglodytes? -Told ya. [chuckling]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">9</td>
<td>1</td>
<td>1</td>
<td>17</td>
<td>-[snorts] -[all chuckling]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">10</td>
<td>1</td>
<td>1</td>
<td>18</td>
<td>[softly] Wait a minute.</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">11</td>
<td>1</td>
<td>1</td>
<td>19</td>
<td>Did you hear that?</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">12</td>
<td>1</td>
<td>1</td>
<td>20</td>
<td>That... that sound?</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">13</td>
<td>1</td>
<td>1</td>
<td>21</td>
<td>Boom... boom...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">14</td>
<td>1</td>
<td>1</td>
<td>22</td>
<td>-[yells] Boom! -[slams table]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">15</td>
<td>1</td>
<td>1</td>
<td>23</td>
<td>That didn't come from the troglodytes. No, that...</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">16</td>
<td>1</td>
<td>1</td>
<td>24</td>
<td>That came from something else.</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">17</td>
<td>1</td>
<td>1</td>
<td>25</td>
<td>-The Demogorgon! -[all groaning]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">18</td>
<td>1</td>
<td>1</td>
<td>26</td>
<td>-We're in deep shit. -Will, your action!</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">19</td>
<td>1</td>
<td>1</td>
<td>27</td>
<td>-I don't know! -Fireball him!</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">20</td>
<td>1</td>
<td>1</td>
<td>28</td>
<td>I'd have to roll a 13 or higher!</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">21</td>
<td>1</td>
<td>1</td>
<td>29</td>
<td>Too risky. Cast a protection spell.</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">22</td>
<td>1</td>
<td>1</td>
<td>30</td>
<td>-Don't be a pussy. Fireball him! -Cast Protection.</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">23</td>
<td>1</td>
<td>1</td>
<td>31</td>
<td>The Demogorgon is tired of your silly human bickering!</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">24</td>
<td>1</td>
<td>1</td>
<td>32</td>
<td>It stomps towards you.</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">25</td>
<td>1</td>
<td>1</td>
<td>33</td>
<td>-Boom! -Fireball him!</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">26</td>
<td>1</td>
<td>1</td>
<td>34</td>
<td>-Another stomp, boom! -Cast Protection.</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">27</td>
<td>1</td>
<td>1</td>
<td>35</td>
<td>-He roars in anger! -[all clamoring]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">28</td>
<td>1</td>
<td>1</td>
<td>36</td>
<td>-Fireball! -[die clattering]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">29</td>
<td>1</td>
<td>1</td>
<td>37</td>
<td>-Oh, shit! -[Lucas] Where'd it go?</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">30</td>
<td>1</td>
<td>1</td>
<td>38</td>
<td>[Lucas] Where is it? [Will] I don't know!</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
</section>
<section id="reshape-data" class="level1">
<h1>Reshape Data</h1>
<p>Cool, so this will get us just rows that actually have dialogue. But what we can see is that each row is a <em>line</em> of dialogue, whereas we actually want to tokenize this so that each row is a word.</p>
<p>To do this, we can use the <code>split</code> function, which lets us split a string at whatever delimiter we provide. In this case, that’s a space. For example:</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">split</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a man a plan a canal panama"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>7-element Vector{SubString{String}}:
 "a"
 "man"
 "a"
 "plan"
 "a"
 "canal"
 "panama"</code></pre>
</div>
</div>
<p>Or, using our actual data:</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">split</span>(dialogue_complete.dialogue[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">" "</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<pre><code>7-element Vector{SubString{String}}:
 "Something"
 "is"
 "coming."
 "Something"
 "hungry"
 "for"
 "blood."</code></pre>
</div>
</div>
<p>It’s worth noting that, by default, <code>split()</code> will split on spaces, so we can just call the default function without the final argument as well:</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb10-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">split</span>(dialogue_complete.dialogue[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<pre><code>7-element Vector{SubString{String}}:
 "Something"
 "is"
 "coming."
 "Something"
 "hungry"
 "for"
 "blood."</code></pre>
</div>
</div>
<p>So this gives us the first step of what we want to do in tokenizing the dialogue.</p>
<p>Let’s start putting this into a <code>chain</code>, which is similar to R’s pipe concept. And apparently there are several different chains/pipes in Julia, but the <code>Chain.jl</code> package seems reasonable to me so let’s just use that one.</p>
<p>We can begin a chain operation with the <code>@chain</code> macro, then pass the dataframe name and a <code>begin</code> keyword. We then do all of our operations, then pass the <code>end</code> keyword. Like <code>tidyverse</code> functions in R, most of Julia’s <code>DataFrame</code> functions expect a dataframe as the first argument, which makes them work well with chains.</p>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb12-1">df_split <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">@chain</span> dialogue_complete <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">begin</span></span>
<span id="cb12-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb12-3">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>season,</span>
<span id="cb12-4">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>episode,</span>
<span id="cb12-5">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>line,</span>
<span id="cb12-6">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ByRow</span>(split) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_split</span>
<span id="cb12-7">    )</span>
<span id="cb12-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<div class="data-frame"><p>26,435 rows × 4 columns</p>
<table class="data-frame table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">season</th>
<th data-quarto-table-cell-role="th">episode</th>
<th data-quarto-table-cell-role="th">line</th>
<th data-quarto-table-cell-role="th">dialogue_split</th>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Vector{SubString{String}}">Array…</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>["Something", "is", "coming.", "Something", "hungry", "for", "blood."]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">2</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>["A", "shadow", "grows", "on", "the", "wall", "behind", "you,", "swallowing", "you", "in", "darkness."]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">3</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>["It", "is", "almost", "here.", "What", "is", "it?"]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">4</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>["What", "if", "it's", "the", "Demogorgon?"]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">5</td>
<td>1</td>
<td>1</td>
<td>13</td>
<td>["Oh,", "Jesus,", "we're", "so", "screwed", "if", "it's", "the", "Demogorgon."]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">6</td>
<td>1</td>
<td>1</td>
<td>14</td>
<td>["It's", "not", "the", "Demogorgon."]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">7</td>
<td>1</td>
<td>1</td>
<td>15</td>
<td>["An", "army", "of", "troglodytes", "charge", "into", "the", "chamber!"]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">8</td>
<td>1</td>
<td>1</td>
<td>16</td>
<td>["Troglodytes?", "Told", "ya."]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">9</td>
<td>1</td>
<td>1</td>
<td>17</td>
<td>[]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">10</td>
<td>1</td>
<td>1</td>
<td>18</td>
<td>["Wait", "a", "minute."]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">11</td>
<td>1</td>
<td>1</td>
<td>19</td>
<td>["Did", "you", "hear", "that?"]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">12</td>
<td>1</td>
<td>1</td>
<td>20</td>
<td>["That...", "that", "sound?"]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">13</td>
<td>1</td>
<td>1</td>
<td>21</td>
<td>["Boom...", "boom..."]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">14</td>
<td>1</td>
<td>1</td>
<td>22</td>
<td>["Boom!"]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">15</td>
<td>1</td>
<td>1</td>
<td>23</td>
<td>["That", "didn't", "come", "from", "the", "troglodytes.", "No,", "that..."]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">16</td>
<td>1</td>
<td>1</td>
<td>24</td>
<td>["That", "came", "from", "something", "else."]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">17</td>
<td>1</td>
<td>1</td>
<td>25</td>
<td>["The", "Demogorgon!"]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">18</td>
<td>1</td>
<td>1</td>
<td>26</td>
<td>["We're", "in", "deep", "shit.", "Will,", "your", "action!"]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">19</td>
<td>1</td>
<td>1</td>
<td>27</td>
<td>["I", "don't", "know!", "Fireball", "him!"]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">20</td>
<td>1</td>
<td>1</td>
<td>28</td>
<td>["I'd", "have", "to", "roll", "a", "13", "or", "higher!"]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">21</td>
<td>1</td>
<td>1</td>
<td>29</td>
<td>["Too", "risky.", "Cast", "a", "protection", "spell."]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">22</td>
<td>1</td>
<td>1</td>
<td>30</td>
<td>["Don't", "be", "a", "pussy.", "Fireball", "him!", "Cast", "Protection."]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">23</td>
<td>1</td>
<td>1</td>
<td>31</td>
<td>["The", "Demogorgon", "is", "tired", "of", "your", "silly", "human", "bickering!"]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">24</td>
<td>1</td>
<td>1</td>
<td>32</td>
<td>["It", "stomps", "towards", "you."]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">25</td>
<td>1</td>
<td>1</td>
<td>33</td>
<td>["Boom!", "Fireball", "him!"]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">26</td>
<td>1</td>
<td>1</td>
<td>34</td>
<td>["Another", "stomp,", "boom!", "Cast", "Protection."]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">27</td>
<td>1</td>
<td>1</td>
<td>35</td>
<td>["He", "roars", "in", "anger!"]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">28</td>
<td>1</td>
<td>1</td>
<td>36</td>
<td>["Fireball!"]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">29</td>
<td>1</td>
<td>1</td>
<td>37</td>
<td>["Oh,", "shit!", "Where'd", "it", "go?"]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">30</td>
<td>1</td>
<td>1</td>
<td>38</td>
<td>["Where", "is", "it?", "I", "don't", "know!"]</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<p>Technically we don’t <em>need</em> to chain anything above since we’re just doing one operation (<code>select()</code>) right now, but we’ll add more soon.</p>
<p>One thing you might notice in the final line within <code>select()</code> is Julia’s notation for “doing things” is <code>input_col =&gt; function =&gt; output_col</code>. In the case above, we’re supplying an anonymous function (which is that x -&gt; fun(x, …)) syntax, and wrapping that in a special <code>ByRow()</code> function that facilitates broadcasting in dataframe operations.</p>
<p>All that said, the above doesn’t quite give us what we want if we look at the first two rows of output:</p>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">first</span>(df_split, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<div class="data-frame"><p>2 rows × 4 columns</p>
<table class="data-frame table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">season</th>
<th data-quarto-table-cell-role="th">episode</th>
<th data-quarto-table-cell-role="th">line</th>
<th data-quarto-table-cell-role="th">dialogue_split</th>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Vector{SubString{String}}">Array…</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>["Something", "is", "coming.", "Something", "hungry", "for", "blood."]</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">2</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>["A", "shadow", "grows", "on", "the", "wall", "behind", "you,", "swallowing", "you", "in", "darkness."]</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<p>Our <code>dialogue_split</code> column is a vector of vectors. To get around this, we want to flatten the column so that each row contains a single word. The nice thing about our chain operation above is that we can just plunk the <code>flatten()</code> function right on the end to do this:</p>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb14-1">df_split <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">@chain</span> dialogue_complete <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">begin</span></span>
<span id="cb14-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb14-3">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>season,</span>
<span id="cb14-4">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>episode,</span>
<span id="cb14-5">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>line,</span>
<span id="cb14-6">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ByRow</span>(split) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_split</span>
<span id="cb14-7">    )</span>
<span id="cb14-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">flatten</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_split)</span>
<span id="cb14-9"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<div class="data-frame"><p>145,243 rows × 4 columns</p>
<table class="data-frame table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">season</th>
<th data-quarto-table-cell-role="th">episode</th>
<th data-quarto-table-cell-role="th">line</th>
<th data-quarto-table-cell-role="th">dialogue_split</th>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="SubString{String}">SubStrin…</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>Something</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">2</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>is</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">3</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>coming.</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">4</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>Something</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">5</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>hungry</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">6</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>for</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">7</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>blood.</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">8</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>A</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">9</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>shadow</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">10</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>grows</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">11</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>on</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">12</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>the</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">13</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>wall</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">14</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>behind</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">15</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>you,</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">16</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>swallowing</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">17</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>you</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">18</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>in</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">19</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>darkness.</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">20</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>It</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">21</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>is</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">22</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>almost</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">23</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>here.</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">24</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>What</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">25</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>is</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">26</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>it?</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">27</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>What</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">28</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>if</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">29</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>it's</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">30</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>the</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<p>Better! Now let’s check out the first 10 elements of our dialogue split column:</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">show</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">first</span>(df_split.<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_split, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>SubString{String}["Something", "is", "coming.", "Something", "hungry", "for", "blood.", "A", "shadow", "grows"]</code></pre>
</div>
</div>
</section>
<section id="clean-text" class="level1">
<h1>Clean Text</h1>
<p>So, it’s not ideal that we have punctuation in here. We don’t want, for instance “blood” to be considered a different word than “blood.” when we count words later. Same deal for uppercase and lowercase letters – we want “something” to be the same as “Something”. So we need to strip punctuation and lowercase everything.</p>
<p>First, we can write a small little function to strip punctuation.</p>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb17-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">strip_punc</span>(x)</span>
<span id="cb17-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">strip</span>(x, [<span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">','</span>, <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">';'</span>, <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">'.'</span>, <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">'?'</span>, <span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">'!'</span>])</span>
<span id="cb17-3"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="13">
<pre><code>strip_punc (generic function with 1 method)</code></pre>
</div>
</div>
<p>And Julia already has a <code>lowercase()</code> function built in. Now, let’s jam these on the end of the chain we already have:</p>
<div class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb19-1">df_split <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">@chain</span> dialogue_complete <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">begin</span></span>
<span id="cb19-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(</span>
<span id="cb19-3">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>season,</span>
<span id="cb19-4">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>episode,</span>
<span id="cb19-5">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>line,</span>
<span id="cb19-6">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ByRow</span>(split) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_split</span>
<span id="cb19-7">    )</span>
<span id="cb19-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">flatten</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_split)</span>
<span id="cb19-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">transform</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_split <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ByRow</span>(lowercase) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_split)</span>
<span id="cb19-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">transform</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_split <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ByRow</span>(strip_punc) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_stripped)</span>
<span id="cb19-11"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="14">
<div class="data-frame"><p>145,243 rows × 5 columns</p>
<table class="data-frame table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">season</th>
<th data-quarto-table-cell-role="th">episode</th>
<th data-quarto-table-cell-role="th">line</th>
<th data-quarto-table-cell-role="th">dialogue_split</th>
<th data-quarto-table-cell-role="th">dialogue_stripped</th>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="String">String</th>
<th data-quarto-table-cell-role="th" title="SubString{String}">SubStrin…</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>something</td>
<td>something</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">2</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>is</td>
<td>is</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">3</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>coming.</td>
<td>coming</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">4</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>something</td>
<td>something</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">5</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>hungry</td>
<td>hungry</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">6</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>for</td>
<td>for</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">7</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>blood.</td>
<td>blood</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">8</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>a</td>
<td>a</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">9</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>shadow</td>
<td>shadow</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">10</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>grows</td>
<td>grows</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">11</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>on</td>
<td>on</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">12</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>the</td>
<td>the</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">13</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>wall</td>
<td>wall</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">14</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>behind</td>
<td>behind</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">15</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>you,</td>
<td>you</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">16</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>swallowing</td>
<td>swallowing</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">17</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>you</td>
<td>you</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">18</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>in</td>
<td>in</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">19</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>darkness.</td>
<td>darkness</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">20</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>it</td>
<td>it</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">21</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>is</td>
<td>is</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">22</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>almost</td>
<td>almost</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">23</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>here.</td>
<td>here</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">24</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>what</td>
<td>what</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">25</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>is</td>
<td>is</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">26</td>
<td>1</td>
<td>1</td>
<td>11</td>
<td>it?</td>
<td>it</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">27</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>what</td>
<td>what</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">28</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>if</td>
<td>if</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">29</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>it's</td>
<td>it's</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">30</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>the</td>
<td>the</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<p>Confirming that this worked:</p>
<div class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb20-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">show</span>(df_split.<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_stripped[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>])</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>SubString{String}["something", "is", "coming", "something", "hungry", "for", "blood", "a", "shadow", "grows"]</code></pre>
</div>
</div>
<p>Splendid.</p>
</section>
<section id="remove-stop-words" class="level1">
<h1>Remove Stop Words</h1>
<p>The next step is to get rid of stop words, because we don’t really care about counting those. There’s a list of stopwords in the <code>Languages.jl</code> package that we’ll use</p>
<div class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb22-1">stops <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stopwords</span>(Languages.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">English</span>())</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="16">
<pre><code>488-element Vector{String}:
 "a"
 "about"
 "above"
 "across"
 "after"
 "again"
 "against"
 "all"
 "almost"
 "alone"
 "along"
 "already"
 "also"
 ⋮
 "you'd"
 "you'll"
 "young"
 "younger"
 "youngest"
 "your"
 "you're"
 "yours"
 "yourself"
 "yourselves"
 "you've"
 "z"</code></pre>
</div>
</div>
<p>Swell. Now that we have this, we can subset (filter in R terms) our dataset to include only rows with words not in the list of stop words.</p>
<div class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb24-1">dialogue_no_stops <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">subset</span>(</span>
<span id="cb24-2">    df_split,</span>
<span id="cb24-3">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_stripped <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> .!<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">in</span>.(x, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Ref</span>(stops))</span>
<span id="cb24-4">    )</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="17">
<div class="data-frame"><p>50,812 rows × 5 columns</p>
<table class="data-frame table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">season</th>
<th data-quarto-table-cell-role="th">episode</th>
<th data-quarto-table-cell-role="th">line</th>
<th data-quarto-table-cell-role="th">dialogue_split</th>
<th data-quarto-table-cell-role="th">dialogue_stripped</th>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
<th data-quarto-table-cell-role="th" title="String">String</th>
<th data-quarto-table-cell-role="th" title="SubString{String}">SubStrin…</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">1</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>coming.</td>
<td>coming</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">2</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>hungry</td>
<td>hungry</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">3</td>
<td>1</td>
<td>1</td>
<td>9</td>
<td>blood.</td>
<td>blood</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">4</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>shadow</td>
<td>shadow</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">5</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>grows</td>
<td>grows</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">6</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>wall</td>
<td>wall</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">7</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>swallowing</td>
<td>swallowing</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">8</td>
<td>1</td>
<td>1</td>
<td>10</td>
<td>darkness.</td>
<td>darkness</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">9</td>
<td>1</td>
<td>1</td>
<td>12</td>
<td>demogorgon?</td>
<td>demogorgon</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">10</td>
<td>1</td>
<td>1</td>
<td>13</td>
<td>oh,</td>
<td>oh</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">11</td>
<td>1</td>
<td>1</td>
<td>13</td>
<td>jesus,</td>
<td>jesus</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">12</td>
<td>1</td>
<td>1</td>
<td>13</td>
<td>screwed</td>
<td>screwed</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">13</td>
<td>1</td>
<td>1</td>
<td>13</td>
<td>demogorgon.</td>
<td>demogorgon</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">14</td>
<td>1</td>
<td>1</td>
<td>14</td>
<td>demogorgon.</td>
<td>demogorgon</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">15</td>
<td>1</td>
<td>1</td>
<td>15</td>
<td>army</td>
<td>army</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">16</td>
<td>1</td>
<td>1</td>
<td>15</td>
<td>troglodytes</td>
<td>troglodytes</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">17</td>
<td>1</td>
<td>1</td>
<td>15</td>
<td>charge</td>
<td>charge</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">18</td>
<td>1</td>
<td>1</td>
<td>15</td>
<td>chamber!</td>
<td>chamber</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">19</td>
<td>1</td>
<td>1</td>
<td>16</td>
<td>troglodytes?</td>
<td>troglodytes</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">20</td>
<td>1</td>
<td>1</td>
<td>16</td>
<td>told</td>
<td>told</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">21</td>
<td>1</td>
<td>1</td>
<td>16</td>
<td>ya.</td>
<td>ya</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">22</td>
<td>1</td>
<td>1</td>
<td>18</td>
<td>wait</td>
<td>wait</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">23</td>
<td>1</td>
<td>1</td>
<td>18</td>
<td>minute.</td>
<td>minute</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">24</td>
<td>1</td>
<td>1</td>
<td>19</td>
<td>hear</td>
<td>hear</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">25</td>
<td>1</td>
<td>1</td>
<td>20</td>
<td>sound?</td>
<td>sound</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">26</td>
<td>1</td>
<td>1</td>
<td>21</td>
<td>boom...</td>
<td>boom</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">27</td>
<td>1</td>
<td>1</td>
<td>21</td>
<td>boom...</td>
<td>boom</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">28</td>
<td>1</td>
<td>1</td>
<td>22</td>
<td>boom!</td>
<td>boom</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">29</td>
<td>1</td>
<td>1</td>
<td>23</td>
<td>troglodytes.</td>
<td>troglodytes</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">30</td>
<td>1</td>
<td>1</td>
<td>24</td>
<td>else.</td>
<td>else</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
<td>⋮</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<p>If you’re not familiar with Julia, the <code>.</code> is a way to broadcast/vectorize operations, which mostly aren’t vectorized by default. And to be completely honest, I’m not sure why I need to wrap our stopwords in <code>Ref()</code>, but the internet says I do and I assume this is some Julia equivalent of, like, tidyeval that I haven’t gotten around to understanding yet. But regardless, this does what we want!</p>
</section>
<section id="getting-the-top-20-words" class="level1">
<h1>Getting the Top 20 Words</h1>
<p>We’re almost there, fam. We’ve got a dataset in the format we want it in, and we’ve done some light cleaning. Now, let’s count how often each word is used and select the top 20 most common. Again, we’re going to chain some operations together.</p>
<div class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb25-1">top_20 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">@chain</span> dialogue_no_stops <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">begin</span></span>
<span id="cb25-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">groupby</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_stripped)</span>
<span id="cb25-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">combine</span>(nrow <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=&gt;</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>count)</span>
<span id="cb25-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sort</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>count, rev <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">true</span>)</span>
<span id="cb25-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">first</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>)</span>
<span id="cb25-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="18">
<div class="data-frame"><p>20 rows × 2 columns</p>
<table class="data-frame table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">dialogue_stripped</th>
<th data-quarto-table-cell-role="th">count</th>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th" title="SubString{String}">SubStrin…</th>
<th data-quarto-table-cell-role="th" title="Int64">Int64</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">1</td>
<td>♪</td>
<td>1386</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">2</td>
<td>yeah</td>
<td>1106</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">3</td>
<td>okay</td>
<td>960</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">4</td>
<td>oh</td>
<td>670</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">5</td>
<td>hey</td>
<td>631</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">6</td>
<td>shit</td>
<td>456</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">7</td>
<td>gonna</td>
<td>427</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">8</td>
<td>uh</td>
<td>396</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">9</td>
<td>mean</td>
<td>310</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">10</td>
<td>time</td>
<td>284</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">11</td>
<td>sorry</td>
<td>281</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">12</td>
<td>look</td>
<td>242</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">13</td>
<td>tell</td>
<td>240</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">14</td>
<td>mike</td>
<td>234</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">15</td>
<td>stop</td>
<td>227</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">16</td>
<td>maybe</td>
<td>225</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">17</td>
<td>please</td>
<td>224</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">18</td>
<td>max</td>
<td>213</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">19</td>
<td>god</td>
<td>211</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">20</td>
<td>little</td>
<td>211</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<p>I’m actually not going to explain the above because I think it’s pretty intuitive if you’ve been following along so far and are familiar with either R or Python functions (the function names here are pretty descriptive, I think).</p>
</section>
<section id="plotting" class="level1">
<h1>Plotting</h1>
<p>Ok, so, as much as I like Julia so far, plotting does feel difficult. I’ve mostly used <code>Makie</code> and its counterparts, and I think I’m almost starting to get a handle on them, but they definitely don’t feel as intuitive to me as, say, <code>ggplot2</code>.</p>
<p>Full transparency – making this little plot took me more time than I wanted it to, and it’s entirely due to labeling the y-axis ticks. So, uh, here’s the code to make the plot, and just know that I don’t fully understand why some options accept vectors while others want tuples.</p>
<div class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb26-1">lbls <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Rank "</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reverse</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">string</span>.(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>))</span>
<span id="cb26-2"></span>
<span id="cb26-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">barplot</span>(</span>
<span id="cb26-4">    <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nrow</span>(top_20),</span>
<span id="cb26-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reverse</span>(top_20.count),</span>
<span id="cb26-6">    direction <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>x,</span>
<span id="cb26-7">    bar_labels <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">reverse</span>(top_20.<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>dialogue_stripped),</span>
<span id="cb26-8">    flip_labels_at <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">median</span>(top_20.count),</span>
<span id="cb26-9">    axis <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb26-10">        yticks <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, lbls),</span>
<span id="cb26-11">        title <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Most Common Words in Stranger Things"</span>,</span>
<span id="cb26-12">        xlabel <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Times Said"</span></span>
<span id="cb26-13">    ),</span>
<span id="cb26-14">)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="19">
<p><img src="https://www.ericekholm.com/posts/stranger-strings/index_files/figure-html/cell-19-output-1.svg" class="img-fluid"></p>
</div>
</div>
<p>Et voila – we’ve taken a dataframe with dialogue, tokenized it, cleaned it a little bit, and found the top 20 most common words. We could modify our list of stop words a little if we wanted to get rid of things like “oh”, “okay”, “uh”, and whatnot, but I’m not going to bother with that here. I hope you learned as much from reading this as I did from writing it!</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2022,
  author = {Ekholm, Eric},
  title = {Stranger {Strings}},
  date = {2022-10-26},
  url = {https://www.ericekholm.com/posts/stranger-strings},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2022" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2022. <span>“Stranger Strings.”</span> October 26, 2022.
<a href="https://www.ericekholm.com/posts/stranger-strings">https://www.ericekholm.com/posts/stranger-strings</a>.
</div></div></section></div> ]]></description>
  <category>Julia</category>
  <category>Text Analysis</category>
  <category>TidyTuesday</category>
  <guid>https://www.ericekholm.com/posts/stranger-strings/index.html</guid>
  <pubDate>Wed, 26 Oct 2022 04:00:00 GMT</pubDate>
</item>
<item>
  <title>MLE Learning Out Loud 2: Logistic Regression</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/mle-logreg-julia/index.html</link>
  <description><![CDATA[ 




<p>In a <a href="https://www.ericekholm.com/posts/mle-learning-julia/">previous post</a>, I did some “learning out loud” by practicing estimating a few models via maximum likelihood by hand. In this short blog, I figured I could extend this learning by applying what I learned previously to logistic regression.</p>
<p>As a reminder, the point of these “learning out loud” posts is to give myself a medium to work through concepts. Hopefully these metacognitive exercises will benefits others, too. The concepts I’m covering here are things that I’m either learning anew or brushing back up on after not using for a while. But either way, I’m not trying to portray myself as an expert. If you are an expert and you notice I’m doing something wrong, I’d love to hear from you!</p>
<section id="stating-the-problem" class="level1">
<h1>Stating the Problem</h1>
<p>So, what I want to do here is get point estimates for the coefficients in a logistic regression model “by hand” (or mostly by hand). I’m going to be doing this in Julia, because I’m also interested in getting better at Julia stuff, but obviously the concepts are the same across any programming language.</p>
</section>
<section id="setup" class="level1">
<h1>Setup</h1>
<p>First, we’ll load the libraries we’re using here and set a seed:</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">GLM </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#to check my work against</span></span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Distributions </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#for the Bernoulli distribution</span></span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#to set a seed</span></span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Optim </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#to do the acutal optimizing</span></span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Statistics </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#mean and std</span></span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">RDatasets </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#to get data</span></span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random</span>.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seed!</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0408</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="2">
<pre><code>TaskLocalRNG()</code></pre>
</div>
</div>
</section>
<section id="load-and-preprocess-data" class="level1">
<h1>Load and Preprocess Data</h1>
<p>Next, we’ll load in some data and do some light preprocessing. We’ll use the <code>Default</code> data from the <code>RDatasets</code> package, which presents features describing a given person as well as a binary indicator of whether they defaulted on a credit card payment.</p>
<p>After loading the data, we’ll pull out the default variable, dummy code it, and then assign it to a vector called <code>y</code>. We’ll also select just the “balance” and “income” columns of the data and assign those to <code>X</code>. There are other columns we could use as predictors, but that’s not really the point here.</p>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb3-1">data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> RDatasets.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dataset</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ISLR"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Default"</span>)</span>
<span id="cb3-2"></span>
<span id="cb3-3">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [r.Default <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Yes"</span> ? <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> for r <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">eachrow</span>(data)]</span>
<span id="cb3-4"></span>
<span id="cb3-5">X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> data[<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>, [<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>Balance, <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>Income]]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<div class="data-frame"><p>10,000 rows × 2 columns</p>
<table class="data-frame table table-sm table-striped small" data-quarto-postprocess="true">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th">Balance</th>
<th data-quarto-table-cell-role="th">Income</th>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th"></th>
<th data-quarto-table-cell-role="th" title="Float64">Float64</th>
<th data-quarto-table-cell-role="th" title="Float64">Float64</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td data-quarto-table-cell-role="th">1</td>
<td>729.526</td>
<td>44361.6</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">2</td>
<td>817.18</td>
<td>12106.1</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">3</td>
<td>1073.55</td>
<td>31767.1</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">4</td>
<td>529.251</td>
<td>35704.5</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">5</td>
<td>785.656</td>
<td>38463.5</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">6</td>
<td>919.589</td>
<td>7491.56</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">7</td>
<td>825.513</td>
<td>24905.2</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">8</td>
<td>808.668</td>
<td>17600.5</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">9</td>
<td>1161.06</td>
<td>37468.5</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">10</td>
<td>0.0</td>
<td>29275.3</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">11</td>
<td>0.0</td>
<td>21871.1</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">12</td>
<td>1220.58</td>
<td>13268.6</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">13</td>
<td>237.045</td>
<td>28251.7</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">14</td>
<td>606.742</td>
<td>44994.6</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">15</td>
<td>1112.97</td>
<td>23810.2</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">16</td>
<td>286.233</td>
<td>45042.4</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">17</td>
<td>0.0</td>
<td>50265.3</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">18</td>
<td>527.54</td>
<td>17636.5</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">19</td>
<td>485.937</td>
<td>61566.1</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">20</td>
<td>1095.07</td>
<td>26464.6</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">21</td>
<td>228.953</td>
<td>50500.2</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">22</td>
<td>954.262</td>
<td>32457.5</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">23</td>
<td>1055.96</td>
<td>51317.9</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">24</td>
<td>641.984</td>
<td>30466.1</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">25</td>
<td>773.212</td>
<td>34353.3</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">26</td>
<td>855.009</td>
<td>25211.3</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">27</td>
<td>643.0</td>
<td>41473.5</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">28</td>
<td>1454.86</td>
<td>32189.1</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">29</td>
<td>615.704</td>
<td>39376.4</td>
</tr>
<tr class="even">
<td data-quarto-table-cell-role="th">30</td>
<td>1119.57</td>
<td>16556.1</td>
</tr>
<tr class="odd">
<td data-quarto-table-cell-role="th">⋮</td>
<td>⋮</td>
<td>⋮</td>
</tr>
</tbody>
</table>
</div>
</div>
</div>
<p>Next, we’ll z_score the predictor variables, convert them to a matrix, and append a column vector of ones to the matrix (so we can estimate the intercept). The <code>mapcols()</code> function from <code>DataFrames.jl</code> will apply the z_score function to all of the columns in X, which is actually only 2 in this case.</p>
<p>First we’ll define a z-score function</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb4-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">z_score</span>(x)</span>
<span id="cb4-2">    u <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(x)</span>
<span id="cb4-3">    s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">std</span>(x)</span>
<span id="cb4-4"></span>
<span id="cb4-5">    res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">Float64</span>[]</span>
<span id="cb4-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lastindex</span>(x)</span>
<span id="cb4-7">        tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (x[i] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> u) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> s</span>
<span id="cb4-8">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">push!</span>(res, tmp)</span>
<span id="cb4-9">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb4-10"></span>
<span id="cb4-11">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> res</span>
<span id="cb4-12"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<pre><code>z_score (generic function with 1 method)</code></pre>
</div>
</div>
<p>And then we’ll actually apply it.</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb6-1">Xz <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hcat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ones</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(y)), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Matrix</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mapcols</span>(z_score, X)))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<pre><code>10000×3 Matrix{Float64}:
 1.0  -0.218824    0.813147
 1.0  -0.037614   -1.60542
 1.0   0.492386   -0.131206
 1.0  -0.632861    0.164023
 1.0  -0.102786    0.370897
 1.0   0.174098   -1.95142
 1.0  -0.0203871  -0.645722
 1.0  -0.0552131  -1.19344
 1.0   0.673295    0.296293
 1.0  -1.727      -0.31805
 1.0  -1.727      -0.873227
 1.0   0.796355   -1.51825
 1.0  -1.23695    -0.394799
 ⋮                
 1.0  -1.727       0.616625
 1.0   0.338849   -1.01252
 1.0  -0.957166   -0.610505
 1.0  -0.36504     1.59599
 1.0   0.571147    0.897805
 1.0   0.213889    1.73331
 1.0  -1.37056    -1.39173
 1.0  -0.255977    1.46029
 1.0  -0.160036   -1.03896
 1.0   0.02075     1.88347
 1.0   1.51667     0.236351
 1.0  -1.31163    -1.24874</code></pre>
</div>
</div>
</section>
<section id="define-a-logistic-function" class="level1">
<h1>Define a Logistic Function</h1>
<p>Next, we’ll write a logistic function that will implement the logistic transformation. This is built into the <code>StatsFuns.jl</code> package, but I want to write it out by hand to reinforce what it is. We’ll use this to predict y values with a given input (which will actually be X*<img src="https://latex.codecogs.com/png.latex?%5Cbeta">)</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb8-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">my_logistic</span>(x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">exp</span>(x))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>my_logistic (generic function with 1 method)</code></pre>
</div>
</div>
</section>
<section id="define-a-maximum-likelihood-estimator" class="level1">
<h1>Define a Maximum Likelihood Estimator</h1>
<p>Now that we have some data, we can write a function that uses maximum likelihood estimation to give us the best <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> parameters for our given <strong>X</strong> and y. If you want to brush up on maximum likelihood, you can read <a href="https://www.ericekholm.com/posts/mle-learning-julia/">my previous “learning out loud” post</a>, or you can probably find materials written by someone who knows way more than I do. Either way, I’m not going to recap what MLE is here.</p>
<p>Let’s define our function that we’ll use to estimate <img src="https://latex.codecogs.com/png.latex?%5Cbeta">. The important thing to keep in mind is that the return value of this function isn’t the <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> values, but rather the negative log likelihood, since this is what we we want to optimize.</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb10-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ml_logreg</span>(x, y, b)</span>
<span id="cb10-2"></span>
<span id="cb10-3">    ŷ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">my_logistic</span>.(x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>b)</span>
<span id="cb10-4">    res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dt" style="color: #AD0000;
background-color: null;
font-style: inherit;">Float64</span>[]</span>
<span id="cb10-5"></span>
<span id="cb10-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lastindex</span>(y)</span>
<span id="cb10-7">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">push!</span>(res, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">logpdf</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Bernoulli</span>(ŷ[i]), y[i]))</span>
<span id="cb10-8">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb10-9"></span>
<span id="cb10-10">    ret <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">-sum</span>(res)</span>
<span id="cb10-11"></span>
<span id="cb10-12">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> ret</span>
<span id="cb10-13"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<pre><code>ml_logreg (generic function with 1 method)</code></pre>
</div>
</div>
<p>So what’s going on in this code?</p>
<ol type="1">
<li>We’re getting <img src="https://latex.codecogs.com/png.latex?y%CC%82"> estimates for a given x and b by running them through the <code>my_logistic()</code> function. This will give us a 10000x1 vector</li>
<li>We’re instantiating an empty vector that will (eventually) contain Float64 values.</li>
<li>For each index in <img src="https://latex.codecogs.com/png.latex?y%CC%82"> (i.e.&nbsp;1 through 10000), we’re getting the log-likelihood of the true outcome (y[i]) given a Bernoulli distribution parameterized by success rate <img src="https://latex.codecogs.com/png.latex?y%CC%82">[i].</li>
</ol>
<p>I think this is the trickiest part of the whole problem, so I want to put it into words to make sure I understand it. In our problem, our y values are either 0 or 1. And the output of the <code>my_logistic()</code> function is going to be, for each y, a predicted probability that <img src="https://latex.codecogs.com/png.latex?y%20=%201">, i.e.&nbsp;a predicted success rate. Since a Bernoulli distribution is parameterized by a given success rate and models the outcome of a single yes/no (1/0) trial, it makes sense to use this to generate the likelihoods we want to maximize.</p>
<p>More concretely, the likelihoods we get will be dependent on:</p>
<ol type="1">
<li>the provided success rate <em>p</em>, and</li>
<li>the actual outcome</li>
</ol>
<p>Where values of <em>p</em> that are closer to the actual outcome will be larger:</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb12-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">logpdf</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Bernoulli</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.5</span>), <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<pre><code>-0.6931471805599453</code></pre>
</div>
</div>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#will be larger than the previous</span></span>
<span id="cb14-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">logpdf</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Bernoulli</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.8</span>), <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<pre><code>-0.2231435513142097</code></pre>
</div>
</div>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb16-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#will be even larger</span></span>
<span id="cb16-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">logpdf</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Bernoulli</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.99</span>), <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<pre><code>-0.01005033585350145</code></pre>
</div>
</div>
<p>And inversely, you can imagine that if the outcome were 0, we’d want our predicted success rate to be very low.</p>
<p>Returning to our <code>ml_logreg()</code> function, what we’re doing then is applying this logic to all of our <img src="https://latex.codecogs.com/png.latex?y%CC%82"> and corresponding <em>y</em> values (i.e.&nbsp;we’re getting the likelihood of <em>y</em> for a given <img src="https://latex.codecogs.com/png.latex?y%CC%82">), and then we’re creating a vector with all of these likelihoods – that’s what the <code>push!(...)</code> notation is doing – pushing these likelihoods to the empty float vector we created.</p>
<p>Finally, we’re summing all of our likelihoods and then multiplying the result by negative one, since the optimizer we’re using actually wants to <em>minimize</em> a loss function rather than <em>maximize</em> a loss function.</p>
<p>We can run this function by providing any X, y, and <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, and it’ll give us back a negative loglikelihood – the negative sum of all of the individual likelihoods.</p>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb18-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#just some arbitrary numbers to test the function with</span></span>
<span id="cb18-2">start_vals <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span>]</span>
<span id="cb18-3"></span>
<span id="cb18-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ml_logreg</span>(Xz, y, start_vals)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<pre><code>7372.506385031871</code></pre>
</div>
</div>
</section>
<section id="optimize-beta" class="level1">
<h1>Optimize <img src="https://latex.codecogs.com/png.latex?%5Cbeta"></h1>
<p>So the above gives us the likelihood for a starting value of <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, but we want to find the <em>best</em> values of <img src="https://latex.codecogs.com/png.latex?%5Cbeta">. To do that, we can optimize the function. Like I said in my previous post, the optimizers are written by people much smarter than I am, so I’m just going to use that package rather than futz around with doing any, like, calculus by hand – although maybe that’s a topic for a later learning out loud post.</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb20-1">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">optimize</span>(b <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ml_logreg</span>(Xz, y, b), start_vals)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="12">
<pre><code> * Status: success

 * Candidate solution
    Final objective value:     7.894831e+02

 * Found with
    Algorithm:     Nelder-Mead

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    103
    f(x) calls:    190</code></pre>
</div>
</div>
<p>And then we can get the <img src="https://latex.codecogs.com/png.latex?%5Cbeta"> coefficients that minimize the loss function (i.e.&nbsp;that maximize the likelihood)</p>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb22-1">Optim.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">minimizer</span>(res)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="13">
<pre><code>3-element Vector{Float64}:
 -6.125561839584853
  2.731586594783831
  0.2775242967112382</code></pre>
</div>
</div>
<p>And just to confirm that we did this correctly, we can check our point estimates against what we’d get if we fit the model using the <code>GLM</code> package.</p>
<div class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb24-1">logreg_res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glm</span>(Xz, y, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Binomial</span>())</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="14">
<pre><code>GeneralizedLinearModel{GLM.GlmResp{Vector{Float64}, Binomial{Float64}, LogitLink}, GLM.DensePredChol{Float64, LinearAlgebra.Cholesky{Float64, Matrix{Float64}}}}:

Coefficients:
─────────────────────────────────────────────────────────────────
        Coef.  Std. Error       z  Pr(&gt;|z|)  Lower 95%  Upper 95%
─────────────────────────────────────────────────────────────────
x1  -6.12557    0.187562   -32.66    &lt;1e-99  -6.49318   -5.75795
x2   2.73159    0.109984    24.84    &lt;1e-99   2.51602    2.94715
x3   0.277522   0.0664854    4.17    &lt;1e-04   0.147213   0.407831
─────────────────────────────────────────────────────────────────</code></pre>
</div>
</div>
<p>Cool beans!</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2022,
  author = {Ekholm, Eric},
  title = {MLE {Learning} {Out} {Loud} 2: {Logistic} {Regression}},
  date = {2022-09-28},
  url = {https://www.ericekholm.com/posts/mle-logreg-julia},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2022" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2022. <span>“MLE Learning Out Loud 2: Logistic
Regression.”</span> September 28, 2022. <a href="https://www.ericekholm.com/posts/mle-logreg-julia">https://www.ericekholm.com/posts/mle-logreg-julia</a>.
</div></div></section></div> ]]></description>
  <category>Julia</category>
  <category>Learning Out Loud</category>
  <category>Maximum Likelihood</category>
  <category>Logistic Regression</category>
  <guid>https://www.ericekholm.com/posts/mle-logreg-julia/index.html</guid>
  <pubDate>Wed, 28 Sep 2022 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Generating Data with a Given Correlation</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <dc:creator>EE </dc:creator>
  <link>https://www.ericekholm.com/posts/cor-generate-data/index.html</link>
  <description><![CDATA[ 




<p>This is going to be a short one, but I saw a comment on Twitter recently about an interview question where someone was asked to generate a dataset with variables X and Y that are correlated at <em>r</em> = .8. So I figured I’d write out some code that does this as a way to practice in Julia a little bit more.</p>
<p>First we load our packages</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Statistics</span></span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Distributions</span></span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">CairoMakie </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#for plotting</span></span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random </span><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#to set a seed</span></span>
<span id="cb1-5"></span>
<span id="cb1-6"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random</span>.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seed!</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0408</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="2">
<pre><code>TaskLocalRNG()</code></pre>
</div>
</div>
<p>The approach here is going to be to define a covariance (correlation) matrix and a vector of means, then define a multivariate normal distribution parameterized by these things. We’ll then use this distribution to generate our data.</p>
<p>First we’ll define <img src="https://latex.codecogs.com/png.latex?%5CSigma">, which is our covariance matrix. Since we’re generating a dataset with only 2 variables, this will be a 2x2 matrix, where the diagonals will be 1 and the off-diagonals will be .8, which is the correlation we want between X and Y.</p>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb3-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#define our covariance matrix</span></span>
<span id="cb3-2">Σ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.8</span>] [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.8</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>]]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>2×2 Matrix{Float64}:
 1.0  0.8
 0.8  1.0</code></pre>
</div>
</div>
<p>Then we’ll define a mean vector. This will be a 2-element vector (one for each variable), but we don’t actually care what the values are here, so let’s just make them 0.</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb5-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#define a mean vector</span></span>
<span id="cb5-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#we don't actually care what these values are, though</span></span>
<span id="cb5-3">μ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">zeros</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="4">
<pre><code>2-element Vector{Float64}:
 0.0
 0.0</code></pre>
</div>
</div>
<p>Now we can define a distribution given <img src="https://latex.codecogs.com/png.latex?%5CSigma"> and <img src="https://latex.codecogs.com/png.latex?%5Cmu"></p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb7-1">d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> Distributions.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">MvNormal</span>(μ, Σ)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<pre><code>FullNormal(
dim: 2
μ: [0.0, 0.0]
Σ: [1.0 0.8; 0.8 1.0]
)</code></pre>
</div>
</div>
<p>And then we can draw a sample from this distribution</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb9-1">s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand</span>(d, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>2×200 Matrix{Float64}:
 -1.40556   0.469524  -1.19092  -0.40408   …  -0.244792  0.874835  -0.719764
 -0.595655  1.01141   -1.84189  -0.550097      0.250661  1.72269   -0.862095</code></pre>
</div>
</div>
<p>To confirm this works like expected, we can plot the sample</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb11-1">CairoMakie.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scatter</span>(s)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<p><img src="https://www.ericekholm.com/posts/cor-generate-data/index_files/figure-html/cell-7-output-1.svg" class="img-fluid"></p>
</div>
</div>
<p>It looks like a .8 correlation to me. But to do a final check, we can get the correlation matrix of our sample.</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#we need to transpose the matrix from 2x200 to 200x2, hence s' instead of s</span></span>
<span id="cb12-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(s<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">'</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<pre><code>2×2 Matrix{Float64}:
 1.0       0.769654
 0.769654  1.0</code></pre>
</div>
</div>
<p>Close enough. Our correlation won’t be <em>exactly</em> equal to .8 using this approach since we’re sampling from a distribution, but there’s really no difference (imo) between a .77 correlation and a .80 correlation.</p>



<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2022,
  author = {Ekholm, Eric and , EE},
  title = {Generating {Data} with a {Given} {Correlation}},
  date = {2022-09-08},
  url = {https://www.ericekholm.com/posts/cor-generate-data},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2022" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric, and EE. 2022. <span>“Generating Data with a Given
Correlation.”</span> September 8, 2022. <a href="https://www.ericekholm.com/posts/cor-generate-data">https://www.ericekholm.com/posts/cor-generate-data</a>.
</div></div></section></div> ]]></description>
  <category>Julia</category>
  <category>Tutorial</category>
  <category>Brief</category>
  <guid>https://www.ericekholm.com/posts/cor-generate-data/index.html</guid>
  <pubDate>Thu, 08 Sep 2022 04:00:00 GMT</pubDate>
</item>
<item>
  <title>MLE Learning Out Loud</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <dc:creator>EE </dc:creator>
  <link>https://www.ericekholm.com/posts/mle-learning-julia/index.html</link>
  <description><![CDATA[ 




<p><em>Disclaimer! The whole point of these “learning out loud” blog posts is to give myself a venue in which to practice/learn various statistics and programming concepts. I’m deciding to post these on my website both to normalize this notion of learning in public and also to invite people who know more than me to provide feedback. If I get something wrong, I’d love for you to tell me!</em></p>
<section id="maury-povich-as-a-metaphor-for-maximum-likelihood-estimation" class="level1">
<h1>Maury Povich as a metaphor for maximum likelihood estimation</h1>
<p>So this obviously isn’t 100% mathematically rigorous, but based on my understanding of maximum likelihood estimation (MLE), I think it’s kind of like the Maury Povich show…</p>
<p>Back when I was in high school, some of my friends and I used to eat lunch in our track coach’s classroom and watch the Maury Povich show. For those of you that haven’t every watched <em>Maury</em>, it’s an…interesting study of human behavior…and worth checking out. But basically it’s like Jerry Springer or any of these other daytime drama-fests, covering everything from infidelity to obesity to insane behavior and everything in between. But Maury’s specialty was paternity tests.</p>
<p>Although the details of the paternity test episodes differed slightly, a common pattern was that a pregnant woman along with multiple men would come on the show, and each of the men would take a paternity test. Maury would ask the men and the women to describe how confident they were in the results of the test, and the men would usually offer up something like:</p>
<p><em>“I am a thousand percent sure I am not the father.”</em></p>
<p>Which would then elicit the next man to say:</p>
<p><em>“Well I am one million percent sure I’m not the father!”</em></p>
<p>Which would in turn elicit animated reactions from the audience, the mother, and the other potential father(s) on the stage.</p>
<p><strong>So how’s this like maximum likelihood estimation?</strong></p>
<p>So my understanding of the logic of maximum likelihood estimation (MLE) is that, given a set of data, we can estimate the likelihood of a distribution parameterized by a given set of parameters. Imagine we have a bunch of measures of adult heights, and we assume that height is normally distributed. We know that a normal distribution is defined by its mean and its standard deviation. And so using our set of data, we can estimate the likelihood of any combination of mean and standard deviation (i.e.&nbsp;any set of parameters) given this data. And the parameters with the maximum likelihood are the “best” given our set of data. We’ll walk through this with examples later.</p>
<p>What matters here though is that the actual number describing the likelihood (or the log-likelihood, more likely) doesn’t really matter. It’s not arbitrary, but it’ll differ depending upon how many observations are in your dataset, the distribution you’re using, etc. The values of the (log)likelihood relative to one another are what matters. And in this respect I’m reminded of Maury’s paternity tests.</p>
<p>It doesn’t matter if a guest on the show says he’s 100% sure the baby isn’t his. If the guy next to him says he’s 110% sure the baby’s not his, then he’s more certain than the first guy. Likewise, if the first guy says he’s one million percent sure the baby isn’t his, he still “loses” if the guy next to him says he’s 2 million percent sure. The actual number doesn’t matter – what matters is the estimate relative to the other estimates.</p>
</section>
<section id="some-examples" class="level1">
<h1>Some Examples</h1>
<p>I’m not 100% sure the Maury analogy actually holds up, but whatever, let’s work through some examples</p>
<p>First we’ll load some necessary packages.</p>
<div class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Distributions</span></span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">CairoMakie</span></span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random</span></span>
<span id="cb1-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Optim</span></span>
<span id="cb1-5"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">GLM</span></span>
<span id="cb1-6"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">using</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">DataFrames</span></span></code></pre></div>
</div>
</section>
<section id="case-1-fitting-a-normal-distribution" class="level1">
<h1>Case 1: Fitting a Normal Distribution</h1>
<p>This is the simplest case. First, we’re going to generate some sample data, s, from a normal distribution with <img src="https://latex.codecogs.com/png.latex?%5Cmu%20=%200"> and <img src="https://latex.codecogs.com/png.latex?%5Csigma%20=%201"></p>
<div class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb2-1"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">Random</span>.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seed!</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0408</span>)</span>
<span id="cb2-2">s <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Normal</span>(), <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">10000</span>)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="3">
<pre><code>10000-element Vector{Float64}:
 -1.4055556573814212
  0.8813161144909877
  0.4695240597638853
  1.0596565592604608
 -1.1909245261358548
 -1.4819187811057175
 -0.40408041211016915
 -0.37805385034816524
 -1.5132047920081557
  2.2528479354589197
 -1.6595728371412546
  1.321172026499611
 -1.5741912720732054
  ⋮
 -0.6706076665047674
  1.313413766916552
 -0.5776340358208154
  2.2968511578121857
  0.6020915294889897
  0.19216658269979192
  0.8936776607551574
 -0.5898756308872724
  0.2424739897566387
  0.7926169568329148
 -0.46603730352631795
 -0.6572491362891565</code></pre>
</div>
</div>
<p>Then we’ll generate a bunch of normal distributions with various means and standard deviations</p>
<div class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb4-1">μs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>(<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>)</span>
<span id="cb4-2">σs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>;]</span>
<span id="cb4-3"></span>
<span id="cb4-4">ds <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb4-5"></span>
<span id="cb4-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> μs, j <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> σs</span>
<span id="cb4-7">    d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Normal</span>(i, j)</span>
<span id="cb4-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">push!</span>(ds, d)</span>
<span id="cb4-9"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
</div>
<p>So our task now is going to be to determine the likelihood of each distribution (defined with a given set a parameters) given our data, <em>s</em>, that we’ve drawn from a normal distribution with <img src="https://latex.codecogs.com/png.latex?%5Cmu%20=%200"> and <img src="https://latex.codecogs.com/png.latex?%5Csigma%20=%201"></p>
<p>To do this, we use the probability density function (pdf) of our normal distribution to determine the likelihood of the parameters for any given observation. Fortunately, Julia (and other languages) have tools that can help us do this without having to write out the entire equation by hand. That said, here’s the equation – even though I’m not going to call it directly, it’s probably useful to see it.</p>
<p><img src="https://latex.codecogs.com/png.latex?f(x)%20=%20%5Cfrac%7B1%7D%7B%5Csqrt%7B2%5Cpi%5Csigma%7D%7D%20%5Cexp%5B-%5Cfrac%7B(x%20-%20%5Cmu)%5E2%7D%7B2%5Csigma%5E2%7D%5D"></p>
<p>Let’s take a look at the first observation and the first distribution we defined:</p>
<p>The first value in our sample is:</p>
<div class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb5-1">s[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="5">
<pre><code>-1.4055556573814212</code></pre>
</div>
</div>
<p>And the first distribution we’ll look at is</p>
<div class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb7-1">ds[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>Normal{Float64}(μ=-2.0, σ=0.5)</code></pre>
</div>
</div>
<p>And if we look at the pdf of this, we get:</p>
<div class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pdf</span>(ds[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], s[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="7">
<pre><code>0.39356088133821826</code></pre>
</div>
</div>
<p>I’m not a statistician (hence these learning posts), but my understanding of this is that it generally represents the “fit” of the distribution (and its parameters) to the given sample/data point. These values will be bound between 0 and 1, since they’re likelihoods, with higher values indicating better fit/higher likelihood.</p>
<p>The next step is to convert this to a log scale, since logging allows us to sum things rather than multiply them (which we’re gonna do soon).</p>
<div class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">logpdf</span>(ds[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], s[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb11-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#same as log(pdf(ds[1], s[]1))</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="8">
<pre><code>-0.9325195055871961</code></pre>
</div>
</div>
<p>So this gives us the log likelihood of a given data point. But now we need to do this for all of the data points in our sample to determine the “fit”/likelihood of the distribution to our whole sample.</p>
<div class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb13-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">logpdf</span>.(ds[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], s))</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<pre><code>-103363.07786213113</code></pre>
</div>
</div>
<p>Apparently <code>Distributions.jl</code> gives us a helper for this via <code>loglikelihood</code>, so the above is the same as:</p>
<div class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">loglikelihood</span>(ds[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>], s)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="10">
<pre><code>-103363.07786213112</code></pre>
</div>
</div>
<p>So this gives us the (log)likelihood of a distribution (normal, in this case, defined by parameters <img src="https://latex.codecogs.com/png.latex?%5Cmu"> and <img src="https://latex.codecogs.com/png.latex?%5Csigma">) given our sample. That is, the relatively plausibility of the parameters given our data. The goal then is to pick the <em>best</em> distribution/parameters, which we can do by <em>maximizing the likelihood</em>. In Maury terms, we want to find guy who’s most sure that the baby isn’t his.</p>
<p>Or, apparently, it’s more common to minimize the negative loglikelihood, which is the same thing (and called logloss, I guess).</p>
<p>So let’s do this for all of the distributions we specified earlier</p>
<div class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb17-1">lls <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> []</span>
<span id="cb17-2"></span>
<span id="cb17-3"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> ds</span>
<span id="cb17-4">    res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">-loglikelihood</span>(i, s)</span>
<span id="cb17-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">push!</span>(lls, res)</span>
<span id="cb17-6"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span>
<span id="cb17-7"></span>
<span id="cb17-8">lls <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Float64</span>.(lls)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="11">
<pre><code>20-element Vector{Float64}:
 103363.07786213112
  34465.6764159677
  24477.94356153769
  22439.92990862643
  42816.83247490566
  19329.115069161333
  17750.58296295708
  18655.789571924823
  22270.587087680186
  14192.553722354956
  15467.666808820915
  17371.649235223234
  41724.3417004547
  19055.992375548587
  17629.195099129196
  18587.50889852165
 101178.09631322921
  33919.43102874222
  24235.16783388192
  22303.36856182006</code></pre>
</div>
</div>
<p>And then we can plot the loglikelihoods we get:</p>
<div class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb19-1">ind <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(ds))</span>
<span id="cb19-2"></span>
<span id="cb19-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lines</span>(ind, lls)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="12">
<p><img src="https://www.ericekholm.com/posts/mle-learning-julia/index_files/figure-html/cell-12-output-1.svg" class="img-fluid"></p>
</div>
</div>
<p>Notice that our negative log likelihood is minimized in the 10th distribution, so let’s take a look at what that is</p>
<div class="cell" data-execution_count="12">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb20-1">ds[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="13">
<pre><code>Normal{Float64}(μ=0.0, σ=1.0)</code></pre>
</div>
</div>
<p>This makes sense! This was the distribution that we drew our samples from!</p>
<p>If we want to do this without looking at a plot, we can apparently do this:</p>
<div class="cell" data-execution_count="13">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb22-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#get the index of the minimum value in lls</span></span>
<span id="cb22-2">min_ll <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">findall</span>(lls <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.==</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">minimum</span>(lls))</span>
<span id="cb22-3"></span>
<span id="cb22-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#get the distribution at this index</span></span>
<span id="cb22-5">ds[min_ll]</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="14">
<pre><code>1-element Vector{Any}:
 Normal{Float64}(μ=0.0, σ=1.0)</code></pre>
</div>
</div>
<p>So this tells us that – of the distributions we tested! – the most likely distribution given our data is a normal distribution with mean of 0 and standard deviation of 1. This doesn’t necessarily mean that this <img src="https://latex.codecogs.com/png.latex?%5Cmu%20=%200"> and <img src="https://latex.codecogs.com/png.latex?%5Csigma%20=%201"> are the <em>optimal</em> parameters. There could be better parameters that we didn’t test, and so in the future we’d want to probably use some sort of optimizing functions that can do all of the math for us.</p>
</section>
<section id="case-2-simple-linear-regression" class="level1">
<h1>Case 2: Simple Linear Regression</h1>
<p>So now let’s move on a bit and try a simple linear regression. First we’ll just generate some fake data and a “ground truth” function</p>
<div class="cell" data-execution_count="14">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb24-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#generate some x values</span></span>
<span id="cb24-2">x <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb24-3"></span>
<span id="cb24-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#generate error</span></span>
<span id="cb24-5">ϵ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Normal</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(x))</span>
<span id="cb24-6"></span>
<span id="cb24-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#define a function relating x to y</span></span>
<span id="cb24-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>(x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>x</span>
<span id="cb24-9"></span>
<span id="cb24-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#generate y as f(x) plus error</span></span>
<span id="cb24-11">y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f</span>.(x) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.+</span> ϵ</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="15">
<pre><code>101-element Vector{Float64}:
  2.4461255238293758
  1.2496723713219855
  1.5014319678963934
  1.8228982131749496
 -0.4320716243406362
  0.9628100854937409
  2.475384749019799
  2.047025196204242
  1.7487030877341891
  1.4865883008076408
  0.405749179591091
  2.5585877608457355
  2.956751811280712
  ⋮
 19.232878652117428
 17.53450559237765
 19.290332010598092
 18.144129060042054
 18.413568163778812
 17.89449730367819
 19.175282300038607
 20.826640516579364
 19.21519350783753
 19.070233221768582
 21.072296712369102
 19.469128284822276</code></pre>
</div>
</div>
<p>And then we can plot the x and y values we just created:</p>
<div class="cell" data-execution_count="15">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb26-1">CairoMakie.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scatter</span>(x, y)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="16">
<p><img src="https://www.ericekholm.com/posts/mle-learning-julia/index_files/figure-html/cell-16-output-1.svg" class="img-fluid"></p>
</div>
</div>
<p>Another way to think about the above is that we expect a linear relationship between x and y in the form of</p>
<p><img src="https://latex.codecogs.com/png.latex?y%20=%20%5Calpha%20+%20%5Cbeta%20x%20+%20%5Cepsilon"></p>
<p>We need to estimate alpha and beta in a way that optimally fits this line, and we can do this with maximum likelihood. We can take advantage of the fact that linear regression assumes that residuals are normally distributed with an expected value (mean) of 0, since this will provide as with a distribution we can try to parameterize optimally. Since the residuals are dependent upon the predicted values of y, and since the predicted values of y are dependent on the intercept (<img src="https://latex.codecogs.com/png.latex?%5Calpha">) and the coefficient (<img src="https://latex.codecogs.com/png.latex?%5Cbeta">), this will give us a way to estimate the terms in the regression line.</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Csigma"> is not super important to us, but we still need to estimate it. We can estimate the loglikelihood of a given set of parameters using the function below.</p>
<div class="cell" data-execution_count="16">
<div class="sourceCode cell-code" id="cb27" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb27-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max_ll_reg</span>(x, y, params)</span>
<span id="cb27-2"></span>
<span id="cb27-3">    α <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> params[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb27-4">    β <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> params[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb27-5">    σ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> params[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb27-6"></span>
<span id="cb27-7">    ŷ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> α <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.+</span> x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.*</span>β</span>
<span id="cb27-8"></span>
<span id="cb27-9">    resids <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.-</span> ŷ</span>
<span id="cb27-10"></span>
<span id="cb27-11">    d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Normal</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, σ)</span>
<span id="cb27-12"></span>
<span id="cb27-13">    ll <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">-loglikelihood</span>(d, resids)</span>
<span id="cb27-14">    </span>
<span id="cb27-15">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> ll</span>
<span id="cb27-16"></span>
<span id="cb27-17"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="17">
<pre><code>max_ll_reg (generic function with 1 method)</code></pre>
</div>
</div>
<p>And let’s see how this works by passing in some generic values – .5 as the intercept, 2 as the beta coefficient, and 1 as the error variance.</p>
<div class="cell" data-execution_count="17">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb29-1">yy <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max_ll_reg</span>(x, y, [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="18">
<pre><code>137.00423917573127</code></pre>
</div>
</div>
<p>The next step then is to optimize this. We pass some starting values and our <code>max_ll_reg</code> function into an optimizer, tell it to find the optimal values for the parameters (<img src="https://latex.codecogs.com/png.latex?%5Calpha">, <img src="https://latex.codecogs.com/png.latex?%5Cbeta">, and <img src="https://latex.codecogs.com/png.latex?%5Csigma">), and then the magical optimizing algorithm written by people much smarter than me will give us our coefficients.</p>
<div class="cell" data-execution_count="18">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb31-1">res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">optimize</span>(params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max_ll_reg</span>(x, y, params), [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>])</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="19">
<pre><code> * Status: success

 * Candidate solution
    Final objective value:     1.361561e+02

 * Found with
    Algorithm:     Nelder-Mead

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    130
    f(x) calls:    234</code></pre>
</div>
</div>
<p>And then this will give us the maximum likelihood solution for our regression equation.</p>
<div class="cell" data-execution_count="19">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb33-1">Optim.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">minimizer</span>(res)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="20">
<pre><code>3-element Vector{Float64}:
 0.6262632240571052
 1.9908770881500015
 0.9315933450872507</code></pre>
</div>
</div>
<p>We can check this by fitting the model with the <code>GLM</code> package</p>
<div class="cell" data-execution_count="20">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb35-1">data <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">DataFrame</span>(X <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> x, Y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y)</span>
<span id="cb35-2"></span>
<span id="cb35-3">ols_res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(<span class="pp" style="color: #AD0000;
background-color: null;
font-style: inherit;">@formula</span>(Y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> X), data)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="21">
<pre><code>StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}}}}, Matrix{Float64}}

Y ~ 1 + X

Coefficients:
────────────────────────────────────────────────────────────────────────
                Coef.  Std. Error      t  Pr(&gt;|t|)  Lower 95%  Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)  0.626272   0.185875    3.37    0.0011   0.257455   0.995089
X            1.99088    0.0321144  61.99    &lt;1e-80   1.92715    2.0546
────────────────────────────────────────────────────────────────────────</code></pre>
</div>
</div>
<p>et voila, we get the same <img src="https://latex.codecogs.com/png.latex?%5Calpha"> and <img src="https://latex.codecogs.com/png.latex?%5Cbeta">! The coefficients aren’t exactly the same as the ones we specified when generating the data, but that’s because of the error we introduced.</p>
<p>It’s maybe also worth nothing that Julia lets us solve the equation via the <code>\</code> operator, which apparently provides a shorthand for solving systems of linear equations:</p>
<div class="cell" data-execution_count="21">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb37-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#we have to include a column of 1s in the matrix to get the intercept</span></span>
<span id="cb37-2">xmat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hcat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ones</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(x)), x)</span>
<span id="cb37-3"></span>
<span id="cb37-4">xmat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span> y</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="22">
<pre><code>2-element Vector{Float64}:
 0.6262717121103298
 1.9908750493937837</code></pre>
</div>
</div>
</section>
<section id="case-3-multiple-regression" class="level1">
<h1>Case 3: Multiple Regression</h1>
<p>And I think we can extend the same logic above to multiple regression. The first step is to generate some data:</p>
<div class="cell" data-execution_count="22">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb39-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#make a 100x3 matrix of random numbers</span></span>
<span id="cb39-2">tmp <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">randn</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>)</span>
<span id="cb39-3"></span>
<span id="cb39-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#append a leading column of 1s (so we can get the intercept)</span></span>
<span id="cb39-5">𝐗 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hcat</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ones</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>), tmp)</span>
<span id="cb39-6"></span>
<span id="cb39-7"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#provide 'ground truth' coefficients</span></span>
<span id="cb39-8">𝚩 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb39-9"></span>
<span id="cb39-10"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#define a function to multiply X by B</span></span>
<span id="cb39-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f₂</span>(X) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> X<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>𝚩</span>
<span id="cb39-12"></span>
<span id="cb39-13"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#create some error</span></span>
<span id="cb39-14">ϵ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rand</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Normal</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.5</span>), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">size</span>(𝐗)[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>])</span>
<span id="cb39-15"></span>
<span id="cb39-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#make outcome values that comprise our generating function plus error</span></span>
<span id="cb39-17">𝐘 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">f₂</span>(𝐗) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> ϵ</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="23">
<pre><code>100-element Vector{Float64}:
  1.9539840955199055
  2.2807060973619135
  2.406665555383834
 -0.014731760693462048
  0.6002032048684441
 -3.531148702629271
  3.194699866301238
 -1.17858666703318
 -0.31117832513371646
  1.5595030004201824
  7.314823243199307
 -2.1182414214687673
 -3.502694667516171
  ⋮
  9.205296659574488
  4.233455153011074
  5.620823053237128
 -2.4088759447640156
  5.127431971734125
  1.0043279157869205
  3.6343775497324184
  2.611885689812401
  0.11077494956658729
  3.06142672043232
  1.961141975667116
 -0.013605916161866072</code></pre>
</div>
</div>
<p>Then we can define another function to return the maximum likelihood. This is the same as the simple regression function above, except it’s generalized to allow for more than 1 slope coefficient. Julia provides some neat functionality via the <code>begin</code> and <code>end</code> keywords that let us access the first and last elements of a vector, and we can even do things like <code>end-1</code> to get the second-to-last element, which is pretty nifty.</p>
<div class="cell" data-execution_count="23">
<div class="sourceCode cell-code" id="cb41" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb41-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max_ll_mreg</span>(x, y, params)</span>
<span id="cb41-2">    𝚩 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> params[begin<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb41-3">    σ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> params[<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span>]</span>
<span id="cb41-4"></span>
<span id="cb41-5">    ŷ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>𝚩</span>
<span id="cb41-6"></span>
<span id="cb41-7">    resids <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.-</span> ŷ</span>
<span id="cb41-8"></span>
<span id="cb41-9">    d <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">Normal</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, σ)</span>
<span id="cb41-10"></span>
<span id="cb41-11">    ll <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">-loglikelihood</span>(d, resids)</span>
<span id="cb41-12"></span>
<span id="cb41-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> ll</span>
<span id="cb41-14"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="24">
<pre><code>max_ll_mreg (generic function with 1 method)</code></pre>
</div>
</div>
<p>Then we can do the same thing as before – provide some starting parameters (coefficients), and tell our super-smart optimizer function to give us the parameters that maximize the likelihood.</p>
<div class="cell" data-execution_count="24">
<div class="sourceCode cell-code" id="cb43" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb43-1">start_params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> [<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.4</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">4.0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>]</span>
<span id="cb43-2"></span>
<span id="cb43-3">mreg_res <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">optimize</span>(params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max_ll_mreg</span>(𝐗, 𝐘, params), start_params)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="25">
<pre><code> * Status: success

 * Candidate solution
    Final objective value:     6.860916e+01

 * Found with
    Algorithm:     Nelder-Mead

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    221
    f(x) calls:    392</code></pre>
</div>
</div>
<p>And then we can show the results:</p>
<div class="cell" data-execution_count="25">
<div class="sourceCode cell-code" id="cb45" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb45-1">Optim.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">minimizer</span>(mreg_res)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="26">
<pre><code>5-element Vector{Float64}:
 0.42976727494283473
 1.0367345683471323
 1.8923643524058003
 3.0304421915621127
 0.4805357922701782</code></pre>
</div>
</div>
<p>And we can check that these are returning the correct values</p>
<div class="cell" data-execution_count="26">
<div class="sourceCode cell-code" id="cb47" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb47-1">𝐗 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\</span> 𝐘</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="27">
<pre><code>4-element Vector{Float64}:
 0.42976708591782187
 1.0367343909692166
 1.8923640529063954
 3.0304412982361364</code></pre>
</div>
</div>
<p>Alternatively, we could have written out the joint pdf for the normal distribution by hand, like below.</p>
<p>First we can define this function:</p>
<div class="cell" data-execution_count="27">
<div class="sourceCode cell-code" id="cb49" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb49-1"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">alt_mle_mlr</span>(x, y, params)</span>
<span id="cb49-2">    𝚩 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> params[begin<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]</span>
<span id="cb49-3">    σ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> params[<span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span>]</span>
<span id="cb49-4"></span>
<span id="cb49-5">    ŷ <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> x<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>𝚩</span>
<span id="cb49-6"></span>
<span id="cb49-7">    n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(ŷ)</span>
<span id="cb49-8"></span>
<span id="cb49-9">    ll <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">*log</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>π) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> n<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(σ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>((y <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.-</span> ŷ)<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.^</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>σ<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">^</span><span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb49-10">    </span>
<span id="cb49-11">    ll <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>ll</span>
<span id="cb49-12"></span>
<span id="cb49-13">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">return</span> ll</span>
<span id="cb49-14"><span class="kw" style="color: #003B4F;
background-color: null;
font-style: inherit;">end</span></span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="28">
<pre><code>alt_mle_mlr (generic function with 1 method)</code></pre>
</div>
</div>
<p>Then see what the loglikelihood is given our starting parameters:</p>
<div class="cell" data-execution_count="28">
<div class="sourceCode cell-code" id="cb51" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb51-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">alt_mle_mlr</span>(𝐗, 𝐘, start_params)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="29">
<pre><code>174.11351826768353</code></pre>
</div>
</div>
<p>Then optimize the function:</p>
<div class="cell" data-execution_count="29">
<div class="sourceCode cell-code" id="cb53" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb53-1">mreg_res2 <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">optimize</span>(params <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-&gt;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">alt_mle_mlr</span>(𝐗, 𝐘, params), start_params)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="30">
<pre><code> * Status: success

 * Candidate solution
    Final objective value:     6.860916e+01

 * Found with
    Algorithm:     Nelder-Mead

 * Convergence measures
    √(Σ(yᵢ-ȳ)²)/n ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    221
    f(x) calls:    392</code></pre>
</div>
</div>
<p>And check the results:</p>
<div class="cell" data-execution_count="30">
<div class="sourceCode cell-code" id="cb55" style="background: #f1f3f5;"><pre class="sourceCode julia code-with-copy"><code class="sourceCode julia"><span id="cb55-1">Optim.<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">minimizer</span>(mreg_res2)</span></code></pre></div>
<div class="cell-output cell-output-display" data-execution_count="31">
<pre><code>5-element Vector{Float64}:
 0.42976727494283473
 1.0367345683471323
 1.8923643524058003
 3.0304421915621127
 0.4805357922701782</code></pre>
</div>
</div>
<p>And there we go. Hopefully that was helpful for some others. I’ll probably do some more of these “learning out loud” posts as I dig into some more math, Julia, or other topics.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2022,
  author = {Ekholm, Eric and , EE},
  title = {MLE {Learning} {Out} {Loud}},
  date = {2022-08-31},
  url = {https://www.ericekholm.com/posts/mle-learning-julia},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2022" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric, and EE. 2022. <span>“MLE Learning Out Loud.”</span> August
31, 2022. <a href="https://www.ericekholm.com/posts/mle-learning-julia">https://www.ericekholm.com/posts/mle-learning-julia</a>.
</div></div></section></div> ]]></description>
  <category>Julia</category>
  <category>Learning Out Loud</category>
  <category>Maximum Likelihood</category>
  <guid>https://www.ericekholm.com/posts/mle-learning-julia/index.html</guid>
  <pubDate>Wed, 31 Aug 2022 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Crossfit Games Analysis</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/crossfit-games-2022/index.html</link>
  <description><![CDATA[ 




<p>For me, the Crossfit Games is one of the most exciting weekends of the year. And between the shift in programming and the closeness of the competition on both the mens and womens sides, this year was probably the best Games we’ve seen in a while.</p>
<p>To prolong the Games-high, I decided to dive into the leaderboard data a bit and see what we can take away from the athletes’ performances. I noticed that <a href="https://morningchalkup.com/crossfit-games-leaderboard/?season=2022&amp;stage=3&amp;comp=16&amp;division=2">Morning Chalk Up has a Games leaderboard</a> that I could scrape, and so I <a href="https://github.com/ekholme/cfg">cobbled together a package</a> to let me pull down the data and analyze it. If you’re into <code>R</code> and want to use the package, go ahead, but beware that it’s pretty fragile at the moment (I threw it together in a couple of hours on Tuesday), but I’ll probably put in a bit of work to improve it when I have more time.</p>
<p>In general, I won’t explain what all of the code in this post does, but I’ll include it in case folks are curious. If you’re not into <code>R</code> or coding and just want to read the text, then feel free to ignore all of the code throughout :)</p>
<p>Without any further ado, let’s get into it.</p>
<section id="setup" class="level2">
<h2 class="anchored" data-anchor-id="setup">Setup</h2>
<p>First we’ll do a little bit of set up, and we’ll also pull down the data.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(cfg)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(eemisc)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(harrypotter)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(glue)</span>
<span id="cb1-6"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(ggtext)</span>
<span id="cb1-7"></span>
<span id="cb1-8"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>())</span>
<span id="cb1-9"></span>
<span id="cb1-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">options</span>(</span>
<span id="cb1-11">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ggplot2.discrete.fill =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-12">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span>
<span id="cb1-13">    ),</span>
<span id="cb1-14">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ggplot2.discrete.color =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-15">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span>
<span id="cb1-16">    )</span>
<span id="cb1-17">)</span>
<span id="cb1-18"></span>
<span id="cb1-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#get data</span></span>
<span id="cb1-20">women_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fetch_leaderboard</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"women"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-21">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Women"</span>)</span>
<span id="cb1-22"></span>
<span id="cb1-23">men_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fetch_leaderboard</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"men"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-24">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Men"</span>)</span>
<span id="cb1-25"></span>
<span id="cb1-26">combined <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(women_df, men_df)</span>
<span id="cb1-27"></span>
<span id="cb1-28">long_by_event <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> combined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-29">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(athlete, division, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"event"</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb1-30">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb1-31">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"event"</span>),</span>
<span id="cb1-32">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"event"</span>,</span>
<span id="cb1-33">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"event_place"</span></span>
<span id="cb1-34">    )</span></code></pre></div>
</div>
</section>
<section id="points" class="level2">
<h2 class="anchored" data-anchor-id="points">Points</h2>
<p>Let’s first take a look at all of the athletes’ total points. Hopefully this is pretty self-explanatory.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(combined, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> points, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(athlete, points), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> points, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> points <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(division), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb2-6">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Points"</span>,</span>
<span id="cb2-7">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb2-8">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Points by Athlete at the 2022 CFG"</span></span>
<span id="cb2-9">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb2-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb2-12">    )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/crossfit-games-2022/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid" width="768"></p>
</div>
</div>
</section>
<section id="number-of-top-3-finishes" class="level2">
<h2 class="anchored" data-anchor-id="number-of-top-3-finishes">Number of Top 3 Finishes</h2>
<p>Another thing that might be interesting to check out is the number of top 3 finishes from each athlete:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">long_by_event <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(event_place <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(division, athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb3-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(athlete, n), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb3-9">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb3-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"# of Top 3 Finishes"</span>,</span>
<span id="cb3-11">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Number of Top 3 Finishes by Athlete"</span>,</span>
<span id="cb3-12">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Only athletes with multiple top 3 finishes shown"</span></span>
<span id="cb3-13">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-14">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb3-15">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb3-16">    )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/crossfit-games-2022/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>It’s probably not surprising that Tia dominated here, finishing in the top 3 in 8 of the 14 total scored events. One thing that’s interesting to me, though, is that every athlete who podiumed had 5 (or more) top 3 finishes, while nobody else had more than 3.</p>
</section>
<section id="event-placement-variability" class="level2">
<h2 class="anchored" data-anchor-id="event-placement-variability">Event Placement Variability</h2>
<p>We hear a lot about the importance of consistency in Crossfit. After all, the whole point of Crossfit is to be able to do everything well. We always hear that you need to be well rounded and need to avoid bombing events. And whenever Pat Vellner or Laura Horvath or whoever has a bad event, we hear about how that’s not the way to win the Games.</p>
<p>So given all of that, let’s spend some time looking at event-to-event variability for each athlete. To make it a bit easier, I’ll limit this to the top 20 athletes for the men and women.</p>
<p>In the graph below, I’ll show athletes’ average event placement with a dot, and a measure of their variability with a bar. It doesn’t really matter how variability is calculated (it’s just the standard error of the mean in this case), but suffice to say that the wider the bar is, the more variable (less consistent) each athlete’s performance was.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">top_20 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> combined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>athlete[combined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>place <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>]</span>
<span id="cb4-2"></span>
<span id="cb4-3">long_by_event <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> top_20) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(division, athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sem =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(event_place)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">avg =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(event_place)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb4-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> avg, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(athlete, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>avg), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_errorbarh</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xmin =</span> avg <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> sem, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xmax =</span> avg <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sem), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">height =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(division), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb4-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb4-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Event Placement"</span>,</span>
<span id="cb4-15">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Event Avg Placement and Variability"</span></span>
<span id="cb4-16">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-17">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_hp_d</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-18">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb4-19">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.major.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb4-20">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb4-21">    )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/crossfit-games-2022/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>So we can see a lot of differences between athletes here. Justin Medeiros and Roman Khrennikov were the two most consistent athletes (followed by Tia Toomey). We can also see that Noah Ohlsen and BKG were very consistent, albeit consistently toward the high-middle. On the flip side, we can see people like Laura Horvath, Gui Malheiros, and Dani Speegle had a lot of variability between events – meaning they finished very well in some and very poorly in others.</p>
</section>
<section id="finish-range" class="level2">
<h2 class="anchored" data-anchor-id="finish-range">Finish Range</h2>
<p>The analysis above takes all of the athletes’ events into account to calculate variability, but we might just care about each athlete’s best and worst performance, and the spread between those.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">finish_range <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> long_by_event <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> top_20) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(division, athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(</span>
<span id="cb5-5">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">worst =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(event_place),</span>
<span id="cb5-6">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">best =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(event_place)</span>
<span id="cb5-7">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb5-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(combined, place, athlete), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"athlete"</span>)</span>
<span id="cb5-10"></span>
<span id="cb5-11"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(finish_range, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> division, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(athlete, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>place))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_segment</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> best, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yend =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(athlete, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>place), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xend =</span> worst)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> worst), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-14">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> best), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-15">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> worst, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> worst)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-16">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> best, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> best)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-17">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_hp_d</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> </span>
<span id="cb5-18">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(division), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-19">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb5-20">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Event Finish"</span>,</span>
<span id="cb5-21">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>,</span>
<span id="cb5-22">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Best and Worst Finishes for Top 20 Athletes"</span></span>
<span id="cb5-23">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-24">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb5-25">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb5-26">    )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/crossfit-games-2022/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>Although this shows something slightly different that the previous graph, it basically reaffirms what we saw there. One interesting takeaway is that Justin Medeiros won the Games without actually winning any events. Another interesting point is that Roman Khrennikov had the best “worst event finish” out of everyone, never finishing lower than 15th, which is pretty incredible. On the women’s side, we see that quite a few athletes spanned the whole range between first and last (recall that Emily Rolfe withdrew, so last was 39th for much of the competition). Laura Horvath, Dani Speegle, and Lucy Campbell all went from worst to first, and a few others (Kara Saunders, Amanda Barnhart) managed nearly the same. To be fair, there were a few instances of this on the men’s side, too (Adler, Malheiros, Pepper).</p>
</section>
<section id="score-variability-vs-overall-finish" class="level2">
<h2 class="anchored" data-anchor-id="score-variability-vs-overall-finish">Score Variability vs Overall Finish</h2>
<p>This doesn’t really tell us the story about how much consistency matters, though. On the one hand, Medeiros and Khrennikov were the most consistent and also finished top 2. On the other hand, Noah Ohlsen was also incredibly consistent, but he finished 12th, whereas Laura Horvath had a lot of variability in her performances and finished 3rd. So let’s plot athletes’ overall finish (on the X axis) against their score variability (on the Y axis). If these are strongly related, we’d expect to see them form a line. If they’re not strongly related, they’ll look like a blob.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">sem_by_place <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> long_by_event <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(division, athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-3">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sem =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(event_place)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sqrt</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(combined, athlete, place), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"athlete"</span>) </span>
<span id="cb6-6">    </span>
<span id="cb6-7">sem_by_place <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb6-8">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> place, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> sem, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_hp_d</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb6-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Place"</span>,</span>
<span id="cb6-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Variability"</span></span>
<span id="cb6-15">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-16">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb6-17">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb6-18">    )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/crossfit-games-2022/index_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>It doesn’t really seem like there’s much here. But we can calculate the actual correlation coefficient to get a number to summarize the relationship.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(sem_by_place<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>sem, sem_by_place<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>place)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] -0.1818332</code></pre>
</div>
</div>
<p>So there’s a small negative correlation here – people who varied more between events tended to place slightly worse overall. But it’s a pretty small relationship.</p>
</section>
<section id="point-trajectories-for-top-5" class="level2">
<h2 class="anchored" data-anchor-id="point-trajectories-for-top-5">Point Trajectories for Top 5</h2>
<p>The thing that always makes the Games interesting are the races for the podium, and this year had some great races. On the men’s side, there was a lot of jostling between the top 3 for position. On the women’s side, Tia didn’t pull away until about event 10, and even Tia and Mal had locked up the gold and silver, there was an incredibly exciting race for 3rd thanks to a big comeback from Laura Horvath. So let’s plot the point trajectories for the athletes that ended up finishing in the top 5:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">top_5 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> combined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>athlete[combined<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>place <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>]</span>
<span id="cb9-2"></span>
<span id="cb9-3">scores_men <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">running_scores</span>(men_df) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Men"</span>)</span>
<span id="cb9-5"></span>
<span id="cb9-6">scores_women <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">running_scores</span>(women_df) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">division =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Women"</span>)</span>
<span id="cb9-8"></span>
<span id="cb9-9">scores_combined <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(scores_men, scores_women) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> top_5)</span>
<span id="cb9-11">    </span>
<span id="cb9-12">max_athlete <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> scores_combined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(athlete) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-14">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">score =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(cum_points))</span>
<span id="cb9-15"></span>
<span id="cb9-16">scores_combined <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> scores_combined <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb9-17">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(max_athlete, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"athlete"</span>)</span>
<span id="cb9-18"></span>
<span id="cb9-19"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(scores_combined, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> event, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> cum_points, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> athlete)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-20">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-21">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-22">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_replace_all</span>(athlete, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^(.*) (.*)"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">\\</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">2"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> score), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">14.5</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-23">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(division)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-24">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">limits =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">19</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">breaks =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">14</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-25">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb9-26">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Point Trajectories for Top 5 Athletes"</span>,</span>
<span id="cb9-27">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Event #"</span>,</span>
<span id="cb9-28">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Points"</span></span>
<span id="cb9-29">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-30">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb9-31">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.minor.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb9-32">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb9-33">    )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/crossfit-games-2022/index_files/figure-html/unnamed-chunk-8-1.png" class="img-fluid" width="768"></p>
</div>
</div>
</section>
<section id="laura-horvaths-comeback" class="level2">
<h2 class="anchored" data-anchor-id="laura-horvaths-comeback">Laura Horvath’s Comeback</h2>
<p>And the last thing I’ll include here is a visualization of Laura Horvath’s huge comeback. I’m a major LH fan, and even though I had a moment of doubt after the HSPU event, I was stoked to see her come back and take 3rd after absolutely crushing it during Sunday’s events.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">scores <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">running_scores</span>(women_df)</span>
<span id="cb10-2"></span>
<span id="cb10-3">lh_scores <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> scores[scores<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>athlete <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Laura Horvath"</span>, ] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|&gt;</span></span>
<span id="cb10-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(event, cum_points, athlete)</span>
<span id="cb10-5"></span>
<span id="cb10-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># function to get 3rd place for a given event</span></span>
<span id="cb10-7">third_place <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(x, event) {</span>
<span id="cb10-8">    tmp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">event_leaderboard</span>(x, event)</span>
<span id="cb10-9"></span>
<span id="cb10-10">    res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> tmp[tmp<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>place <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"event"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cum_points"</span>)]</span>
<span id="cb10-11"></span>
<span id="cb10-12">    res<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>athlete <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"third"</span></span>
<span id="cb10-13"></span>
<span id="cb10-14">    res</span>
<span id="cb10-15">}</span>
<span id="cb10-16"></span>
<span id="cb10-17">events <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(scores<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>event)</span>
<span id="cb10-18"></span>
<span id="cb10-19">thirds <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dfr</span>(events, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">third_place</span>(women_df, .x))</span>
<span id="cb10-20"></span>
<span id="cb10-21">x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bind_rows</span>(lh_scores, thirds)</span>
<span id="cb10-22"></span>
<span id="cb10-23">hungary_green <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#436F4D"</span></span>
<span id="cb10-24">bronze <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"#CD7F32"</span></span>
<span id="cb10-25"></span>
<span id="cb10-26"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(x, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> event, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> cum_points, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> athlete)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-27">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-28">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">shape =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-29">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> cum_points)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-30">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_manual</span>(</span>
<span id="cb10-31">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(hungary_green, bronze)</span>
<span id="cb10-32">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-33">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb10-34">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"&lt;span style='color:{hungary_green}'&gt;Laura Horvath's&lt;/span&gt; Charge to the &lt;span style='color:{bronze}'&gt;Podium&lt;/span&gt;"</span>),</span>
<span id="cb10-35">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Total Points"</span>,</span>
<span id="cb10-36">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Event #"</span>,</span>
<span id="cb10-37">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">subtitle =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glue</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"The &lt;span style='color:{bronze}'&gt;bronze line&lt;/span&gt; represents 3rd place after any given event"</span>)</span>
<span id="cb10-38">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-39">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_ee</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">12</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-40">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb10-41">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>,</span>
<span id="cb10-42">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.minor =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb10-43">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.minor.x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb10-44">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">panel.grid.minor.y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_blank</span>(),</span>
<span id="cb10-45">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">plot.title =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">element_markdown</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">16</span>)</span>
<span id="cb10-46">    )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/crossfit-games-2022/index_files/figure-html/unnamed-chunk-9-1.png" class="img-fluid" width="768"></p>
</div>
</div>
<p>That’s all for now – maybe I’ll do another once the Rogue Invitational rolls around in October. And who knows, maybe Tia will hang up her shoes and we’ll have a new top dog in the women’s division.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2022,
  author = {Ekholm, Eric},
  title = {Crossfit {Games} {Analysis}},
  date = {2022-08-12},
  url = {https://www.ericekholm.com/posts/crossfit-games-2022},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2022" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2022. <span>“Crossfit Games Analysis.”</span> August 12,
2022. <a href="https://www.ericekholm.com/posts/crossfit-games-2022">https://www.ericekholm.com/posts/crossfit-games-2022</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>Crossfit</category>
  <category>EDA</category>
  <guid>https://www.ericekholm.com/posts/crossfit-games-2022/index.html</guid>
  <pubDate>Fri, 12 Aug 2022 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Modifying the Default Quarto Blog Structure</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/demo-quarto-site/index.html</link>
  <description><![CDATA[ 




<p>So, <a href="https://quarto.org">Quarto</a> is pretty great. I just finished migrating my own personal website (which you’re already looking at, so I won’t link to it) to use Quarto (rather than <a href="https://rstudio.github.io/distill/">{distill}</a>), and I’m liking it a lot so far. I’m particularly excited about Quarto’s built-in support for multiple languages – I’m starting to learn Julia, and so I’m going to be doing some “learning out loud” (a la Jesse Mostipak) and blogging about my Julia journey.</p>
<p>This is a bit of a quibble, but I’m not a huge fan of the default Quarto blog structure – not necessarily the styling/theming or anything like that, but more so the fact that it sets your blog posts as the index (home page) of the site, whereas I’d prefer a more generic landing page with a little bit about me and some social media links, etc. Which the default Quarto blog does include as an <code>about</code> page! So the point of this post is to show you how to modify the default Quarto layout so that:</p>
<ul>
<li>your website index (home page, <code>sitename.com</code>) is a brief “summary” of yourself;</li>
<li>your blog posts are listed at <code>sitename.com/blog</code>; and</li>
<li>you have a more extensive about page at <code>sitename.com/about</code></li>
</ul>
<p>Beyond that, I’m not going to go into how to change the styling, how to publish, or anything like that, because there are already much better tutorials out there (<a href="https://blog.djnavarro.net/posts/2022-04-20_porting-to-quarto/">Danielle Navarro’s blog</a> is great if you’re trying to migrate from distill to Quarto).</p>
<section id="step-0-install-quarto" class="level1">
<h1>Step 0: Install Quarto</h1>
<p>Maybe this is obvious, but you’ll need Quarto installed if you want to use it. You can download it <a href="https://quarto.org/docs/get-started/">here</a></p>
</section>
<section id="step-1-make-a-site" class="level1">
<h1>Step 1: Make a site</h1>
<p>I prefer to do this from the command line, but I think you can do it from RStudio as well (I use VSCode, so I’m not 100% up-to-date on all of the RStudio IDE features).</p>
<p>You can do this in the command line by changing your working directory to wherever you want your site’s folder to live, then create a default site via:</p>
<pre><code>quarto create-project PROJECT_NAME --type website:blog</code></pre>
<p>This will create a generic sample site with the following file structure:</p>
<p><img src="https://www.ericekholm.com/posts/demo-quarto-site/img/default_tree.png" class="img-fluid"></p>
<p>(n.b.&nbsp;that you may not have the README file in your directory)</p>
</section>
<section id="step-2-render-the-site" class="level1">
<h1>Step 2: Render the site</h1>
<p>This is sort of optional, but if you want to get a sense of what the default site looks like, you can render all of the files in it. From the command line, set your current directory to the site folder (that you just created), then run</p>
<pre><code>quarto render</code></pre>
<p>and you should get some notifications that your posts are rendering. Once they’re done, you will see a <code>_site</code> folder. This has all of your files rendered inside of it, and if you open <code>index.html</code>, you can navigate to your site’s home page, which (by default) is the blog listing. This is something we’re going to change. You’ll also notice the <code>about.html</code> page, which is what we actually want to make our index.</p>
</section>
<section id="step-3-change-some-file-names" class="level1">
<h1>Step 3: Change some file names</h1>
<p>In your root directory (i.e.&nbsp;whatever you set PROJECT_NAME to earlier; not <code>_site</code>), we want to do the following:</p>
<ul>
<li>change the file name of <code>index.qmd</code> to <code>blog.qmd</code></li>
<li>change the file name of <code>about.qmd</code> to <code>index.qmd</code></li>
</ul>
<p>You’ll also want to open up the <code>blog.qmd</code> file and change the title (inside the YAML header) to “Blog”. The header should now look like this:</p>
<p><img src="https://www.ericekholm.com/posts/demo-quarto-site/img/blog_yaml.png" class="img-fluid"></p>
<p>Likewise, you’ll want to change the title of <code>index.qmd</code> (which was <code>about.qmd</code>) from “About” to something else – maybe your name of the name of the site. So it’ll look something like this:</p>
<p><img src="https://www.ericekholm.com/posts/demo-quarto-site/img/index_yaml.png" class="img-fluid"></p>
<p>You’ll also obviously want to change the image, include your own social media links, etc., but we won’t cover that here.</p>
</section>
<section id="step-4-make-a-new-about-page" class="level1">
<h1>Step 4: Make a new about page</h1>
<p>I like having the home page (index) be like a brief summary of me, and then I like having a more detailed “About” page just in case people are interested in reading more. We just set the index to be that brief intro or whatever you want to call it, so now we need to create a new about page.</p>
<p>In your root folder (PROJECT_NAME), create a new file called <code>about.qmd</code></p>
<p>And then within that you can add in whatever content you want – a brief bio, some pictures, whatever. The only thing I include at the outset (beside the content) is a YAML header that looks like this:</p>
<pre><code>---
title: "About Me"
---</code></pre>
</section>
<section id="step-5-modify-your-_quarto.yml-file" class="level1">
<h1>Step 5: Modify your _quarto.yml file</h1>
<p>Now we need to modify the <code>_quarto.yml</code> file, which is in the root directory (PROJECT_NAME). This file provides Quarto with some “big picture” instructions on the overall layout and styling of you site.</p>
<p>Basically we need to do 2 things in this file:</p>
<ol type="1">
<li>add <code>blog.qmd</code> to the <code>navbar</code> menu; and</li>
<li>add your site url.</li>
</ol>
<p>You also might want to change your website title. Either way, after you’re done, your <code>_quarto.yml</code> file should look something like this:</p>
<p><img src="https://www.ericekholm.com/posts/demo-quarto-site/img/quarto_yml.png" class="img-fluid"></p>
</section>
<section id="step-6-render-your-site-again" class="level1">
<h1>Step 6: Render your site (again)</h1>
<p>Now we’re ready to render again! Before you render, make sure you save all of your files. As before, we can render the site from the command line via:</p>
<pre><code>quarto render</code></pre>
<p>And your files should update in the <code>_site</code> directory. And that’s pretty much it. You can explore the files in your <code>_site</code> directory and see that our “postcard” is now the index/home page, and your blog and about me pages are accessible via the navbar.</p>
</section>
<section id="step-7-customize-and-deploy" class="level1">
<h1>Step 7: Customize and deploy</h1>
<p>I’m not going to cover these things, but now that we have our website set up the way we want it, the next steps are to customize the style/theming, add content, include your own social media links, etc. And then finally deploy your site! <a href="https://quarto.org/docs/websites/website-blog.html">The main Quarto site has some great resources on how to do all of this</a>.</p>
</section>
<section id="wrapping-up" class="level1">
<h1>Wrapping Up</h1>
<p>You can see this demo site (I didn’t customize anything beyond what we just walked through) <a href="https://demo-quarto-site.netlify.app/">here</a>, and you can see the site’s Github repo <a href="https://github.com/ekholme/demo_quarto_site">here</a></p>
<p>Hope this helps some folks! Happy Quarto-ing!</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2022,
  author = {Ekholm, Eric},
  title = {Modifying the {Default} {Quarto} {Blog} {Structure}},
  date = {2022-07-22},
  url = {https://www.ericekholm.com/posts/demo-quarto-site},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2022" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2022. <span>“Modifying the Default Quarto Blog
Structure.”</span> July 22, 2022. <a href="https://www.ericekholm.com/posts/demo-quarto-site">https://www.ericekholm.com/posts/demo-quarto-site</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>Quarto</category>
  <category>Tutorial</category>
  <category>website</category>
  <guid>https://www.ericekholm.com/posts/demo-quarto-site/index.html</guid>
  <pubDate>Fri, 22 Jul 2022 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Function Writing Metacognition</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/function-writing-metacognition/index.html</link>
  <description><![CDATA[ 




<p>If you’re a sane and respectable person, you keep percentages in your data formatted as decimals (e.g.&nbsp;50% as .5, 71% as .71). However, you may also find that you need to present these numbers in reports/visuals/tables in a more readable way (i.e.&nbsp;as “50%”). If you’re like me, you’ll often find yourself creating additional columns that take your (correct) percent-as-decimal columns and turn them into strings. The other day, I realized that I was doing this a lot, and so I wrote a very simple function to handle this for me, which I added to my personal/miscellaneous R package, <a href="https://github.com/ekholme/eemisc"><code>{eemisc}</code></a>. Here’s that function:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">pct_to_string <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(x) {</span>
<span id="cb1-2">   <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%"</span>)</span>
<span id="cb1-3">}</span></code></pre></div>
</div>
<p>Very simple, very straightforward, but it’ll save me a little bit of typing. One common (for me) use case is:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb2-2"></span>
<span id="cb2-3">tmp <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb2-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">grp =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"b"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"c"</span>),</span>
<span id="cb2-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pct =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">41</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">52</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">63</span>),</span>
<span id="cb2-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">txt =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pct_to_string</span>(pct)</span>
<span id="cb2-7">)</span>
<span id="cb2-8"></span>
<span id="cb2-9"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(tmp, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> pct, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> grp)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"steelblue"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> txt, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> pct <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">01</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">5</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_x_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">percent_format</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-13">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/function-writing-metacognition/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Which directly prints our string-formatted percent on the bar.</p>
<p>While writing this function, though, I made a few tweaks to the “base” version above, and I thought this would be a decent opportunity to write a metacognitive reflection on the process of developing this function. Hopefully this is helpful for people just starting out with writing R functions.</p>
<section id="base-function" class="level2">
<h2 class="anchored" data-anchor-id="base-function">Base Function</h2>
<p>Right, so, the point of this function is to take a percent (as a decimal) and turn it into a string. The base version of that function is the same as we presented above:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">pct_to_string <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(x) {</span>
<span id="cb3-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%"</span>)</span>
<span id="cb3-3">}</span></code></pre></div>
</div>
<p>And there’s nothing necessarily wrong with this. It works like we’d expect it to:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pct_to_string</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "10%"</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pct_to_string</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">111</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "11.1%"</code></pre>
</div>
</div>
</section>
<section id="digits-argument" class="level2">
<h2 class="anchored" data-anchor-id="digits-argument">Digits Argument</h2>
<p>The vast majority of the time, I’ll want to present these string percentages with a single decimal place (e.g.&nbsp;“11.1%”). It’s pretty rare – at least in the contexts I work in – for the hundredths place in a percentage to matter much, and including it detracts more than it helps. If two “groups” score 11.12% and 11.14%, these are functionally identical in my mind.</p>
<p>That said, there may be some cases where I do want to include a hundredths decimal point. More likely, there may be cases where I don’t want to include any decimal points. In the base version of <code>pct_to_string()</code>, I hardcoded the function to provide 1 decimal place (<code>digits = 1</code>). But to allow for some flexibility, I want to make <code>digits</code> an argument. Since I’ll be setting it to <code>1</code> 99.9% (see what I did there) of the time, I’ll just set 1 as the default. So now our function looks like this:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">pct_to_string <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) {</span>
<span id="cb8-2">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> digits), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%"</span>)</span>
<span id="cb8-3">}</span></code></pre></div>
</div>
<p>And we can see this in action like so:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#not specifying a digits argument will use the default of 1</span></span>
<span id="cb9-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pct_to_string</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1111</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "11.1%"</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#but we can specify a different number of digits if we want</span></span>
<span id="cb11-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pct_to_string</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1111</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "11.11%"</code></pre>
</div>
</div>
</section>
<section id="checking-bounds" class="level2">
<h2 class="anchored" data-anchor-id="checking-bounds">Checking Bounds</h2>
<p>Most often, the percent data I’m working with is bounded between 0 and 1 (0% - 100%). For instance, if I’m looking at the pass rates for standardized tests, I’m doing something wrong if I have a number greater than 1 or less than 0.</p>
<p>Another note is that, although I’m pretty consistent (insistent? both?) about formatting my percents as decimals, I sometimes pull data from sources where this isn’t the case, and it comes in as, e.g., 80.5 (rather than .805). The Virginia Department of Education tends to format their data this way.</p>
<p>Given both of these tidbits, I want to add an argument to <code>pct_to_string()</code> that checks if values of <code>x</code> are between 0 and 1. In my case, this is mostly to help catch mistakes before I make them. For instance, I want it to stop me if I try to multiply <code>80.5 * 100</code> because I didn’t realize the input was 80.5 and not .805. Additionally, because it’s so common that my percents at between 0 and 1, I want it to stop me if I’m working outside of this range.</p>
<p>To accomplish this, I’ll add a <code>check_bounds</code> argument to <code>pct_to_string()</code>. I want this to be a logical argument that, if set to <code>TRUE</code>, will stop the function from running if any values of x are less than 0 or greater than 1.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">pct_to_string <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">check_bounds =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) {</span>
<span id="cb13-2"></span>
<span id="cb13-3">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (check_bounds <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) {</span>
<span id="cb13-4">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stop</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"all elements of `x` must be between 0 and 1. If you are intentionally using a percentage outside of these bounds, set `check_bounds = FALSE`"</span>)</span>
<span id="cb13-5">    }</span>
<span id="cb13-6"></span>
<span id="cb13-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> digits), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%"</span>)</span>
<span id="cb13-8">}</span></code></pre></div>
</div>
<p>So let’s see how this works, now:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">a <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb14-2"></span>
<span id="cb14-3"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#this should work</span></span>
<span id="cb14-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pct_to_string</span>(a)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "0%"   "10%"  "20%"  "30%"  "40%"  "50%"  "60%"  "70%"  "80%"  "90%" 
[11] "100%"</code></pre>
</div>
</div>
<p>Note that the below will throw an error, so I’m going to capture it using <code>safely()</code> from the <code>{purrr}</code> package and then return the error</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1">b <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">seq</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.1</span>, .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb16-2"></span>
<span id="cb16-3">safe_pts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safely</span>(pct_to_string)</span>
<span id="cb16-4"></span>
<span id="cb16-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_pts</span>(b)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>error</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;simpleError in .f(...): all elements of `x` must be between 0 and 1. If you are intentionally using a percentage outside of these bounds, set `check_bounds = FALSE`&gt;</code></pre>
</div>
</div>
<p>One note is that you could write this function to throw a warning or a message rather than an error depending on your needs. For my personal use cases, I think it makes more sense to throw an error rather than a warning, but your mileage my vary.</p>
</section>
<section id="input-checks" class="level2">
<h2 class="anchored" data-anchor-id="input-checks">Input Checks</h2>
<p>A final thing I want to do is add a few statements to check that the input values I’m providing are valid. If this is a function that really is just for me, I might not do this (mostly out of laziness), but I’m also going to add it to a package that other people at my work use, so I think it makes sense to include these.</p>
<p>Basically, these will just throw an error if you try to pass an invalid value to one of the function arguments. Like with the <code>check_bounds</code> piece earlier, this entials using an <code>if</code> statement to evaluate some parameter values, and then, if these are <code>TRUE</code>, to stop the function and instead return an error message. I also want to make sure that the error messages are actually helpful. We can add these like so:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1">pct_to_string <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">check_bounds =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>) {</span>
<span id="cb18-2">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.numeric</span>(x)) {</span>
<span id="cb18-3">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stop</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"`x` must be numeric"</span>)</span>
<span id="cb18-4">    }</span>
<span id="cb18-5"></span>
<span id="cb18-6">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.integer</span>(digits) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> digits <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) {</span>
<span id="cb18-7">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stop</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"`digits` must be a non-negative integer"</span>)</span>
<span id="cb18-8">    }</span>
<span id="cb18-9"></span>
<span id="cb18-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.logical</span>(check_bounds)) {</span>
<span id="cb18-11">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stop</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"`check_bounds` must be TRUE or FALSE"</span>)</span>
<span id="cb18-12">    }</span>
<span id="cb18-13"></span>
<span id="cb18-14">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (check_bounds <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">min</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&amp;</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)) {</span>
<span id="cb18-15">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stop</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"all elements of `x` must be between 0 and 1. If you are intentionally using a percentage outside of these bounds, set `check_bounds = FALSE`"</span>)</span>
<span id="cb18-16">    }</span>
<span id="cb18-17"></span>
<span id="cb18-18">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">paste0</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> digits), <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"%"</span>)</span>
<span id="cb18-19">}</span>
<span id="cb18-20"></span>
<span id="cb18-21"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># and wrapping this with safely() again to show errors</span></span>
<span id="cb18-22">safe_pts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safely</span>(pct_to_string)</span></code></pre></div>
</div>
<p>And we can see what happens if we pass in invalid values:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_pts</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>eror</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>NULL</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_pts</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>error</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;simpleError in .f(...): `x` must be numeric&gt;</code></pre>
</div>
</div>
<p>etc. etc.</p>
<p>Technically, you don’t need some of these checks. If you try to pass a non-numeric value (<code>x</code>) to <code>round()</code>, you’ll get an error. Likewise if you give it an invalid value for its <code>digits</code> argument:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># first creating a safe function to catch error</span></span>
<span id="cb23-2">safe_round <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safely</span>(round)</span>
<span id="cb23-3"></span>
<span id="cb23-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_round</span>(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>error</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;simpleError in .Primitive("round")(x, digits): non-numeric argument to mathematical function&gt;</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_round</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"a"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">digits =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>error</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>&lt;simpleError in .Primitive("round")(x, digits): non-numeric argument to mathematical function&gt;</code></pre>
</div>
</div>
<p>But these error messages aren’t quite as helpful as the ones we’ve written.</p>
</section>
<section id="wrapping-up" class="level2">
<h2 class="anchored" data-anchor-id="wrapping-up">Wrapping Up</h2>
<p>That’s pretty much it – my thought process for creating a pretty simple function to convert percents to strings, as well as how I would build out this function. Hopefully this metacognitive activity was useful for people who are just starting out writing their own R functions!</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2022,
  author = {Ekholm, Eric},
  title = {Function {Writing} {Metacognition}},
  date = {2022-05-31},
  url = {https://www.ericekholm.com/posts/function-writing-metacognition},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2022" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2022. <span>“Function Writing Metacognition.”</span> May
31, 2022. <a href="https://www.ericekholm.com/posts/function-writing-metacognition">https://www.ericekholm.com/posts/function-writing-metacognition</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>programming</category>
  <category>metacognition</category>
  <guid>https://www.ericekholm.com/posts/function-writing-metacognition/index.html</guid>
  <pubDate>Tue, 31 May 2022 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Combining pmap and do.call</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/combining-pmap-and-docall/index.html</link>
  <description><![CDATA[ 




<p>The point of this blog post is to walk through a pattern I’ve started using in some of my analyses that combines <code>do.call()</code>, <code>purrr::pmap()</code>, and some wrapper functions to customize how a given analysis gets run. I’ll start by demonstrating <code>do.call()</code> and <code>pmap()</code> separately, then showing how you can use them together to do some cool things. I’m not going to go super in-depth on either <code>do.call()</code> or <code>pmap()</code>, so it might be worthwhile to look into some of the documentation for those functions separately.</p>
<p>Also – I’m going to use the <a href="https://allisonhorst.github.io/palmerpenguins/"><code>{palmerpenguins}</code></a> data here to illustrate this workflow. And, like, as is typically the case with toy data, the point here isn’t to run a suite of analyses that answer meaningful questions about this data, but rather to demonstrate how to combine these functions in a way that could help you answer meaningful questions for your own data.</p>
<p>With all of that said, onward and upward!</p>
<section id="setup" class="level1">
<h1>Setup</h1>
<p>To start, let’s load the packages we’ll need.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">knitr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span>opts_chunk<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">echo =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">warning =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">message =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)</span>
<span id="cb1-2"></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins)</span></code></pre></div>
</div>
<p>Let’s also take a quick peeksie at the penguins data, although the content of the data isn’t terrible important here.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>(penguins)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 344
Columns: 8
$ species           &lt;fct&gt; Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            &lt;fct&gt; Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    &lt;dbl&gt; 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     &lt;dbl&gt; 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm &lt;int&gt; 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       &lt;int&gt; 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               &lt;fct&gt; male, female, female, NA, female, male, female, male…
$ year              &lt;int&gt; 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…</code></pre>
</div>
</div>
<p>Cool cool. Now, let’s assume we want to analyze this penguins data. Let’s say we want to estimate a mean, a correlation coefficient, and fit a linear regression, and that this is our workflow (n.b.&nbsp;again that this probably shouldn’t be your <em>actual</em> workflow when you analyze data).</p>
<p>Let’s say we want to get the mean body mass – this is easy for us.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 4201.754</code></pre>
</div>
</div>
<p>Another way we can do the exact same thing is with <code>do.call()</code>. <code>do.call()</code> has a “what” argument, to which you provide the function you want to call (or the character string name of the function), and an “args” argument, where you list the arguments to pass to “what”. It has some other arguments, too, but I’m going to ignore those here. So, the call below does the exact same thing we did previously:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">what =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">args =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 4201.754</code></pre>
</div>
</div>
<p>The nice thing about do.call is that it’s very flexible. Say we wanted to run a correlation between body mass and bill depth. We can do this by directly calling the <code>cor()</code> function:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># option 1:</span></span>
<span id="cb8-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(</span>
<span id="cb8-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g,</span>
<span id="cb8-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>bill_depth_mm, </span>
<span id="cb8-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">use =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pairwise.complete.obs"</span></span>
<span id="cb8-6">)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] -0.4719156</code></pre>
</div>
</div>
<p>Or we can do the exact same thing via <code>do.call()</code>:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># option 2</span></span>
<span id="cb10-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cor"</span>,</span>
<span id="cb10-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">args =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb10-4">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g,</span>
<span id="cb10-5">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>bill_depth_mm, </span>
<span id="cb10-6">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">use =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pairwise.complete.obs"</span></span>
<span id="cb10-7">    )</span>
<span id="cb10-8">)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] -0.4719156</code></pre>
</div>
</div>
<p>Or say we wanted to run a linear regression with body mass regressed on bill depth and sex. Again, we can call <code>lm()</code> directly:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb12" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb12-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># option 1:</span></span>
<span id="cb12-2">res1 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(body_mass_g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> bill_depth_mm <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sex, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> penguins, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.action =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"na.omit"</span>)</span>
<span id="cb12-3"></span>
<span id="cb12-4">broom<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glance</span>(res1)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      &lt;dbl&gt;         &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0.642         0.640  483.      296. 2.34e-74     2 -2529. 5066. 5081.
# … with 3 more variables: deviance &lt;dbl&gt;, df.residual &lt;int&gt;, nobs &lt;int&gt;</code></pre>
</div>
</div>
<p>Or via <code>do.call()</code>:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#option 2</span></span>
<span id="cb14-2">res2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">args =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb14-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> body_mass_g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> bill_depth_mm <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sex,</span>
<span id="cb14-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> penguins,</span>
<span id="cb14-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.action =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"na.omit"</span></span>
<span id="cb14-6">))</span>
<span id="cb14-7"></span>
<span id="cb14-8">broom<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glance</span>(res2)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      &lt;dbl&gt;         &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0.642         0.640  483.      296. 2.34e-74     2 -2529. 5066. 5081.
# … with 3 more variables: deviance &lt;dbl&gt;, df.residual &lt;int&gt;, nobs &lt;int&gt;</code></pre>
</div>
</div>
<section id="combining-with-purrrpmap" class="level2">
<h2 class="anchored" data-anchor-id="combining-with-purrrpmap">Combining with purrr::pmap ()</h2>
<p>Just based on the above, <code>do.call()</code> isn’t really doing anything useful for us. It’s just a slightly more verbose way to call a function. But where <code>do.call()</code> really shines is when you pair it with some iteration – which we’ll do now, via <code>purrr::pmap()</code> – and/or some conditional logic (which we’ll add later via a wrapper function). Basically it shines with you program with it, is what I’m trying to say.</p>
<p>For those that don’t know, <code>purrr::pmap()</code> extends <code>purrr::map()</code> to allow for an arbitrary number of arguments to map over in parallel. If you’re not familiar with <code>purrr::map()</code>, <a href="https://r4ds.had.co.nz/iteration.html">Hadley’s R for Data Science book</a> has a good chapter on it. But anyway, let’s illustrate <code>pmap()</code> by running a handful of correlations on some sample data</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#generate data</span></span>
<span id="cb16-2">a <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb16-3">b <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb16-4">d <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rnorm</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span>)</span>
<span id="cb16-5"></span>
<span id="cb16-6"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#put data into a list</span></span>
<span id="cb16-7">sample_args <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb16-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(a, b, d),</span>
<span id="cb16-9">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(b, d, a)</span>
<span id="cb16-10">)</span></code></pre></div>
</div>
<p>This gives us a list of x and y values, where the first element of <code>x</code> is <code>a</code>, the first element of <code>y</code> is <code>b</code>, etc etc. We can run a bunch of correlations – <code>x[[1]]</code> with <code>y[[1]]</code>, <code>x[[2]]</code> with <code>y[[2]]</code> etc – by using <code>pmap()</code> and <code>cor()</code>:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pmap</span>(sample_args, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">cor</span>(..<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, ..<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">use =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pairwise.complete.obs"</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[[1]]
[1] 0.0467708

[[2]]
[1] 0.1479934

[[3]]
[1] -0.07458596</code></pre>
</div>
</div>
<p>Which can be a helpful pattern.</p>
<p>What’s potentially more interesting, though, is that we can also use <code>pmap()</code> in conjunction with <code>do.call()</code> to not only iterate through arguments passed to a given function (like we do with <code>cor()</code> above), but to also iterate over various functions:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#create a vector of function names</span></span>
<span id="cb19-2">funs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cor"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>)</span>
<span id="cb19-3"></span>
<span id="cb19-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#create a list of function arguments, where each element of the list is a list of args</span></span>
<span id="cb19-5">fun_args <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb19-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb19-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb19-8">        penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g, </span>
<span id="cb19-9">        penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>bill_depth_mm, </span>
<span id="cb19-10">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">use =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pairwise.complete.obs"</span></span>
<span id="cb19-11">        ),</span>
<span id="cb19-12">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb19-13">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> body_mass_g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> bill_depth_mm <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sex,</span>
<span id="cb19-14">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> penguins,</span>
<span id="cb19-15">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.action =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"na.omit"</span></span>
<span id="cb19-16">    )</span>
<span id="cb19-17">)</span>
<span id="cb19-18"></span>
<span id="cb19-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#combine the function names and args into a tibble</span></span>
<span id="cb19-20">fun_iterator <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb19-21">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">f =</span> funs,</span>
<span id="cb19-22">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fa =</span> fun_args</span>
<span id="cb19-23">)</span>
<span id="cb19-24"></span>
<span id="cb19-25"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#take a look at the tibble</span></span>
<span id="cb19-26"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>(fun_iterator)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 3
Columns: 2
$ f  &lt;chr&gt; "mean", "cor", "lm"
$ fa &lt;list&gt; [&lt;3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, 4250, 3300, 3…</code></pre>
</div>
</div>
<p>What we’re doing in the above code is:</p>
<ul>
<li>creating a list of function names;</li>
<li>creating a list of function arguments (where each element of the list is a list of args);</li>
<li>binding these lists together in a tibble.</li>
</ul>
<p>Then, we can then execute all of these functions with their corresponding arguments with <code>do.call()</code>:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb21" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb21-1">res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pmap</span>(fun_iterator, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(..<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, ..<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span></code></pre></div>
</div>
<p>Within <code>do.call()</code>, we’re passing the first column of our <code>fun_iterator</code> table to the first argument of <code>do.call()</code> (as denoted by ..1), and the second column of the tibble to the second argument of <code>do.call()</code> (as denoted by ..2). This will give us a list, <code>res</code>, where each element is the result of the function/argument combination in our <code>fun_iterator</code> tibble.</p>
<p>To prove it worked, let’s look at the results:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#mean</span></span>
<span id="cb22-2">res[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 4201.754</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#cor</span></span>
<span id="cb24-2">res[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]]</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] -0.4719156</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#lm</span></span>
<span id="cb26-2">broom<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glance</span>(res[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]])</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      &lt;dbl&gt;         &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0.642         0.640  483.      296. 2.34e-74     2 -2529. 5066. 5081.
# … with 3 more variables: deviance &lt;dbl&gt;, df.residual &lt;int&gt;, nobs &lt;int&gt;</code></pre>
</div>
</div>
<p>In theory, you can specify an entire set of analyses ahead of time and then execute them using <code>pmap()</code> + <code>do.call()</code> if you wanted to. So let’s at one way we might do that via a wrapper function.</p>
</section>
<section id="wrap-your-analyses" class="level2">
<h2 class="anchored" data-anchor-id="wrap-your-analyses">Wrap Your Analyses</h2>
<p>The real power of this is to write a function that wraps all of these components and allows you to run just a subset of them. And this is how I actually use this pattern in my own work. But I’ll touch on some real-world applications after we go through the code below.</p>
<p>Let’s start by writing a wrapper function that has 1 argument, <code>include</code>, where <code>include</code> is a character vector of function names.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">analyze_penguins <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">include =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cor"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>)) {</span>
<span id="cb28-2">  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#some code here</span></span>
<span id="cb28-3">}</span></code></pre></div>
</div>
<p>Then let’s drop all of the code that we just ran into the function:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1">analyze_penguins <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">include =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cor"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>)) {</span>
<span id="cb29-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#we already ran all of this</span></span>
<span id="cb29-3">    funs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cor"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>)</span>
<span id="cb29-4"></span>
<span id="cb29-5">    fun_args <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb29-6">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb29-7">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb29-8">            penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g,</span>
<span id="cb29-9">            penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>bill_depth_mm,</span>
<span id="cb29-10">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">use =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pairwise.complete.obs"</span></span>
<span id="cb29-11">        ),</span>
<span id="cb29-12">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb29-13">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> body_mass_g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> bill_depth_mm <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sex,</span>
<span id="cb29-14">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> penguins,</span>
<span id="cb29-15">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.action =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"na.omit"</span></span>
<span id="cb29-16">        )</span>
<span id="cb29-17">    )</span>
<span id="cb29-18"></span>
<span id="cb29-19">    fun_iterator <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb29-20">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">f =</span> funs,</span>
<span id="cb29-21">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fa =</span> fun_args</span>
<span id="cb29-22">    )</span>
<span id="cb29-23">}</span></code></pre></div>
</div>
<p>And then we subset the <code>fun_iterator</code> tibble to only include the functions we include in the <code>include</code> argument of our wrapper function, and executed only those functions via <code>pmap()</code> + <code>do.call()</code>:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">analyze_penguins <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">include =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cor"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>)) {</span>
<span id="cb30-2">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#this is all the same as previously</span></span>
<span id="cb30-3">    funs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cor"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>)</span>
<span id="cb30-4"></span>
<span id="cb30-5">    fun_args <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb30-6">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>),</span>
<span id="cb30-7">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb30-8">            penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>body_mass_g,</span>
<span id="cb30-9">            penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>bill_depth_mm,</span>
<span id="cb30-10">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">use =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pairwise.complete.obs"</span></span>
<span id="cb30-11">        ),</span>
<span id="cb30-12">        <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb30-13">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">formula =</span> body_mass_g <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> bill_depth_mm <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> sex,</span>
<span id="cb30-14">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> penguins,</span>
<span id="cb30-15">            <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.action =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"na.omit"</span></span>
<span id="cb30-16">        )</span>
<span id="cb30-17">    )</span>
<span id="cb30-18"></span>
<span id="cb30-19">    fun_iterator <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb30-20">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">f =</span> funs,</span>
<span id="cb30-21">        <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fa =</span> fun_args</span>
<span id="cb30-22">    )</span>
<span id="cb30-23"></span>
<span id="cb30-24">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># filter to only a subset of these functions that we've asked for in the wrapper args</span></span>
<span id="cb30-25">    fun_iterator <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> fun_iterator[fun_iterator<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>f <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> include, ]</span>
<span id="cb30-26">    </span>
<span id="cb30-27">    <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#execute these functions</span></span>
<span id="cb30-28">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pmap</span>(fun_iterator, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">do.call</span>(..<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, ..<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>))</span>
<span id="cb30-29">}</span></code></pre></div>
</div>
<p>So, say we just wanted the mean:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb31" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb31-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">analyze_penguins</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[[1]]
[1] 4201.754</code></pre>
</div>
</div>
<p>Or just the mean and the correlation:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">analyze_penguins</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"mean"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"cor"</span>))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[[1]]
[1] 4201.754

[[2]]
[1] -0.4719156</code></pre>
</div>
</div>
<p>Or just the linear model:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">broom<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glance</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">analyze_penguins</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"lm"</span>)[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]])</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic  p.value    df logLik   AIC   BIC
      &lt;dbl&gt;         &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;    &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     0.642         0.640  483.      296. 2.34e-74     2 -2529. 5066. 5081.
# … with 3 more variables: deviance &lt;dbl&gt;, df.residual &lt;int&gt;, nobs &lt;int&gt;</code></pre>
</div>
</div>
<p>I really like this pattern for data cleaning. I have a handful of demographic variables that I regularly work with that need to be cleaned and/or recoded, and I have some helper functions I’ve written to clean/recode each of them individually. But I also have a “meta” <code>recode_demographics()</code> function that can execute any combination of my helper functions depending on what I need for a given project. You can obviously also write your wrapper function to give you more control over the arguments to each constituent function (like by allowing you to pass in a formula to <code>lm()</code>, for instance, rather than hardcoding your formula), which can make this whole approach very flexible! It can be a bit time-consuming to write a wrapper that gives you the right level of flexibility, but if you have a set of related tasks you do frequently, I think it’s worth the time to figure out.</p>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2022,
  author = {Ekholm, Eric},
  title = {Combining Pmap and Do.call},
  date = {2022-03-15},
  url = {https://www.ericekholm.com/posts/combining-pmap-and-docall},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2022" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2022. <span>“Combining Pmap and Do.call.”</span> March 15,
2022. <a href="https://www.ericekholm.com/posts/combining-pmap-and-docall">https://www.ericekholm.com/posts/combining-pmap-and-docall</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>programming</category>
  <category>purrr</category>
  <guid>https://www.ericekholm.com/posts/combining-pmap-and-docall/index.html</guid>
  <pubDate>Tue, 15 Mar 2022 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Fitting a Multiple Regression with Torch</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/fitting-a-multiple-regression-with-torch/index.html</link>
  <description><![CDATA[ 




<p>In this post, I want to play around with the <code>{torch}</code> package a little bit by fitting a multiple regression model “by hand” (sort of) using torch and the Adam optimizer.</p>
<p>A few warnings/disclaimers right up front:</p>
<ul>
<li>I’m using this post as a way to explore and learn how <code>{torch}</code> works. I’m by no means an expert. I’m sure there are more concise/more idiomatic ways to do these things. And if you know about them, I’d love for you to show me!</li>
<li>I don’t think fitting a multiple regression (and particularly <em>this</em> multiple regression) through <code>{torch}</code> is really worthwhile. But it felt like a way to dig into the package a little bit in a way that didn’t involve loading MNIST and following a canned tutorial.</li>
<li>There are lots of data cleaning/exploration steps I’m just flat out skipping here.</li>
</ul>
<p>All of that said, if you’re still with me, let’s dive in.</p>
<section id="loading-data" class="level1">
<h1>Loading Data</h1>
<p>For this project, I’m going to use some ultramarathon data from <a href="https://github.com/rfordatascience/tidytuesday">#TidyTuesday</a> a few weeks ago. So the first step is loading that in and setting some plot options and whatnot.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(eemisc) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#for ggplot theme</span></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(harrypotter) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#for colors</span></span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(janitor)</span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(torch)</span>
<span id="cb1-6"></span>
<span id="cb1-7"></span>
<span id="cb1-8">herm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span>
<span id="cb1-9"></span>
<span id="cb1-10">opts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">options</span>(</span>
<span id="cb1-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ggplot2.discrete.fill =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-12">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>),</span>
<span id="cb1-13">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>),</span>
<span id="cb1-14">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Always"</span>)</span>
<span id="cb1-15">  )</span>
<span id="cb1-16">)</span>
<span id="cb1-17"></span>
<span id="cb1-18"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_ee</span>())</span>
<span id="cb1-19"></span>
<span id="cb1-20">ultra_rankings <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> readr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-26/ultra_rankings.csv'</span>)</span>
<span id="cb1-21">race <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> readr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-10-26/race.csv'</span>)</span></code></pre></div>
</div>
<p>Then, let’s take a little peeksie at the data. The first dataframe, <code>ultra_rankings</code>, provides data for each runner in each race.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>(ultra_rankings)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 137,803
Columns: 8
$ race_year_id    &lt;dbl&gt; 68140, 68140, 68140, 68140, 68140, 68140, 68140, 68140…
$ rank            &lt;dbl&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, NA, NA, NA,…
$ runner          &lt;chr&gt; "VERHEUL Jasper", "MOULDING JON", "RICHARDSON Phill", …
$ time            &lt;chr&gt; "26H 35M 25S", "27H 0M 29S", "28H 49M 7S", "30H 53M 37…
$ age             &lt;dbl&gt; 30, 43, 38, 55, 48, 31, 55, 40, 47, 29, 48, 47, 52, 49…
$ gender          &lt;chr&gt; "M", "M", "M", "W", "W", "M", "W", "W", "M", "M", "M",…
$ nationality     &lt;chr&gt; "GBR", "GBR", "GBR", "GBR", "GBR", "GBR", "GBR", "GBR"…
$ time_in_seconds &lt;dbl&gt; 95725, 97229, 103747, 111217, 117981, 118000, 120601, …</code></pre>
</div>
</div>
<p>Let’s also peek at the <code>race</code> data, which provides data about races:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>(race)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 1,207
Columns: 13
$ race_year_id   &lt;dbl&gt; 68140, 72496, 69855, 67856, 70469, 66887, 67851, 68241,…
$ event          &lt;chr&gt; "Peak District Ultras", "UTMB®", "Grand Raid des Pyréné…
$ race           &lt;chr&gt; "Millstone 100", "UTMB®", "Ultra Tour 160", "PERSENK UL…
$ city           &lt;chr&gt; "Castleton", "Chamonix", "vielle-Aure", "Asenovgrad", "…
$ country        &lt;chr&gt; "United Kingdom", "France", "France", "Bulgaria", "Turk…
$ date           &lt;date&gt; 2021-09-03, 2021-08-27, 2021-08-20, 2021-08-20, 2021-0…
$ start_time     &lt;time&gt; 19:00:00, 17:00:00, 05:00:00, 18:00:00, 18:00:00, 17:0…
$ participation  &lt;chr&gt; "solo", "Solo", "solo", "solo", "solo", "solo", "solo",…
$ distance       &lt;dbl&gt; 166.9, 170.7, 167.0, 164.0, 159.9, 159.9, 163.8, 163.9,…
$ elevation_gain &lt;dbl&gt; 4520, 9930, 9980, 7490, 100, 9850, 5460, 4630, 6410, 31…
$ elevation_loss &lt;dbl&gt; -4520, -9930, -9980, -7500, -100, -9850, -5460, -4660, …
$ aid_stations   &lt;dbl&gt; 10, 11, 13, 13, 12, 15, 5, 8, 13, 23, 13, 5, 12, 15, 0,…
$ participants   &lt;dbl&gt; 150, 2300, 600, 150, 0, 300, 0, 200, 120, 100, 300, 50,…</code></pre>
</div>
</div>
</section>
<section id="exploring-data" class="level1">
<h1>Exploring Data</h1>
<p>Again, I’m skipping this, but you should definitely do some exploration before building a model. :)</p>
</section>
<section id="modeling-with-torch" class="level1">
<h1>Modeling with Torch</h1>
<p>So, the idea here is to use <code>{torch}</code> to “manually” (sort of) estimate a multiple linear regression model. The first thing I’m going to do is refine a dataframe to use in the model. There are lots of possibilities here, but I’m going to choose to estimate model that predicts the winning time of a race from the race distance, the total elevation gain, and the total elevation loss. So let’s and filter our data down to what we’ll actually use in our model.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">ultra_mod_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ultra_rankings <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">left_join</span>(race, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">by =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"race_year_id"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(rank <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(time_in_seconds, distance, elevation_gain, elevation_loss)</span></code></pre></div>
</div>
<p>Next, I’m going to drop any observations with missing values on any of these variables. I’m also going to normalize the variables, because my understanding is that this matters quite a bit for optimizing via gradient descent (and it’s also good practice for linear models in general).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">ultra_normed <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ultra_mod_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">drop_na</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>, <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(x) {(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mean</span>(x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> x)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sd</span>(x)}))</span></code></pre></div>
</div>
<section id="creating-a-dataset" class="level2">
<h2 class="anchored" data-anchor-id="creating-a-dataset">Creating a Dataset</h2>
<p>Right, so, now we can get into the torch-y stuff. The first step is to use the <code>dataset()</code> constructor to build a dataset. According to the <a href="https://torch.mlverse.org/docs/articles/loading-data.html">torch documentation</a>, this requires following a few conventions. More specifically, we need to establish an <code>initialize()</code> function, a <code>.getitem()</code> function, and a <code>.length()</code> function.</p>
<p>Basically, these do the following:</p>
<ul>
<li><code>initialize()</code> creates x (predictor) and y (outcome) tensors from the data;</li>
<li><code>.getitem()</code> provides a way to return the x and y values for an item when provided an index (or multiple indices) by the user;</li>
<li><code>.length()</code> tells us how many observations we have in the data</li>
</ul>
<p>We can also define helper functions within <code>dataset()</code> as well (e.g.&nbsp;preprocessors for our data). I’m not going to do that here (since we’ve already lightly preprocessed our data), but I could if I wanted.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#initializing dataset</span></span>
<span id="cb8-2"></span>
<span id="cb8-3">ultra_dataset <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">dataset</span>(</span>
<span id="cb8-4">  </span>
<span id="cb8-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">name =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"ultra_dataset"</span>,</span>
<span id="cb8-6">  </span>
<span id="cb8-7">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">initialize =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(df) {</span>
<span id="cb8-8">    self<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb8-9">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>time_in_seconds) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb8-10">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.matrix</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb8-11">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_tensor</span>()</span>
<span id="cb8-12">    </span>
<span id="cb8-13">    self<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_tensor</span>(df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>time_in_seconds)</span>
<span id="cb8-14">    </span>
<span id="cb8-15">  },</span>
<span id="cb8-16">    </span>
<span id="cb8-17">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.getitem =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(i) {</span>
<span id="cb8-18">      x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> self<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x[i, ]</span>
<span id="cb8-19">      y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> self<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y[i]</span>
<span id="cb8-20">      </span>
<span id="cb8-21">      <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(x, y)</span>
<span id="cb8-22">    },</span>
<span id="cb8-23">    </span>
<span id="cb8-24">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">.length =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>() {</span>
<span id="cb8-25">      self<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">size</span>()[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]</span>
<span id="cb8-26">    }</span>
<span id="cb8-27">)</span></code></pre></div>
</div>
<p>Let’s see what this looks like. We’ll create a tensor dataset from the full ultra_normed data and then return its length:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">ultra_tensor_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ultra_dataset</span>(ultra_normed)</span>
<span id="cb9-2"></span>
<span id="cb9-3"></span>
<span id="cb9-4">ultra_len <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> ultra_tensor_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">.length</span>()</span>
<span id="cb9-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#note that this is the same as: length(ultra_tensor_df)</span></span>
<span id="cb9-6"></span>
<span id="cb9-7">ultra_len</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 1237</code></pre>
</div>
</div>
<p>We can also pull out a single observation if we want, and the result will give us the values in the X tensor and the y tensor:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">ultra_tensor_df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">.getitem</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[[1]]
torch_tensor
-0.3624
 0.2811
-0.2866
[ CPUFloatType{3} ]

[[2]]
torch_tensor
-0.809365
[ CPUFloatType{} ]</code></pre>
</div>
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#note that 1 here refers to the index of the item</span></span></code></pre></div>
</div>
<p>Next, let’s make train and validation datasets.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0408</span>)</span>
<span id="cb14-2">train_ids <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>ultra_len, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">floor</span>(.<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">8</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>ultra_len))</span>
<span id="cb14-3">valid_ids <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">setdiff</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span>ultra_len, train_ids)</span>
<span id="cb14-4"></span>
<span id="cb14-5">trn <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ultra_dataset</span>(ultra_normed[train_ids, ])</span>
<span id="cb14-6">vld <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ultra_dataset</span>(ultra_normed[valid_ids, ])</span></code></pre></div>
</div>
<p>This would be the point where we could also define a dataloader to train on batches of the data, but I’m not going to do that here because we can just train on the entire dataset at once.</p>
</section>
<section id="defining-a-model" class="level2">
<h2 class="anchored" data-anchor-id="defining-a-model">Defining a Model</h2>
<p>Now, let’s define our model. Again, for our learning purposes today, this is just going to be a plain old multiple regression model. To implement this in <code>{torch}</code>, we can define the model as follows:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">lin_mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(x, w, b) {</span>
<span id="cb15-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_mm</span>(w, x) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> b</span>
<span id="cb15-3">}</span></code></pre></div>
</div>
<p>In this model, we’re taking a vector of weights (or slopes), w, multiplying it by our input matrix, x, and adding our bias (or intercept). The <code>torch_mm()</code> function lets us perform this matrix multiplication.</p>
<p>Now that we’ve defined this model, let’s create our w and b parameters. Since this is a linear regression, each predictor in our model will have a single weight associated with it, and we’ll have a single intercept for the model. We’ll just use 1 as the starting value for our w parameters and 0 as the starting value for our b parameter.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb16" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb16-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#defining parameters</span></span>
<span id="cb16-2">num_feats <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span></span>
<span id="cb16-3"></span>
<span id="cb16-4">w <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_ones</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, num_feats))</span>
<span id="cb16-5">b <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_zeros</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span></code></pre></div>
</div>
<p>Now we can do a quick test to make sure everything fits together. We’re not actually training our model at this point, but I want to just run a small sample of our training data through the model (with the parameter starting values) to make sure we don’t get any errors.</p>
<p>Note that I need to transpose the X matrix for the multiplication to work.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">aa <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> trn<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">.getitem</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span>
<span id="cb17-2"></span>
<span id="cb17-3">aa_x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_transpose</span>(aa[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]], <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb17-4"></span>
<span id="cb17-5">t_out <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lin_mod</span>(aa_x, w, b)</span>
<span id="cb17-6"></span>
<span id="cb17-7">t_out</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>torch_tensor
-0.2315 -0.4256 -0.7237 -0.2182 -0.2279 -0.2015 -0.2869 -0.2124 -0.5832 -0.2300
[ CPUFloatType{1,10} ]</code></pre>
</div>
</div>
<p>Great! This gives us a single output for each of our input observations, which is what we want.</p>
</section>
<section id="training-the-model" class="level2">
<h2 class="anchored" data-anchor-id="training-the-model">Training the Model</h2>
<p>Now that we have a model and can feed data into the model, let’s train it.</p>
<p>Training the model involves using gradient descent, an optimizer, a loss function, and backpropagation to slowly tweak our parameters until they reach their optimal values (i.e.&nbsp;those that minimize loss). I’m not going to do a super deep dive into what all of that means, but basically in our training loop we’re going to:</p>
<ul>
<li>Run the data through the model and get predictions;</li>
<li>Measure how good our predictions are (via the loss function);</li>
<li>Compute the gradient of the loss with respect to the parameters (via the <code>backward()</code> method);</li>
<li>Tell our optimizer to update the parameters (via <code>optimizer$step()</code>);</li>
<li>Repeat a bunch of times</li>
</ul>
<p>That’s basically what the code below does. A few little extra things to point out, thought:</p>
<ul>
<li>In addition to training the model on the training data, I’m also getting predictions on the validation data during each iteration of the training process. This won’t influence the training at all, but it’ll give us a look at how the model does on a holdout set of data throughout the entire process.</li>
<li>The <code>torch_squeeze()</code> function just removes an unnecessary dimension from the predictions tensors.</li>
<li>I’ve also created lists to track training loss, validation loss, and parameter values throughout the fitting, and these get recorded on each pass through the training loop.</li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb19" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb19-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#recreate our parameters with the requires_grad attribute</span></span>
<span id="cb19-2">w <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_zeros</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, num_feats), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">requires_grad =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb19-3">b <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_zeros</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">requires_grad =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span>
<span id="cb19-4"></span>
<span id="cb19-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#put the parameters in a list</span></span>
<span id="cb19-6">params <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(w, b)</span>
<span id="cb19-7"></span>
<span id="cb19-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#define our optimizer</span></span>
<span id="cb19-9">optimizer <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">optim_adam</span>(params, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">lr =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb19-10"></span>
<span id="cb19-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#create lists to track values during the training</span></span>
<span id="cb19-12">loss_tracking <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>()</span>
<span id="cb19-13">params_tracking <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>()</span>
<span id="cb19-14">vld_loss_tracking <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>()</span>
<span id="cb19-15"></span>
<span id="cb19-16"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#training loop</span></span>
<span id="cb19-17"><span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">for</span> (i <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">in</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>) {</span>
<span id="cb19-18">  </span>
<span id="cb19-19">  optimizer<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">zero_grad</span>()</span>
<span id="cb19-20">  </span>
<span id="cb19-21">  x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_transpose</span>(trn<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb19-22">  vld_x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_transpose</span>(vld<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>x, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>)</span>
<span id="cb19-23">  </span>
<span id="cb19-24">  preds <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lin_mod</span>(x, w, b)</span>
<span id="cb19-25">  vld_preds <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lin_mod</span>(vld_x, w, b)</span>
<span id="cb19-26">  </span>
<span id="cb19-27">  preds <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_squeeze</span>(preds)</span>
<span id="cb19-28">  vld_preds <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">torch_squeeze</span>(vld_preds)</span>
<span id="cb19-29">  </span>
<span id="cb19-30">  current_loss <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nnf_mse_loss</span>(preds, trn<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y)</span>
<span id="cb19-31">  vld_loss <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">nnf_mse_loss</span>(vld_preds, vld<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>y)</span>
<span id="cb19-32">  </span>
<span id="cb19-33">  loss_tracking[i] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> current_loss<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">item</span>()</span>
<span id="cb19-34">  vld_loss_tracking[i] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> vld_loss<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">item</span>()</span>
<span id="cb19-35">  params_tracking[i] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(params[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]]), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.numeric</span>(params[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]])))</span>
<span id="cb19-36">  </span>
<span id="cb19-37">  current_loss<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">backward</span>()</span>
<span id="cb19-38">  </span>
<span id="cb19-39">  optimizer<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step</span>()</span>
<span id="cb19-40">  </span>
<span id="cb19-41">}</span></code></pre></div>
</div>
</section>
<section id="investigating-our-results" class="level2">
<h2 class="anchored" data-anchor-id="investigating-our-results">Investigating our Results</h2>
<p>Cool stuff – our model has finished training now. Let’s take a look at our final parameter values. In a little while, we’ll also compare these to values we get from fitting a multiple regression using the <code>lm()</code> function.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">betas <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb20-2">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">term =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(ultra_normed)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>], <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"intercept"</span>),</span>
<span id="cb20-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> params_tracking[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>]]</span>
<span id="cb20-4">)</span>
<span id="cb20-5"></span>
<span id="cb20-6">betas</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 4 × 2
  term              size
  &lt;chr&gt;            &lt;dbl&gt;
1 distance       -0.154 
2 elevation_gain  0.906 
3 elevation_loss  0.314 
4 intercept       0.0144</code></pre>
</div>
</div>
<p>Next, let’s take a look at how the parameter values (minus the intercept) change throughout the training loop/fitting process.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">descent_tibble <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(i, inp) {</span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb22-3">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iter =</span> i,</span>
<span id="cb22-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">distance =</span> inp[[i]][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>],</span>
<span id="cb22-5">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">elevation_gain =</span> inp[[i]][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>],</span>
<span id="cb22-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">elevation_loss =</span> inp[[i]][<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]</span>
<span id="cb22-7">  )</span>
<span id="cb22-8">}</span>
<span id="cb22-9"></span>
<span id="cb22-10">params_fitting_tbl <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dfr</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">descent_tibble</span>(.x, params_tracking)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb22-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>iter)</span>
<span id="cb22-12"></span>
<span id="cb22-13">params_fitting_tbl <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb22-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> iter, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> name)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-15">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-16">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_hp_d</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/fitting-a-multiple-regression-with-torch/index_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>We probably could have trained for fewer iterations, but it’s a small dataset and a simple model, so whatever.</p>
<p>Now, let’s see what the coefficients of a “standard” multiple regression (fit using <code>lm()</code>) look like. This will serve as our “ground truth” and will tell us if our gradient descent fitting process arrived at the “right” coefficient values:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">mod_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lm</span>(time_in_seconds <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> distance <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> elevation_gain <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> elevation_loss, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> ultra_normed[train_ids, ])</span>
<span id="cb23-2"></span>
<span id="cb23-3">mod_res</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>
Call:
lm(formula = time_in_seconds ~ distance + elevation_gain + elevation_loss, 
    data = ultra_normed[train_ids, ])

Coefficients:
   (Intercept)        distance  elevation_gain  elevation_loss  
       0.01438        -0.15403         0.90636         0.31391  </code></pre>
</div>
</div>
<p>Good stuff! If we look back up at the coefficients from our torch model, we can see that they’re (nearly) identical to those from this <code>lm()</code> model – which is what we want.</p>
<p>As a final step, let’s look at the loss of the model throughout the training process on both the training set and the validation set. This will give us a sense of how our model “learns” throughout the process.</p>
<p>As sort of an aside – we’d typically look at these metrics as a way to examine overfitting, which is a big problem for neural networks and more complex models. However, we’re not running a complex model. Linear models pretty much can’t overfit, so this is a less useful diagnostic here. But let’s take a look anyway.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb25" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb25-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#checking out loss during training</span></span>
<span id="cb25-2">loss_metrics <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tibble</span>(</span>
<span id="cb25-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">iter =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span>,</span>
<span id="cb25-4">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">trn_loss =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(loss_tracking),</span>
<span id="cb25-5">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">vld_loss =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unlist</span>(vld_loss_tracking)</span>
<span id="cb25-6">) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb25-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb25-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>iter</span>
<span id="cb25-9">  )</span>
<span id="cb25-10"></span>
<span id="cb25-11">loss_metrics <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb25-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> iter, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> name)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb25-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb25-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_color_hp_d</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/fitting-a-multiple-regression-with-torch/index_files/figure-html/unnamed-chunk-16-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Right, so this is pretty much what we’d expect. Both losses drop in the first few iterations and then level off. The fact that <em>both</em> losses flatline indicates that we’re not overfitting, which again is what we expect with a linear model. We also expect our validation loss to be higher than the training loss, because the model hasn’t seen this data ever.</p>
</section>
<section id="conclusion" class="level2">
<h2 class="anchored" data-anchor-id="conclusion">Conclusion</h2>
<p>That’s it for now. We’ve learned how to write a ton of code to accomplish something we can do in a single-liner call to <code>lm()</code> :)</p>
<p>I’m planning on digging into <code>{torch}</code> more and potentially writing a few more blogs once I get into actual neural networks with image and/or text data, but that’s for another day.</p>


</section>
</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2021,
  author = {Ekholm, Eric},
  title = {Fitting a {Multiple} {Regression} with {Torch}},
  date = {2021-11-20},
  url = {https://www.ericekholm.com/posts/fitting-a-multiple-regression-with-torch},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2021" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2021. <span>“Fitting a Multiple Regression with
Torch.”</span> November 20, 2021. <a href="https://www.ericekholm.com/posts/fitting-a-multiple-regression-with-torch">https://www.ericekholm.com/posts/fitting-a-multiple-regression-with-torch</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>torch</category>
  <category>regression</category>
  <guid>https://www.ericekholm.com/posts/fitting-a-multiple-regression-with-torch/index.html</guid>
  <pubDate>Sat, 20 Nov 2021 05:00:00 GMT</pubDate>
</item>
<item>
  <title>2021 Virginia Datathon Recap</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/virginia-datathon-recap/index.html</link>
  <description><![CDATA[ 




<section id="overview" class="level1">
<h1>Overview</h1>
<p>This past Thursday and Friday, a couple of friends (<a href="https://www.mldebusklane.com/">Morgan DeBusk-Lane</a> and <a href="https://soe.vcu.edu/directory/full-directory/first--last-name-301679-en.html">Mike Broda</a>) and I had the opportunity to participate in the <a href="https://www.cdo.virginia.gov/datathon/">2021 Virtual Virginia Datathon</a>. This is an annual hackathon that I’ve participated in for the past few years in which Virginia’s state agencies curate a bunch of datasets relating to a particular theme and ask participating teams to develop some sort of solution. Which I imagine is how hackathons typically work, but I haven’t participated in any others.</p>
<p>Anyway, the theme for this year’s datathon was “Addressing Hunger with Bits and Bytes,” and most of the data had to do with food insecurity, SNAP participation, free and reduced school meals, and the like. We focused in on one dataset provided – <a href="https://data.virginia.gov/Education/VDOE-Afterschool-Meal-Sites/q9n6-eddu">sites participating in the CACFP afterschool meals program</a>. From this dataset, we created a Shiny app that allows users to enter their address and identify the closest site (in Virginia) participating in the afterschool meals program. Although we’ve un-deployed our app, you can find the Github repo with all of the code (and a lightly cleaned dataset) <a href="https://github.com/debusklaneml/datathon_2021_vcusoe">here</a>.</p>
</section>
<section id="lessons-learned" class="level1">
<h1>Lessons Learned</h1>
<p>One thing I appreciated about our approach to this year’s datathon is that it gave me the opportunity to practice with some skills/tools I’ve used before but certainly wouldn’t consider myself super proficient in. More specifically, I got to practice a bit with Shiny and with working with geographical data. Some things I learned/took away are:</p>
<ul>
<li><p><strong>The <code>{leaflet}</code> package is awesome, but I probably need to learn some Javascript.</strong> I’ve dabbled with leaflet before, but using it in this instance just reaffirmed how amazing it is. Creating a great-looking, interactive map requires like three lines of R code and a dataframe with some geometry in it. That’s it. And the map we created suited our purposes just fine (or at least it worked as a prototype). That said, when I dug into some of the functions, I think I really need to learn some JS if I want to fully take advantage of the features <code>{leaflet}</code> offers. I’ve also been working with the <code>{reactable}</code> package quite a bit lately, so between these two tools, that might be enough of a push to pick up some JS.</p></li>
<li><p><strong>The <code>{nngeo}</code> package is also awesome.</strong> I’ve done a fair amount of geocoding and working with Census data as part of my job, so I’m reasonably familiar with tools like <code>{tidycensus}</code> and <code>{tidygeocoder}</code>. But I’ve only really had to do nearest neighbors with lat/long data once before, and although I figured it out, my code wasn’t super clean and I felt like I kind of stumbled my way through it. Fortunately, while we were working on this project, Mike found the <code>{nngeo}</code> package and its <code>st_nn()</code> function, which finds the nearest neighbor(s) to each row in X from a comparison dataset Y. So all I had to do was write a little wrapper around this function to tweak the inputs and outputs a little bit (you can see this in the <code>get_closest_ind()</code> function in the functions file in the Github repo).</p></li>
<li><p><strong>I ought to learn more about proxy functions in Shiny.</strong> I’ll begin this by saying that my understanding of proxy functions in Shiny is pretty minimal, but my general understanding is that they allow you to modify a specific aspect of a widget (a leaflet map, in this case) without recreating the entire output of the widget. So like you could change the colors of some markers or something. I think the filter functionality we included (allowing users to select all sites, school sites, or non-school sites) could be a candidate for using the <code>leafletProxy()</code> function, but I’m not sure. And given that we had a limited time to make a (very) rough prototype of an app, I didn’t feel like I had had enough time to play around with it on the fly. But it’s definitely something I want to dig into more when I have more time.</p></li>
</ul>
<p>Overall, I really enjoyed participating in the VA datathon this year because I felt like I got to expand my toolkit a little bit and work with tools that I don’t always use as part of my day job.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2021,
  author = {Ekholm, Eric},
  title = {2021 {Virginia} {Datathon} {Recap}},
  date = {2021-10-13},
  url = {https://www.ericekholm.com/posts/virginia-datathon-recap},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2021" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2021. <span>“2021 Virginia Datathon Recap.”</span> October
13, 2021. <a href="https://www.ericekholm.com/posts/virginia-datathon-recap">https://www.ericekholm.com/posts/virginia-datathon-recap</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>shiny</category>
  <category>VA datathon</category>
  <category>geography</category>
  <guid>https://www.ericekholm.com/posts/virginia-datathon-recap/index.html</guid>
  <pubDate>Wed, 13 Oct 2021 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Scooby Doo EDA</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/scooby-doo-eda/index.html</link>
  <description><![CDATA[ 




<p>For this week’s (well, really last week’s) #TidyTuesday, I wanted to do a sort of stream-of-consciousness type EDA and modeling that I’ll put up as a blog post. One motivation for this is that I’m considering doing some data science streaming in the future, and so I want to get a feel for whether this is an approach I might be interested in taking with streaming. So, the narrative here might be a bit lacking.</p>
<p>I’m going to shoot for spending an hour-ish on this, but I might end up doing more or less.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(eemisc)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(harrypotter)</span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(lubridate)</span>
<span id="cb1-5"></span>
<span id="cb1-6">herm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span>
<span id="cb1-7"></span>
<span id="cb1-8">opts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">options</span>(</span>
<span id="cb1-9">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ggplot2.discrete.fill =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-10">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>),</span>
<span id="cb1-11">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>),</span>
<span id="cb1-12">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Always"</span>)</span>
<span id="cb1-13">  )</span>
<span id="cb1-14">)</span>
<span id="cb1-15"></span>
<span id="cb1-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_ee</span>())</span>
<span id="cb1-17"></span>
<span id="cb1-18">scooby_raw <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> readr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-13/scoobydoo.csv'</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">""</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NA"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"NULL"</span>))</span></code></pre></div>
</div>
<p>What does the data look like?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>(scooby_raw)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 603
Columns: 75
$ index                    &lt;dbl&gt; 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14…
$ series_name              &lt;chr&gt; "Scooby Doo, Where Are You!", "Scooby Doo, Wh…
$ network                  &lt;chr&gt; "CBS", "CBS", "CBS", "CBS", "CBS", "CBS", "CB…
$ season                   &lt;chr&gt; "1", "1", "1", "1", "1", "1", "1", "1", "1", …
$ title                    &lt;chr&gt; "What a Night for a Knight", "A Clue for Scoo…
$ imdb                     &lt;dbl&gt; 8.1, 8.1, 8.0, 7.8, 7.5, 8.4, 7.6, 8.2, 8.1, …
$ engagement               &lt;dbl&gt; 556, 479, 455, 426, 391, 384, 358, 358, 371, …
$ date_aired               &lt;date&gt; 1969-09-13, 1969-09-20, 1969-09-27, 1969-10-…
$ run_time                 &lt;dbl&gt; 21, 22, 21, 21, 21, 21, 21, 21, 21, 21, 21, 2…
$ format                   &lt;chr&gt; "TV Series", "TV Series", "TV Series", "TV Se…
$ monster_name             &lt;chr&gt; "Black Knight", "Ghost of Cptn. Cuttler", "Ph…
$ monster_gender           &lt;chr&gt; "Male", "Male", "Male", "Male", "Female", "Ma…
$ monster_type             &lt;chr&gt; "Possessed Object", "Ghost", "Ghost", "Ancien…
$ monster_subtype          &lt;chr&gt; "Suit", "Suit", "Phantom", "Miner", "Witch Do…
$ monster_species          &lt;chr&gt; "Object", "Human", "Human", "Human", "Human",…
$ monster_real             &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ monster_amount           &lt;dbl&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 2, 1, 1, …
$ caught_fred              &lt;lgl&gt; FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE,…
$ caught_daphnie           &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ caught_velma             &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ caught_shaggy            &lt;lgl&gt; TRUE, TRUE, FALSE, FALSE, FALSE, FALSE, FALSE…
$ caught_scooby            &lt;lgl&gt; TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, FALSE,…
$ captured_fred            &lt;lgl&gt; FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALS…
$ captured_daphnie         &lt;lgl&gt; FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALS…
$ captured_velma           &lt;lgl&gt; FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, FALS…
$ captured_shaggy          &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ captured_scooby          &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, TRUE, FALSE, FALS…
$ unmask_fred              &lt;lgl&gt; FALSE, TRUE, TRUE, TRUE, FALSE, TRUE, FALSE, …
$ unmask_daphnie           &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ unmask_velma             &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ unmask_shaggy            &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRU…
$ unmask_scooby            &lt;lgl&gt; TRUE, FALSE, FALSE, FALSE, TRUE, FALSE, FALSE…
$ snack_fred               &lt;lgl&gt; TRUE, FALSE, TRUE, FALSE, FALSE, TRUE, FALSE,…
$ snack_daphnie            &lt;lgl&gt; FALSE, FALSE, FALSE, TRUE, TRUE, FALSE, FALSE…
$ snack_velma              &lt;lgl&gt; FALSE, TRUE, FALSE, FALSE, FALSE, FALSE, TRUE…
$ snack_shaggy             &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ snack_scooby             &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ unmask_other             &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ caught_other             &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ caught_not               &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ trap_work_first          &lt;lgl&gt; NA, FALSE, FALSE, TRUE, NA, TRUE, FALSE, FALS…
$ setting_terrain          &lt;chr&gt; "Urban", "Coast", "Island", "Cave", "Desert",…
$ setting_country_state    &lt;chr&gt; "United States", "United States", "United Sta…
$ suspects_amount          &lt;dbl&gt; 2, 2, 0, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, …
$ non_suspect              &lt;lgl&gt; FALSE, TRUE, TRUE, FALSE, FALSE, FALSE, FALSE…
$ arrested                 &lt;lgl&gt; TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, FAL…
$ culprit_name             &lt;chr&gt; "Mr. Wickles", "Cptn. Cuttler", "Bluestone th…
$ culprit_gender           &lt;chr&gt; "Male", "Male", "Male", "Male", "Male", "Male…
$ culprit_amount           &lt;dbl&gt; 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, …
$ motive                   &lt;chr&gt; "Theft", "Theft", "Treasure", "Natural Resour…
$ if_it_wasnt_for          &lt;chr&gt; NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "thes…
$ and_that                 &lt;chr&gt; NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "dog"…
$ door_gag                 &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ number_of_snacks         &lt;chr&gt; "2", "1", "3", "2", "2", "4", "4", "0", "1", …
$ split_up                 &lt;dbl&gt; 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, …
$ another_mystery          &lt;dbl&gt; 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ set_a_trap               &lt;dbl&gt; 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, …
$ jeepers                  &lt;dbl&gt; 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ jinkies                  &lt;dbl&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ my_glasses               &lt;dbl&gt; 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, …
$ just_about_wrapped_up    &lt;dbl&gt; 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ zoinks                   &lt;dbl&gt; 1, 3, 1, 2, 0, 2, 1, 0, 0, 0, 0, 6, 3, 5, 8, …
$ groovy                   &lt;dbl&gt; 0, 0, 2, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, …
$ scooby_doo_where_are_you &lt;dbl&gt; 0, 1, 0, 0, 1, 0, 0, 1, 0, 2, 0, 0, 0, 1, 0, …
$ rooby_rooby_roo          &lt;dbl&gt; 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 3, 0, 0, 0, …
$ batman                   &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ scooby_dum               &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ scrappy_doo              &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ hex_girls                &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ blue_falcon              &lt;lgl&gt; FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FAL…
$ fred_va                  &lt;chr&gt; "Frank Welker", "Frank Welker", "Frank Welker…
$ daphnie_va               &lt;chr&gt; "Stefanianna Christopherson", "Stefanianna Ch…
$ velma_va                 &lt;chr&gt; "Nicole Jaffe", "Nicole Jaffe", "Nicole Jaffe…
$ shaggy_va                &lt;chr&gt; "Casey Kasem", "Casey Kasem", "Casey Kasem", …
$ scooby_va                &lt;chr&gt; "Don Messick", "Don Messick", "Don Messick", …</code></pre>
</div>
</div>
<p>What’s the range of dates we’re looking at here?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">range</span>(scooby_raw<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>date_aired)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "1969-09-13" "2021-02-25"</code></pre>
</div>
</div>
<p>And how many episodes are we seeing each year?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(date_aired)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">year =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb6-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> year, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> n)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> herm)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/scooby-doo-eda/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>What about episodes by decade?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">scooby_raw<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">year</span>(date_aired) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%/%</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">rename</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">decade =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb7-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> decade, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> n)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> herm)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/scooby-doo-eda/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Next, let’s look at what ratings look like over time:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb8-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> index, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> imdb)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb8-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_smooth</span>()</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/scooby-doo-eda/index_files/figure-html/unnamed-chunk-5-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>And what if we color the points by series – I’d imagine series might have different ratings:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> index, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> imdb)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> series_name)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"grey70"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/scooby-doo-eda/index_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Next, I’m interested in looking at some comparisons across characters for different actions they take, like unmasking baddies, getting caught, etc. There are a bunch of these logical columns (e.g.&nbsp;<code>unmask_fred</code>), and so I’ll write a little helper function to summarize them and then pivot them into a shape that’ll be easier to plot later.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">summarize_pivot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(df, str) {</span>
<span id="cb10-2">  </span>
<span id="cb10-3">  df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-4">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">starts_with</span>(str), <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(.x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">na.rm =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-5">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(</span>
<span id="cb10-6">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">everything</span>(),</span>
<span id="cb10-7">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"key"</span>,</span>
<span id="cb10-8">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"value"</span></span>
<span id="cb10-9">    ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">extract</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">col =</span> key, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">into =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"key"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"char"</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">regex =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"^(.*)_(.*)$"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-11">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(value))</span>
<span id="cb10-12">}</span></code></pre></div>
</div>
<p>An example of what this does:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb11-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize_pivot</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"unmask"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 3
  key    char    value
  &lt;chr&gt;  &lt;chr&gt;   &lt;int&gt;
1 unmask fred      102
2 unmask velma      94
3 unmask daphnie    37
4 unmask other      35
5 unmask scooby     23
6 unmask shaggy     13</code></pre>
</div>
</div>
<p>Aaaand another example:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb13-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize_pivot</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"caught"</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 7 × 3
  key    char    value
  &lt;chr&gt;  &lt;chr&gt;   &lt;int&gt;
1 caught scooby    160
2 caught fred      132
3 caught other      84
4 caught shaggy     77
5 caught velma      41
6 caught not        31
7 caught daphnie    29</code></pre>
</div>
</div>
<p>Next, let’s use <code>purrr::map()</code> to do this a few times, combine the results into a df, and then make a plot</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1">iter_strs <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"caught"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"captured"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"unmask"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"snack"</span>)</span>
<span id="cb15-2"></span>
<span id="cb15-3">actions_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">map_dfr</span>(iter_strs, <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize_pivot</span>(scooby_raw, .x))</span>
<span id="cb15-4"></span>
<span id="cb15-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>(actions_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 23
Columns: 3
$ key   &lt;chr&gt; "caught", "caught", "caught", "caught", "caught", "caught", "cau…
$ char  &lt;chr&gt; "scooby", "fred", "other", "shaggy", "velma", "not", "daphnie", …
$ value &lt;int&gt; 160, 132, 84, 77, 41, 31, 29, 91, 85, 83, 74, 71, 102, 94, 37, 3…</code></pre>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb17" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb17-1">actions_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb17-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> value, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> char, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> key)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(key), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb17-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme</span>(</span>
<span id="cb17-6">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">legend.position =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"none"</span></span>
<span id="cb17-7">  )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/scooby-doo-eda/index_files/figure-html/unnamed-chunk-11-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Right, so we see that all of the characters get captured more or less the same amount, Fred and Scooby tend to catch monsters the most, Daphnie and Shaggy eat the most snacks, and Velma and Fred do the most unmasking.</p>
<p>Switching up a bit, what if we want to look at monster’s motives? First let’s take a look at all of the unique motives.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb18" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb18-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(scooby_raw<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>motive)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code> [1] "Theft"            "Treasure"         "Natural Resource" "Competition"     
 [5] "Extortion"        "Safety"           "Counterfeit"      "Inheritance"     
 [9] "Smuggling"        "Preservation"     NA                 "Experimentation" 
[13] "Food"             "Trespassing"      "Assistance"       "Abduction"       
[17] "Haunt"            "Anger"            "Imagination"      "Bully"           
[21] "Loneliness"       "Training"         "Conquer"          "Mistake"         
[25] "Automated"        "Production"       "Entertainment"    "Simulation"      </code></pre>
</div>
</div>
<p>And it’s probably useful to count these:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb20" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb20-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span> </span>
<span id="cb20-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(motive, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">sort =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 28 × 2
   motive               n
   &lt;chr&gt;            &lt;int&gt;
 1 Competition        168
 2 Theft              125
 3 &lt;NA&gt;                67
 4 Treasure            54
 5 Conquer             42
 6 Natural Resource    26
 7 Smuggling           22
 8 Trespassing         15
 9 Abduction           12
10 Food                11
# … with 18 more rows</code></pre>
</div>
</div>
<p>So, “Competition” is far and away the most common motive. I’m not sure I really understand what this means? But it’s also been a while since I’ve watched Scooby Doo.</p>
<p>I’m also interested in how often we see “zoinks” in episodes, bc I feel like this is the defining line of the show (along with the meddling kids, which I’ll look at next).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb22" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb22-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb22-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> zoinks)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb22-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">20</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> herm)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/scooby-doo-eda/index_files/figure-html/unnamed-chunk-14-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>This feels weird to me. Most often, we get 0 or 1, but then there are episodes with more than 10? I’d imagine these are probably movies?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb23" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb23-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb23-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> zoinks)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_histogram</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">bins =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb23-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(format), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">scales =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"free_y"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/scooby-doo-eda/index_files/figure-html/unnamed-chunk-15-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Well, so, there are still some TV shows that have a ton of zoinks’s. But also our biggest outlier is a movie, which makes sense to me since there’s more time for zoinking.</p>
<p>And what about our “if it wasn’t for those meddling kids” data?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb24" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb24-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(scooby_raw<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>if_it_wasnt_for))</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] 108</code></pre>
</div>
</div>
<p>Ok, wow, so that’s a lot of different values for “if it wasn’t for…”</p>
<p>First, let’s just see how many episodes have the “if it wasn’t for…” catchphrase</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb26" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb26-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb26-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">has_catchphrase =</span>  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(if_it_wasnt_for), <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb26-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(has_catchphrase)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 2
  has_catchphrase     n
  &lt;lgl&gt;           &lt;int&gt;
1 FALSE             414
2 TRUE              189</code></pre>
</div>
</div>
<p>Cool, so, 189 of our 603 episodes have the “if it wasn’t for…” catchphrase.</p>
<p>And now which of these also use the term “meddling?”</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb28" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb28-1">scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb28-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(if_it_wasnt_for)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb28-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">meddling =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">if_else</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str_detect</span>(if_it_wasnt_for, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"meddling"</span>), <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">TRUE</span>, <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">FALSE</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb28-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">count</span>(meddling) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb28-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> meddling)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb28-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_text</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">label =</span> n, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> n <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">hjust =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"white"</span>)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/scooby-doo-eda/index_files/figure-html/unnamed-chunk-18-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Alright, so, of the 189 episodes that have the “if it wasn’t for…” catchphrase, most of those also include the word “meddling!”</p>
<p>The last little bit here – because I’m trying to keep my time to about an hour (again, to test out the feel for if this is a viable approach to streaming or making videos), is going to be to fit a quick linear model predicting the imdb rating of an episode.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb29" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb29-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidymodels)</span></code></pre></div>
</div>
<p>Let’s just use numeric/logical columns in our model, mostly because preprocessing them is pretty straightforward (although note that this doesn’t mean what I’m doing below is anywhere near the best approach). Then let’s look at how much missing data we have for each of these columns.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb30" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb30-1">mod_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> scooby_raw <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb30-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.numeric) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.logical)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb30-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(imdb))</span>
<span id="cb30-4"></span>
<span id="cb30-5">miss_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mod_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb30-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">summarize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">everything</span>(), <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sum</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">is.na</span>(.x))<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">length</span>(.x))))</span>
<span id="cb30-7"></span>
<span id="cb30-8">miss_df</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 1 × 51
  index  imdb engagement run_time monster_amount suspects_amount culprit_amount
  &lt;dbl&gt; &lt;dbl&gt;      &lt;dbl&gt;    &lt;dbl&gt;          &lt;dbl&gt;           &lt;dbl&gt;          &lt;dbl&gt;
1     0     0          0        0              0               0              0
# … with 44 more variables: split_up &lt;dbl&gt;, another_mystery &lt;dbl&gt;,
#   set_a_trap &lt;dbl&gt;, jeepers &lt;dbl&gt;, jinkies &lt;dbl&gt;, my_glasses &lt;dbl&gt;,
#   just_about_wrapped_up &lt;dbl&gt;, zoinks &lt;dbl&gt;, groovy &lt;dbl&gt;,
#   scooby_doo_where_are_you &lt;dbl&gt;, rooby_rooby_roo &lt;dbl&gt;, monster_real &lt;dbl&gt;,
#   caught_fred &lt;dbl&gt;, caught_daphnie &lt;dbl&gt;, caught_velma &lt;dbl&gt;,
#   caught_shaggy &lt;dbl&gt;, caught_scooby &lt;dbl&gt;, captured_fred &lt;dbl&gt;,
#   captured_daphnie &lt;dbl&gt;, captured_velma &lt;dbl&gt;, captured_shaggy &lt;dbl&gt;, …</code></pre>
</div>
</div>
<p>So, some of these columns have a ton of missing data. Just to keep moving forward on this, I’m going to chuck any columns with more than 20% missing data, then median impute cases with missing data in the remaining columns (which we’ll do in the recipes step below).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb32" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb32-1">keep_vars <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> miss_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb32-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">everything</span>(),</span>
<span id="cb32-3">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"nms"</span>,</span>
<span id="cb32-4">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"vals"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb32-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(vals <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb32-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pull</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>)</span>
<span id="cb32-7"></span>
<span id="cb32-8">mod_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mod_df <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb32-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_of</span>(keep_vars)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb32-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">across</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">where</span>(is.logical), as.numeric))</span></code></pre></div>
</div>
<p>Now we’ll set up some bootstrap resamples. I’m using bootstrap resamples here rather than k-fold because it’s a relatively small dataset.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb33" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb33-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0408</span>)</span>
<span id="cb33-2">booties <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bootstraps</span>(mod_df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">times =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>)</span></code></pre></div>
</div>
<p>And then let’s define some very basic preprocessing using a recipe:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb34" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb34-1">rec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">recipe</span>(imdb <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> mod_df) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb34-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_impute_median</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb34-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">step_normalize</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all_numeric_predictors</span>()) </span></code></pre></div>
</div>
<p>And let’s do a lasso regression, just using a small and kinda of arbitrary penalty value (we could tune this, but I’m not going to).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb35" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb35-1">lasso_spec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">linear_reg</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">mixture =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">penalty =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">001</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb35-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set_engine</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"glmnet"</span>)</span>
<span id="cb35-3"></span>
<span id="cb35-4"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#combining everything into a workflow</span></span>
<span id="cb35-5">lasso_wf <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">workflow</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb35-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_recipe</span>(rec) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb35-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">add_model</span>(lasso_spec)</span></code></pre></div>
</div>
<p>And now let’s fit!</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb36" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb36-1">lasso_res <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit_resamples</span>(</span>
<span id="cb36-2">  lasso_wf,</span>
<span id="cb36-3">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">resamples =</span> booties</span>
<span id="cb36-4">)</span></code></pre></div>
</div>
<p>The main reason for fitting on these resamples is to check our model performance, so let’s do that.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb37" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb37-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">collect_metrics</span>(lasso_res)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 2 × 6
  .metric .estimator  mean     n std_err .config             
  &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt; &lt;int&gt;   &lt;dbl&gt; &lt;chr&gt;               
1 rmse    standard   0.626    10  0.0104 Preprocessor1_Model1
2 rsq     standard   0.280    10  0.0165 Preprocessor1_Model1</code></pre>
</div>
</div>
<p>Our R-squared is .29, which isn’t great, but it’s also not terrible considering we really didn’t put much effort into our preprocessing here, and we discarded a bunch of data.</p>
<p>Let’s fit one final time on the full dataset to look at the importance of our predictor variables:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb39" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb39-1">prepped_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> rec <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb39-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">prep</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb39-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">bake</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">new_data =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span>)</span>
<span id="cb39-4"></span>
<span id="cb39-5">mod_fit <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> lasso_spec <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb39-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fit</span>(imdb <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> ., <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> prepped_df)</span></code></pre></div>
</div>
<p>And then finally we can look at our coefficients.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb40" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb40-1">mod_fit <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb40-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb40-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(term <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"(Intercept)"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb40-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">arrange</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">desc</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(estimate))) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb40-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> estimate, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">fct_reorder</span>(term, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">abs</span>(estimate)), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fill =</span> estimate <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&gt;=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb40-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_col</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb40-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb40-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="cn" style="color: #8f5902;
background-color: null;
font-style: inherit;">NULL</span></span>
<span id="cb40-9">  )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/scooby-doo-eda/index_files/figure-html/unnamed-chunk-28-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>And there we go. That was a bit more than an hour, but it was worth it to get to a reasonable stopping point!</p>



<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2021,
  author = {Ekholm, Eric},
  title = {Scooby {Doo} {EDA}},
  date = {2021-07-20},
  url = {https://www.ericekholm.com/posts/scooby-doo-eda},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2021" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2021. <span>“Scooby Doo EDA.”</span> July 20, 2021. <a href="https://www.ericekholm.com/posts/scooby-doo-eda">https://www.ericekholm.com/posts/scooby-doo-eda</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>EDA</category>
  <category>Scooby Doo</category>
  <category>regression</category>
  <guid>https://www.ericekholm.com/posts/scooby-doo-eda/index.html</guid>
  <pubDate>Tue, 20 Jul 2021 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Robustly Create Parameterized Reports</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/robustly-create-parameterized-reports/index.html</link>
  <description><![CDATA[ 




<p>Recently, I was working on creating parameterized reports for all of the schools in the division where I work. The basic idea was to provide school leadership teams with individualized reports on several (common) key metrics that they could use to both 1) reflect on the previous year(s) and 2) set goals for the upcoming year(s).</p>
<p>The beauty of parameterized reporting via RMarkdown is that you can build a template report, define some parameters that will vary within each iteration of the report, and then render several reports all from a single template along with a file that will loop (or <code>purrr::walk()</code>) through the parameters. (If you want to learn more about parameterized reporting, the always-incredible Alison Hill has a recent-ish tutorial on them that you can find <a href="https://alison.rbind.io/talk/2021-rmd-params/">here</a>). In my case, this meant creating one template for all 65+ schools and then looping through a function that rendered the report for each school. Sounds great, right?</p>
<section id="see-what-had-happened-was" class="level2">
<h2 class="anchored" data-anchor-id="see-what-had-happened-was">See, what had happened was…</h2>
<p><img src="https://www.ericekholm.com/posts/robustly-create-parameterized-reports/https:/media1.tenor.com/images/dfbd2101a59e39cfa42938026e9ef19b/tenor.gif?itemid=9281985" class="img-fluid"></p>
<p>This workflow is great…when it works. Except it doesn’t always. This isn’t to say that <code>{rmarkdown}</code> mysteriously breaks or anything, but rather that when you create these reports using real (read: usually messy) data, and when you’re trying to present a lot of data in a report, the probability that one of your iterations throws an error increases. This is especially true when you work in a school division and the integrity of your data has been absolutely ravaged by COVID during the past ~18 months. When this happens, instead of watching the text zip by on your console as all of your reports render like they’re supposed to, you end up hunting through the data for each individual school wondering why calculating a particular metric threw an error. Which is like, not nearly as much fun.</p>
</section>
<section id="so-what-can-we-do-about-this" class="level2">
<h2 class="anchored" data-anchor-id="so-what-can-we-do-about-this">So what can we do about this?</h2>
<p>Fortunately, we can get around this by making “safe” versions of our functions. What exactly that means will vary from function to function and from use case to use case, but generally it means wrapping a function in another function that can facilitate error handling (or prevent errors from occuring). In some cases, it might mean using <code>purrr::safely()</code> or <code>purrr::possibly()</code> to capture errors or provide default values to the functions. In other cases, it might mean writing your own wrapper (which is what I’ll demonstrate below) to deal with errors that pop up. Regardless of the exact route you go, the goal here is to prevent errors that would otherwise stop your document(s) from rendering.</p>
<p>Let’s see this in action.</p>
</section>
<section id="setup" class="level1">
<h1>Setup</h1>
<p>I’m not actually going to create “real” parameterized reports here, but I’ll illustrate the principle using data from <code>{palmerpenguins}</code> and some ggplots. First, I’ll load some packages and set some options and whatnot, plus also take a peek at the penguins data we’ll be using.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(palmerpenguins) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#data on penguins</span></span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#you all know what this is</span></span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(eemisc) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#personal ggplot themes</span></span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(harrypotter) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#colors</span></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(reactable)</span>
<span id="cb1-6"></span>
<span id="cb1-7">herm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span>
<span id="cb1-8"></span>
<span id="cb1-9">opts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">options</span>(</span>
<span id="cb1-10">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ggplot2.discrete.fill =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-11">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>),</span>
<span id="cb1-12">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Always"</span>)</span>
<span id="cb1-13">  )</span>
<span id="cb1-14">)</span>
<span id="cb1-15"></span>
<span id="cb1-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_ee</span>())</span>
<span id="cb1-17"></span>
<span id="cb1-18"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">glimpse</span>(penguins)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>Rows: 344
Columns: 8
$ species           &lt;fct&gt; Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel…
$ island            &lt;fct&gt; Torgersen, Torgersen, Torgersen, Torgersen, Torgerse…
$ bill_length_mm    &lt;dbl&gt; 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, …
$ bill_depth_mm     &lt;dbl&gt; 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, …
$ flipper_length_mm &lt;int&gt; 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186…
$ body_mass_g       &lt;int&gt; 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, …
$ sex               &lt;fct&gt; male, female, female, NA, female, male, female, male…
$ year              &lt;int&gt; 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007…</code></pre>
</div>
</div>
</section>
<section id="split-data" class="level1">
<h1>Split Data</h1>
<p>If you’re not familiar with this dataset, it contains data on a few hundred penguins, and you can learn more <a href="https://allisonhorst.github.io/palmerpenguins/">here</a>. One feature of this dataset is that it has data on three different species of penguins: Adelie, Gentoo, and Chinstrap. So, let’s imagine we wanted to provide separate reports for each species of penguin. To do this, let’s first divide our data up into separate dataframes to emulate a potential workflow of creating parameterized reports.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">split_df <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">split</span>(penguins, penguins<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>species)</span>
<span id="cb3-2"></span>
<span id="cb3-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">str</span>(split_df)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>List of 3
 $ Adelie   : tibble [152 × 8] (S3: tbl_df/tbl/data.frame)
  ..$ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..$ island           : Factor w/ 3 levels "Biscoe","Dream",..: 3 3 3 3 3 3 3 3 3 3 ...
  ..$ bill_length_mm   : num [1:152] 39.1 39.5 40.3 NA 36.7 39.3 38.9 39.2 34.1 42 ...
  ..$ bill_depth_mm    : num [1:152] 18.7 17.4 18 NA 19.3 20.6 17.8 19.6 18.1 20.2 ...
  ..$ flipper_length_mm: int [1:152] 181 186 195 NA 193 190 181 195 193 190 ...
  ..$ body_mass_g      : int [1:152] 3750 3800 3250 NA 3450 3650 3625 4675 3475 4250 ...
  ..$ sex              : Factor w/ 2 levels "female","male": 2 1 1 NA 1 2 1 2 NA NA ...
  ..$ year             : int [1:152] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
 $ Chinstrap: tibble [68 × 8] (S3: tbl_df/tbl/data.frame)
  ..$ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 2 2 2 2 2 2 2 2 2 2 ...
  ..$ island           : Factor w/ 3 levels "Biscoe","Dream",..: 2 2 2 2 2 2 2 2 2 2 ...
  ..$ bill_length_mm   : num [1:68] 46.5 50 51.3 45.4 52.7 45.2 46.1 51.3 46 51.3 ...
  ..$ bill_depth_mm    : num [1:68] 17.9 19.5 19.2 18.7 19.8 17.8 18.2 18.2 18.9 19.9 ...
  ..$ flipper_length_mm: int [1:68] 192 196 193 188 197 198 178 197 195 198 ...
  ..$ body_mass_g      : int [1:68] 3500 3900 3650 3525 3725 3950 3250 3750 4150 3700 ...
  ..$ sex              : Factor w/ 2 levels "female","male": 1 2 2 1 2 1 1 2 1 2 ...
  ..$ year             : int [1:68] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...
 $ Gentoo   : tibble [124 × 8] (S3: tbl_df/tbl/data.frame)
  ..$ species          : Factor w/ 3 levels "Adelie","Chinstrap",..: 3 3 3 3 3 3 3 3 3 3 ...
  ..$ island           : Factor w/ 3 levels "Biscoe","Dream",..: 1 1 1 1 1 1 1 1 1 1 ...
  ..$ bill_length_mm   : num [1:124] 46.1 50 48.7 50 47.6 46.5 45.4 46.7 43.3 46.8 ...
  ..$ bill_depth_mm    : num [1:124] 13.2 16.3 14.1 15.2 14.5 13.5 14.6 15.3 13.4 15.4 ...
  ..$ flipper_length_mm: int [1:124] 211 230 210 218 215 210 211 219 209 215 ...
  ..$ body_mass_g      : int [1:124] 4500 5700 4450 5700 5400 4550 4800 5200 4400 5150 ...
  ..$ sex              : Factor w/ 2 levels "female","male": 1 2 1 2 2 1 1 2 1 2 ...
  ..$ year             : int [1:124] 2007 2007 2007 2007 2007 2007 2007 2007 2007 2007 ...</code></pre>
</div>
</div>
<p>Great, so now we have a separate dataframe for each penguin species.</p>
</section>
<section id="create-a-plot" class="level1">
<h1>Create a Plot</h1>
<p>Now, imagine we want to create a ggplot to include in each of our reports. To illustrate this, let’s just do a scatterplot of flipper length by bill length for Adelie penguins. This is a fairly basic scatterplot, but it works for our purposes.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]], <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> flipper_length_mm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> bill_length_mm)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb5-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Adelie"</span></span>
<span id="cb5-5">  )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/robustly-create-parameterized-reports/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>And we can do the same thing for Chinstrap penguins:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]], <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> flipper_length_mm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> bill_length_mm)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb6-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb6-4">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Chinstrap"</span></span>
<span id="cb6-5">  )</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/robustly-create-parameterized-reports/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>Good so far, right? Our next step might be to define a function to make this plot. Although this is a fairly basic plot, writing a function still saves us a little bit of typing, and it will prove useful later when we need to wrap it, so let’s go ahead and write our function.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1">make_penguin_plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(df) {</span>
<span id="cb7-2">  </span>
<span id="cb7-3">  title <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>species))</span>
<span id="cb7-4">  </span>
<span id="cb7-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(df, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> flipper_length_mm, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> bill_length_mm)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb7-7">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb7-8">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> title</span>
<span id="cb7-9">    )</span>
<span id="cb7-10">}</span>
<span id="cb7-11"></span>
<span id="cb7-12"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_penguin_plot</span>(split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]])</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/robustly-create-parameterized-reports/index_files/figure-html/unnamed-chunk-4-1.png" class="img-fluid" width="672"></p>
</div>
</div>
</section>
<section id="so-what-if-something-goes-wrong" class="level1">
<h1>So what if something goes wrong?</h1>
<p>Let’s imagine, now, that we don’t have measurements of Gentoo penguins’ bill lengths. Maybe the bill-length-measuring machine was broken on the one day we were going to take their measurements. Or maybe all of them were especially bite-y and wouldn’t let us measure their bills (I have no clue if penguins actually bite).</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb8" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb8-1"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#dropping the bill_length measurement from the Gentoo data</span></span>
<span id="cb8-2">split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]] <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]] <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb8-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>bill_length_mm)</span></code></pre></div>
</div>
<p>Now what happens when we try to make our penguin plot?</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_penguin_plot</span>(split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]])</span>
<span id="cb9-2"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#produces message 'Error: Column `bill_length_mm` not found in `.data`'</span></span></code></pre></div>
</div>
<p>Oh no! An error! In this contrived example here, I’m just showing you the error message produced, but if you’re actually rendering a report, this will stop your report from rendering, which isn’t great.</p>
<p><img src="https://www.ericekholm.com/posts/robustly-create-parameterized-reports/https:/i.kym-cdn.com/entries/icons/original/000/024/027/blog_image_3822_4926_Webcomic_Name_April_Fools_Day_201703231756.jpg" class="img-fluid"></p>
<p>If this happens to you, you have a few options. You might create a separate report template just for Gentoo penguins, although this doesn’t seem ideal, because it defeats the point of having a template if you need to make separate ones every time an exception pops up. You could drop this metric from your main report template if the data seems problematic (which is a good thing to investigate). You could potentially use <code>purrr::possibly()</code> or <code>purrr::safely()</code> if you have a default value you want to use.</p>
<p>Another option is to write your own little wrapper to make your function “safe”, which I’ll show below.</p>
</section>
<section id="wrap-that-function" class="level1">
<h1>Wrap that function!</h1>
<p>The best part here is that this is, in the very specific case, it’s fairly straightforward. I’m just going to check the names of the variables in the data I’m passing into the function to see if flipper length and bill length are present in the data, then execute <code>make_penguin_plot()</code> if they are and print out “Whoops!” if they’re not.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">safe_penguin_plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(df) {</span>
<span id="cb10-2">  </span>
<span id="cb10-3"> nms <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(df)</span>
<span id="cb10-4"> </span>
<span id="cb10-5"> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"flipper_length_mm"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"bill_length_mm"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> nms)) {</span>
<span id="cb10-6">   <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">make_penguin_plot</span>(df)</span>
<span id="cb10-7"> } <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Whoops! Looks like you don't have all of the data you need for this plot!"</span>)</span>
<span id="cb10-8">}</span>
<span id="cb10-9"></span>
<span id="cb10-10"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_penguin_plot</span>(split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]])</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/robustly-create-parameterized-reports/index_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb11" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb11-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_penguin_plot</span>(split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]])</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "Whoops! Looks like you don't have all of the data you need for this plot!"</code></pre>
</div>
</div>
</section>
<section id="generalize-your-functions-using-rlang" class="level1">
<h1>Generalize your functions using {rlang}</h1>
<p>You can also imagine generalizing this to accept other variables. This requires some quoting/unquoting and diving into <code>{rlang}</code>, which is something I’ve been trying to learn lately:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb13" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb13-1">gen_penguin_plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(df, xvar, yvar) {</span>
<span id="cb13-2">  </span>
<span id="cb13-3">  title <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">as.character</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(df<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>species))</span>
<span id="cb13-4">  </span>
<span id="cb13-5">  x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enexpr</span>(xvar) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#captures xvar as an expression</span></span>
<span id="cb13-6">  y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enexpr</span>(yvar) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#captures yvar as an expression</span></span>
<span id="cb13-7">  </span>
<span id="cb13-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(df, <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!!</span>x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!!</span>y)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-9">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb13-10">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb13-11">      <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">title =</span> title</span>
<span id="cb13-12">    )</span>
<span id="cb13-13">}</span>
<span id="cb13-14"></span>
<span id="cb13-15"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gen_penguin_plot</span>(split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]], bill_length_mm, bill_depth_mm)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/robustly-create-parameterized-reports/index_files/figure-html/unnamed-chunk-8-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>There’s a little bit more work in making the more generalized version “safe” that has to do with handling the quoted expressions/environments, especially since we’re passing them into another function:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb14" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb14-1">safe_gen_plot <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(df, xvar, yvar) {</span>
<span id="cb14-2">  </span>
<span id="cb14-3">   nms <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">names</span>(df)</span>
<span id="cb14-4">   </span>
<span id="cb14-5">   vec <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">deparse</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enexpr</span>(xvar)), <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">deparse</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enexpr</span>(yvar)))</span>
<span id="cb14-6">   </span>
<span id="cb14-7">   x <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enquo</span>(xvar)</span>
<span id="cb14-8">   y <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">enquo</span>(yvar)</span>
<span id="cb14-9">   </span>
<span id="cb14-10">    <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">if</span> (<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">all</span>(vec <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> nms)) {</span>
<span id="cb14-11">   <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">gen_penguin_plot</span>(df, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">xvar =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!!</span>x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">yvar =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!!</span>y)</span>
<span id="cb14-12"> } <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">else</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Whoops! Looks like you don't have all of the data you need for this plot!"</span>)</span>
<span id="cb14-13">  </span>
<span id="cb14-14">}</span>
<span id="cb14-15"></span>
<span id="cb14-16"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_gen_plot</span>(split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>]], bill_length_mm, bill_depth_mm)</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/robustly-create-parameterized-reports/index_files/figure-html/unnamed-chunk-9-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<div class="cell">
<div class="sourceCode cell-code" id="cb15" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb15-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">safe_gen_plot</span>(split_df[[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>]], bill_length_mm, bill_depth_mm)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code>[1] "Whoops! Looks like you don't have all of the data you need for this plot!"</code></pre>
</div>
</div>
</section>
<section id="conclusion" class="level1">
<h1>Conclusion</h1>
<p>By implementing a simple (or less simple, depending on how generalizable you want your function to be) wrapper function, we can replace errors with a message to be displayed when rendering a report. I can’t emphasize how much time this approach has saved me when creating parameterized reports, especially since our data has gotten so wonky due to COVID and this provides a flexible way to handle all of this craziness.</p>
<p>Hope this helps others who might be in similar positions!</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2021,
  author = {Ekholm, Eric},
  title = {Robustly {Create} {Parameterized} {Reports}},
  date = {2021-07-02},
  url = {https://www.ericekholm.com/posts/robustly-create-parameterized-reports},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2021" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2021. <span>“Robustly Create Parameterized
Reports.”</span> July 2, 2021. <a href="https://www.ericekholm.com/posts/robustly-create-parameterized-reports">https://www.ericekholm.com/posts/robustly-create-parameterized-reports</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>tutorial</category>
  <category>errors</category>
  <category>programming</category>
  <guid>https://www.ericekholm.com/posts/robustly-create-parameterized-reports/index.html</guid>
  <pubDate>Fri, 02 Jul 2021 04:00:00 GMT</pubDate>
</item>
<item>
  <title>It’s-a Me, Linear Regression</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/its-a-me-linear-regression/index.html</link>
  <description><![CDATA[ 




<p>I’ve been sort of out of the #TidyTuesday game for a while, but this week’s dataset on Mario Kart world records called to me. I have tons of fond memories from late elementary school and into middle school playing Mario Kart 64 with kids in my neighborhood. I certainly wasn’t world-record-caliber good (I was like 10ish), but I do remember learning little tricks like how to get the speed boost on the 3-2-1 go! countdown or how to power-slide through turns.</p>
<p>Anyway, when I initially looked at the dataset, I thought I’d approach it by trying to fit a model to predict whether or not a driver took advantage of a shortcut or not when they set a record, but alas, I waited until later in the week and got scooped by Julia Silge (check out her analysis <a href="https://juliasilge.com/blog/mario-kart/">here</a>). Which is probably for the best, because she did a better job than I would have.</p>
<p>That said, when I dug into the data, I did stumble across some interesting patterns in the progression of records over time, so I want to show how we can model these progressions using some simple feature engineering and a relatively straightforward mixed-effects model.</p>
<section id="setup" class="level1">
<h1>Setup</h1>
<p>First, we’ll load our packages, set some global options, and get our data.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(tidyverse)</span>
<span id="cb1-2"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(eemisc)</span>
<span id="cb1-3"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(harrypotter) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#colors</span></span>
<span id="cb1-4"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(nlme) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#for mixed-effects models</span></span>
<span id="cb1-5"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">library</span>(broom.mixed) <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">#functions for tidying mixed effects models</span></span>
<span id="cb1-6"></span>
<span id="cb1-7">herm <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span>
<span id="cb1-8">herm2 <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)[<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>]</span>
<span id="cb1-9"></span>
<span id="cb1-10">opts <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">options</span>(</span>
<span id="cb1-11">  <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">ggplot2.discrete.fill =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">list</span>(</span>
<span id="cb1-12">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>),</span>
<span id="cb1-13">    harrypotter<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">hp</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">n =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">7</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">option =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"HermioneGranger"</span>)</span>
<span id="cb1-14">  )</span>
<span id="cb1-15">)</span>
<span id="cb1-16"></span>
<span id="cb1-17"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_set</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_ee</span>())</span>
<span id="cb1-18"></span>
<span id="cb1-19">records <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> readr<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">read_csv</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-05-25/records.csv'</span>)</span></code></pre></div>
</div>
</section>
<section id="explore-data" class="level1">
<h1>Explore Data</h1>
<p>So, what I’m interested in here is how the world record for each track progresses over time. To make sure all of our comparisons are “apples to apples,” I’m going to limit this to single-lap, no-shortcut records.</p>
<p>Let’s randomly choose 4 tracks and look at these records over time.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">set.seed</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0408</span>)</span>
<span id="cb2-2">samp_tracks <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">sample</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">unique</span>(records<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">$</span>track), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">size =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>)</span>
<span id="cb2-3"></span>
<span id="cb2-4"></span>
<span id="cb2-5">records <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(track <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> samp_tracks,</span>
<span id="cb2-7">         type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Single Lap"</span>,</span>
<span id="cb2-8">         shortcut <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"No"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb2-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> date, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> time, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> track)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb2-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(track))</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/its-a-me-linear-regression/index_files/figure-html/unnamed-chunk-1-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>It’s a little bit hard to tell what’s going on here, especially since the tracks are different lengths and seem to have different record asymptotes. Another issue is that record-setting seems to be clustered by dates. A lot of records are set in a cluster, then there’s a drought for several years where the record isn’t broken. In some analyses this may be meaningful, but I care less about the actual <em>date</em> a record was set on and more about where it is in the sequence of records for that track. So, it might be more straightforward to just assign a running count of records for each track:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">records <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb3-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(track <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%in%</span> samp_tracks,</span>
<span id="cb3-3">         type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Single Lap"</span>,</span>
<span id="cb3-4">         shortcut <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"No"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb3-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(track) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb3-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">record_num =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb3-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb3-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> record_num, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> time, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> track)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>() <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb3-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(track))</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/its-a-me-linear-regression/index_files/figure-html/unnamed-chunk-2-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>This puts all of our X-axes on the same scale, and it also removes a lot of the white space where we had no records being broken (again, this information might be useful for a different analysis).</p>
<p>We might also want to consider how we’re representing the lap time here. Each track is a different length, and each track has its own unique obstacles. We can see here that Wario Stadium is a much longer track than, say, Sherbet Land. By extension, a 1 second decrease in time on Sherbet Land means a lot more than a 1 second decrease in time on Wario Stadium.</p>
<p>Standardizing our measure of time – and our measure of improvement over time – will help us out here. What I’m going to do is, for each record (and specific to each track), calculate how much better (as a percent) it was than the <em>first</em> world record on that track. This will give us a standard way to compare the progress of each world record across all of the tracks.</p>
<p>Let’s graph this to see what they look like.</p>
<div class="cell" data-preview="true">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1">records_scaled <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> records <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(type <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Single Lap"</span>,</span>
<span id="cb4-3">         shortcut <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"No"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">group_by</span>(track) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">mutate</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">init_wr =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">max</span>(time),</span>
<span id="cb4-6">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">pct_better =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> time<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>init_wr,</span>
<span id="cb4-7">         <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">record_num =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">row_number</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ungroup</span>()</span>
<span id="cb4-9"></span>
<span id="cb4-10">records_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb4-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> record_num, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> pct_better)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-13">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-14">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb4-15">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Record Number"</span>,</span>
<span id="cb4-16">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Pct Improvement over Initial"</span></span>
<span id="cb4-17">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-18">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">percent_format</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-19">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(track)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb4-20">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/its-a-me-linear-regression/index_files/figure-html/pct-better-plot-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>These are a lot easier to compare, and we can see a pretty definitive pattern across all of the tracks. There are some sharp improvements in the early record-setting runs, but then these improvements attenuate over time, and records are getting only a teeny bit faster each time they’re broken.</p>
<p>And this is sort of what we’d expect, particularly given a closed system like Mario Kart 64. The game isn’t changing, and people will hit a threshold in terms of how much better they can get, so it makes sense that these records are flattening.</p>
<p>Another interesting feature of the above graphs is that they (strongly) resemble logarithmic curves. We can plot these below to illustrate the similarity:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb5-1">records_scaled <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb5-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> record_num, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> pct_better)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-4">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_line</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(track)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-6">    <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">labs</span>(</span>
<span id="cb5-7">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Record Number"</span>,</span>
<span id="cb5-8">    <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Pct Improvement over Initial"</span></span>
<span id="cb5-9">  ) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-10">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">scale_y_continuous</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">labels =</span> scales<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">::</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">percent_format</span>()) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-11">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">stat_function</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fun =</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-style: inherit;">function</span>(x) .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log</span>(x, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">base =</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>), <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">geom =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"line"</span>, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm2) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb5-12">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/its-a-me-linear-regression/index_files/figure-html/unnamed-chunk-3-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>We can see that the general shape of the logarithmic curve matches the general shape of the track records here. I multiplied the curves by an arbitrary constant just to plot them, so of course we don’t expect them to match perfectly. That said, this does give us a clue that, given these feature engineering choices, we can model the data using a logarithmic curve.</p>
</section>
<section id="building-a-model" class="level1">
<h1>Building A Model</h1>
<p>There are a few paths we could take moving forward to model the data. Two in particular stand out to me:</p>
<ol type="1">
<li>We could fit a model separately for each track, where <code>percent_better</code> is regressed on the log of <code>record_num</code>.</li>
<li>We could fit a multilevel model where <code>percent_better</code> is regressed on the log of <code>record_num</code> and specify a random intercept and a random slope by track. This is the option I’m going to take.</li>
</ol>
<p>To give a very quick and insufficient nod to multilevel models (MLMs), they are useful for modeling clustered data (data where there are known dependencies between observations). The prototypical example of this is looking at people over time. Imagine you have a dataset of 1,000 observations where each of the 50 people in the dataset contributes 20 observations. When modeling this, you’d want to account for the fact that within-person observations are not independent. The one I encounter a lot in education is students clustered within classrooms. Different application, but same principle. For more on MLMs, <a href="https://www.amazon.com/Hierarchical-Linear-Models-Applications-Quantitative/dp/076191904X">Raudenbush &amp; Bryk (2002)</a> is a great resource, as is John Fox’s <a href="https://www.amazon.com/Applied-Regression-Analysis-Generalized-Linear-dp-1452205663/dp/1452205663/ref=dp_ob_title_bk">Applied Regression Analysis</a>, which as a chapter on MLMs. My friend and colleague Mike Broda also has made public some content from his <a href="https://rpubs.com/mdbroda">multilevel modeling (and multivariate statistics) course</a> in R as well.</p>
<p>Anyway, moving along! We basically have clustered data here: records are clustered within (read: dependent upon) each track. What an MLM allows us to do is fit a single model to the entire dataset while also allowing some of the parameters to vary by track. More specifically, we can allow the intercept to vary (which we don’t actually need here, since we’ve standardized our intercepts, but it’s just as easy to allow it), and we can allow the slope to vary. Varying the slope will let us estimate a different progression of world record runs for each track, which we can see that we need from the plots above.</p>
<p>I’m using the <code>{nlme}</code> package to fit this model. And the model I’m fitting can be read as follows:</p>
<ul>
<li>I want a “regular” fixed-effects model where <code>pct_better</code> is regressed on the log of <code>record_num</code>.</li>
<li>I also want to allow the coefficient of the log of <code>record_num</code> to vary depending on which track the record was set on.</li>
</ul>
<div class="cell">
<div class="sourceCode cell-code" id="cb6" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb6-1">mod <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">lme</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">fixed =</span> pct_better <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log10</span>(record_num),</span>
<span id="cb6-2">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">random =</span> <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">~</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">log10</span>(record_num) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">|</span> track,</span>
<span id="cb6-3">           <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">data =</span> records_scaled)</span></code></pre></div>
</div>
<p>And let’s take a look at the results (thanks to <code>{broom.mixed}</code>):</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb7" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb7-1"><span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">tidy</span>(mod)</span></code></pre></div>
<div class="cell-output cell-output-stdout">
<pre><code># A tibble: 6 × 8
  effect   group    term            estimate std.error    df statistic   p.value
  &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;              &lt;dbl&gt;     &lt;dbl&gt; &lt;dbl&gt;     &lt;dbl&gt;     &lt;dbl&gt;
1 fixed    &lt;NA&gt;     (Intercept)      0.0162    0.00505   608      3.20  1.44e- 3
2 fixed    &lt;NA&gt;     log10(record_n…  0.0518    0.00396   608     13.1   1.46e-34
3 ran_pars track    sd_(Intercept)   0.0199   NA          NA     NA    NA       
4 ran_pars track    cor_log10(reco…  0.768    NA          NA     NA    NA       
5 ran_pars track    sd_log10(recor…  0.0156   NA          NA     NA    NA       
6 ran_pars Residual sd_Observation   0.00651  NA          NA     NA    NA       </code></pre>
</div>
</div>
<p>For what it’s worth, we see that both of the fixed effects are statistically significant. For reasons that I don’t quite remember off the top of my head, there’s some debate about doing hypothesis tests on random effects, so these aren’t included here (I think other packages will run these tests if you really want them). The main thing I focus on here, though, is that there’s a seemingly non-negligible amount of variance in the coefficient for <code>record_num</code> (see the term <code>sd_log10(recordnum)</code>). The mean coefficient is .051, and the standard deviation of the coefficient values is .015, which seems meaningful to me.</p>
</section>
<section id="plotting-our-results" class="level1">
<h1>Plotting Our Results</h1>
<p>To get a better sense of what this model is doing, as well as to graphically examine how well it does, we can use the <code>augment()</code> function from <code>{broom.mixed}</code>. Let’s plot our fitted values against our actual <code>pct_better</code> values.:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb9" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb9-1">aug <span class="ot" style="color: #003B4F;
background-color: null;
font-style: inherit;">&lt;-</span> mod <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb9-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">augment</span>()</span>
<span id="cb9-3"></span>
<span id="cb9-4">aug <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb9-5">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">filter</span>(pct_better <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">!=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb9-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> pct_better, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> .fitted)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> herm) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb9-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/its-a-me-linear-regression/index_files/figure-html/unnamed-chunk-6-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>We’re expecting an a-b line here, so this is good.</p>
<p>Finally, though, what if we plot our actual values and our fitted values against <code>record_num</code> to see how well our model predictions compare to the real values, and let’s look at this by track:</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb10" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb10-1">aug <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-2">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">select</span>(track, .fitted, pct_better, record_num) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-3">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">pivot_longer</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">cols =</span> <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">c</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">".fitted"</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"pct_better"</span>),</span>
<span id="cb10-4">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">names_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"type"</span>,</span>
<span id="cb10-5">               <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">values_to =</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"val"</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">%&gt;%</span></span>
<span id="cb10-6">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">ggplot</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">aes</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">x =</span> record_num, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">y =</span> val, <span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">color =</span> type)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-7">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">geom_point</span>(<span class="at" style="color: #657422;
background-color: null;
font-style: inherit;">alpha =</span> .<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">4</span>) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-8">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">facet_wrap</span>(<span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">vars</span>(track)) <span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span></span>
<span id="cb10-9">  <span class="fu" style="color: #4758AB;
background-color: null;
font-style: inherit;">theme_minimal</span>()</span></code></pre></div>
<div class="cell-output-display">
<p><img src="https://www.ericekholm.com/posts/its-a-me-linear-regression/index_files/figure-html/unnamed-chunk-7-1.png" class="img-fluid" width="672"></p>
</div>
</div>
<p>The fitted values pretty much entirely overlap with the actual values (with the exception of Yoshi Valley and maybe sort of DK Jungle Parkway), which means we have a pretty solid model, here!</p>
<p>If we wanted, it would be fairly straightforward to translate these predictions back into seconds, but I’m going to call it done right here. Hopefully this post illustrates that with a little bit of feature engineering, you can build a really good model without having to load <code>{xgboost}</code> or <code>{keras}</code>. And hopefully it encourages people to dig more into MLMs!</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2021,
  author = {Ekholm, Eric},
  title = {It’s-a {Me,} {Linear} {Regression}},
  date = {2021-05-30},
  url = {https://www.ericekholm.com/posts/its-a-me-linear-regression},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2021" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2021. <span>“It’s-a Me, Linear Regression.”</span> May 30,
2021. <a href="https://www.ericekholm.com/posts/its-a-me-linear-regression">https://www.ericekholm.com/posts/its-a-me-linear-regression</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>tutorial</category>
  <category>regression</category>
  <category>Mario Kart</category>
  <guid>https://www.ericekholm.com/posts/its-a-me-linear-regression/index.html</guid>
  <pubDate>Sun, 30 May 2021 04:00:00 GMT</pubDate>
</item>
<item>
  <title>Publishing Rmarkdown to Google Sites</title>
  <dc:creator>Eric Ekholm</dc:creator>
  <link>https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites/index.html</link>
  <description><![CDATA[ 




<p>This post – and the digging behind it – was inspired by <a href="https://twitter.com/allison_horst">Allison Horst</a> and <a href="https://twitter.com/skyetetra">Jacqueline Nolis’s</a> recent-ish <a href="https://jnolis.com/blog/training_ds_for_teams/">blog post</a> about conflicts between data scientists and the teams they’ may be’re joining/supporting. It’s a great post, and I’d recommend it to anyone, whether you’re looking for a job, new to a job, or relatively senior in your organization. When I read it, the post struck a chord with me and led me to do some introspection about how much change I should be initiating/pushing for as a new employee in my organization vs how much I should be adapting to established workflows.</p>
<p>Basically, this was my response:</p>
<p><img src="https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites/https:/i.kym-cdn.com/entries/icons/facebook/000/029/191/cover6.jpg" class="img-fluid"></p>
<section id="sharing-html-a-sticking-point" class="level1">
<h1>Sharing HTML: A Sticking Point</h1>
<p>One specific issue that came to mind was sharing html documents. In short, when I joined the organization I work for (Chesterfield County Public Schools; CCPS), I started producing reports/products as html documents – because I like the flexibility and richness of html – which was not common in the organization, and so we went around and around on how best to share these files. We threw around several different ideas, ranging from emailing them as attachments to saving them on the network to exploring RStudio Connect. And, for various reasons, nothing we tried or proposed felt like the right solution.</p>
</section>
<section id="google-sites-to-the-rescue-sort-of" class="level1">
<h1>Google Sites to the Rescue (Sort of)</h1>
<p>That said, CCPS widely uses Google Sites. The vast majority of the people who work in CCPS aren’t web developers or analysts; it is, after all, a school division and not a tech company. And Google Sites provides a platform for all sorts of people in the organization to create good-looking sites, so from that perspective, it makes sense that we use Google Sites widely so that teachers, principals, and other non-technical folks can create websites. Given that, I wanted to see if there was a way to take the html reports and products I was making and host them on Google Sites, which would make my work much more integrated into established practices.</p>
<p>And so after a lot of Googling and my hopes being dashed by a dramatic change sometime in the past 3-4 years from “classic Google Sites” to “new Google Sites,” I finally arrived at a solution that, although not ideal, seems to do generally what I want it to, which is what I’ll share below.</p>
</section>
<section id="rendering-rmd-content-to-google-sites" class="level1">
<h1>Rendering Rmd Content to Google Sites</h1>
<p>The rest of this post will walk through how to create an html report using Rmarkdown and then “publish” it to Google Sites. To illustrate this, I’m going to use the README for Allison Horst’s <a href="https://allisonhorst.github.io/palmerpenguins/"><code>{palmerpenguins}</code> package</a> as the example report to publish.</p>
<section id="step-1-create-your-report" class="level2">
<h2 class="anchored" data-anchor-id="step-1-create-your-report">Step 1: Create Your Report</h2>
<p>Maybe obviously, the first thing you want to do is write and knit your report as an html file. Again, this can be any content you want, but I’m using the <code>{palmerpenguins}</code> README here. Another note is that, since we’re using html, we can include features like <code>{reactable}</code> tables or <code>{leaflet}</code> maps if we want, although those aren’t included here. We could also knit this as a <code>{distill}</code> article or add whatever styling/css we want.</p>
</section>
<section id="step-2-create-a-google-site" class="level2">
<h2 class="anchored" data-anchor-id="step-2-create-a-google-site">Step 2: Create A Google Site</h2>
<p>Once you have your report created, you can create the Google Site in Google Drive, like so:</p>
<p><img src="https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites/img/google_site_drop.png" class="img-fluid"></p>
<p>Or you can navigate to a site you already own/can edit.</p>
<p><em>n.b.&nbsp;that I’m not really going to get into the weeds of working in Google Sites because that’s not really the point of this post, plus I’m not an expert myself, although basic usage is fairly straightforward.</em></p>
</section>
<section id="step-3-create-a-page-in-google-sites-to-house-your-report" class="level2">
<h2 class="anchored" data-anchor-id="step-3-create-a-page-in-google-sites-to-house-your-report">Step 3: Create a Page in Google Sites to House Your Report</h2>
<p>The details of what this means will depend on the layout of your site, but essentially you want to create a page that can house the report you just knit. You might end up with something that looks like this – a basic banner and then a blank body:</p>
<p><img src="https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites/img/google_page_blank.png" class="img-fluid"></p>
</section>
<section id="step-4-view-the-source-code-of-your-html-report" class="level2">
<h2 class="anchored" data-anchor-id="step-4-view-the-source-code-of-your-html-report">Step 4: View the Source Code of Your HTML Report</h2>
<p>Next, you want to get to the source code of the html file you created. You can do this by either opening the file with a browser and then inspecting the page source (the exact process for doing this will depend on which browser you use), or you can open the html file in RStudio (or another text editor).</p>
<p>Once you’re there, <em>select and copy all of the source code</em></p>
</section>
<section id="step-5-embed-the-source-code-in-your-google-sites-page" class="level2">
<h2 class="anchored" data-anchor-id="step-5-embed-the-source-code-in-your-google-sites-page">Step 5: Embed the Source Code in Your Google Sites Page</h2>
<p>Returning to the page where we want this report to live on our Google Site, we want to select the “Embed” option from the “Insert” menu on the right-hand side. You can also double left-click a blank part of your web page to have options pop up (&amp; you can select “Embed” from there).</p>
<p>Once we click the “Embed” button, a dialogue box pops up:</p>
<p><img src="https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites/img/embed_box.png" class="img-fluid"></p>
<p>And we want to select the “Embed code” option. Once we’re here, we want to <em>paste in all of the html source code from our report file.</em> Then, click “Next,” and “Insert”</p>
<p><strong>Don’t stress if you see a box that says “trouble loading embed. Reload to try again.” (see below) The content should load once you publish your site.</strong></p>
<p><img src="https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites/img/embed_trouble.jpg" class="img-fluid"></p>
<p>You also probably want to resize &amp; reposition the embedded content at this point. As far as I can tell, this is only possible via dragging, and there aren’t parameters you can set anywhere in the site to ensure consistent alignment (although there are grid lines to guide you).</p>
</section>
<section id="step-6-publish-your-site" class="level2">
<h2 class="anchored" data-anchor-id="step-6-publish-your-site">Step 6: Publish Your Site</h2>
<p>Now we’re ready to publish our site. Click the “Publish” button in the upper-right of the screen and select the options you want in the popup box. If you’re publishing from a Google account that belongs to an organization, your options may be pre-specified. For instance, when I publish from my work account, the default option is to allow only internal users to see the site (which is what I want in most cases).</p>
<p>Once you’ve published your site, you can navigate to it and see the following:</p>
<p><img src="https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites/img/finished_site.png" class="img-fluid"></p>
<p>If you want to navigate to the example site I just created, you can find it <a href="https://sites.google.com/view/ekholme-test-penguins/home">here.</a></p>
</section>
</section>
<section id="closing-thoughts" class="level1">
<h1>Closing Thoughts</h1>
<p>I came to this process as a way to publish reports &amp; other data tools (e.g.&nbsp;<code>{crosstalk}</code> “dashboards”) I was creating through Google Sites, since that’s what my organization uses. The benefits of this approach are that I’m integrating my work into a tool that others in CCPS are familiar with, and I’m not being a nuisance and asking our technology department to stand up a new tool/platform that only 1 or 2 people would use to create content. And both of these feel like pretty big wins to me from an organizational perspective.</p>
<p>That said, there are obviously some drawbacks and situations when I wouldn’t do this. Publishing (and updating) anything is a fairly manual process since you’re copy/pasting html code, and as far as I can tell, there’s no way to automate this. So if you have reports/products that need to be updated daily, this might not be the best approach for you. Similarly, if you’re writing a bunch of parameterized reports and need a place to publish all of them, this process could get tedious very quickly. Overall, I think this publishing to Google Sites works best if you’re publishing static reports that don’t need to be updated too often (e.g.&nbsp;annual/semi annual progress reports; one-off project reports/special requests).</p>
<p>Anyway, hopefully this walkthrough/discussion helps someone who’s in a similar position, and thanks again to Allison Horst &amp; Jacqueline Nolis for the blog that inspired me to delve into this more.</p>


</section>

<div id="quarto-appendix" class="default"><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Reuse</h2><div id="quarto-reuse" class="quarto-appendix-contents"><div><a rel="license" href="https://creativecommons.org/licenses/by-nc/4.0/">https://creativecommons.org/licenses/by-nc/4.0/</a></div></div></section><section class="quarto-appendix-contents"><h2 class="anchored quarto-appendix-heading">Citation</h2><div><div class="quarto-appendix-secondary-label">BibTeX citation:</div><pre class="sourceCode code-with-copy quarto-appendix-bibtex"><code class="sourceCode bibtex">@online{ekholm2021,
  author = {Ekholm, Eric},
  title = {Publishing {Rmarkdown} to {Google} {Sites}},
  date = {2021-05-03},
  url = {https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites},
  langid = {en}
}
</code></pre><div class="quarto-appendix-secondary-label">For attribution, please cite this work as:</div><div id="ref-ekholm2021" class="csl-entry quarto-appendix-citeas">
Ekholm, Eric. 2021. <span>“Publishing Rmarkdown to Google Sites.”</span>
May 3, 2021. <a href="https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites">https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites</a>.
</div></div></section></div> ]]></description>
  <category>R</category>
  <category>tutorial</category>
  <category>Rmarkdown</category>
  <category>Google Sites</category>
  <guid>https://www.ericekholm.com/posts/publishing-rmarkdown-to-google-sites/index.html</guid>
  <pubDate>Mon, 03 May 2021 04:00:00 GMT</pubDate>
</item>
</channel>
</rss>
