Beer Markets

Beverages
Author

Ryan Horn

Published

March 2, 2026

Here is some analytics in relation to Beer Markets!

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 73115 Columns: 26
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): X_purchase_desc, brand, container, region, state, market, buyertyp...
dbl  (5): household, quantity, dollar_spent, beer_floz, price_floz
lgl  (6): promo, childrenUnder6, children6to17, microwave, dishwasher, singl...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# A tibble: 1 × 3
  avg_price avg_spent avg_volume
      <dbl>     <dbl>      <dbl>
1    0.0560      13.8       266.

These values summarize average price per ounce, spending, and purchase volume across all households.

# A tibble: 5 × 2
  brand             n
  <chr>         <int>
1 BUD_LIGHT     21592
2 MILLER_LITE   17159
3 COORS_LIGHT   13074
4 NATURAL_LIGHT 12616
5 BUSCH_LIGHT    8674

This shows which brands are purchased most frequently.

# A tibble: 2 × 3
  promo avg_price pct_diff
  <lgl>     <dbl>    <dbl>
1 FALSE    0.0568    NA   
2 TRUE     0.0527    -7.19

Promotional purchases are associated with a 7.2% reduction in price per ounce, indicating that promotions provide measurable cost savings to consumers.

beer_markets %>%
  group_by(brand) %>%
  summarise(
    avg_price = mean(price_floz, na.rm = TRUE),
    avg_volume = mean(beer_floz, na.rm = TRUE)
  ) %>%
  arrange(avg_price)
# A tibble: 5 × 3
  brand         avg_price avg_volume
  <chr>             <dbl>      <dbl>
1 NATURAL_LIGHT    0.0439       277.
2 BUSCH_LIGHT      0.0449       311.
3 MILLER_LITE      0.0597       264.
4 COORS_LIGHT      0.0606       257.
5 BUD_LIGHT        0.0616       249.

This compares average pricing and purchase volume across brands.

model_beer <- lm(price_floz ~ beer_floz + promo + income, 
                 data = beer_markets)

summary(model_beer)

Call:
lm(formula = price_floz ~ beer_floz + promo + income, data = beer_markets)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.055077 -0.008316  0.000170  0.007139  0.184644 

Coefficients:
                 Estimate Std. Error  t value Pr(>|t|)    
(Intercept)     6.548e-02  1.479e-04  442.685   <2e-16 ***
beer_floz      -2.697e-05  2.254e-07 -119.636   <2e-16 ***
promoTRUE      -2.974e-03  1.128e-04  -26.361   <2e-16 ***
income20-60k   -2.856e-03  1.478e-04  -19.324   <2e-16 ***
income200k+     7.556e-04  4.690e-04    1.611   0.1072    
income60-100k  -2.561e-04  1.547e-04   -1.655   0.0979 .  
incomeunder20k -4.196e-03  2.118e-04  -19.810   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.01209 on 73108 degrees of freedom
Multiple R-squared:  0.1877,    Adjusted R-squared:  0.1877 
F-statistic:  2816 on 6 and 73108 DF,  p-value: < 2.2e-16

The regression estimates how purchase size, promotions, and income are associated with price per ounce.

ggplot(beer_markets, aes(x = beer_floz, y = price_floz)) +
  geom_point(alpha = 0.2) +
  geom_smooth(method = "lm") +
  scale_x_log10() +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

After adjusting for extreme bulk purchases using a log scale, the negative relationship between volume and price becomes clearer, suggesting that consumers benefit from lower per-ounce pricing when buying larger quantities.