31  Publication tables

For a thorough treatment of effective display tables in gt (Iannone et al. 2026), see Rich Iannone’s guide: https://rich-iannone.github.io/gt-effective-display-tables/.

This chapter builds publication-ready tables directly with the gt package. The recipes here are deliberately self-contained; once a dedicated table package matures, these tables will migrate to the planned hvtiRtables package and the call sites will collapse to a single helper.

31.1 When to use it

Open almost any clinical manuscript and the first table you meet is a baseline-characteristics summary, the one journals number “Table 1.” It describes who was in the study before any outcome is discussed: how old the patients were, how their risk factors split, and whether the comparison groups looked alike at the start. Reach for it whenever you need to convince a reader that two arms are comparable, or simply to put a face on the cohort a figure later summarises. The job is description, not inference. A good Table 1 lets a reader decide for themselves whether a later result is believable, because they can see what the groups were made of.

The layout is always the same: rows of characteristics, columns split by the grouping variable (treatment, exposure, sex). Continuous variables get a centre and a spread (mean and standard deviation, or median and interquartile range); categorical variables get a count and a percentage. We build the table here from hvtiRutilities::generate_survival_data(), which returns a richly labelled patient-level cohort with a natural grouping variable (sex), so the recipe runs end to end before you point it at your own data.

dta <- hvtiRutilities::generate_survival_data(n = 200, seed = 42)
str(dta[, c("sex", "age", "bmi", "gfr_bs", "diabetes")])
'data.frame':   200 obs. of  5 variables:
 $ sex     : Factor w/ 2 levels "Female","Male": 2 2 1 2 1 2 1 2 2 2 ...
  ..- attr(*, "label")= chr "Sex"
 $ age     : num  65.6 36.5 50.4 54.5 51.1 43.4 67.7 43.6 75.3 44.1 ...
  ..- attr(*, "label")= chr "Age at surgery (years)"
 $ bmi     : num  27 30.8 27.2 30.7 26.3 26.7 29.4 32 20.8 26.8 ...
  ..- attr(*, "label")= chr "Body mass index (kg/m2)"
 $ gfr_bs  : num  62 54.9 64.3 72.8 87 83.3 72.9 57.9 97.5 93.3 ...
  ..- attr(*, "label")= chr "Baseline eGFR (mL/min/1.73m2)"
 $ diabetes: Factor w/ 2 levels "No","Yes": 1 1 1 2 1 1 1 2 1 1 ...
  ..- attr(*, "label")= chr "Diabetes mellitus"

31.2 Build it

A Table 1 is assembled one variable type at a time, then joined and rendered. We summarise the continuous variables first, the categorical variable second, and hand the combined frame to gt() for formatting. Doing it in pieces keeps each summary readable and lets you check the numbers before they are dressed up.

31.2.1 Summarise numeric variables as mean ± SD

We summarise three continuous baseline variables (age, BMI, baseline eGFR) as mean and standard deviation within each sex group. Mean and SD are the conventional pair when a variable is roughly symmetric; for a skewed variable (cost, time on bypass) you would report median and interquartile range instead, since a mean dragged by a long tail misrepresents the typical patient.

num_summary <- dta |>
  group_by(sex) |>
  summarise(
    n        = dplyr::n(),
    age_mean = mean(age),     age_sd = sd(age),
    bmi_mean = mean(bmi),     bmi_sd = sd(bmi),
    gfr_mean = mean(gfr_bs),  gfr_sd = sd(gfr_bs),
    .groups  = "drop"
  )
num_summary
# A tibble: 2 × 8
  sex        n age_mean age_sd bmi_mean bmi_sd gfr_mean gfr_sd
  <fct>  <int>    <dbl>  <dbl>    <dbl>  <dbl>    <dbl>  <dbl>
1 Female    77     45.1   14.8     27.1   4.89     76.6   20.5
2 Male     123     44.3   14.5     26.6   4.67     75.9   18.7

31.2.2 Summarise a categorical variable as n (%)

Diabetes status is summarised as a count and within-group percentage. The percentage here is within each sex group, the column denominator, which is almost always what a reader expects: “what fraction of the women had diabetes,” not “what fraction of all diabetics were women.” Make the denominator explicit in your own head before you compute it, because the same count divided by a different base tells a different story.

cat_summary <- dta |>
  group_by(sex) |>
  summarise(
    diabetes_n   = sum(diabetes == "Yes"),
    diabetes_pct = 100 * mean(diabetes == "Yes"),
    .groups      = "drop"
  )
cat_summary
# A tibble: 2 × 3
  sex    diabetes_n diabetes_pct
  <fct>       <int>        <dbl>
1 Female         17         22.1
2 Male           38         30.9

31.2.3 Render with gt()

We join the two summaries and present the result with a header, relabelled columns, and consistent numeric formatting. Mean ± SD pairs are formatted to one decimal place; the diabetes percentage to one decimal place.

table1 <- num_summary |>
  left_join(cat_summary, by = "sex") |>
  gt(rowname_col = "sex") |>
  tab_header(
    title    = "Table 1. Baseline Characteristics by Sex",
    subtitle = "Simulated cohort, n = 200"
  ) |>
  cols_label(
    n            = "N",
    age_mean     = "Age (mean)",
    age_sd       = "Age (SD)",
    bmi_mean     = "BMI (mean)",
    bmi_sd       = "BMI (SD)",
    gfr_mean     = "eGFR (mean)",
    gfr_sd       = "eGFR (SD)",
    diabetes_n   = "Diabetes (n)",
    diabetes_pct = "Diabetes (%)"
  ) |>
  fmt_number(
    columns  = c(age_mean, age_sd, bmi_mean, bmi_sd,
                 gfr_mean, gfr_sd, diabetes_pct),
    decimals = 1
  ) |>
  tab_spanner(label = "Age (years)",  columns = c(age_mean, age_sd)) |>
  tab_spanner(label = "BMI (kg/m²)",  columns = c(bmi_mean, bmi_sd)) |>
  tab_spanner(label = "eGFR (mL/min)", columns = c(gfr_mean, gfr_sd))

table1
Table 1. Baseline Characteristics by Sex
Simulated cohort, n = 200
N
Age (years)
BMI (kg/m²)
eGFR (mL/min)
Diabetes (n) Diabetes (%)
Age (mean) Age (SD) BMI (mean) BMI (SD) eGFR (mean) eGFR (SD)
Female 77 45.1 14.8 27.1 4.9 76.6 20.5 17 22.1
Male 123 44.3 14.5 26.6 4.7 75.9 18.7 38 30.9

The table renders as static HTML in the book and as a publication-quality PDF or Word object when the document is rendered to those formats.

31.3 Pitfalls

  • Significance tests in Table 1. It is tempting to add a p-value column comparing the groups, and many journals still ask for one. In a randomised trial that column is close to meaningless: any baseline difference is by construction due to chance, so testing it asks whether randomisation worked, not whether the groups differ in a way that matters. In an observational cohort the test is at least answering a real question, but a small p-value on a tiny clinically irrelevant difference (a half-year age gap in 5,000 patients) misleads more than it informs. Describe the groups; let the reader judge balance from the numbers and the standardised differences.
  • Inconsistent rounding. Decide on decimal places per variable and hold to it down the whole column. Age to one decimal in one row and whole numbers in the next reads as carelessness and makes the eye work harder. fmt_number() with a fixed decimals keeps a column honest; the recipe above pins the continuous summaries and the percentage to one decimal each.
  • Ambiguous denominators for percentages. A percentage means nothing without its base. State whether it is computed within the column group, across the whole cohort, or among only the patients with a non-missing value, and be consistent. When missingness is common, report the n the percentage rests on so a reader does not assume a denominator you did not use.