Skip to contents

The irishbuoys package provides tools to download, process, and analyze data from the Irish Weather Buoy Network. It includes functions for accessing real-time and historical data via the Marine Institute’s ERDDAP server, storing data in DuckDB for efficient querying, and building predictive models for wave height and weather conditions.

Installation

You can install the development version of irishbuoys from GitHub:

# install.packages("remotes")
remotes::install_github("johngavin/irishbuoys")

This project enforces Nix --pure mode to guarantee:

  • Reproducibility: Only Nix-provided tools are available
  • Security: No accidental use of system tools with different versions
  • Consistency: Same environment locally and in CI
Entering and verifying pure mode
# Recommended: Use default.sh (pure mode enforced automatically)
./default.sh
# Output shows: πŸ”’ SECURITY: Running in --pure mode

# Or manually with pure flag:
nix-shell --pure default.nix

# Verify pure mode
echo $IN_NIX_SHELL  # Expected: pure
which R             # Expected: /nix/store/...

Passing environment variables:

nix-shell --pure --keep GITHUB_TOKEN --keep MY_API_KEY default.nix

Using with rix

To integrate this package into your own Nix environment:

library(rix)

rix(
  r_ver = "4.5.2",  # Must match project's R version
  r_pkgs = c(
    # Core dependencies (from Imports)
    "arrow", "cli", "dbplyr", "DBI", "dplyr", "duckdb",
    "glue", "httr2", "jsonlite", "lubridate", "pointblank",
    "purrr", "rlang", "tibble",
    # Visualization (from Suggests)
    "plotly", "ggplot2", "dygraphs", "DT"
  ),
  git_pkgs = list(
    list(
      package_name = "irishbuoys",
      repo_url = "https://github.com/johngavin/irishbuoys",
      commit = "main"  # Use specific SHA for reproducibility
    )
  ),
  ide = "other",
  project_path = "."
)

Cachix Binary Cache (Faster Builds)

Use the pre-built R packages from rstats-on-nix Cachix cache for much faster builds:

# Install cachix (one-time setup)
nix-shell -p cachix --run "cachix use rstats-on-nix"

# Now nix-shell will download pre-built packages instead of compiling
cd irishbuoys
./default.sh  # Much faster with cache!

This project uses a two-tier Cachix strategy:

Priority Cache Contains
1st rstats-on-nix All standard R packages (public, pre-built)
2nd johngavin Project-specific custom packages only

Important: Standard R packages (dplyr, targets, etc.) are ALL available from rstats-on-nix. The johngavin cache is only for custom packages not in rstats-on-nix.

For irishbuoys: - All dependencies come from rstats-on-nix - The irishbuoys package itself is loaded via pkgload::load_all() (development mode) - Nothing needs to be pushed to johngavin cache

CI workflows automatically use both caches:

# In .github/workflows/r-cmd-check.yaml
- uses: cachix/cachix-action@v15
  with:
    name: rstats-on-nix  # Public cache FIRST

- uses: cachix/cachix-action@v15
  with:
    name: johngavin      # Project cache SECOND
    authToken: '${{ secrets.CACHIX_AUTH_TOKEN }}'
    skipPush: true       # Don't push during checks

Documentation

Comprehensive vignettes on the package website:

Static Dashboard - Interactive visualization of Irish Weather Buoy Network data.

  • Real-time station data: Max Wave, Signif Wave, and Wind Speed by station
  • Time series analysis: Hourly measurements with interactive plotly charts
  • Extreme event detection: Rogue wave identification (Max Wave > 2Γ— Signif Wave)

Wave Analysis - Scientific analysis of wave patterns and extreme value statistics.

  • Key metrics dashboard: Station statistics, date coverage, and data quality summary
  • Extreme value modeling: GEV/GPD fits for return period estimation
  • Joint distribution analysis: Cross-correlations, copula tail dependence

Interactive validation reports via pointblank:

Pipeline Telemetry - Pipeline metrics and performance tracking.

  • Data coverage metrics: Completeness by station and time period
  • Pipeline performance: Target execution times and dependencies

Quick Start

Download Recent Data

library(irishbuoys)

# Download last 7 days of data
# Show structure of first 3 rows for efficiency
data <- download_buoy_data(
  start_date = Sys.Date() - 7,
  end_date = Sys.Date()
)
str(head(data, 3))

# Get data for specific station
m3_data <- download_buoy_data(
  stations = "M3",
  start_date = "2024-01-01"
)
dim(m3_data)

# Get earliest available data (buoy network started 2001-02-05)
# Show structure of first 3 rows for efficiency
waves <- download_buoy_data(
  start_date = "2001-02-05",
  end_date = "2001-02-06"
)
str(head(waves, 3))

Initialize and Query Database

# Initialize database with historical data (chunk_days=365 for faster downloads)
initialize_database(
  start_date = "2024-01-01",  # Default: recent data for quick start
  end_date = Sys.Date(),
  chunk_days = 365  # Download in 1-year chunks for efficiency
)

# Check database statistics immediately after initialization
stats <- get_database_stats()

# Connect to database
con <- connect_duckdb()

# Check QC flag distribution by station FIRST
# qc_flag: 0=unknown, 1=good, 9=missing
qc_tally <- tbl(con, "buoy_data") |>
  group_by(station_id, qc_flag) |>
  summarise(n = n(), .groups = "drop") |>
  collect() |>
  tidyr::pivot_wider(names_from = qc_flag, values_from = n, names_prefix = "qc_")
print(qc_tally)

# Query wave data (qc_filter=FALSE returns all data; TRUE filters for qc_flag==1)
wave_data <- query_buoy_data(
  con,
  stations = c("M3", "M4"),
  variables = c("time", "station_id", "wave_height", "wave_period"),
  start_date = "2024-01-01",
  qc_filter = FALSE  # Set TRUE to filter for qc_flag==1 only
)

# Custom SQL query: Find top 10 most extreme rogue waves
# Ordered by hmax (highest individual wave) because "extreme" = largest waves
extreme_waves <- query_buoy_data(
  con,
  sql_query = "
    SELECT station_id, time, wave_height, hmax
    FROM buoy_data
    WHERE hmax > 2 * wave_height
      AND wave_height > 0
      AND qc_flag = 1
    ORDER BY hmax DESC
    LIMIT 10
  "
)

Tidyverse Alternative (duckplyr)

The same query using dplyr verbs with duckplyr backend:

# Tidyverse alternative using duckplyr
# Same query as SQL above, ordered by hmax (highest waves first)
library(dplyr)
library(duckplyr)

extreme_waves_tidy <- tbl(con, "buoy_data") |>
  filter(
    hmax > 2 * wave_height,
    wave_height > 0,
    qc_flag == 1
  ) |>
  select(station_id, time, wave_height, hmax) |>
  arrange(desc(hmax)) |>
  head(10) |>
  collect()

Why duckplyr? - Familiar tidyverse syntax - Lazy evaluation - query runs only on collect() - Automatic SQL translation for performance - Works with any DBI connection

# Don't forget to disconnect
DBI::dbDisconnect(con)

Incremental Updates

# Perform incremental update (for scheduled jobs)
result <- incremental_update()
print(result$summary)

Data Dictionary

# Get complete data dictionary (returns tibble)
dict <- get_data_dictionary()
print(dict)

# Get detailed documentation for specific variable
(wave_docs <- get_variable_docs("WaveHeight"))

# Merge dictionary with database column info
# Useful for creating documentation or understanding data
library(dplyr)
db_cols <- tibble(
  variable = c("wave_height", "hmax", "wind_speed", "gust"),
  db_column = c("wave_height", "hmax", "wind_speed", "gust")
)
dict |>
  filter(tolower(variable) %in% db_cols$variable |
         variable %in% c("WaveHeight", "Hmax", "WindSpeed", "Gust")) |>
  select(variable, units, description)

Data Source

Data is sourced from the Marine Institute’s ERDDAP server, which provides real-time and historical measurements from the Irish Weather Buoy Network.

Available Stations

  • M2: Southwest of Ireland
  • M3: Southwest of Ireland
  • M4: Southeast of Ireland
  • M5: West of Ireland
  • M6: Northwest of Ireland
  • M1: Historical data (decommissioned)
  • FS1: Historical data

Measured Parameters

  • Meteorological: Air temperature, pressure, humidity, wind speed/direction
  • Oceanographic: Wave height/period/direction, sea temperature, salinity
  • Quality: QC flags for data validation

Project Structure

Click to expand project tree
.
β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ LICENSE
β”œβ”€β”€ LICENSE.md
β”œβ”€β”€ NAMESPACE
β”œβ”€β”€ R
β”‚   β”œβ”€β”€ data_dictionary.R
β”‚   β”œβ”€β”€ database.R
β”‚   β”œβ”€β”€ database_parquet.R
β”‚   β”œβ”€β”€ dev
β”‚   β”‚   β”œβ”€β”€ check_data_gaps.R
β”‚   β”‚   β”œβ”€β”€ generate_dashboard_data.R
β”‚   β”‚   └── issues
β”‚   β”œβ”€β”€ email_summary.R
β”‚   β”œβ”€β”€ erddap_client.R
β”‚   β”œβ”€β”€ extreme_values.R
β”‚   β”œβ”€β”€ irishbuoys-package.R
β”‚   β”œβ”€β”€ joint_analysis.R
β”‚   β”œβ”€β”€ plot_functions.R
β”‚   β”œβ”€β”€ plotly_helpers.R
β”‚   β”œβ”€β”€ rogue_waves.R
β”‚   β”œβ”€β”€ tar_plans
β”‚   β”‚   β”œβ”€β”€ plan_dashboard.R
β”‚   β”‚   β”œβ”€β”€ plan_dashboard_captions.R
β”‚   β”‚   β”œβ”€β”€ plan_data_acquisition.R
β”‚   β”‚   β”œβ”€β”€ plan_doc_examples.R
β”‚   β”‚   β”œβ”€β”€ plan_evidence.R
β”‚   β”‚   β”œβ”€β”€ plan_joint_analysis.R
β”‚   β”‚   β”œβ”€β”€ plan_pkgctx.R
β”‚   β”‚   β”œβ”€β”€ plan_quality_control.R
β”‚   β”‚   β”œβ”€β”€ plan_telemetry.R
β”‚   β”‚   β”œβ”€β”€ plan_vignettes.R
β”‚   β”‚   └── plan_wave_analysis.R
β”‚   β”œβ”€β”€ trend_analysis.R
β”‚   β”œβ”€β”€ update.R
β”‚   β”œβ”€β”€ validation.R
β”‚   β”œβ”€β”€ wave_model.R
β”‚   └── wave_science.R
β”œβ”€β”€ README.md
β”œβ”€β”€ README.qmd
β”œβ”€β”€ README.rmarkdown
β”œβ”€β”€ _extensions
β”‚   └── quarto-ext
β”‚       └── shinylive
β”œβ”€β”€ _output
β”‚   β”œβ”€β”€ shinylive-sw.js
β”‚   └── vignettes
β”‚       β”œβ”€β”€ dashboard_shinylive.html
β”‚       β”œβ”€β”€ dashboard_shinylive_files
β”‚       └── data
β”œβ”€β”€ _pkgdown.yml
β”œβ”€β”€ _quarto.yml
β”œβ”€β”€ _targets.R
β”œβ”€β”€ _targets.yaml
β”œβ”€β”€ data-raw
β”œβ”€β”€ default.R
β”œβ”€β”€ default.nix
β”œβ”€β”€ default.sh
β”œβ”€β”€ inst
β”‚   β”œβ”€β”€ docs
β”‚   β”‚   └── parquet_storage_guide.md
β”‚   β”œβ”€β”€ extdata
β”‚   β”‚   β”œβ”€β”€ analysis_questions.md
β”‚   β”‚   β”œβ”€β”€ ctx
β”‚   β”‚   β”œβ”€β”€ dashboard_buoy_data.rds
β”‚   β”‚   β”œβ”€β”€ dashboard_stats.rds
β”‚   β”‚   β”œβ”€β”€ dashboard_timeseries.rds
β”‚   β”‚   β”œβ”€β”€ irish_buoys.duckdb
β”‚   β”‚   β”œβ”€β”€ return_levels.rds
β”‚   β”‚   β”œβ”€β”€ rogue_wave_events.rds
β”‚   β”‚   β”œβ”€β”€ seasonal_analysis.rds
β”‚   β”‚   └── wave_analysis_summary.rds
β”‚   └── scripts
β”‚       β”œβ”€β”€ example_usage.R
β”‚       └── storage_comparison.R
β”œβ”€β”€ man
β”‚   β”œβ”€β”€ add_wave_metrics.Rd
β”‚   β”œβ”€β”€ analyze_gust_factor.Rd
β”‚   β”œβ”€β”€ analyze_joint_extremes.Rd
β”‚   β”œβ”€β”€ analyze_parquet_storage.Rd
β”‚   β”œβ”€β”€ analyze_rogue_statistics.Rd
β”‚   β”œβ”€β”€ analyze_station_pairs.Rd
β”‚   β”œβ”€β”€ buoy_tbl.Rd
β”‚   β”œβ”€β”€ calculate_annual_trends.Rd
β”‚   β”œβ”€β”€ calculate_hs_from_elevation.Rd
β”‚   β”œβ”€β”€ calculate_return_levels.Rd
β”‚   β”œβ”€β”€ calculate_rms_wave_height.Rd
β”‚   β”œβ”€β”€ calculate_seasonal_means.Rd
β”‚   β”œβ”€β”€ calculate_wave_steepness.Rd
β”‚   β”œβ”€β”€ compare_rogue_wave_gust.Rd
β”‚   β”œβ”€β”€ connect_duckdb.Rd
β”‚   β”œβ”€β”€ convert_duckdb_to_parquet.Rd
β”‚   β”œβ”€β”€ create_buoy_schema.Rd
β”‚   β”œβ”€β”€ create_email_summary.Rd
β”‚   β”œβ”€β”€ create_plot_annual_trends.Rd
β”‚   β”œβ”€β”€ create_plot_gust_by_category.Rd
β”‚   β”œβ”€β”€ create_plot_gusts_vs_waves.Rd
β”‚   β”œβ”€β”€ create_plot_monthly_wave.Rd
β”‚   β”œβ”€β”€ create_plot_monthly_wind.Rd
β”‚   β”œβ”€β”€ create_plot_return_levels.Rd
β”‚   β”œβ”€β”€ create_plot_rogue_all.Rd
β”‚   β”œβ”€β”€ create_plot_rogue_by_station.Rd
β”‚   β”œβ”€β”€ create_plot_rogue_gusts.Rd
β”‚   β”œβ”€β”€ create_plot_rogue_gusts_all.Rd
β”‚   β”œβ”€β”€ create_plot_rogue_gusts_by_station.Rd
β”‚   β”œβ”€β”€ create_plot_stl.Rd
β”‚   β”œβ”€β”€ create_plot_time_of_day.Rd
β”‚   β”œβ”€β”€ create_plot_week_of_year.Rd
β”‚   β”œβ”€β”€ create_plot_wind_beaufort.Rd
β”‚   β”œβ”€β”€ create_return_level_plot_data.Rd
β”‚   β”œβ”€β”€ create_validation_summary.Rd
β”‚   β”œβ”€β”€ cross_correlation_stations.Rd
β”‚   β”œβ”€β”€ decompose_stl.Rd
β”‚   β”œβ”€β”€ detect_anomalies.Rd
β”‚   β”œβ”€β”€ detect_rogue_waves.Rd
β”‚   β”œβ”€β”€ download_buoy_data.Rd
β”‚   β”œβ”€β”€ evaluate_wave_model.Rd
β”‚   β”œβ”€β”€ explain_hourly_averaging.Rd
β”‚   β”œβ”€β”€ explain_hs_formula.Rd
β”‚   β”œβ”€β”€ explain_measurement_period.Rd
β”‚   β”œβ”€β”€ explain_wave_height_measurement.Rd
β”‚   β”œβ”€β”€ extreme_values.Rd
β”‚   β”œβ”€β”€ fit_bivariate_copula.Rd
β”‚   β”œβ”€β”€ fit_gev_annual_maxima.Rd
β”‚   β”œβ”€β”€ fit_gpd_threshold.Rd
β”‚   β”œβ”€β”€ generate_and_send_summary.Rd
β”‚   β”œβ”€β”€ generate_validation_reports.Rd
β”‚   β”œβ”€β”€ generate_weekly_summary.Rd
β”‚   β”œβ”€β”€ get_data_dictionary.Rd
β”‚   β”œβ”€β”€ get_database_stats.Rd
β”‚   β”œβ”€β”€ get_latest_timestamp.Rd
β”‚   β”œβ”€β”€ get_station_info.Rd
β”‚   β”œβ”€β”€ get_stations.Rd
β”‚   β”œβ”€β”€ get_variable_docs.Rd
β”‚   β”œβ”€β”€ haversine_distance.Rd
β”‚   β”œβ”€β”€ hs_from_rms.Rd
β”‚   β”œβ”€β”€ incremental_update.Rd
β”‚   β”œβ”€β”€ incremental_update_parquet.Rd
β”‚   β”œβ”€β”€ init_parquet_storage.Rd
β”‚   β”œβ”€β”€ initialize_database.Rd
β”‚   β”œβ”€β”€ irishbuoys-package.Rd
β”‚   β”œβ”€β”€ irishbuoys_ggplotly.Rd
β”‚   β”œβ”€β”€ irishbuoys_layout.Rd
β”‚   β”œβ”€β”€ joint_analysis.Rd
β”‚   β”œβ”€β”€ joint_analysis_summary.Rd
β”‚   β”œβ”€β”€ load_to_duckdb.Rd
β”‚   β”œβ”€β”€ log_update.Rd
β”‚   β”œβ”€β”€ plot_functions.Rd
β”‚   β”œβ”€β”€ predict_station_lagged.Rd
β”‚   β”œβ”€β”€ predict_wave_height.Rd
β”‚   β”œβ”€β”€ prepare_wave_features.Rd
β”‚   β”œβ”€β”€ query_buoy_data.Rd
β”‚   β”œβ”€β”€ query_parquet.Rd
β”‚   β”œβ”€β”€ rogue_wave_report.Rd
β”‚   β”œβ”€β”€ save_to_parquet.Rd
β”‚   β”œβ”€β”€ station_distance_matrix.Rd
β”‚   β”œβ”€β”€ train_wave_model.Rd
β”‚   β”œβ”€β”€ trend_analysis.Rd
β”‚   β”œβ”€β”€ trend_summary_report.Rd
β”‚   β”œβ”€β”€ update_station_metadata.Rd
β”‚   β”œβ”€β”€ validate_buoy_data.Rd
β”‚   β”œβ”€β”€ validate_rogue_events.Rd
β”‚   β”œβ”€β”€ validation.Rd
β”‚   β”œβ”€β”€ wave_glossary.Rd
β”‚   β”œβ”€β”€ wave_model.Rd
β”‚   β”œβ”€β”€ wave_model_report.Rd
β”‚   β”œβ”€β”€ wave_science.Rd
β”‚   └── wave_science_documentation.Rd
β”œβ”€β”€ pkgdown
β”‚   └── extra.css
β”œβ”€β”€ plans
β”‚   └── PLAN_telemetry_overhaul.md
β”œβ”€β”€ push_to_cachix.sh
β”œβ”€β”€ tests
β”‚   β”œβ”€β”€ testthat
β”‚   β”‚   β”œβ”€β”€ _snaps
β”‚   β”‚   β”œβ”€β”€ test-data-consistency.R
β”‚   β”‚   β”œβ”€β”€ test-defensive-programming.R
β”‚   β”‚   └── test-doc-dependencies.R
β”‚   └── testthat.R
└── vignettes
    β”œβ”€β”€ _targets.yaml
    β”œβ”€β”€ custom.scss
    β”œβ”€β”€ dashboard_shinylive.qmd
    β”œβ”€β”€ dashboard_shinylive_files
    β”‚   └── mediabag
    β”œβ”€β”€ dashboard_static.qmd
    β”œβ”€β”€ data
    β”‚   β”œβ”€β”€ buoy_data.json
    β”‚   β”œβ”€β”€ buoy_data.parquet
    β”‚   β”œβ”€β”€ buoy_data_raw.csv
    β”‚   └── stations.json
    β”œβ”€β”€ debug.qmd
    β”œβ”€β”€ debug_fixed_files
    β”œβ”€β”€ shinylive-sw.js
    β”œβ”€β”€ telemetry.qmd
    └── wave_analysis.qmd

Note: _targets/ (pipeline cache) and docs/ (generated site) excluded for clarity.

Key Features
  1. Efficient Data Storage: Uses DuckDB for fast querying of large datasets
  2. Incremental Updates: Smart updating to only download new data
  3. Quality Control: Built-in filtering for data quality
  4. Rogue Wave Detection: Identify extreme wave events (Hmax > 2 Γ— Hs)
  5. Comprehensive Documentation: Full data dictionary with scientific definitions
Use Cases
  • Marine safety and operations planning
  • Climate and oceanographic research
  • Extreme event analysis
  • Wave energy resource assessment
  • Weather forecast validation
Contributing, License & Acknowledgments

Contributing: Contributions are welcome! Please feel free to submit a Pull Request.

License: MIT License. See LICENSE file for details.

Acknowledgments: Data provided by the Marine Institute Ireland in collaboration with Met Γ‰ireann and the UK Met Office.

Sources:


*Last updated: 2026-02-15 13:19 UTC *