The irishbuoys package provides tools to download, process, and analyze data from the Irish Weather Buoy Network. It includes functions for accessing real-time and historical data via the Marine Instituteβs ERDDAP server, storing data in DuckDB for efficient querying, and building predictive models for wave height and weather conditions.
Installation
You can install the development version of irishbuoys from GitHub:
# install.packages("remotes")
remotes::install_github("johngavin/irishbuoys")For a reproducible development environment using Nix:
# Clone the repository
git clone https://github.com/johngavin/irishbuoys.git
cd irishbuoys
# Enter Nix shell (first time may take a while, subsequent entries are fast)
./default.shThe default.sh script automatically: - Generates default.nix from DESCRIPTION if needed (via default.R) - Creates a GC root to prevent garbage collection - Enters pure mode for reproducibility
This project enforces Nix --pure mode to guarantee:
- Reproducibility: Only Nix-provided tools are available
- Security: No accidental use of system tools with different versions
- Consistency: Same environment locally and in CI
Entering and verifying pure mode
# Recommended: Use default.sh (pure mode enforced automatically)
./default.sh
# Output shows: π SECURITY: Running in --pure mode
# Or manually with pure flag:
nix-shell --pure default.nix
# Verify pure mode
echo $IN_NIX_SHELL # Expected: pure
which R # Expected: /nix/store/...Passing environment variables:
Using with rix
To integrate this package into your own Nix environment:
library(rix)
rix(
r_ver = "4.5.2", # Must match project's R version
r_pkgs = c(
# Core dependencies (from Imports)
"arrow", "cli", "dbplyr", "DBI", "dplyr", "duckdb",
"glue", "httr2", "jsonlite", "lubridate", "pointblank",
"purrr", "rlang", "tibble",
# Visualization (from Suggests)
"plotly", "ggplot2", "dygraphs", "DT"
),
git_pkgs = list(
list(
package_name = "irishbuoys",
repo_url = "https://github.com/johngavin/irishbuoys",
commit = "main" # Use specific SHA for reproducibility
)
),
ide = "other",
project_path = "."
)Cachix Binary Cache (Faster Builds)
Use the pre-built R packages from rstats-on-nix Cachix cache for much faster builds:
This project uses a two-tier Cachix strategy:
| Priority | Cache | Contains |
|---|---|---|
| 1st | rstats-on-nix |
All standard R packages (public, pre-built) |
| 2nd | johngavin |
Project-specific custom packages only |
Important: Standard R packages (dplyr, targets, etc.) are ALL available from rstats-on-nix. The johngavin cache is only for custom packages not in rstats-on-nix.
For irishbuoys: - All dependencies come from rstats-on-nix - The irishbuoys package itself is loaded via pkgload::load_all() (development mode) - Nothing needs to be pushed to johngavin cache
CI workflows automatically use both caches:
Documentation
Comprehensive vignettes on the package website:
Static Dashboard - Interactive visualization of Irish Weather Buoy Network data.
- Real-time station data: Max Wave, Signif Wave, and Wind Speed by station
- Time series analysis: Hourly measurements with interactive plotly charts
- Extreme event detection: Rogue wave identification (Max Wave > 2Γ Signif Wave)
Wave Analysis - Scientific analysis of wave patterns and extreme value statistics.
- Key metrics dashboard: Station statistics, date coverage, and data quality summary
- Extreme value modeling: GEV/GPD fits for return period estimation
- Joint distribution analysis: Cross-correlations, copula tail dependence
Interactive validation reports via pointblank:
- Analysis Data Validation: Column checks, physical bounds (wave 0-30m, wind 0-100 m/s)
- Rogue Wave Validation: Ensures Hmax > 2 Γ Hs criterion
Pipeline Telemetry - Pipeline metrics and performance tracking.
- Data coverage metrics: Completeness by station and time period
- Pipeline performance: Target execution times and dependencies
Quick Start
Download Recent Data
library(irishbuoys)
# Download last 7 days of data
# Show structure of first 3 rows for efficiency
data <- download_buoy_data(
start_date = Sys.Date() - 7,
end_date = Sys.Date()
)
str(head(data, 3))
# Get data for specific station
m3_data <- download_buoy_data(
stations = "M3",
start_date = "2024-01-01"
)
dim(m3_data)
# Get earliest available data (buoy network started 2001-02-05)
# Show structure of first 3 rows for efficiency
waves <- download_buoy_data(
start_date = "2001-02-05",
end_date = "2001-02-06"
)
str(head(waves, 3))Initialize and Query Database
# Initialize database with historical data (chunk_days=365 for faster downloads)
initialize_database(
start_date = "2024-01-01", # Default: recent data for quick start
end_date = Sys.Date(),
chunk_days = 365 # Download in 1-year chunks for efficiency
)
# Check database statistics immediately after initialization
stats <- get_database_stats()
# Connect to database
con <- connect_duckdb()
# Check QC flag distribution by station FIRST
# qc_flag: 0=unknown, 1=good, 9=missing
qc_tally <- tbl(con, "buoy_data") |>
group_by(station_id, qc_flag) |>
summarise(n = n(), .groups = "drop") |>
collect() |>
tidyr::pivot_wider(names_from = qc_flag, values_from = n, names_prefix = "qc_")
print(qc_tally)
# Query wave data (qc_filter=FALSE returns all data; TRUE filters for qc_flag==1)
wave_data <- query_buoy_data(
con,
stations = c("M3", "M4"),
variables = c("time", "station_id", "wave_height", "wave_period"),
start_date = "2024-01-01",
qc_filter = FALSE # Set TRUE to filter for qc_flag==1 only
)
# Custom SQL query: Find top 10 most extreme rogue waves
# Ordered by hmax (highest individual wave) because "extreme" = largest waves
extreme_waves <- query_buoy_data(
con,
sql_query = "
SELECT station_id, time, wave_height, hmax
FROM buoy_data
WHERE hmax > 2 * wave_height
AND wave_height > 0
AND qc_flag = 1
ORDER BY hmax DESC
LIMIT 10
"
)Tidyverse Alternative (duckplyr)
The same query using dplyr verbs with duckplyr backend:
# Tidyverse alternative using duckplyr
# Same query as SQL above, ordered by hmax (highest waves first)
library(dplyr)
library(duckplyr)
extreme_waves_tidy <- tbl(con, "buoy_data") |>
filter(
hmax > 2 * wave_height,
wave_height > 0,
qc_flag == 1
) |>
select(station_id, time, wave_height, hmax) |>
arrange(desc(hmax)) |>
head(10) |>
collect()Why duckplyr? - Familiar tidyverse syntax - Lazy evaluation - query runs only on collect() - Automatic SQL translation for performance - Works with any DBI connection
# Don't forget to disconnect
DBI::dbDisconnect(con)Incremental Updates
# Perform incremental update (for scheduled jobs)
result <- incremental_update()
print(result$summary)Data Dictionary
# Get complete data dictionary (returns tibble)
dict <- get_data_dictionary()
print(dict)
# Get detailed documentation for specific variable
(wave_docs <- get_variable_docs("WaveHeight"))
# Merge dictionary with database column info
# Useful for creating documentation or understanding data
library(dplyr)
db_cols <- tibble(
variable = c("wave_height", "hmax", "wind_speed", "gust"),
db_column = c("wave_height", "hmax", "wind_speed", "gust")
)
dict |>
filter(tolower(variable) %in% db_cols$variable |
variable %in% c("WaveHeight", "Hmax", "WindSpeed", "Gust")) |>
select(variable, units, description)Data Source
Data is sourced from the Marine Instituteβs ERDDAP server, which provides real-time and historical measurements from the Irish Weather Buoy Network.
Project Structure
Click to expand project tree
.
βββ DESCRIPTION
βββ LICENSE
βββ LICENSE.md
βββ NAMESPACE
βββ R
β βββ data_dictionary.R
β βββ database.R
β βββ database_parquet.R
β βββ dev
β β βββ check_data_gaps.R
β β βββ generate_dashboard_data.R
β β βββ issues
β βββ email_summary.R
β βββ erddap_client.R
β βββ extreme_values.R
β βββ irishbuoys-package.R
β βββ joint_analysis.R
β βββ plot_functions.R
β βββ plotly_helpers.R
β βββ rogue_waves.R
β βββ tar_plans
β β βββ plan_dashboard.R
β β βββ plan_dashboard_captions.R
β β βββ plan_data_acquisition.R
β β βββ plan_doc_examples.R
β β βββ plan_evidence.R
β β βββ plan_joint_analysis.R
β β βββ plan_pkgctx.R
β β βββ plan_quality_control.R
β β βββ plan_telemetry.R
β β βββ plan_vignettes.R
β β βββ plan_wave_analysis.R
β βββ trend_analysis.R
β βββ update.R
β βββ validation.R
β βββ wave_model.R
β βββ wave_science.R
βββ README.md
βββ README.qmd
βββ README.rmarkdown
βββ _extensions
β βββ quarto-ext
β βββ shinylive
βββ _output
β βββ shinylive-sw.js
β βββ vignettes
β βββ dashboard_shinylive.html
β βββ dashboard_shinylive_files
β βββ data
βββ _pkgdown.yml
βββ _quarto.yml
βββ _targets.R
βββ _targets.yaml
βββ data-raw
βββ default.R
βββ default.nix
βββ default.sh
βββ inst
β βββ docs
β β βββ parquet_storage_guide.md
β βββ extdata
β β βββ analysis_questions.md
β β βββ ctx
β β βββ dashboard_buoy_data.rds
β β βββ dashboard_stats.rds
β β βββ dashboard_timeseries.rds
β β βββ irish_buoys.duckdb
β β βββ return_levels.rds
β β βββ rogue_wave_events.rds
β β βββ seasonal_analysis.rds
β β βββ wave_analysis_summary.rds
β βββ scripts
β βββ example_usage.R
β βββ storage_comparison.R
βββ man
β βββ add_wave_metrics.Rd
β βββ analyze_gust_factor.Rd
β βββ analyze_joint_extremes.Rd
β βββ analyze_parquet_storage.Rd
β βββ analyze_rogue_statistics.Rd
β βββ analyze_station_pairs.Rd
β βββ buoy_tbl.Rd
β βββ calculate_annual_trends.Rd
β βββ calculate_hs_from_elevation.Rd
β βββ calculate_return_levels.Rd
β βββ calculate_rms_wave_height.Rd
β βββ calculate_seasonal_means.Rd
β βββ calculate_wave_steepness.Rd
β βββ compare_rogue_wave_gust.Rd
β βββ connect_duckdb.Rd
β βββ convert_duckdb_to_parquet.Rd
β βββ create_buoy_schema.Rd
β βββ create_email_summary.Rd
β βββ create_plot_annual_trends.Rd
β βββ create_plot_gust_by_category.Rd
β βββ create_plot_gusts_vs_waves.Rd
β βββ create_plot_monthly_wave.Rd
β βββ create_plot_monthly_wind.Rd
β βββ create_plot_return_levels.Rd
β βββ create_plot_rogue_all.Rd
β βββ create_plot_rogue_by_station.Rd
β βββ create_plot_rogue_gusts.Rd
β βββ create_plot_rogue_gusts_all.Rd
β βββ create_plot_rogue_gusts_by_station.Rd
β βββ create_plot_stl.Rd
β βββ create_plot_time_of_day.Rd
β βββ create_plot_week_of_year.Rd
β βββ create_plot_wind_beaufort.Rd
β βββ create_return_level_plot_data.Rd
β βββ create_validation_summary.Rd
β βββ cross_correlation_stations.Rd
β βββ decompose_stl.Rd
β βββ detect_anomalies.Rd
β βββ detect_rogue_waves.Rd
β βββ download_buoy_data.Rd
β βββ evaluate_wave_model.Rd
β βββ explain_hourly_averaging.Rd
β βββ explain_hs_formula.Rd
β βββ explain_measurement_period.Rd
β βββ explain_wave_height_measurement.Rd
β βββ extreme_values.Rd
β βββ fit_bivariate_copula.Rd
β βββ fit_gev_annual_maxima.Rd
β βββ fit_gpd_threshold.Rd
β βββ generate_and_send_summary.Rd
β βββ generate_validation_reports.Rd
β βββ generate_weekly_summary.Rd
β βββ get_data_dictionary.Rd
β βββ get_database_stats.Rd
β βββ get_latest_timestamp.Rd
β βββ get_station_info.Rd
β βββ get_stations.Rd
β βββ get_variable_docs.Rd
β βββ haversine_distance.Rd
β βββ hs_from_rms.Rd
β βββ incremental_update.Rd
β βββ incremental_update_parquet.Rd
β βββ init_parquet_storage.Rd
β βββ initialize_database.Rd
β βββ irishbuoys-package.Rd
β βββ irishbuoys_ggplotly.Rd
β βββ irishbuoys_layout.Rd
β βββ joint_analysis.Rd
β βββ joint_analysis_summary.Rd
β βββ load_to_duckdb.Rd
β βββ log_update.Rd
β βββ plot_functions.Rd
β βββ predict_station_lagged.Rd
β βββ predict_wave_height.Rd
β βββ prepare_wave_features.Rd
β βββ query_buoy_data.Rd
β βββ query_parquet.Rd
β βββ rogue_wave_report.Rd
β βββ save_to_parquet.Rd
β βββ station_distance_matrix.Rd
β βββ train_wave_model.Rd
β βββ trend_analysis.Rd
β βββ trend_summary_report.Rd
β βββ update_station_metadata.Rd
β βββ validate_buoy_data.Rd
β βββ validate_rogue_events.Rd
β βββ validation.Rd
β βββ wave_glossary.Rd
β βββ wave_model.Rd
β βββ wave_model_report.Rd
β βββ wave_science.Rd
β βββ wave_science_documentation.Rd
βββ pkgdown
β βββ extra.css
βββ plans
β βββ PLAN_telemetry_overhaul.md
βββ push_to_cachix.sh
βββ tests
β βββ testthat
β β βββ _snaps
β β βββ test-data-consistency.R
β β βββ test-defensive-programming.R
β β βββ test-doc-dependencies.R
β βββ testthat.R
βββ vignettes
βββ _targets.yaml
βββ custom.scss
βββ dashboard_shinylive.qmd
βββ dashboard_shinylive_files
β βββ mediabag
βββ dashboard_static.qmd
βββ data
β βββ buoy_data.json
β βββ buoy_data.parquet
β βββ buoy_data_raw.csv
β βββ stations.json
βββ debug.qmd
βββ debug_fixed_files
βββ shinylive-sw.js
βββ telemetry.qmd
βββ wave_analysis.qmdNote: _targets/ (pipeline cache) and docs/ (generated site) excluded for clarity.
Key Features
- Efficient Data Storage: Uses DuckDB for fast querying of large datasets
- Incremental Updates: Smart updating to only download new data
- Quality Control: Built-in filtering for data quality
- Rogue Wave Detection: Identify extreme wave events (Hmax > 2 Γ Hs)
- Comprehensive Documentation: Full data dictionary with scientific definitions
Use Cases
- Marine safety and operations planning
- Climate and oceanographic research
- Extreme event analysis
- Wave energy resource assessment
- Weather forecast validation
Contributing, License & Acknowledgments
Contributing: Contributions are welcome! Please feel free to submit a Pull Request.
License: MIT License. See LICENSE file for details.
Acknowledgments: Data provided by the Marine Institute Ireland in collaboration with Met Γireann and the UK Met Office.
Sources: