Using DuckDB in R: A Walkthrough Starting with MySQL
By Steve Ewing
November 1, 2024
Introduction
In this walkthrough, we will demonstrate how to use DuckDB in R, starting by connecting to a MySQL server. By integrating MySQL and DuckDB, you can leverage the strengths of both databases within your data science workflows. This guide will help you retrieve data from MySQL, import it into DuckDB, and perform efficient analytical queries.
Prerequisites
R and RStudio: Ensure you have R and RStudio installed.
MySQL Server: Access to a MySQL server with the necessary permissions.
Required R Packages: RMySQL
, DBI
, duckdb
, dplyr
, dbplyr
, ggplot2
Loading Data into DuckDB From MySQL
First, we need to connect to a MySQL database to retrieve data.
Installation
# install.packages("RMySQL")
# install.packages("DBI") # Database interface
Load the libraries:
library(RMySQL)
## Warning: package 'RMySQL' was built under R version 4.4.1
## Loading required package: DBI
## Warning: package 'DBI' was built under R version 4.4.1
library(DBI)
Establishing a Connection
Set up the connection to your MySQL server:
# Replace with your MySQL server details
con_mysql <- dbConnect(RMySQL::MySQL(),
dbname = Sys.getenv("DO_DB_NAME"),
host = Sys.getenv("DO_DB_HOST"),
port = as.integer(Sys.getenv("DO_DB_PORT")),
user = Sys.getenv("DO_DB_USER"),
password = Sys.getenv("DO_DB_PASSWORD"))
Connect to DuckDB
# DuckDB connection
con_duckdb <- dbConnect(duckdb::duckdb(), dbdir = "dmo_data.duckdb")
Retrieve the List of Tables from MySQL
Let’s retrieve data from a MySQL table:
# List available tables
mysql_tables <- dbListTables(con_mysql)
# View the list of tables
head(mysql_tables)
## [1] "a_Comp_Price_Data" "a_Comp_Store" "a_Comp_URL"
## [4] "a_Comp_URL_DNC" "adcampaigns" "address_book"
If you want to load only specific tables, you can define them:
# Specify the tables you want to load
tables_to_load <- c("orders", "products", "orders_products")
Transfer Tables from MySQL to DuckDB
We’ll iterate over each table, read it from MySQL, and write it into DuckDB.
## Loading table: orders
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 1 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 42 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 44 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 45 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 46
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 47 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 48 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 49 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 50 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 53 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 56 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 89 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 115
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 118
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 119 imported
## as numeric
## Loading table: products
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 29 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 30 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 31 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 32 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 36 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 37 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 46
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 61 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 62 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 63 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 64 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 65 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 66 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 67 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 68 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 87 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 88 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 91 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 92 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 93 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 97 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 98 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 100 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 105 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 136
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 139
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 150 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 152 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 153 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 155 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 157 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 158 imported
## as numeric
## Loading table: orders_products
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 7 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 8 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 9 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 10 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 13 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 15
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 18 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 19 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 23
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 24
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Unsigned INTEGER in col 25
## imported as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 34 imported
## as numeric
## Warning in dbSendQuery(conn, statement, ...): Decimal MySQL column 35 imported
## as numeric
Disconnect From the Server
# Disconnect from MySQL
dbDisconnect(con_mysql)
## [1] TRUE
Verify the Tables in DuckDB
List the tables in DuckDB to confirm they have been loaded:
duckdb_tables <- dbListTables(con_duckdb)
print(duckdb_tables)
## [1] "address_book" "autoship_orders"
## [3] "box_subscription_orders" "customers"
## [5] "customers_credit_accounts" "customers_groups"
## [7] "fees" "manufacturers"
## [9] "orders" "orders_products"
## [11] "orders_products_pick" "orders_status"
## [13] "orders_total" "payments_charges"
## [15] "payments_refunds" "payments_types"
## [17] "products" "products_bundles"
## [19] "products_extra_fields" "products_groups"
## [21] "products_inventory" "products_to_products_extra_fields"
## [23] "products_to_products_groups" "shipments"
## [25] "shipments_status" "suppliers_import"
## [27] "ups_import" "wm_boxes"
## [29] "wm_inventory_transactions" "wm_invoices"
## [31] "wm_po_history" "wm_po_items"
## [33] "wm_product_promos" "wm_product_promos_products"
## [35] "wm_product_promos_tiers" "wm_purchase_orders"
## [37] "wm_suppliers" "wm_suppliers_to_products"
Performing Queries in DuckDB
With the tables loaded into DuckDB, you can perform efficient queries using SQL or dplyr.
Using SQL Queries
result <- dbGetQuery(con_duckdb, "
SELECT op.products_id,
COUNT(op.qty_returned) AS returns
FROM orders_products op
LEFT JOIN products p USING(products_id)
LEFT JOIN orders o USING(orders_id)
WHERE op.qty_returned > 0
AND op.qty_returned IS NOT NULL
GROUP BY op.products_id, p.name
ORDER BY returns DESC
LIMIT 10;
")
print(result)
## products_id returns
## 1 304 1469
## 2 184 1348
## 3 1813 728
## 4 1877 519
## 5 100 502
## 6 1816 489
## 7 1817 481
## 8 3546 427
## 9 40 398
## 10 65 351
- Posted on:
- November 1, 2024
- Length:
- 7 minute read, 1352 words
- See Also: