05 Dec 2025


Advantages:
Pure data collection: avoid malformed HTML, no legal issues, clear data structures
Standardised data access procedures: transparency, replicability
Robustness: benefits from wisdom of the crowds
Disadvantages:
They’re not always available
Dependency on API providers
Lack of natural connection to R

RESTful APIs: queries for static information at current moment (e.g. user profiles, posts, etc.)
Streaming APIs: real time data (e.g. new tweets, weather alerts)
APIs often have extensive documentation:
written for developers, what to look for: endpoints and parameters: API Documentation
most APIs are rate-limited: restrictions on number of API calls by user/IP address and period of time
commercial APIs may impose a monthly fee
List of APIs in case you need inspiration
Rate-limit your requests (sys.sleep() in loop
Json vs XML:

Most APIs requires a key or other user credentials before you can query their database
Getting credentialised with a API requires that you register with the organization
Most APIs are set up for developers, so you will likely be asked to register an application
Once you have successfully registered, you will be assigned one or more keys, tokens, or other credentials that must be supplied to the server as part of any API call you make

There are two ways to collect data through APIs in R:
Many common APIs are available through user-written R Packages. These packages offer functions that “wrap” API queries and format the response. These packages are usually much more convenient than writing our own query
If no wrapper function is available, we have to write our own API request and format the response ourselves using R. This is trickier, but definitely doable
Setup:
Daily or hourly weather variables
Values are numerical and time-stamped → ideal for time-series analysis
| temperature_2m_mean | Average air temperature at 2 metres above ground (°C) |
| temperature_2m_max | Daily maximum air temperature (°C) |
| rain_sum | Total rainfall only (mm) |
| sunshine_duration | Minutes of direct sunlight per day (minutes) |
get_nov <- function(year) {
url <- "https://archive-api.open-meteo.com/v1/era5"
q <- list(
latitude = 51.5074, longitude = -0.1278,
start_date = sprintf("%d-11-01", year),
end_date = sprintf("%d-11-30", year),
daily = "temperature_2m_mean",
timezone = "Europe/London"
)
dat <- fromJSON(content(GET(url, query = q), "text"))$daily
tibble(
day = as.integer(substr(dat$time, 9, 10)),
tmean = dat$temperature_2m_mean,
year = factor(year)
)
}httr Package
# Pick last 4 *complete* Novembers
this_year <- as.integer(format(Sys.Date(), "%Y"))
years <- (this_year - 5):(this_year - 1)
nov <- map_dfr(years, get_nov)
# Plot
ggplot(nov, aes(day, tmean, colour = year, linetype = year)) +
geom_line(linewidth = 1) +
stat_summary(
fun = mean, geom = "line",
aes(group = 1),
colour = "black", linewidth = 1.3,
show.legend = FALSE
) +
scale_colour_brewer(palette = "Blues") +
scale_linetype_manual(values = c("solid","dashed","dotdash","twodash","longdash")) +
labs(
title = "London – Daily Mean Temperature in November (Last 5 Years)",
subtitle = "Black line is the average temperature",
x = "Day of November", y = "Mean Temperature (°C)"
) +
theme_minimal(base_size = 14)ggmap uses Google Maps behind the scenes, so you’ll need an active Google Cloud Platform account (see here if you cannot figure out)

ggmap with Google Maps APIOpenstreetMap (OSM) is a free and open map of the world created largely by voluntary contribution of millions of people around the world
OSM serves two APIs, namely Main API for editing OSM, and Overpass API for providing OSM data
OSM data is stored as a list of attributes tagged in key - value pairs of geospatial objects (points, lines or polygons)
For example, for charities, the key is “office”, and the value is “charity”
The first step is to define a bounding box of the geographical area we are interested in. It is defined by four geographic coordinates, representing the minimum and maximum latitude and longitude of the area
osmdata objecthead(london_rst$osm_polygons)$name
#select name and geometry for charities
rst_osm_points <- london_rst$osm_points %>% #select point data from downloaded OSM data
select(name, geometry) #for now just selecting the name and geometry to plot
rst_osm_polygons <- london_rst$osm_polygons %>%
select(name, geometry)
london_charities <- rbind(rst_osm_points, rst_osm_polygons)Further texts:
Bonus Tutorial: Spotify API:
SOCS0100 – Computational Social Science