Skip to contents

In the Copernicus Data Space Ecosystem (CDSE), STAC stands for SpatioTemporal Asset Catalog. It is a standardized, open-source metadata specification used to structure, search, and discover Earth Observation (EO) data.

Instead of requiring users to download massive, raw satellite images, the CDSE STAC RESTful API allows developers to query precise metadata (e.g., location, time, cloud cover, and specific bands) to locate exact data assets.

Key Functions of STAC:

  • Data Access: It provides a direct path (such as S3 storage links) to cloud-hosted imagery, enabling tools to process a subset of data without full downloads.
  • System Interoperability: It replaces satellite-specific extensions with a unified data model, allowing the same code or software to seamlessly handle diverse datasets like Sentinel-1 and Sentinel-2.

The package also offers features for the complementary primary catalogue via OData. For more details on that read vignette("OData").

In general to understand which STAC client the server is offering, you can call dse_stac_client().

Data Exploration

A good starting point of exploring data with STAC is being aware which collections of data are available in the first place. You can list them as follows:

library(CopernicusDataspace)
dse_stac_collections()
#> # A tibble: 10 × 20
#>    id            type  links  title assets   extent   license keywords providers
#>    <chr>         <chr> <list> <chr> <list>   <list>   <chr>   <list>   <list>   
#>  1 ccm-optical   Coll… <list> Cope… <tibble> <tibble> other   <list>   <list>   
#>  2 ccm-sar       Coll… <list> Cope… <tibble> <tibble> other   <list>   <list>   
#>  3 clms_ba_glob… Coll… <list> CLMS… <tibble> <tibble> other   <NULL>   <list>   
#>  4 clms_ba_glob… Coll… <list> CLMS… <tibble> <tibble> other   <NULL>   <list>   
#>  5 clms_ba_glob… Coll… <list> CLMS… <tibble> <tibble> other   <NULL>   <list>   
#>  6 clms_ba_glob… Coll… <list> CLMS… <tibble> <tibble> other   <NULL>   <list>   
#>  7 clms_ba_glob… Coll… <list> CLMS… <tibble> <tibble> other   <NULL>   <list>   
#>  8 clms_ba_glob… Coll… <list> CLMS… <tibble> <tibble> other   <NULL>   <list>   
#>  9 clms_ba_glob… Coll… <list> CLMS… <tibble> <tibble> other   <NULL>   <list>   
#> 10 clms_ba_glob… Coll… <list> CLMS… <tibble> <tibble> other   <NULL>   <list>   
#> # ℹ 11 more variables: summaries <list>, description <chr>, item_assets <list>,
#> #   `auth:schemes` <list>, `ceosard:type` <list>, stac_version <chr>,
#> #   stac_extensions <list>, `storage:schemes` <list>, bands <list>,
#> #   `sci:doi` <list>, contacts <list>

The returned data.frame contains descriptive information about each of the collections. It can help you focus your search. Once you have identified a collection, you can check which filter/search are available for further narrowing your exploration tour:

dse_stac_queryables("sentinel-1-grd") |> summary()
#>                      Length Class  Mode     
#> $id                   1     -none- character
#> type                  1     -none- character
#> title                 1     -none- character
#> $schema               1     -none- character
#> properties           11     -none- list     
#> additionalProperties  1     -none- logical

The example above shows 11 properties that can be used to focus the search. You can start an actual search by creating a STAC search request with: dse_stac_search_request(). It creates a special class of httr2 request object. In essence, it is a request to the API server, which you can modify with tidyverse operators. This sounds more complicated than it is.

Once you have created the request, you can add tidyverse operators (like filter(), arrange() and slice_head()), to modify this request. You can join those modifications with the pipe operator (|> or %>%). You can also query products that intersect with specific spatial features (sf) using st_intersects().

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.4.0; sf_use_s2() is TRUE

bbox <-
  sf::st_bbox(
    c(xmin = 5.261, ymin = 52.680, xmax = 5.319, ymax = 52.715),
    crs = 4326)

dse_stac_search_request("sentinel-1-grd") |>
  filter(`sat:orbit_state` == "ascending") |>
  arrange("id") |>
  st_intersects(bbox) |>
  collect()
#> # A tibble: 10 × 56
#>    id        bbox   type  links  assets   geometry collection properties.created
#>  * <chr>     <list> <chr> <list> <list>   <list>   <chr>      <chr>             
#>  1 S1A_EW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-06-26T05:37:…
#>  2 S1A_EW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-04-27T16:33:…
#>  3 S1A_EW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-04-27T18:49:…
#>  4 S1A_IW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-05-08T16:00:…
#>  5 S1A_IW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-05-08T16:50:…
#>  6 S1A_IW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-04-25T15:40:…
#>  7 S1A_IW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-04-27T11:18:…
#>  8 S1A_IW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-04-27T13:01:…
#>  9 S1A_IW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-04-27T15:24:…
#> 10 S1A_IW_G… <list> Feat… <list> <tibble> <tibble> sentinel-… 2023-05-04T14:05:…
#> # ℹ 48 more variables: properties.updated <chr>, properties.datetime <chr>,
#> #   properties.platform <chr>, properties.published <chr>,
#> #   properties.instruments <list>, `properties.auth:schemes.s3.type` <chr>,
#> #   `properties.auth:schemes.oidc.type` <chr>,
#> #   `properties.auth:schemes.oidc.openIdConnectUrl` <chr>,
#> #   properties.end_datetime <chr>, `properties.product:type` <chr>,
#> #   `properties.view:azimuth` <dbl>, properties.constellation <chr>, …

Downloading Data

When downloading data, you could retrieve the Uniform Resource Identifier (URI) with dse_stac_get_uri(). To get an URI, you need at least the asset identifier and the specific asset. Both can be obtained with a search as shown above. The example below shows you how:

dse_stac_get_uri(
  asset_id = "S2A_MSIL1C_20260109T132741_N0511_R024_T39XVL_20260109T142148",
  asset = "B01",
  collection = "sentinel-2-l1c"
)
#> [1] "s3://eodata/Sentinel-2/MSI/L1C/2026/01/09/S2A_MSIL1C_20260109T132741_N0511_R024_T39XVL_20260109T142148.SAFE/GRANULE/L1C_T39XVL_A055105_20260109T132737/IMG_DATA/T39XVL_20260109T132741_B01.jp2"
#> attr(,"local_path")
#> [1] "S2A_MSIL1C_20260109T132741_N0511_R024_T39XVL_20260109T142148.SAFE/GRANULE/L1C_T39XVL_A055105_20260109T132737/IMG_DATA/T39XVL_20260109T132741_B01.jp2"

This approach also needs the collection from which the asset is made available. If you don’t provide it, it will be guessed with dse_stac_guess_collection(). This function is not 100% reliable, so it’s best practice to provide the collection manually. Instead of working with the URI yourself it is easier to call dse_stac_download(). It will automatically takes care of required authentication for downloading the file (if properly provided). Check vignette("Authentication") for more information about the authentication process.

dse_stac_download(
  asset_id = "S2A_MSIL1C_20260109T132741_N0511_R024_T39XVL_20260109T142148",
  asset = "B01",
  collection = "sentinel-2-l1c",
  destination = tempdir()
)