AI-powered pixels: Introducing Google’s Satellite Embedding dataset

9 min read3 days ago

By Valerie Pasquarella, Research Scientist and Emily Schechter, Product Manager, Google

We’re introducing a new way to analyze the planet. Google’s Satellite Embedding dataset uses the power of AI to pack a year’s worth of multi-source satellite data into every single 10-meter pixel, enabling faster and more powerful geospatial analysis. Welcome to the future of deep learning in Earth Engine.

Fifteen years ago, we launched Earth Engine with a mission to provide widespread access to Earth observation imagery and geospatial data. As we’ve added petabytes of publicly available data to the Earth Engine Data Catalog, this ambitious goal has brought a new challenge: how can users effectively leverage ever-growing image archives and a multitude of inputs and algorithms to address the world’s most pressing environmental issues? Answer: The power of AI!

3D visualization of Earth Engine’s new Satellite Embedding dataset generated by Google DeepMind’s new AlphaEarth Foundations model.

Today, we are excited to introduce our new Satellite Embedding dataset produced in partnership with Google DeepMind. This first-of-its-kind dataset was generated using AlphaEarth Foundations, Google DeepMind’s new geospatial AI model that assimilates observations across diverse sources of geospatial information, including optical and thermal imagery from Sentinel-2 and Landsat satellites, radar data that can see through clouds, 3D measurements of surface properties, global elevation models, climate information, gravity fields, and descriptive text. Unlike traditional deep learning models that require users to fine-tune weights and run their own inference on clusters of high-end computers, AlphaEarth Foundations was designed to produce information-rich, 64-dimensional geospatial “embeddings” that are suitable for use with Earth Engine’s built-in machine learning classifiers and other pixel-based analysis.

We’ve run AlphaEarth Foundations at scale to produce a global dataset of precomputed, analysis-ready embeddings at a 10-meter resolution for each year back to 2017. While this may look like any standard Earth Engine Image Collection, we’ve effectively packed AI-powered feature extraction into every pixel, and you can use these embedding “images” in place of more conventional image composites and engineered features like spectral indices and harmonic fits. The best part is that embedding layers are analysis-ready; no need for atmospheric correction, cloud masking, spectral transformations, speckle filtering, or other featurization techniques — just superior results at reduced effort and complexity.

Satellite Embedding dataset in the Earth Engine Data Catalog.

What’s embedded in an embedding?

Measurements. The geospatial embeddings generated by AlphaEarth Foundations are learned across diverse data sources from the Earth Engine Data Catalog and geo-temporally located text labels. The model uses a self-supervised approach that enables learning from many types of data at once without hand-annotated training data. By assimilating information across multiple sources and modes of description, including Sentinel-1 C-Band SAR, multi-spectral Sentinel-2, and multi-spectral, panchromatic, and thermal observations from Landsat 8 and Landsat 9, GEDI Raster Canopy Height metrics, GLO-30 DEM, ERA5-Land Reanalysis Monthly Aggregates, ALOS PALSAR-2 ScanSAR, GRACE monthly mass grids, and several text sources, AlphaEarth Foundations is able to learn a more compact representation of pixel properties and semantics.

From images to embeddings: translating video sequences into information-rich feature vectors for every 10 m x 10 m pixel.

Spatial and temporal context. AlphaEarth Foundations was trained on over 3 billion individual image frames sampled from over 5 million locations globally. By treating satellite images of a given location over time as if they were frames in a video, the model is able to learn across space, time, and mode of measurement to produce embeddings that capture spatial context and preserve temporal trajectories. This means every embedding vector in the Satellite Embedding dataset provides a highly compact, yet semantically rich representation of surface conditions for every 10-meter pixel (100 square-meter) area of Earth’s terrestrial land surface. Each 10-meter pixel’s embedding also captures information about the area around that pixel, so that areas that appear very similar when considered in isolation, e.g., the asphalt surfaces of a parking lot or a freeway, will have quite distinct embeddings. And in the case of our GOOGLE/SATELLITE_EMBEDDING/V1/ANNUAL collection, embeddings summarize a full year of image acquisitions, which means they include seasonal signals, like vegetation phenology or seasonal snow cover, and other within-year change events.

📄 Read more about the model and evaluations in the AlphaEarth Foundations paper.

Earth in 64 dimensions: Coordinates versus bands

Images in the Satellite Embedding dataset have 64 bands — but they are not like classic optical reflectance or radar returns. Rather, the 64 “bands” of a single pixel in our AlphaEarth Foundations embedding represent a 64-dimensional coordinate on the surface of a 64-dimensional “sphere”. Similar to how we need latitude and longitude and elevation to most accurately describe our position on the surface of the Earth, we need all 64 axes of the AlphaEarth Foundations representation to precisely define a Satellite Embedding coordinate — and while it is tempting to want to explain these axes, it’s important to remember that embeddings are learned by a deep learning model. While mathematically highly explanatory, they are representations of a much higher dimensional measurements space, not measurements in their own right.

A Satellite Embedding is essentially a coordinate on the surface of a 64-dimensional “sphere”

So what can you do with the Satellite Embedding dataset?

To get you inspired, here are a few things we’re excited about:

Similarity Search: You can pick a point anywhere on Earth — say, in a specific type of farmland or forest — and instantly find and map all other locations with similar surface and environmental conditions anywhere in the world.
Change Detection: By comparing the embedding vectors for the same pixel from different years, you can easily spot changes and track processes like urban sprawl, wildfire impacts and recovery, and fluctuating reservoir water levels.
Automatic Clustering: Without any pre-existing labels, you can use clustering algorithms to automatically group pixels into distinct categories. This spatio-temporal segmentation can reveal hidden patterns in the landscape, differentiating various types of forests, soils, or urban development.
Smarter Classification: You can create accurate maps with far less training data. For example, instead of needing tens of thousands of labeled points to map crop types with more conventional inputs, you might only need a few hundred per class, saving time and compute.

Read on to learn more!

Find other places like this: Similarity search

Similarity searches are an easy way to compare embedding vectors for different locations and quickly identify pixels that have similar environmental and surface conditions to a location of interest. For example, the embedding vector for a 100-square-meter (10 m x 10 m) pixel in the dense urban landscape of New York City reveals strong similarity to other highly developed urban centers around the world.

Example of similarity search for -73.9812, 40.7628 (Midtown Manhattan, New York City, United States). (view in interactive demo app)

Want to try your own examples? Explore interactively using our Satellite Embedding similarity search demo, and learn more about how to use a simple dot product calculation to implement your own similarity search in our new Similarity Search with Satellite Embeddings tutorial.

Detecting change: Tracking shifts in embedding space

Similarity-based comparisons also work through time, and can be used for embedding-powered change detection and stability monitoring. The AlphaEarth Foundations embedding space was designed to be temporally consistent, so relatively stable locations should have similar embedding vectors across years in the dataset, while year-to-year changes in embedding vectors for a given location are indicative of changes in surface properties, environmental conditions, and/or their temporal dynamics. By computing the angle between annual embedding vectors for different years, you can monitor for long-term stability and catastrophic change, and begin to explore and understand the drivers of these changes.

The figures below show some examples of changes between 2020 and 2024 as seen in embedding space, with the final image in each row showing the similarity of each pixel to itself (brighter values indicating greater dissimilarity) for the following types of change:

Suburban expansion
A wildfire scar interspersed with clearcuts where vegetation loss pre-dates the fire event
Changes in a man-made reservoir from a period of drought to less strained water conditions
Differences in fields between years, showing how embeddings capture within-year dynamics like crop cycles and fallowing.

Examples of year-to-year comparisons using the Satellite Embedding dataset. Examples for Central California, USA, comparing embedding layers for the years 2020 and 2024 (Source: Earth Engine Code Editor script)

Discover hidden patterns: Automatic clustering

If you want to explore more complex groupings and other hidden patterns in embedding space, ee.Clusterer algorithms are a good starting point, especially if you don’t have existing label or measurement data. And unlike three-channel RGB visualizations, clustering allows you to visualize patterns using all 64 dimensions of the embedding space at once.

To cluster Satellite Embeddings using the Earth Engine API, select a region of interest and generate some number of random samples. Sample the embeddings for any year in the Satellite Embedding dataset and use this random sample to train a kMeans clustering algorithm varying the number of clusters and apply the trained clusters back to the larger region of interest. Looking at resulting maps of cluster IDs, we can see interesting patterns emerge from the embedding space, including differentiation among land surface types and phenologies, as well as topography and hydrology. Learn more about unsupervised classification using the Satellite Embedding dataset, including basic cluster visualization and how to assign labels to clusters, in our Introduction to the Satellite Embedding dataset and Unsupervised Classification with Satellite Embedding — Crop Type Mapping tutorials.

Animation showing unsupervised clustering of the Satellite Embedding dataset from coarse to precise segmentation (Source: Earth Engine Code Editor script)

Create detailed maps with less manual labeling

Exploring underlying patterns and manually labelling clusters is one inroad into exploring the Satellite Embedding dataset, but embeddings were really designed for effective interpolation of existing labels and measurements, i.e., supervised classification and regression problems.

If you have a label dataset, it’s easy to use Earth Engine to sample Satellite Embedding vectors for labeled locations, train a built-in ee.Classifier, and apply the trained classifier at scale to generate map tiles. The low-noise embedding space means that you should need fewer labels to get high-quality results. For example, Satellite Embeddings can be used to proxy 87 crop type and land cover classes from the 2024 USDA NASS Cropland Data Layer with just 150 samples per class.

Example classification using the Satellite Embedding dataset (left) to proxy a subset (87 classes) from the 2024 USDA Cropland Data Layer (right) using just 150 points per class for training. (Source: Earth Engine Code Editor script)

Satellite Embeddings were designed to work well for clustering and tree-based classification, like kNN or Random Forest, but you can substitute embeddings for raw image inputs or other engineered features like composites or aggregate statistics in any existing classification workflow. And Satellite Embedding images are conveniently analysis-ready, with gap-free, wall-to-wall coverage, and are hosted as tiles projected in their local UTM zone, no additional preprocessing required. So rather than spend time and compute wrangling data, you can focus on the important part: good training data and a high-quality mapped result.

Learn more about supervised classification and regression using the Satellite Embedding dataset in our Supervised Classification with Satellite Embedding — Mapping Mangroves and Regression with Satellite Embedding — Predicting Above Ground Biomass (AGB) tutorials.

Bringing AI to Earth Engine

The launch of the Satellite Embedding dataset in the Earth Engine Data Catalog marks a next step on delivering on Earth Engine’s mission to make geospatial data more accessible and useful for understanding our changing planet. By bringing AlphaEarth Foundations’ Satellite Embeddings to Earth Engine as an Image Collection, we’re delivering the power of AI as an analysis-ready dataset that directly integrates with the Earth Engine API and broader ecosystem.

We can’t wait to see how you use this new AI capability within Earth Engine! Check out the Satellite Embedding dataset in the Earth Engine Data Catalog to start using the dataset today. If you are not already an Earth Engine user, get started using the Code Editor Quickstart.

Want to learn more about how you can use the Satellite Embedding dataset, have questions, or interested in sharing a case study? Get in touch here using our feedback form.

Finally, we are excited to announce that we’re offering a series of small grants (up to $5,000 USD) to eligible researchers to help accelerate scientific inquiry and publication of Satellite Embedding use cases. We are accepting submissions over the next few months.

Google Earth and Earth Engine