Harmonizing Landsat for Sentinel-2: Unlocking Data Back to 1984
Earth observation holds untapped potential in the vast archives of decades-old satellite data. Yet for Sentinel-2 users, this wealth of historical information has remained largely out of reach. ClearSKY bridges this gap by transforming historical datasets into Sentinel-2 standards, making it possible to seamlessly access and analyze 40 years of Landsat data.
The Value of Historical Earth Observation Data
The Landsat program, with its archive spanning over 50 years, is a cornerstone of Earth observation. Its continuous, high-resolution imagery has provided invaluable insights into global environmental changes, such as deforestation, glacier retreat, and urban expansion. This wealth of historical data enables researchers to study long-term trends and better understand the evolving dynamics of Earth’s systems.
By comparison, Sentinel-2, while offering unparalleled resolution and spectral capabilities, has only about a decade of data. This is significant in human years but represents just a snapshot in “planet years,” limiting its ability to reveal slower, long-term environmental changes. Landsat’s multi-decade archive provides the necessary historical baseline to contextualize and complement Sentinel-2’s more recent observations.
The economic and societal impact of Landsat data further underscores its value. By 2023, its annual economic benefits were estimated at $25.6 billion, reflecting its role in supporting industries, informing policy, and advancing scientific research. Historical Earth observation data not only serves as a reliable baseline for understanding the past but also provides essential context for analyzing present-day phenomena and forecasting future trends. Landsat’s archive is a vital resource for addressing global challenges, offering decades of insights to support sustainable development and environmental stewardship.
Challenges of Using Legacy Satellite Data
While invaluable, legacy datasets like Landsat 4/5/7 come with challenges that can hinder their usability for modern companies used to Sentinel-2:
- Resolution Differences : Sentinel-2 provides 10m (visible, NIR) and 20m (SWIR, red edge) resolution, while Landsat offers 30m resolution for most bands. This difference can pose challenges for precision applications that rely on Sentinel-2’s finer spatial details.
- Wavelength Differences: Landsat lacks the dedicated red edge bands found in Sentinel-2, which are crucial for vegetation monitoring and crop analysis. Additionally, while both sensors capture multispectral data, their bands do not perfectly align, with slight differences in central wavelengths that can affect direct comparisons.
- Atmospheric Correction: Landsat and Sentinel-2 use different Bottom-of-Atmosphere (BOA) correction algorithms, making direct reflectance comparisons difficult. The retrieval methods behind Landsat’s BOA products (e.g., LEDAPS, LaSRC) differ significantly from Sentinel-2’s algorithms (e.g., Sen2Cor).
- Data Artifacts & Quality: Older Landsat datasets often contain sensor-related artifacts, striping, and radiometric inconsistencies, particularly in early Landsat missions or post-2003 Landsat 7 due to its SLC failure. Additionally, geolocation inaccuracies in older datasets can introduce alignment issues when combining Landsat with Sentinel-2 for time-series analysis
From Past to Present, Seamlessly
ClearSKY bridges the gap between decades of historical Earth observation data and modern analysis needs. By harmonizing Landsat’s extensive archive with Sentinel-2 standards, we extend analysis-ready workflows back to 1984, unlocking unparalleled opportunities for long-term environmental insights.
Landsat’s 30-meter resolution, introduced in 1984, was groundbreaking for its time and remains highly valuable today. Its level of detail strikes a unique balance, offering enough spatial precision to capture meaningful environmental changes over large areas. Moreover, its proximity to Sentinel-2’s 10-meter resolution makes it an ideal candidate for harmonization, ensuring smooth transitions in data quality without compromising usability. If a feature cannot be seen in the original Landsat image, it should not appear in its Sentinel-2-equivalent. We do not use AI to hallucinate details that were never captured. The only exception is when multiple images allow us to map subpixel movements over time, extracting additional insights through temporal analysis. ClearSKY prioritizes data integrity, ensuring that every pixel in the harmonized dataset reflects real, measurable information rather than artificial enhancements.
ClearSKY’s transformation process resolves key differences — rescaling resolution, aligning spectral bands, and unifying formats — to create a seamless, analysis-ready dataset. This ensures users can analyze decades of historical trends with the same precision and ease as Sentinel-2 data, without relying on AI-driven guesswork. By making historical Earth observation data a natural extension of today’s workflows, ClearSKY ensures that Landsat’s rich legacy remains accessible, actionable, and relevant for tackling modern challenges.
Refining the View: Atmospheric Correction
Historical satellite imagery often comes with atmospheric distortions that can obscure critical details. Standard Bottom-of-Atmosphere (BOA) corrections improve surface reflectance consistency, but they are not always sufficient — especially when dealing with haze, thin cloud cover, and variable atmospheric conditions across decades of data. One challenge with traditional BOA correction is that it is often applied on a per-tile basis, meaning that adjacent Sentinel-2 or Landsat tiles can sometimes exhibit noticeable differences in correction levels. In many cases, this isn’t a major issue, but in areas with changing conditions — such as snow-covered landscapes, wildfire smoke, or shifting aerosols — these inconsistencies can disrupt large-scale analysis.
While tile-based corrections provide localized accuracy, they can also create abrupt transitions that interfere with broader regional analyses. ClearSKY applies deep learning-based corrections to refine historical imagery, removing atmospheric noise, cirrus clouds, and other inconsistencies while preserving real environmental changes. By filtering where needed and maintaining natural variability where it matters, ClearSKY enhances the clarity, consistency, and usability of historical satellite data. With these refinements in atmospheric correction, the next challenge is ensuring that our harmonization methods hold up in real-world scenarios — especially when working with legacy datasets that lack direct Sentinel-2 comparisons.
Prototype Phase: Real-World Harmonization Testing
The current prototype looks promising, but as with any deep learning process, edge cases take time to refine. Our focus has been on harmonizing cloud-free, L1TP Landsat products — geometrically accurate datasets that provide a solid foundation. However, with no Sentinel-2 data before 2015, direct validation against the workhorse of the Landsat program, Landsat 5, is impossible. Operating between 1984 and 2011, Landsat 5 captured decades of critical Earth observation data, but without a direct Sentinel-2 counterpart, harmonization must rely on indirect comparisons and alternative validation approaches. To address this, we use cross-sensor calibration methods, leveraging insights from Landsat 8 as an intermediary reference and incorporating historical consistency checks.
Most satellite images are far from perfect — haze, cloud cover, and varying atmospheric conditions affect a significant portion of the archive. While 100% cloud-free products are ideal for early-stage harmonization testing, they do not reflect real-world data availability. In practice, a large share of historical Landsat scenes contain some level of cloud obstruction, making them the most difficult cases for both production and validation. A key challenge is that standard cloud masks are binary, either a pixel is labeled as cloud or it is not. However, in reality, clouds exist on a spectrum, with partial transparency, subpixel contamination, and thin cirrus layers that are difficult to classify accurately. This makes harmonization more complex, as overly aggressive cloud masking can remove valid data, while insufficient masking can introduce unwanted atmospheric noise into the harmonized dataset.
Addressing this requires robust atmospheric correction and cloud-masking techniques to ensure that harmonization methods remain reliable even when working with suboptimal or partially clouded imagery. While deep learning models can aid in detecting and filtering these challenges, the real test lies in ensuring that harmonized data remains accurate across varying conditions, not just in ideal cases, and one’s ability to identify what is what.
Filling the Gaps in Historical Data
At ClearSKY, we specialize in creating cloud-free products by leveraging both optical and SAR imagery. While no openly accessible high-resolution SAR data exists from 1984, we can still fill gaps in historical archives using multi-image compositing techniques.
This is particularly relevant for Landsat 7, which has suffered from data gaps due to the SLC failure since 2003. The missing scan lines in post-2003 Landsat 7 images introduce significant challenges for time-series analysis and long-term environmental monitoring. By combining multiple images over time, we can reconstruct more complete datasets, ensuring that harmonized products remain as consistent and reliable as possible — even when working with imperfect historical records.
Depending on user needs, ClearSKY can provide both single-image harmonization - preserving the original dataset structure - or multi-image harmonization, which leverages additional observations to improve continuity and completeness. If gap-filling is of interest, we can explore tailored solutions that balance historical consistency with enhanced data usability.
This article makes extensive use of original Landsat 5 and Landsat 8/9 imagery, all heavily modified unless stated otherwise, courtesy of the USGS. Whether you’re interested in single-image harmonization or multi-image gap filling, we’re here to make Earth observation data more accessible, reliable, and actionable.
Learn more at clearsky.vision/harmonization or reach out to contact@clearsky.vision to discuss pilot testing opportunities.