# Standard AIKP development flow for Sentinel-2 Data Ingestion and Analysis **Developing a Standard AIKP for Sentinel-2 Data Ingestion (via internal Finder) and NDVI Analysis "Mondrian Example"** --- This tutorial walks through the implementation of a standard AI Knowledge Pack (AIKP) named Mondrian. Mondrian is an example AIKP in the OCLI platform designed to: - fetch Sentinel-2 satellite imagery from the Copernicus API using an internal Finder tool. Internal Finder tool automates data retrieval based on user-defined filters. - create a stack using internal S2 Processor and compute spectral indices (like NDVI). - visualize (RGBA cog tiff) in a "Mondrian" style. - publish to be available from Web UI. ## Files structure ``` ocli.aikp.mondrian/ ├ __init__.py # Marks the AIKP directory as a python package. ├ cli.py # Define all custom AIKP specific CLI commands here | # For example `task template visualize` ├ config.py # AIKP-specific configuration defaults for tasks and recipes. | # This includes default paths and parameters used across the AIKP. ├ recipe_schema.json # JSON Schema for recipe validation. ├ template.py # Task Template class (TemplateMondrian) a.k.a task controller. | # which implements common task operations: validate task, | # create task, upgrade task, update recipe, ai path resolve, etc.. | └ s2/ # A sub-package for satellite specific logic (Sentinel-2) ├ __init__.py # It defines `Template` class inherited from TemplateMondrian ├ config.py # Sentinel-2 specific configuration overrides or extends to base config defaults. └ assemble_recipe.py # Assembly logic for Sentinel-2 data (the heart of raster processing AIKP). # It defines how input Sentinel-2 imagery (the stack) is transformed into output tensors (e.g. computing NDVI). # This file provides an `assemble_kernel` function and lists required input bands for the Assembler. ``` With this structure, Mondrian cleanly separates general template logic from Sentinel-2-specific details. This means that, if it is necessary to implement the same use case but for a different satellite image provider, only the Sentinel-2 specific folder needs to be made, while the base logic stays the same. In case the use case will not be at any point expanded to multiple satellite image providers, all the files can be merged into the same level of directory (with same file names merged into one). ## Configuring Task Defaults (config.py) The base `config.py` in Mondrian sets general defaults used for all tasks of this template, such as directory paths and default analysis settings, for example defines paths for EO data and results and a default index to compute: ```python TASK_DEFAULTS = { 'eodata': None, # Path to raw Earth observation data (to be set by user or env) 'ai_results': '/optoss/out', # Path for analysis output files 'stack_results': '/optoss/stack', # Path for intermediate stack files 'master': None, 'master_path': None, 'single': True, # Single imagery mode (True for one image input) 'index_kind': 'NDVI', # Default index product to generate 'kind': 'index', 'metadata_preset': "Remote Sensing Indices" } ``` User-defined filters (like date range or cloud cover) are not hard-coded in these configs – instead, they are handled dynamically via the Finder. When a new task is created using Mondrian’s template, the template injects additional Finder-related keys into the task configuration. In the `TemplateMondrian` class (defined in `template.py`), the `create_task` method is overridden to call an OCLI helper that adds default filter parameters: `mondrian/template.py` (inside TemplateMondrian.create_task) ```python super().create_task(task) inject_finder_keys(task) # populate task.config with default Finder filter keys (e.g. date range) ``` OCLI’s `inject_finder_keys` will add entries under the task config (usually prefixed with `finder.`) that the Sentinel-2 Finder expects – for example, default time range, cloud coverage limits, etc. This means that immediately after creation, the task has search filters in place (with sensible defaults). The template also calls `inject_finder_keys` whenever a task is loaded or upgraded, ensuring the Finder parameters stay in sync with the code version. Users can view or override these by retrieving or setting the task’s configuration. For instance, if the default time window or cloud cover is not suitable, you could run commands like `task set finder.start_date=2024-01-01` (exact key names depend on the Finder’s implementation) to adjust them. In summary, the config file and template setup ensure that: - The Sentinel-2 Finder will be used (via `sat_family_name='s2'`). - Task defaults (paths, mode flags, default index, etc.) are set. - Finder filter keys (like date range, etc.) are automatically added to each task for Sentinel-2 searches, which users can customize via CLI if needed. ## Satellite specific Finder Integration (s2/config.py) The configuration files define default parameters for tasks and recipes, and they also connect the AIKP to the appropriate data Finder. In Mondrian’s s2/config.py, we see the key entry linking to the Sentinel-2 finder: ```python TASK_DEFAULTS = { 'sat_family_name': 's2', # Use Sentinel-2 data finder for this AIKP 'friendly_name': "{project}/{kind}/{name}", 'cos_key': "{project}/{kind}/{name}", } ``` Here 'sat_family_name': 's2' tells OCLI that this AIKP’s data source is Sentinel-2. OCLI’s core will use this to instantiate the appropriate Finder for searching images. The `friendly_name` and `cos_key` patterns define how output products will be named when published (using placeholders like project name, kind, etc.). ## Searching for Sentinel-2 Imagery with the Finder With the task configured, the next step is to fetch Sentinel-2 imagery using the internal Finder tool. OCLI’s Finder system abstracts the Copernicus Open Access Hub API, allowing us to search for Sentinel-2 scenes by area, date, and other filters without writing API calls ourselves. In the Mondrian workflow, once a task is created and configured, you trigger the search by using the CLI command `product load` (provided by OCLI’s CLI, not a custom command) that invokes the Finder for the current task. Under the hood, OCLI will use the task’s `sat_family_name` to get the correct finder instance. Using the ROI and any filter parameters in the task, the S2 Finder queries the Copernicus API (Sentinel-2 catalog) for matching images. For example, it will use the task’s ROI geometry (set when the task was created with `--roi`) and date range (`finder.start_date`, `finder.end_date` in the task config) to find all available Sentinel-2 scenes that cover the area and time window, possibly also filtering by cloud cover percentage if that is a finder key. Mondrian is designed to work with one Sentinel-2 image (since `single=True` in config) On setting the master, the template’s `task_set` hook kicks in. Mondrian’s Template overrides `task_set` to intercept changes to the `master` field. In code, it looks up the chosen scene in the Finder’s records and populates the task with the scene’s details: `mondrian/template.py` (inside TemplateMondrian.task_set) ```python finder = finder_by_task(task) record = finder.get_meta_record(value) # value is user-provided master ID task.config['master'] = record['title'] # Human-readable scene title task.config['master_path'] = record['product_eodata_path'] # Local path for the scene data ``` This code uses `finder.get_meta_record` to retrieve the full metadata for the selected product (where `value` is the ID we provided). It then updates `task.config['master']` to the product’s official title and, importantly, sets `task.config['master_path']` to the expected path where the raw data will be stored. The `product_eodata_path` is typically a file path (or intended file path) on the local system (for example, a path under an `eodata` directory) where the Sentinel-2 SAFE archive will reside. At this point, the task knows which Sentinel-2 scene to use and where it should be on disk. ## Downloading the data Selecting the master by itself doesn’t fetch the satellite imagery yet – it only updates the metadata. To actually retrieve the Sentinel-2 data from Copernicus, OCLI provides the `--data` flag on the `task get` command. Here `-m` stands for “master”, and `--data` signals that the raw data should be downloaded. This command causes OCLI to download the Sentinel-2 product (if not already present) to the `master_path` defined earlier. Under the hood, the S2 Finder knows the download URL or method for the product and will fetch the imagery (e.g., the `.SAFE` archive or imagery files) into the local filesystem (e.g., into the directory specified by `eodata`). Once this completes, the task’s `master_path` now points to an actual file location containing the Sentinel-2 scene data. At this stage, we have successfully searched for a Sentinel-2 scene using user-defined filters (ROI, date, etc.), chosen one scene, and retrieved its data automatically. The Finder has handled all interactions with the Copernicus API. As a developer, you did TASK_DEFAULTS in config.py and injected Finder keys in create_task class method in template.py which allowed OCLI’s built-in Finder to do the work. ## Assembling Retrieved Scenes into a Tensor With the Sentinel-2 scene downloaded, the next step is to convert that raw data into the format needed for analysis – in our case, to compute an index like NDVI. OCLI’s architecture uses a concept of a Stack to represent the prepared input data (e.g., the relevant bands of the imagery), and an Assembler to turn the stack into the final tensor (the numeric array representing the index). Mondrian takes advantage of a satellite-specific stack creation tool called Processor and provides a custom assembly recipe for the NDVI calculation as an Assembler. ### Stack creation In the Mondrian workflow, after selecting and downloading the master image, the command `task make stack s2` is used. This invokes OCLI’s Sentinel-2 stack Processor (which is available as part of the platform’s tools to process the downloaded Sentinel-2 scene. The S2 Processor will: - Read the raw Sentinel-2 data (e.g., the SAFE archive or imagery files in `master_path`). - Extract the necessary spectral bands. In Mondrian’s case, the indices of interest (NDVI, NDWI, NDMI) require certain bands: Red (B4), Green (B3), NIR (B8), and SWIR (B11). - Resample bands if needed to a common resolution and align them. For example, Sentinel-2’s B11 is 20m resolution while B4, B3, B8 are 10m – the Processor will resamples B11 to 10m so that computations with B8 are possible. - Save each band as a separate file in a structured way in the `stack_results` directory. ### Assembling tensors After `task make stack s2`, the stack directory (e.g., under `/optoss/stack`) will contain files for each band. Mondrian’s `assemble_recipe.py` anticipates these files by name: there is an `input_list` defined, which enumerates the expected input files for assembly: `mondrian/s2/assemble_recipe.py` (partial) ```python input_list = [ {"fname": "master_res_10_B3", "type": "intensity", "resolution": 10, "band": "B3", "index": 3}, {"fname": "master_res_10_B4", "type": "intensity", "resolution": 10, "band": "B4", "index": 4}, {"fname": "master_res_10_B8", "type": "intensity", "resolution": 10, "band": "B8", "index": 8}, {"fname": "master_res_20_B11", "type": "intensity", "resolution": 20, "band": "B11", "index": 11}, ] ``` This list tells the Assembler which stack files to use as inputs. Each entry corresponds to one image layer: for example, it expects a file named `master_res_10_B4` (the Red band at 10m resolution). The stack creation step produces files with these names, so now the Assembler knows where to find each band image. So, now once the stack is ready, we can perform the actual index calculation. In the CLI, this is initiated by invoking the Assembler with the command `ai basic assemble zone`, which reads the stack and produces the output tensor. Mondrian’s `assemble_recipe.py` provides the custom logic for how to combine the Sentinel-2 bands into the desired index output. The core of this logic is the `assemble_kernel` function: `mondrian/s2/assemble_recipe.py` (inside assemble_kernel) ```python if product_name == 'NDVI': index = (B(8) - B(4)) / (B(8) + B(4)) # Compute NDVI from NIR (B8) and Red (B4) elif product_name == 'NDWI': index = (B(3) - B(8)) / (B(3) + B(8)) # Compute NDWI from Green (B3) and NIR (B8) elif product_name == 'NDMI': index = (B(8) - B(11)) / (B(8) + B(11)) # Compute NDMI from NIR (B8) and SWIR (B11) outputs.append(index, name=product_name) ``` For NDVI, it uses the classic formula (NIR – Red) / (NIR + Red); etc for NDWI and NDMI. The helper `B(band_index)` is used to fetch the image array for that band from the input layers, applying any necessary preprocessing (the code above it handles things like scaling and masking invalid pixels). Finally, the computed `index` array is appended to the outputs with the name of the product. Mondrian’s assembly code also includes a `before()` function that runs once – it checks that the requested product (NDVI/NDWI/NDMI) is one of the supported ones. This ensures the task’s `index_kind` (which the user can set) is valid. After assembly, the output is saved to the `ai_results` directory. By default, OCLI’s Assembler will produce NumPy `.npy` file for the tensor + `.hdr` file (ENVI header format) + `.npy` bad-pixel mask file (indicating any pixels with no data), and a JSON metadata file. Mondrian’s `assemble_recipe.py` defines the filenames for these outputs in the `tnsr_fnames` dictionary (for a full region output vs. a zonal sub-area). At this point, the Sentinel-2 data has been transformed into a ready-to-use result – for example, an NDVI image covering the ROI. ## Implementing and Exposing Custom CLI Commands (cli.py) One of the advantages of AIKPs is that you can add AIKP-specific commands to the OCLI CLI for convenience. In Mondrian’s `cli.py`, a custom `visualize` command is defined to help developers quickly inspect the result. This command isn’t strictly required for the data pipeline – it’s a utility to convert the assembled tensor into a viewable format or perform a quick operation (like ensuring it’s one-band NDVI data). A few things to note from this: - The command is created using `click.command('visualize')` and some OCLI-specific decorators (`@argument_zone`, etc.). OCLI collects these commands and attaches them under the `task template` CLI group. That’s why we can run it as `task template visualize`. - The `@pass_task` and `@ensure_task_resolved` decorators provide the current task object to the function and ensure the task is properly loaded. - The function uses OCLI’s `Recipe` and `Filenames` utilities to locate the output files for the given zone (`zone/full_tnsr.npy` and `zone/full_bd.npy` (bad pixel mask)). What visualize custom command does: loads the NDVI tensor into a NumPy array (`index`) and its bad-pixel mask (`empty_pixels`) confirms that the tensor has one band (as expected for NDVI) Essentially, `visualize` helps convert the NumPy output into a standard raster format for visualization. For your own AIKP, you can define similar custom commands in `cli.py` to extend functionality. Once defined, these commands become available in the OCLI shell after mounting them in the ‘cli_task_mount’ class method in template.py. The key is to use OCLI’s CLI patterns (`click` library and provided decorators) to integrate them. ## Running the Mondrian AIKP Workflow (Step-by-Step) [Please see ocli.aikp.mondrian for details](../../ocli/aikp/mondrian/README.md) As a developer, your focus was on the custom parts (setting defaults, specifying how to find data, and how to compute the index), while OCLI handled the boilerplate of finding data, creating stacks, and output formatting. ## Adapting this Pattern for Your Own AIKP - Decide on the satellite/data source and ensure OCLI has a Finder for it. Set your `sat_family_name` accordingly (e.g., `"s1"` for Sentinel-1, `"landsat8"`, etc.). If a finder doesn’t exist, you might need to implement it or use an external search, but OCLI provides finders for many common sources (Sentinel-1, Sentinel-2, Landsat-8, ASTER, WorldView, etc.). - Define TASK_DEFAULTS and RECIPE_DEFAULTS in your AIKP’s config. Include any unique task parameters your analysis needs, and default values that make sense. Most can mirror Mondrian’s (paths, etc.). - If you are using an internal or newly built Finder, use `inject_finder_keys` in your template to automatically add search filter parameters. This way, users can easily tweak filters via CLI without you hard-coding them. - Use OCLI’s stack creators if available/applicable. If your data source is supported, the heavy lifting of processing raw imagery into analysis-ready rasters is already implemented. Just specify the needed bands or data layers in your `assemble_recipe.py` `input_list`. If not available, you can write your own stack Processor code. - Implement the assembly logic in `assemble_recipe.py`. Focus on the core analytics: combine bands, apply formulas, or run models on the data. Mondrian’s assemble_kernel is a simple example (computing an index); yours could be more complex (e.g., a machine learning model inference on imagery). - Add any custom commands in `cli.py` to enhance usability. This could be for visualization, exporting results, or any custom task operation. The commands has to be mounted under `task template` automatically, as long as you define them with `@click.command` and use the appropriate decorators to get the task context. By following this approach, you let OCLI handle the common workflow steps (searching/downloading data, validation, and output handling) while you concentrate on the unique logic of your AIKP.