All About Scripts

What is a Script?

Scripts are specific elements that are part of a LOST annotation pipeline. A script element is implemented as a python3 module. The listing below shows an example of such a script. This script will request image annotations for all images of a dataset.

Listing 1: An example LOST script.
from lost.pyapi import script
import os

ENVS = ['lost']

class LostScript(script.Script):
    '''This Script requests image annotations for each image of an imageset.

    An imageset is basicly a folder with images.
    '''
    def main(self):
        self.logger.info("Request image annotations for:")
        for ds in self.inp.datasources:
            media_path = ds.path
            fs = ds.get_fs()
            for img_file in os.listdir(media_path):
                img_path = os.path.join(media_path, img_file)
                self.outp.request_image_anno(img_path=img_path, fs=fs)
                self.logger.debug(img_path)

if __name__ == "__main__":
    my_script = LostScript()

In order to implement a script you need to create a python class that inherits from lost.pyapi.script.Script. Your class needs to implement a main method needs to be instantiated within your python script. The listing below shows a minimum example for a script.

Listing 2: A minimum example for a script in LOST
1
2
3
4
5
6
7
8
9
from lost.pyapi import script

class MyScript(script.Script):

    def main(self):
        self.logger.info('Hello World!')

if __name__ == "__main__":
    MyScript()

Example Scripts

More script examples can be found here: lost/backend/lost/pyapi/examples/pipes

The LOST PyAPI Script Model

As all pipeline elements a script has an input and an output object. Via these objects it is connected to other elements in a pipeline (see also Pipeline Definition Files).

Inside a script you can exchange information with the connected elements by using the self.inp object and the self.outp object.

Reading Imagesets

It is a common pattern to read a path to an imageset from a Datasource element in your annotation pipeline. See Listing 3 for a code example. Since multiple Datasources could be connected to our script, we iterate over all connected Datasources of the input with self.inp.datasources. For each Datasource element we can read the path attribute to get the filesystem path to a folder with images.

Listing 3: Getting the path to all images of a Datasource.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
from lost.pyapi import script
import os

class MyScript(script.Script):

    def main(self):
        for ds in self.inp.datasources:
            for img_file in os.listdir(ds.path):
                img_path = os.path.join(ds.path, img_file)

if __name__ == "__main__":
    MyScript()

Requesting Annotations

The most important feature of the LOST PyAPI is the ability to request annotations for a connected AnnotationTask element. Inside a Script you can access the output element and call the self.outp.request_annos method (see Listing 4).

Listing 4: Requesting an annotation for a image.
self.outp.self.outp.request_annos(img_path)

Sometimes you also want to send annotation proposals to an AnnotationTask in order to support your annotator. In most cases these proposals will be generated by an AI, like an object detector. The listing below shows a simple example to send a dummy box and a dummy point to an annotation tool.

Listing 5: Requesting an annotation for a image.
self.outp.self.outp.request_annos(img_path,
    annos = [[0.1, 0.1, 0.2, 0.2], [0.1, 0.2]],
    anno_types = ['bbox', 'point'])

Annotation Broadcasting

If multiple AnnoTask elements are connected to your ScriptOutput and you call self.outp.request_annos, the annotation request will be broadcasted to all connected AnnoTasks. So each AnnoTask will get its own copy of your annotation request. Technically, for each annotation request an empty ImageAnno will be created for each AnnoTask. During the annotation process this ImageAnno will be filled with information.

Reading Annotations

Another important task is to read annotations from previous pipeline elements. In most cases this will be AnnoTask elements.

If you like to read all annotations at the script input in a vectorized way, you can use self.inp.to_df() to get a pandas DataFrame or self.inp.to_vec() to get a list of lists.

If you prefer to iterate over all ImageAnnos you can use the respective iterator self.inp.img_annos. See the listing below for an example.

Iterate over all annotation at the script input.
for img_anno in self.inp.img_annos:
    for twod_anno in img_anno.twod_annos:
        self.logger.info('image path: {}, 2d_anno_data: {}'.format(img_anno.img_path, twod_anno.data)

Contexts to Store Files

There are three different contexts that can be used to store files that should handled by your script. Each context is modeled as a specific folder in the lost filesystem. In order to get the path to a context call self.get_path. Listing 6 shows an application of the self.get_path in order to get the path to the instance context.

Listing 6: Create a csv file and store this file to the instance context.
from lost.pyapi import script
import os

ENVS = ['lost']
ARGUMENTS = {'file_name_parquet' : { 'value':'annos.parquet',
                            'help': 'Name of the file with exported bbox annotations in parquet format.'},
            'file_name_csv' : { 'value':'annos.csv',
                            'help': 'Name of the file with exported bbox annotations in csv format.'},
            }

class LostScript(script.Script):
    '''This Script creates a csv file from image annotations and adds a data_export
    to the output of this script in pipeline.
    '''
    def main(self):
        df = self.inp.to_df()
        fs = self.get_filesystem()
        
        file_path_parquet = self.get_path(self.get_arg('file_name_parquet'), context='instance')
        file_path_csv = self.get_path(self.get_arg('file_name_csv'), context='instance')
        self.logger.info('File path parquet: {}'.format(file_path_parquet))
        self.logger.info('File path csv: {}'.format(file_path_csv))
        
        with fs.open(file_path_parquet, 'wb') as f:
            df.to_parquet(f)
        
        with fs.open(file_path_csv, 'wb') as f:
            df.to_csv(f,sep=',',
                      header=True,
                      index=False)
        
        self.logger.info('Wrote file: {}'.format(fs.ls(os.path.split(file_path_parquet)[0])))
        self.outp.add_data_export(file_path=file_path_parquet, fs=fs)
    
        self.logger.info('Wrote file: {}'.format(fs.ls(os.path.split(file_path_csv)[0])))
        self.outp.add_data_export(file_path=file_path_csv, fs=fs)

if __name__ == "__main__":
    my_script = LostScript()

There a three types of contexts that can be accessed: instance, pipe, static.

The instance context is only accessible by the current instance of your script. Each time a pipeline is started each script will get its own instance folder in the LOST filesystem. No other script in the same pipeline will access this folder.

If you like to exchange files among the script instances of a started pipeline, you can choose the pipe context. When calling self.get_path with context = ‘pipe’ you will get a path to a folder that is available to all script instances of a pipeline instance.

The static context is a path to the pipeline project folder where all script instances will have access to. In this way you can access files that you have provided inside the Pipeline Project. For example, if you like to load a pretrained machine learning model inside of your script, you can put it into the pipeline project folder and and access it via the static context:

Listing 7: Getting the path to the static context.
path_to_model = self.get_path('pretrained_model.md5', context='static')

Logging

Each Script will have a its own logger. This logger is an instance of the standard python logger. The example below shows how to log an info message, a warning and an error. All logs are redirected to a pipeline log file that can be downloaded via the pipeline view inside the web gui.

Listing 8: Logging examples.
self.logger.info('I am a info message')
self.logger.warning('I am a warning')
self.logger.error('An error occured!')

Script Errors and Exceptions

If an error occurs in your script, the traceback of the exception will be visible in the web gui, when clicking on the respective script in your pipeline. The error will also be automatically logged to the pipeline log file.

Script ARGUMENTS

The ARGUMENTS variable will be used to provide script arguments that can be set during the start of a pipline within the web gui. ARGUMENTS are defined as a dictionary of dictionaries. Each argument dictionary has the keys value and help. As you can see in the listing below the first argument is called my_arg its value is true and its help text is A boolean argument.

Listing 9: Defining arguments.
ARGUMENTS = {'my_arg' : { 'value':'true',
                'help': 'A boolean argument.'}
            }

Within your script you can access the value of an argument with the get_arg(...) method as shown below.

Listing 10: Accessing argument values.
if self.get_arg('my_arg').lower() == 'true':
    self.logger.info('my_arg was true')

Script ENVS

The EVNS variable provides meta information for the pipeline engine by defining a list of environments (similar to conda environments) where this script may be executed in. In this way you can assure that a script will only be executed in environments where all your dependencies are installed. All environments are installed in workers that may execute your script. If many different environments are defined within the ENVS list of a script, the pipeline engine will try to assign the script to a worker in the same order as defined within the ENVS list. So if a worker is online that has installed the first environment in the list the pipeline engine will assign the script to this worker. If no worker with the first environment is online, it will try to assign the script to a worker with the second environment in the list and so on. Listing 11 shows an example of the ENVS definition in a script that may be executed in two different environments.

Listing 11: ENVS definition with inside a script.
ENVS = ['lost', 'lost-cv']

Script RESOURCES

Sometimes a script will require all resources of a worker. And therefore no other script should be executed in parallel by the worker that executes your script. This is often the case if you train an AI model and you need all GPU memory to do this. In those cases, you can define a RESOURCES variable inside your python script and assign a list containing the string lock_all to it. See the listing below for an example:

Listing 12: RESOURCES definition inside a script.
RESOURCES = ['lock_all']

Debugging a Script

Most likely, if you imported your pipeline and run it for the first time some scripts will not work, since you placed some tiny bug into your code :-)

Inside the web GUI all exceptions and errors of your script will be visualized when clicking on the respective script element in the pipeline visualization. In this way you get a first hint what’s wrong.

In order to debug your code you need to login to the docker container and find the instance folder that is created for each script instance. Inside this folder there is a bash script called debug.sh that need to be executed in order to start the pudb debugger. You will find your script by its unique pipeline element id. The path to the script instance folder will be /home/lost/app/debug/i-<pipe_element_id>.

# Log in to docker
docker exec -it lost bash
# Change directory to the instance path of your script
cd /home/lost/app/debug/i-<pipe_element_id>
# Start debugging
bash debug.sh

Note

If your script requires a special ENV to be executed, you need to login to a container that has installed this environment for debugging.