All About Scripts¶
What is a Script?¶
Scripts are specific elements that are part of a LOST annotation pipeline. A script element is implemented as a python3 module. The listing below shows an example of such a script. This script will request image annotations for all images of a dataset.
from lost.pyapi import script
import os
ENVS = ['lost']
class AnnoAllImgs(script.Script):
'''This Script requests image annotations for each image of an imageset.
An imageset is basicly a folder with images.
'''
def main(self):
self.logger.info("Request image annotations for:")
for ds in self.inp.datasources:
media_path = ds.path
for img_file in os.listdir(media_path):
img_path = os.path.join(media_path, img_file)
self.outp.request_image_anno(img_path=img_path)
self.logger.debug(img_path)
if __name__ == "__main__":
my_script = AnnoAllImgs()
In order to implement a script you need to create a python class that
inherits from lost.pyapi.script.Script
.
Your class needs to implement a main method needs to be instantiated
within your python script.
The listing below shows a minimum example
for a script.
1 2 3 4 5 6 7 8 9 | from lost.pyapi import script
class MyScript(script.Script):
def main(self):
self.logger.info('Hello World!')
if __name__ == "__main__":
MyScript()
|
Example Scripts¶
More script examples can be found here: lost/backend/lost/pyapi/examples/pipes
The LOST PyAPI Script Model¶
As all pipeline elements a script has an input and an output object. Via these objects it is connected to other elements in a pipeline (see also Pipeline Definition Files).
Inside a script you can exchange information with the connected elements
by using the self.inp
object and
the self.outp
object.
Reading Imagesets¶
It is a common pattern to read a path to an imageset from a
Datasource element in your annotation pipeline.
See Listing 3 for a code example.
Since multiple Datasources could be connected to our script,
we iterate over all connected Datasources of the input with
self.inp.datasources
.
For each Datasource element we can read the path attribute
to get the filesystem path to a folder with images.
1 2 3 4 5 6 7 8 9 10 11 12 | from lost.pyapi import script
import os
class MyScript(script.Script):
def main(self):
for ds in self.inp.datasources:
for img_file in os.listdir(ds.path):
img_path = os.path.join(ds.path, img_file)
if __name__ == "__main__":
MyScript()
|
Requesting Annotations¶
The most important feature of the LOST PyAPI is the ability to request
annotations for a connected AnnotationTask element.
Inside a Script you can access the output element and call the
self.outp.request_annos
method (see Listing 4).
self.outp.self.outp.request_annos(img_path)
Sometimes you also want to send annotation proposals to an AnnotationTask in order to support your annotator. In most cases these proposals will be generated by an AI, like an object detector. The listing below shows a simple example to send a dummy box and a dummy point to an annotation tool.
self.outp.self.outp.request_annos(img_path,
annos = [[0.1, 0.1, 0.2, 0.2], [0.1, 0.2]],
anno_types = ['bbox', 'point'])
Annotation Broadcasting¶
If multiple AnnoTask elements are connected to your
ScriptOutput
and you call
self.outp.request_annos
,
the annotation request will be broadcasted to all connected AnnoTasks.
So each AnnoTask will get its own copy of your annotation request.
Technically, for each annotation request an empty
ImageAnno
will be created for
each AnnoTask.
During the annotation process this
ImageAnno
will be filled with information.
Reading Annotations¶
Another important task is to read annotations from previous
pipeline elements.
In most cases this will be
AnnoTask
elements.
If you like to read all annotations at the
script input
in a vectorized way,
you can use self.inp.to_df()
to get a pandas DataFrame
or self.inp.to_vec()
to get a
list of lists.
If you prefer to iterate over all
ImageAnnos
you can use the
respective iterator self.inp.img_annos
.
See the listing below for an example.
for img_anno in self.inp.img_annos:
for twod_anno in img_anno.twod_annos:
self.logger.info('image path: {}, 2d_anno_data: {}'.format(img_anno.img_path, twod_anno.data)
Contexts to Store Files¶
There are three different contexts that can be used to store files
that should handled by your script.
Each context is modeled as a specific folder in the lost filesystem.
In order to get the path to a context call
self.get_path
.
Listing 6 shows an application of the
self.get_path
in order to
get the path to the instance context.
from lost.pyapi import script
import os
import pandas as pd
ENVS = ['lost']
ARGUMENTS = {'file_name' : { 'value':'annos.csv',
'help': 'Name of the file with exported bbox annotations.'}
}
class ExportCsv(script.Script):
'''This Script creates a csv file from image annotations and adds a data_export
to the output of this script in pipeline.
'''
def main(self):
df = self.inp.to_df()
csv_path = self.get_path(self.get_arg('file_name'), context='instance')
df.to_csv(path_or_buf=csv_path,
sep=',',
header=True,
index=False)
self.outp.add_data_export(file_path=csv_path)
if __name__ == "__main__":
my_script = ExportCsv()
There a three types of contexts that can be accessed: instance, pipe, static.
The instance context is only accessible by the current instance of your script. Each time a pipeline is started each script will get its own instance folder in the LOST filesystem. No other script in the same pipeline will access this folder.
If you like to exchange files among the script instances of a started
pipeline,
you can choose the pipe context.
When calling self.get_path
with context = ‘pipe’ you will get a path to a folder that is
available to all script instances of a pipeline instance.
The static context is a path to the pipeline project folder where all script instances will have access to. In this way you can access files that you have provided inside the Pipeline Project. For example, if you like to load a pretrained machine learning model inside of your script, you can put it into the pipeline project folder and and access it via the static context:
path_to_model = self.get_path('pretrained_model.md5', context='static')
Logging¶
Each Script will have a its own
logger
.
This logger is an instance of the standard
python logger.
The example below shows how to log an
info message, a warning and an error.
All logs are redirected to a pipeline log file that can be downloaded
via the pipeline view inside the web gui.
self.logger.info('I am a info message')
self.logger.warning('I am a warning')
self.logger.error('An error occured!')
Script Errors and Exceptions¶
If an error occurs in your script, the traceback of the exception will be visible in the web gui, when clicking on the respective script in your pipeline. The error will also be automatically logged to the pipeline log file.
Script ARGUMENTS¶
The ARGUMENTS variable will be used to provide script arguments that can be set during the start of a pipline within the web gui. ARGUMENTS are defined as a dictionary of dictionaries. Each argument dictionary has the keys value and help. As you can see in the listing below the first argument is called my_arg its value is true and its help text is A boolean argument.
ARGUMENTS = {'my_arg' : { 'value':'true',
'help': 'A boolean argument.'}
}
Within your script you can access the value of an argument with the
get_arg(...)
method as
shown below.
if self.get_arg('my_arg').lower() == 'true':
self.logger.info('my_arg was true')
Script ENVS¶
The EVNS variable provides meta information for the pipeline engine by defining a list of environments (similar to conda environments) where this script may be executed in. In this way you can assure that a script will only be executed in environments where all your dependencies are installed. All environments are installed in workers that may execute your script. If many different environments are defined within the ENVS list of a script, the pipeline engine will try to assign the script to a worker in the same order as defined within the ENVS list. So if a worker is online that has installed the first environment in the list the pipeline engine will assign the script to this worker. If no worker with the first environment is online, it will try to assign the script to a worker with the second environment in the list and so on. Listing 11 shows an example of the ENVS definition in a script that may be executed in two different environments.
ENVS = ['lost', 'lost-cv']
Script RESOURCES¶
Sometimes a script will require all resources of a worker. And therefore no other script should be executed in parallel by the worker that executes your script. This is often the case if you train an AI model and you need all GPU memory to do this. In those cases, you can define a RESOURCES variable inside your python script and assign a list containing the string lock_all to it. See the listing below for an example:
RESOURCES = ['lock_all']
Debugging a Script¶
Most likely, if you imported your pipeline and run it for the first time some scripts will not work, since you placed some tiny bug into your code :-)
Inside the web GUI all exceptions and errors of your script will be visualized when clicking on the respective script element in the pipeline visualization. In this way you get a first hint what’s wrong.
In order to debug your code you need to login to the docker container and find the instance folder that is created for each script instance. Inside this folder there is a bash script called debug.sh that need to be executed in order to start the pudb debugger. You will find your script by its unique pipeline element id. The path to the script instance folder will be /home/lost/data/instance/i-<pipe_element_id>.
# Log in to docker
docker exec -it lost bash
# Change directory to the instance path of your script
cd /home/lost/data/instance/i-<pipe_element_id>
# Start debugging
bash debug.sh
Note
If your script requires a special ENV to be executed, you need to login to a container that has installed this environment for debugging.