Getting started with `create-lilypad-module`
Use create-lilypad-module to create Lilypad modules
create-lilypad-module
is an officially supported package that provides a simple scaffolding for building Lilypad modules. It offers a modern Docker setup with minimal configuration and useful pre-written scripts.
A Lilypad module is a Git repository that allows you to perform various tasks using predefined templates and inputs.
Getting Started: install and run create-lilypad-module
Folder Structure: output and explanation of create-lilypad-module
files
Configuration: requirements and explanations of Lilypad module configuration
Creating Your Module: a step-by-step guide on how to create a simple Lilypad module using create-lilypad-module
Configure your Lilypad module
After bootstrapping your module, additional configuration is required to run it.
.env
WEB3_PRIVATE_KEY
🚨 DO NOT SHARE THIS KEY 🚨
The private key for the wallet that will be used to run the job.
This is required to run the module on Lilypad Network.
A new development wallet is highly recommended to use for development. The wallet must have enough LP tokens and Arbitrum Sepolia ETH to fund the job.
config/constants.py
DOCKER_REPO
The Docker Hub repository storing the container image of the module code.
This is required to push the image to Docker Hub and run the module on Lilypad Network.
e.g. "<dockerhub_username>/<dockerhub_image>"
DOCKER_TAG
The specific tag of the DOCKER_REPO
containing the module code.
Default: "latest"
MODULE_REPO
The URL for the GitHub repository storing the lilypad_module.json.tmpl
file. The visibility of the repository must be public.
The lilypad_module.json.tmpl
file points to a DOCKER_REPO
and Lilypad runs the module from the image.
e.g. "github.com/<github_username>/<github_repo>"
TARGET_COMMIT
The git branch or commit hash that contains the lilypad_module.json.tmpl
file you want to run.
Use git log
to easily find commit hashes.
Default: "main"
Your module will be bootstrapped with some handy scripts to help you download the model(s) for your module, build and push Docker images, and run your module locally or on Lilypad Network. Some additional configuration may be required.
In the project directory, you can run:
python -m scripts.download_models
A basic outline for downloading a model from Hugging Face is provided, but the structure of the script and the methods for downloading a model can differ between models and libraries. It’s important to tailor the process to the specific requirements of the model you're working with.
Most (but not all) models that utilize machine learning use the 🤗 Transformers library, which provides APIs and tools to easily download and train pretrained models.
No matter which model you are using, be sure to thoroughly read the documentation to learn how to properly download and use the model locally.
python -m scripts.docker_build
Builds and optionally publishes a Docker image for the module to use.
For most use cases, this script should be sufficient and won't require any configuration or modification (aside from setting your DOCKER_REPO
and DOCKER_TAG
).
In the modules Dockerfile
, you'll find 3 COPY instructions.
These instructions copy the requirements.txt
file, the src
directory, and the models
directory from your local machine into the Docker image. It's important to remember that any modifications to these files or directories will necessitate a rebuild of the module's Docker image to ensure the changes are reflected in the container.
--push
Flag
Running the script with --push
passed in pushes the Docker image to Docker Hub.
--no-cache
Flag
Running the script with --no-cache
passed in builds the Docker image without using the cache. This flag is useful if you need a fresh build to debug caching issues, force system or dependency updates, pull the latest base image, or ensure clean builds in CI/CD pipelines.
python -m scripts.run_module
This script is provided for convenience to speed up development. It is equivalent to running the Lilypad module with the provided input and private key (unless running the module locally, then no private key is required). Depending on how your module works, you may need to change the default behavior of this script.
--local
Flag
Running the script with --local
passed in runs the Lilypad module Docker image locally instead of on Lilypad's Network.
--demonet
Flag
Running the script with --demonet
passed in runs the Lilypad module Docker image on Lilypad's Demonet.
lilypad_module.json.tmpl
The default lilypad_module.json.tmpl
file is below. Make sure to update the Docker Image to point to your Docker Hub image with the correct tag.
The default
lilypad_module.json.tmpl
should work for low complexity modules. If your module requires additional resources (such as a GPU) make sure to configure the applicable fields.
Machine: Specifies the system resources.
Job: Specifies the job details.
APIVersion: Specifies the API version for the job.
Metadata: Specifies the metadata for the job.
Spec: Contains the detailed job specifications.
Deal: Sets the concurrency to 1, ensuring only one job instance runs at a time.
Docker: Configures the Docker container for the job
WorkingDirectory: Defines the working directory of the Docker image.
Entrypoint: Defines the command(s) to be executed in the container as part of its initial startup runtime.
EnvironmentVariables: This can be utilised to set env vars for the containers runtime, in the example above we use Go templating to set the INPUT
variable dynamically from the CLI.
Image: Specifies the image to be used (DOCKERHUB_USERNAME
/IMAGE
:TAG
).
Engine: Sets the container runtime (Default: "Docker"
).
Network: Specifies that the container does not require networking (Default: "Type": "None"
).
Outputs: Specifies name and path of the directory that will store module outputs.
Resources: Specify additional resources.
Timeout: Sets the maximum duration for the job. (Default: 600
[10 minutes]).
The folder structure output from using `create-lilypad-module`
After creation, your project should look like this:
For the module to run, these files must exist with exact filenames:
src/run_inference.py
The Dockerfile
ENTRYPOINT
.
If you change this files name or location, you must also update the ENTRYPOINT
in your Dockerfile
and lilypad_module.json.tmpl
file to match.
config/constants.py
The configuration file that stores the DOCKER_REPO
, DOCKER_TAG
, MODULE_REPO
, and TARGET_COMMIT
.
If you change this files name or location, you must also update the import
statements in scripts/docker_build.py
and scripts/run_module.py
.
Dockerfile
Required to build your module into a Docker image, and push the image to Docker Hub where it can be accessed by Lilypad Network.
requirements.txt
Used by the Dockerfile
to install dependencies required by your module.
Technically, this file can be deleted or renamed, but this naming convention is highly recommended as an industry standard best practice.
lilypad_module.json.tmpl
The Lilypad configuration file.
You can delete or rename the other files.
You may create subdirectories inside src
. For faster builds and smaller Docker images, only files inside src
are copied by Docker. You need to put any files required to run your module inside src
, otherwise Docker won’t copy them.
You can create more top-level directories. They will not be included in the final Docker image so you can use them for things like documentation.
If you have Git installed and your project is not part of a larger repository, then a new repository will be initialized resulting in an additional top-level .git
directory.
Create your Lilypad module
This guide will walk you through creating a basic sentiment analysis module using create-lilypad-module
and distilbert/distilbert-base-uncased-finetuned-sst-2-english
(which will be referred to as Distilbert from now on). We will be referring back to the Hugging Face page throughout this guide, so it's best to keep it open and accessible.
Input:
Output:
If you prefer to follow along with a video guide, you can view our live workshop below! 👇
To build and run a module on Lilypad Network, you'll need to have the Lilypad CLI, Python and Docker on your machine, as well as GitHub and Docker Hub accounts.
For this guide, we'll be using create-lilypad-module
which requires pip
and uses Python
.
The first thing you'll need for your module is a local model to use.
A basic outline for downloading a model from Hugging Face is provided in scripts/download_models.py
. The structure of the script and the methods for downloading a model can differ between models and libraries. It’s important to tailor the process to the specific requirements of the model you're working with.
You can get started by attempting to run the download_models.py
script.
Since the script hasn't been properly configured yet, it will return an error and point you to the file.
Open scripts/download_models.py
and you will see some TODO
comments with instructions. Let's go through them each in order. You can remove each TODO
comment after completing the task.
First we have a reminder to update our requirements.txt
file, which is used by the Dockerfile
to install the module's dependencies. In the next line is a commented out import
statement.
To find the dependencies that our model requires, we can refer back to Distilbert's Hugging Face page and click on the "Use this model" dropdown, where you will see the 🤗 Transformers library as an option. Click it.
Most (but not all) models that utilize machine learning use the 🤗 Transformers library, which provides APIs and tools to easily download and train pretrained models.
You should see a handy modal explaining how to use the model with the Transformers
library. For most models, you'd want to use this. However, Distilbert has a specific tokenizer and model class. Close the modal and scroll to the How to Get Started With the Model section of the model card. We're going to use this instead.
For now, let's look at the top 2 lines of the provided code block:
Notice that torch
is also being used. Copy the transformers
import statement and paste it over the existing import statement in our download_models.py
file.
Now open requirements.txt
:
These are 2 of the most common libraries when working with models. Similar to the import
statement in the download_models.py
file, they are provided by default for convenience, but commented out because although they are common, not every model will use them.
Since this model happens to use both of these libraries, we can uncomment both lines and close the file after saving.
torch
is a collection of APIs for extending PyTorch’s core library of operators.
Return to the download_models.py
file, and look for the next TODO
comment.
If we take a look at the Distilbert Hugging Face page, we can use the copy button next to the name of the module to get the MODULE_IDENTIFIER
. Paste that in as the value.
For our use case, it should look like this:
You're almost ready to download the model. All you need to do now is replace the following 2 lines after the TODO
comment:
Instead of using AutoTokenizer
and AutoModelForSequenceClassification
, replace those with the DistilBertTokenizer
and DistilBertForSequenceClassification
we imported.
The script is now configured! Try running the command again.
The models
directory should now appear in your project. 🎉
No matter which model you are using, be sure to thoroughly read the model's documentation to learn how to properly download and use the model locally.
Now for the fun part, it's time to start using the model!
This time we'll get started by running the run_module
script.
You should see an error with some instructions.
Let's tackle the run_inference.py
script first. This is where your modules primary logic and functionality should live. There is a TODO
comment near the top of the file.
We've already updated the requirements.txt
file, so we can skip that step. Go ahead and uncomment the import
statements and replace the transformers
line with the DistilBertTokenizer
and DistilBertForSequenceClassification
.
We should refer back to the "How to Get Started With the Model" section of Distilbert's model card to figure out how to use the model.
Let's implement this into our run_inference
script. Scroll down to the main()
function and you'll see another TODO
comment.
Same as before, uncomment and replace AutoTokenizer
with DistilBertTokenizer
and AutoModelForSeq2SeqLM
with DistilBertForSequenceClassification
. This is now functionally identical to the first 2 lines of code from Distilbert's example.
Below that, the tokenizer
and model
are passed into the run_job()
function. Let's scroll back up and take a look at the function. This is where we'll want to implement the rest of the code from Distilbert's example. The inputs
are already functionally identical, so let's adjust the output
.
From the Distilbert model card, copy all of the code below the inputs
variable declaration, and paste it over the output
variable declaration in your modules code.
All we need to do from here is set the output
to the last line we pasted.
That's everything we'll need for the modules source code!
We still need to finish step 2 that the error in the console gave us earlier. Open the run_module.py
script.
Find the TODO
comment and delete the code block underneath.
Before you are able to run your module, we need to build the Docker image. You can run the following command:
You should see the following response in the console:
Open the constants.py
file, it should look like this:
For now, we'll be testing the module locally, so all we need to worry about is the DOCKER_REPO
variable. We'll use MODULE_REPO
when it's time to run the module on Lilypad Network. For help or more information, view the configuration documentation
You should be able to successfully build the Docker image now.
In the modules Dockerfile, you'll find 3 COPY instructions.
These instructions bring the requirements.txt
file, the src
directory, and the models
directory into the Docker image. It's important to remember that any modifications to these files or directories will necessitate a rebuild of the module's Docker image to ensure the changes are reflected in the container.
It's finally time to see your module in action.
Let's start by running it locally.
The CLI should ask you for an input. Enter whatever you like and hit enter. The module will analyze the sentiment of your input and output the results at outputs/result.json
.
You just used a local LLM! 🎉
Before you can run the module on Lilypad Network, you'll need to push the Docker image to Docker Hub.
While the Docker image is being built and pushed, you should configure the rest of the variables in constants.py
. Make sure that you push your code to a public GitHub repository.
Since these variables are only used in scripts and not in any src
code that gets used in the Docker image we won't need to rebuild after making these changes.
The last thing we'll need to do is edit the Lilypad module configuration file, lilypad_module.json.tmpl
. For the purposes of this module, the default configuration is mostly correct. However, the "Image"
field needs to be configured.
Replace the default value with your Docker Hub username, module image, and tag.
Once your Docker image is pushed to Docker Hub and your most recent code is pushed to a public GitHub repository, you can test your module on Lilypad's DemoNet by replacing the --local
flag with --demonet
You can also remove the
--demonet
flag and supply yourWEB3_PRIVATE_KEY
to run the module on Lilypad's IncentiveNet.
You just used an LLM on Lilypad's decentralized network! 🎉
Now anyone who has the Lilypad CLI installed can also run your module: