Inference API
Anura, Lilypad's official AI inference API
Getting Started
Use Anura to start running AI inference job modules on Lilypad's decentralized compute network:
Get an API key from the Anura website.
NEW! See All Models Available (GraphQL)
You can run the following endpoints to view all Anura-supported models on the Lilypad network. These queries are available as graphQL queries here. You can also view all available queries by opening Apollo Server and putting this url in:
https://lilypad-model-api.vercel.app/api/graphqlCurl Requests
Get all Models:
curl -X POST https://lilypad-model-api.vercel.app/api/graphql \
-H "Content-Type: application/json" \
-d '{"query": "{ allModels { id name category } }"}'Response example:
{
"data": {
"allModels": [
{
"id": "llama3.1:8b",
"name": "Llama 3.1 8B",
"category": "text-generation"
},
{
"id": "sdxl-turbo",
"name": "SDXL Turbo",
"category": "image-generation"
}
]
}
}You can also fetch a selection of multiple model types using the following endpoint:
Get Started with Text Generation
Find which models we support:
Choose a model, customize your request and fire away:
Get Started with Image Generation
Find which models we support:
Choose a model and generate your first image
API Endpoints
API Clients
Rate limits
Currently the rate limit for the api is set to 20 calls per second
Get Available Models
To see which models are available:
Chat Completions API
Chat Completions
POST /api/v1/chat/completions
Note: Due to the decentralized nature of the Lilypad Network we recommend using the streaming variant where possible at this time
This endpoint provides both a streaming interface using Server-Sent Events (SSE) and non-streaming interface for chat completions which is compliant with the OpenAI specification. This means that you can plug and play Anura using the OpenAI SDK by simply passing in the Anura Url and API Key into your client like so:
Request Headers
Content-Type: application/json*Accept: text/event-stream(recommended for streaming)Authorization: Bearer YOUR_API_KEY*
Request Parameters
model*
Model ID used to generate the response (e.g. deepseek-r1:7b). Required.
string
messages*
A list of messages comprising the conversation so far. Required.
array
Request Body (non-streaming)
Response Format (non-streaming)
The response is an OpenAI ChatCompletion Object with the following format:
Response Codes
200 OK: Request successful, stream begins400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key404 Not Found: Requested model not found500 Internal Server Error: Server error processing request
Response Object Fields
The response data contains the following fields:
id
A unique identifier for the chat completion
object
The object type
created
Timestamp when the response was created
model
The model used for generation
choices
The array containing the assistant's response
choices[0].message.role
Always "assistant" for responses
choices[0].message.content
The generated text content
choices[0].message.tool_calls
The array containing the corresponding tool response objects (this is only applicable if you make a tool request)
choices[0].finish_reason
Reason for completion (e.g., "stop", "length")
usage.prompt_tokens
The number of tokens used in the prompt
usage.completion_tokens
The number of tokens in the generated completion
usage.total_tokens
The sum of the prompt_tokens and the completion_tokens
Request Body (streaming)
Response Format (streaming)
The response is a stream of Server-Sent Events (SSE) with chunked OpenAI ChatCompletion objects with the following format:
Initial response:
Processing updates:
Content delivery:
Completion marker:
Response Codes
200 OK: Request successful, stream begins400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key404 Not Found: Requested model not found500 Internal Server Error: Server error processing request
Response Object Fields
The delta event data contains the following fields:
id
A unique identifier for the chat completion
object
The object type
created
Timestamp when the response was created
model
The model used for generation
choices
The array containing the assistant's response
choices[0].delta.role
Always "assistant" for responses
choices[0].delta.content
The generated text content
choices[0].delta.tool_calls
The array containing the corresponding tool response objects (this is only applicable if you make a tool request)
choices[0].finish_reason
Reason for completion (e.g., "stop", "length")
usage.prompt_tokens
The number of tokens used in the prompt
usage.completion_tokens
The number of tokens in the generated completion
usage.total_tokens
The sum of the prompt_tokens and the completion_tokens
Conversation Context
The API supports multi-turn conversations by including previous messages in the request:
This allows for contextual follow-up questions and maintaining conversation history.
Tooling calls
The Anura chat completions endpoint supports requests with tooling allowing for function calling through many popular AI frameworks and sdks.
At the moment only a select number models support tooling including:
llama3.1:8b
qwen2.5:7b
qwen2.5-coder:7b
phi4-mini:3.8b
mistral:7b
Below is a sample request and response
Request:
Response:
Vision Support
The chat completions API also supports vision requests allowing for image-to-text search against a base64 encoded image. This will allow you to make a query against an image asking a LLM what the image is or about particular details around it. Currently vision is only supported via the following models (more coming soon):
llava:7b
gemma3:4b
Additionally, the vision capability is limited by the following constraints:
Images must only be base64 encoded (you cannot pass a link to an image at this time)
Maximum image size is 512px x 512px
Support for JPEG or PNG format
Request:
Response:
Embeddings
Use the embeddings endpoint to compute embeddings for user queries supported by the nomic-embed-text model. This endpoint is OpenAI compliant which means you can use it with the OpenAI SDK (see the end of the Embeddings section for a code example)
Endpoint
POST /api/v1/embeddings
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Request Parameters
model*
Model ID used to generate the response (e.g. nomic-embed-text). Required.
string
input*
The input to create embeddings from. This can be either a single string or an array of strings. Required
string or array of strings
Request Sample (single input)
Response Sample (single input)
Request Sample (multiple input)
Response Sample (multiple input)
Response Codes
200 OK: Request successful, stream begins400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key404 Not Found: Requested model not found500 Internal Server Error: Server error processing request
Example Code using the OpenAI SDK
Image Generation
The Anura API enables you to run stable diffusion jobs to generate images executed through our decentralized compute network. It's really easy to get started generating your own generative AI art using Anura through the endpoints we provide.
Retrieve the list supported image generation models
GET /api/v1/image/models
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Request Parameters
model*
Model ID used to generate the response (e.g. sdxl-turbo). Required.
string
prompt*
The prompt input to generate your image from (max limit of 1000 characters)
string
Request Sample
Response
Response Codes
200 OK: Request successful, stream begins400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key404 Not Found: Requested model not found500 Internal Server Error: Server error processing request
Currently we support sdxl-turbo; however, we are always adding new models, so stay tuned!
Generate an AI Image
POST /api/v1/image/generate
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Request Parameters
model*
Model ID used to generate the response (e.g. sdxl-turbo). Required.
string
prompt*
The prompt input to generate your image from (max limit of 1000 characters)
string
Request Sample
Alternatively you can also make the same request through a curl command and have the image be output to a file on your machine
The result of running this command will be the creation of the spaceship.png file in the directory you ran the command from.
Response
This endpoint will return the raw bytes value of the image that was generated which you can output to a file (like shown in the curl command above) or place it in a buffer to write to a file in your app, e.g.
Note: Should you ever need to know what the corresponding Job Offer ID for image generation, it is provided in the response header as Job-Offer-Id
Response Codes
200 OK: Request successful, stream begins400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key404 Not Found: Requested model not found500 Internal Server Error: Server error processing request
Video Generation
The Anura API enables you to run long running jobs to generate videos executed through our decentralized compute network. It's really easy to get started generating your own videos using Anura through the endpoints we provide.
Note: Video generation can take anywhere between 4-8 mins to produce a video
Retrieve the list supported video generation models
GET /api/v1/video/models
Currently we support wan2.1; however, we are always adding new models, so stay tuned!
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Request Sample
Response
Response Codes
200 OK: Request successful401 Unauthorized: Invalid or missing API key500 Internal Server Error: Server error processing request
Send out a request to create an AI generated video
POST /api/v1/video/create-job
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Request Parameters
model*
Model used to generate the response (e.g. wan2.1). Required.
string
prompt*
The prompt input to generate your video from (max limit of 1000 characters). Required.
string
negative_prompt
An optional field to specify to the model what to exclude from the generated scene
string
Request Sample
Response
This endpoint will return an job_offer_id which is an unique identifier corresponding to the job that's running to create your video. What you'll want to do with this id is pass it into our /video/results endpoint (see below) which will provide you the output as a webp file or report that the job is still running. In the latter case, you then can continue to call the endpoint at a later time to eventually retrieve your video. As mentioned in the beginning of this section, video generation can take anywhere between 4-8 mins to complete.
Response Codes
200 OK: Request successful, stream begins400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key404 Not Found: Requested model not found500 Internal Server Error: Server error processing request
Retrieve your video
GET /api/v1/video/results/:job_offer_id
job_offer_id*
The id returned to you in the video creation request i.e /api/v1/video/create-jobRequired.
string
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Response
If the video is still in the process of being generated you will see a response that looks like the following:
Response Codes
102 Processing: Request is still processing the creation of the video200 OK: Request successful400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key500 Internal Server Error: Server error processing request
However, once the video has be generated you'll be returned the video in webp format with its raw bytes which you can save to a file in the following manner:
The result of the above command will be the video.webp file being saved in the directory from which you ran it from:

Audio Generation
The Anura API enables you to generate audio from text executed through our decentralized compute network. It's really easy to get started generating your own audio using Anura through the endpoints we provide.
Note: Audio generation can take anywhere between 40 seconds to 3 mins to complete depending on the input length
Retrieve the list supported audio generation models
GET /api/v1/audio/models
Currently we support kokoro; however, we are always adding new models, so stay tuned!
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Request Sample
Response
Response Codes
200 OK: Request successful401 Unauthorized: Invalid or missing API key500 Internal Server Error: Server error processing request
Send out a request to create an AI generated audio
POST /api/v1/audio/create-job
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Request Parameters
model*
Model used to generate the response (e.g. kokoro). Required.
string
input*
The prompt input to generate your audio from (max limit of 420 characters). Required.
string
voice*
The voice to use when generating the audio sample. Possible values are heart, puck, fenrir, and bellaRequired.
string
Voice samples
Heart
Puck
Fenrir
Bella
Request Sample
Response
This endpoint will return an job_offer_id which is an unique identifier corresponding to the job that's running to create your audio. What you'll want to do with this id is pass it into our /audio/results endpoint (see below) which will provide you the output as a wav file or report that the job is still running. In the latter case, you then can continue to call the endpoint at a later time to eventually retrieve your audio. As mentioned in the beginning of this section, audio generation can take anywhere between 40 seconds to 3 mins to complete.
Response Codes
200 OK: Request successful400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key404 Not Found: Requested model not found500 Internal Server Error: Server error processing request
Retrieve your video
GET /api/v1/audio/results/:job_offer_id
job_offer_id*
The id returned to you in the audio creation request i.e /api/v1/audio/create-jobRequired.
string
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Response
If the audio is still in the process of being generated you will see a response that looks like the following:
Response Codes
102 Processing: Request is still processing the creation of the audio200 OK: Request successful400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key500 Internal Server Error: Server error processing request
However, once the audio has be generated you'll be returned the audio in wav format with its raw bytes which you can save to a file in the following manner:
Web Search
The Anura API provides developers with a web search capability enabling you to add a powerful tool to your AI Agent building arsenal. LLM's are only as great as their training data and are taken to the next level when provided with additional context from the web. With web search you can power your AI Agent workflow with live web search data providing your LLM the most up to date information on the latest on goings in the world.
It's easy to get started searching the web through the Anura API using our endpoint:
POST /api/v1/websearch
Request Headers
Content-Type: application/json*Authorization: Bearer YOUR_API_KEY*
Request Parameters
query*
The web search query you wish to execute
string
number_of_results*
The number of search results you want returned (limited to 1 to 10 inclusive)
number
Request Sample
Response Sample
The response will include the following fields:
results
The array of search results where each result object is made up of the strings: title, url and description
related_queries
An array of strings containing similar queries based on the one you supplied
count
The number of search results returned
Response Codes
200 OK: Request successful, stream begins400 Bad Request: Invalid request parameters401 Unauthorized: Invalid or missing API key404 Not Found: Requested model not found500 Internal Server Error: Server error processing request
Jobs
GET /api/v1/jobs/:id- Get status and details of a specific job
Get Status/Details of a Job
You can use another terminal to check job status while the job is running.
Get Outputs from a Job
Once your job has run, you should get output like this:
Cowsay
POST /api/v1/cowsay- Create a new cowsay jobRequest body:
{"message": "text to display"}
GET /api/v1/cowsay/:id/results- Get results of a cowsay job
Last updated
Was this helpful?

