Inference API
Anura, Lilypad's official AI inference API
Last updated
Was this helpful?
Anura, Lilypad's official AI inference API
Last updated
Was this helpful?
Use Anura to start running AI inference job modules on Lilypad's decentralized compute network:
Get an API key from the .
Find which models we support:
curl GET "https://anura-testnet.lilypad.tech/api/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY"
Choose a model, customize your request and fire away:
curl -X POST "https://anura-testnet.lilypad.tech/api/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Accept: text/event-stream" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{
"model": "MODEL_NAME:MODEL_VERSION",
"messages": [{
"role": "system",
"content": "you are a helpful AI assistant"
},
{
"role": "user",
"content": "what order do frogs belong to?"
}],
"stream": true,
"temperature": 0.6
}'
Find which models we support:
curl GET "https://anura-testnet.lilypad.tech/api/v1/image/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY"
Choose a model and generate your first image
curl -X POST https://anura-testnet.lilypad.tech/api/v1/image/generate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY" \
-d '{"prompt": "A spaceship parked on a lilypad", "model": "sdxl-turbo"}' \
--output spaceship.png
If you are using an API client such as Bruno or Postman, you can use our provided collections below.
Currently the rate limit for the api is set to 20 calls per second
To see which models are available:
curl -X GET "https://anura-testnet.lilypad.tech/api/v1/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer YOUR_API_KEY"
* = Required
POST /api/v1/chat/completions
Note: Due to the decentralized nature of the Lilypad Network we recommend using the streaming variant where possible at this time
This endpoint provides both a streaming interface using Server-Sent Events (SSE) and non-streaming interface for chat completions which is compliant with the OpenAI specification. This means that you can plug and play Anura using the OpenAI SDK by simply passing in the Anura Url and API Key into your client like so:
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'https://anura-testnet.lilypad.tech/api/v1',
apiKey: process.env.ANURA_API_KEY || '',
});
const completion = await client.chat.completions.create({
model: 'llama3.1:8b',
messages: [
{ role: 'system', content: 'You are a helpful AI assistant.' },
{ role: 'user', content: 'Are semicolons optional in JavaScript?' },
],
});
return completion.choices[0].message.content;
Request Headers
Content-Type: application/json
*
Accept: text/event-stream
(recommended for streaming)
Authorization: Bearer YOUR_API_KEY
*
Request Parameters
model
*
Model ID used to generate the response (e.g. deepseek-r1:7b
). Required.
string
messages
*
A list of messages comprising the conversation so far. Required.
array
frequency_penalty
Number between -2.0
and 2.0
. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
0
max_tokens
The maximum number of tokens that can be generated in the chat completion.
presence_penalty
Number between -2.0
and 2.0
. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
0
response_format
seed
If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed
and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint
response parameter to monitor changes in the backend.
null
stop
Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.
stream
false
stream_options
Options for streaming response. Only set this when you set stream: true
.
null
temperature
What sampling temperature to use, between 0
and 2
. Higher values like 0.8
will make the output more random, while lower values like 0.2
will make it more focused and deterministic. We generally recommend altering this or top_p
but not both.
1
tools
A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported. At the moment only a select number models support tooling including:
llama3.1:8b
qwen2.5:7b
qwen2.5-coder:7b
phi4-mini:3.8b
mistral:7b
top_p
An alternative to sampling with temperature
, called nucleus sampling, where the model considers the results of the tokens with top_p
probability mass. So 0.1
means only the tokens comprising the top 10% probability mass are considered.
We generally recommend altering this or temperature
but not both.
1
Request Body (non-streaming)
{
"model": "llama3.1:8b",
"messages": [
{
"role": "system",
"content": "you are a helpful AI assistant"
},
{
"role": "user",
"content": "write a haiku about lilypads"
}
],
"temperature": 0.6
}
Response Format (non-streaming)
The response is an OpenAI ChatCompletion Object with the following format:
{
"id": "jobId-Qmds4fif8RLVKrSKfWVGHe7fDkBwizzV5omd3kPSnc8Xdf-jobState-ResultsSubmitted",
"object": "chat.completion",
"created": 1742509938,
"model": "llama3.1:8b",
"system_fingerprint": "",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Lily pads dance\nOn the water's gentle lap\nSerene beauty"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 2048,
"completion_tokens": 184,
"total_tokens": 2232
}
}
Response Codes
200 OK
: Request successful, stream begins
400 Bad Request
: Invalid request parameters
401 Unauthorized
: Invalid or missing API key
404 Not Found
: Requested model not found
500 Internal Server Error
: Server error processing request
Response Object Fields
The response data contains the following fields:
id
A unique identifier for the chat completion
object
The object type
created
Timestamp when the response was created
model
The model used for generation
choices
The array containing the assistant's response
choices[0].message.role
Always "assistant" for responses
choices[0].message.content
The generated text content
choices[0].message.tool_calls
The array containing the corresponding tool response objects (this is only applicable if you make a tool request)
choices[0].finish_reason
Reason for completion (e.g., "stop", "length")
usage.prompt_tokens
The number of tokens used in the prompt
usage.completion_tokens
The number of tokens in the generated completion
usage.total_tokens
The sum of the prompt_tokens and the completion_tokens
Request Body (streaming)
{
"model": "llama3.1:8b",
"messages": [
{
"role": "system",
"content": "you are a helpful AI assistant"
},
{
"role": "user",
"content": "write a haiku about lilypads"
}
],
"stream": true,
"temperature": 0.6
Response Format (streaming)
The response is a stream of Server-Sent Events (SSE) with chunked OpenAI ChatCompletion objects with the following format:
Initial response:
data: {
"id": "jobId-QmZXDGS7m8VuJrURqsKvByGKHCM749NMVFmEA2hH2DtDWs-jobState-DealNegotiating",
"object": "chat.completion.chunk",
"created": 1742425132,
"model": "llama3.1:8b",
"system_fingerprint": "",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant",
"content": null
},
"finish_reason": null
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}
Processing updates:
data: {
"id": "jobId-QmZXDGS7m8VuJrURqsKvByGKHCM749NMVFmEA2hH2DtDWs-jobState-DealAgreed",
"object": "chat.completion.chunk",
"created": 1742425135,
"model": "llama3.1:8b",
"system_fingerprint": "",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant",
"content": null
},
"finish_reason": null
}
],
"usage": {
"prompt_tokens": 0,
"completion_tokens": 0,
"total_tokens": 0
}
}
Content delivery:
data: {
"id": "jobId-QmZXDGS7m8VuJrURqsKvByGKHCM749NMVFmEA2hH2DtDWs-jobState-ResultsSubmitted",
"object": "chat.completion.chunk",
"created": 1742425147,
"model": "llama3.1:8b",
"system_fingerprint": "",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant",
"content": "Lily pads dance\nOn the water's gentle lap\nSerene beauty"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 2048,
"completion_tokens": 456,
"total_tokens": 2504
}
}
Completion marker:
data: [DONE]
Response Codes
200 OK
: Request successful, stream begins
400 Bad Request
: Invalid request parameters
401 Unauthorized
: Invalid or missing API key
404 Not Found
: Requested model not found
500 Internal Server Error
: Server error processing request
Response Object Fields
The delta event data contains the following fields:
id
A unique identifier for the chat completion
object
The object type
created
Timestamp when the response was created
model
The model used for generation
choices
The array containing the assistant's response
choices[0].delta.role
Always "assistant" for responses
choices[0].delta.content
The generated text content
choices[0].delta.tool_calls
The array containing the corresponding tool response objects (this is only applicable if you make a tool request)
choices[0].finish_reason
Reason for completion (e.g., "stop", "length")
usage.prompt_tokens
The number of tokens used in the prompt
usage.completion_tokens
The number of tokens in the generated completion
usage.total_tokens
The sum of the prompt_tokens and the completion_tokens
Conversation Context
The API supports multi-turn conversations by including previous messages in the request:
{
"model": "llama2:7b",
"messages": [
{
"role": "user",
"content": "write a haiku about lilypads"
},
{
"role": "assistant",
"content": "Lily pads dance\nOn the water's gentle lap\nSerene beauty"
},
{
"role": "user",
"content": "Now write one about frogs"
}
],
"temperature": 0.6
}
This allows for contextual follow-up questions and maintaining conversation history.
The Anura chat completions endpoint supports requests with tooling allowing for function calling through many popular AI frameworks and sdks.
At the moment only a select number models support tooling including:
llama3.1:8b
qwen2.5:7b
qwen2.5-coder:7b
phi4-mini:3.8b
mistral:7b
Below is a sample request and response
Request:
{
"model": "mistral:7b",
"messages": [
{
"role": "system",
"content": "you are a helpful AI assistant"
},
{
"role": "user",
"content": "What's the weather in Tokyo?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The name of the city"
}
},
"required": [
"city"
]
}
}
}
],
"temperature": 0.6,
"stream": false
}
Response:
{
"id": "jobId-QmTm3E4oEu4TYp1FLykHdnrkPyX6cLz2UUYS45YrmrzqdN-jobState-ResultsSubmitted",
"object": "chat.completion",
"created": 1742790608,
"model": "mistral:7b",
"system_fingerprint": "",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "",
"tool_calls": [
{
"id": "call_syyia0kt",
"index": 0,
"type": "function",
"function": {
"name": "get_current_weather",
"arguments": "{\"city\":\"Tokyo\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": {
"prompt_tokens": 88,
"completion_tokens": 22,
"total_tokens": 110
}
}
The chat completions API also supports vision requests allowing for image-to-text search against a base64 encoded image. This will allow you to make a query against an image asking a LLM what the image is or about particular details around it. Currently vision is only supported via the following models (more coming soon):
llava:7b
gemma3:4b
Additionally, the vision capability is limited by the following constraints:
Images must only be base64 encoded (you cannot pass a link to an image at this time)
Maximum image size is 512px x 512px
Support for JPEG or PNG format
Request:
{
"model": "llava:7b",
"messages": [
{
"role": "system",
"content": "you are a helpful AI assistant"
},
{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": ""
}
]
}
],
"temperature": 0.6,
"stream": false
}
Response:
{
"id": "jobId-QmcJohc71DrHnVbXbLRt7drDyPSU1dyNTskU3a1yRR7zBu-jobState-ResultsSubmitted",
"object": "chat.completion",
"created": 1744411025,
"model": "llava:7b",
"system_fingerprint": "",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": " The image shows a creature that appears to be a blend of a frog and some form of robotic or mechanical structure. It has the body shape of a frog, with prominent eyes and limbs. The creature is depicted with a sleek, technologically advanced design, featuring metallic parts and futuristic elements. The color scheme includes blues, blacks, and greens, giving it a high-tech aesthetic. This creature could be an example of concept art for a video game or science fiction story. "
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 600,
"completion_tokens": 106,
"total_tokens": 706
}
}
The Anura API enables you to run stable diffusion jobs to generate images executed through our decentralized compute network. It's really easy to get started generating your own generative AI art using Anura through the endpoints we provide.
Retrieve the list supported image generation models
GET /api/v1/image/models
Request Headers
Content-Type: application/json
*
Authorization: Bearer YOUR_API_KEY
*
Request Sample
curl -X GET "https://anura-testnet.lilypad.tech/api/v1/image/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here"
Response
{
"data": {
"models": [
"sdxl-turbo"
]
},
"message": "Retrieved models successfully",
"status": 200
}
Response Codes
200 OK
: Request successful, stream begins
400 Bad Request
: Invalid request parameters
401 Unauthorized
: Invalid or missing API key
404 Not Found
: Requested model not found
500 Internal Server Error
: Server error processing request
Currently we support sdxl-turbo
; however, we are always adding new models, so stay tuned!
Generate an AI Image
POST /api/v1/image/generate
Request Headers
Content-Type: application/json
*
Authorization: Bearer YOUR_API_KEY
*
Request Parameters
model
*
Model ID used to generate the response (e.g. sdxl-turbo
). Required.
string
prompt
*
The prompt input to generate your image from (max limit of 1000 characters)
string
Request Sample
{
"prompt": "A spaceship parked on a lilypad",
"model": "sdxl-turbo"
}
Alternatively you can also make the same request through a curl command and have the image be output to a file on your machine
curl -X POST https://anura-testnet.lilypad.tech/api/v1/image/generate \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
-d '{"prompt": "A spaceship parked on a lilypad", "model": "sdxl-turbo"}' \
--output spaceship.png
The result of running this command will be the creation of the spaceship.png
file in the directory you ran the command from.
Response
This endpoint will return the raw bytes value of the image that was generated which you can output to a file (like shown in the curl command above) or place it in a buffer to write to a file in your app, e.g.
const fs = require("fs");
const fetch = require("node-fetch");
async function generateImage() {
const response = await fetch("https://anura-testnet.lilypad.tech/api/v1/image/generate", {
method: "POST",
headers: {
"Content-Type": "application/json",
"Authorization": "Bearer your_api_key_here"
},
body: JSON.stringify({
prompt: "A spaceship parked on a lilypad",
model: "sdxl-turbo"
}),
});
if (!response.ok) {
console.error(`Error generating image: StatusCode: ${response.status} Error: ${response.message}`);
return;
}
const buffer = await response.buffer();
fs.writeFileSync("spaceship.png", buffer);
}
generateImage();
Note: Should you ever need to know what the corresponding Job Offer ID for image generation, it is provided in the response header as Job-Offer-Id
Response Codes
200 OK
: Request successful, stream begins
400 Bad Request
: Invalid request parameters
401 Unauthorized
: Invalid or missing API key
404 Not Found
: Requested model not found
500 Internal Server Error
: Server error processing request
The Anura API enables you to run long running jobs to generate videos executed through our decentralized compute network. It's really easy to get started generating your own videos using Anura through the endpoints we provide.
Note: Video generation can take anywhere between 4-8 mins to produce a video
Retrieve the list supported video generation models
GET /api/v1/video/models
Currently we support wan2.1
; however, we are always adding new models, so stay tuned!
Request Headers
Content-Type: application/json
*
Authorization: Bearer YOUR_API_KEY
*
Request Sample
curl -X GET "https://anura-testnet.lilypad.tech/api/v1/video/models" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here"
Response
{
"data": {
"models": [
"wan2.1"
]
},
"message": "Retrieved models successfully",
"status": 200
}
Response Codes
200 OK
: Request successful
401 Unauthorized
: Invalid or missing API key
500 Internal Server Error
: Server error processing request
Send out a request to create an AI generated video
POST /api/v1/video/create-job
Request Headers
Content-Type: application/json
*
Authorization: Bearer YOUR_API_KEY
*
Request Parameters
model
*
Model used to generate the response (e.g. wan2.1
). Required.
string
prompt
*
The prompt input to generate your video from (max limit of 1000 characters). Required.
string
negative_prompt
An optional field to specify to the model what to exclude from the generated scene
string
Request Sample
{
"prompt": "Two frogs sit on a lilypad, animatedly discussing the wonders and quirks of AI agents. As they ponder whether these digital beings can truly understand their froggy lives, the serene pond serves as a backdrop to their lively conversation.",
"negative_prompt": "Dull colors, grainy texture, washed-out details, static frames, incorrect lighting, unnatural shadows, distorted faces, artifacts, low-resolution elements, flickering, blurry motion, repetitive patterns, unrealistic reflections, overly simplistic backgrounds, three legged people, walking backwards.",
"model": "wan2.1"
}
Response
This endpoint will return an job_offer_id
which is an unique identifier corresponding to the job that's running to create your video. What you'll want to do with this id is pass it into our /video/results
endpoint (see below) which will provide you the output as a webp
file or report that the job is still running. In the latter case, you then can continue to call the endpoint at a later time to eventually retrieve your video. As mentioned in the beginning of this section, video generation can take anywhere between 4-8 mins to complete.
{
"status": 200,
"message": "Video job created successfully",
"data": {
"job_offer_id": "<your-job-offer-id-here>"
}
}
Response Codes
200 OK
: Request successful, stream begins
400 Bad Request
: Invalid request parameters
401 Unauthorized
: Invalid or missing API key
404 Not Found
: Requested model not found
500 Internal Server Error
: Server error processing request
Retrieve your video
GET /api/v1/video/results/:job_offer_id
job_offer_id
*
The id returned to you in the video creation request i.e /api/v1/video/create-job
Required.
string
Request Headers
Content-Type: application/json
*
Authorization: Bearer YOUR_API_KEY
*
Response
If the video is still in the process of being generated you will see a response that looks like the following:
{
"status": 102,
"message": "Request is still processing",
"data": {
"job_offer_id": "<job-offer-id>",
"job_state": "DealAgreed"
}
}
Response Codes
102 Processing
: Request is still processing the creation of the video
200 OK
: Request successful
400 Bad Request
: Invalid request parameters
401 Unauthorized
: Invalid or missing API key
500 Internal Server Error
: Server error processing request
However, once the video has be generated you'll be returned the video in webp
format with its raw bytes which you can save to a file in the following manner:
curl -X GET "https://anura-testnet.lilypad.tech/api/v1/video/results/<your-job-offer-id>" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here" \
--output video.webp
The result of the above command will be the video.webp
file being saved in the directory from which you ran it from:
The Anura API provides developers with a web search capability enabling you to add a powerful tool to your AI Agent building arsenal. LLM's are only as great as their training data and are taken to the next level when provided with additional context from the web. With web search you can power your AI Agent workflow with live web search data providing your LLM the most up to date information on the latest on goings in the world.
It's easy to get started searching the web through the Anura API using our endpoint:
POST /api/v1/websearch
Request Headers
Content-Type: application/json
*
Authorization: Bearer YOUR_API_KEY
*
Request Parameters
query
*
The web search query you wish to execute
string
number_of_results
*
The number of search results you want returned (limited to 1 to 10 inclusive)
number
Request Sample
{
"query": "What's the Lilypad Network?",
"number_of_results" : 3
}
Response Sample
The response will include the following fields:
results
The array of search results where each result object is made up of the strings: title
, url
and description
related_queries
An array of strings containing similar queries based on the one you supplied
count
The number of search results returned
{
"results": [
{
"title": "Lilypad Network",
"url": "https://lilypad.tech",
"description": "Lilypad Network Lilypad offers a seamless and efficient way to access the computing power you need for AI and other demanding tasks—no need to invest in expensive hardware or navigate complex cloud setups. Simply submit your job; our decentralized network connects you with the best available resources. Benefit from competitive pricing, secure ..."
},
{
"title": "Lilypad Network - internet-scale off-chain distributed compute solution",
"url": "https://blog.lilypadnetwork.org",
"description": "Verifiable, truly internet-scale distributed compute network Efficient off-chain computation for AI & ML DataDAO computing The next frontier of web3. Follow. ... Check out the docs https://docs.lilypad.tech/lilypad! Lilypad Builder-verse! Devlin Rocha. 4 min read. Fuel the Future by Building on Lilypad and Accelerate Open Source AI. Alex Mirran."
},
{
"title": "What is the Lilypad Decentralized Compute Network?",
"url": "https://blog.lilypadnetwork.org/what-is-the-lilypad-decentralized-compute-network",
"description": "Lilypad democratizes AI high-performance computing, offering affordable, scalable solutions for researchers and startups. Follow. Follow. What is the Lilypad Decentralized Compute Network? A Crowdsourced Network for HPC Tasks. Lindsay Walker"
}
],
"related_queries": [
"Lilypad Tech",
"LilyPad github",
"Lilypad website",
"Lilypad AI",
"Lily pad Minecraft server",
"Lilypad crypto",
"LilyPad Arduino"
],
"count": 3
}
Response Codes
200 OK
: Request successful, stream begins
400 Bad Request
: Invalid request parameters
401 Unauthorized
: Invalid or missing API key
404 Not Found
: Requested model not found
500 Internal Server Error
: Server error processing request
GET /api/v1/jobs/:id
- Get status and details of a specific job
You can use another terminal to check job status while the job is running.
curl -X GET "https://anura-testnet.lilypad.tech/api/v1/jobs/{job_id}" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your_api_key_here"
Once your job has run, you should get output like this:
data: {
"id": "cmpl-e654be2df70700d27c155d4d",
"object": "text_completion",
"created": 1738614839,
"model": "llama2",
"choices": [{
"text": "<output text here>
"finish_reason": "stop"
}]
}
POST /api/v1/cowsay
- Create a new cowsay job
Request body: {"message": "text to display"}
GET /api/v1/cowsay/:id/results
- Get results of a cowsay job
An object specifying the format that the model must output. .
If set to true, the model response data will be streamed to the client as it is generated using .