Build and Earn! Check out the bounties!

Inference API

Anura, Lilypad's official AI inference API

Getting Started

Use Anura to start running AI inference job modules on Lilypad's decentralized compute network:

  1. Get an API key from the Anura website.

NEW! See All Models Available (GraphQL)

You can run the following endpoints to view all Anura-supported models on the Lilypad network. These queries are available as graphQL queries here. You can also view all available queries by opening Apollo Server and putting this url in:

https://lilypad-model-api.vercel.app/api/graphql

Curl Requests

Get all Models:

curl -X POST https://lilypad-model-api.vercel.app/api/graphql \
  -H "Content-Type: application/json" \
  -d '{"query": "{ allModels { id name category } }"}'

Response example:

{
  "data": {
    "allModels": [
      {
        "id": "llama3.1:8b",
        "name": "Llama 3.1 8B",
        "category": "text-generation"
      },
      {
        "id": "sdxl-turbo",
        "name": "SDXL Turbo", 
        "category": "image-generation"
      }
    ]
  }
}

You can also fetch a selection of multiple model types using the following endpoint:

Get Started with Text Generation

  1. Find which models we support:

  1. Choose a model, customize your request and fire away:

Get Started with Image Generation

  1. Find which models we support:

  1. Choose a model and generate your first image

API Endpoints

API Clients

If you are using an API client such as Bruno or Postman, you can use our provided collections below.

Bruno collection

Rate limits

Currently the rate limit for the api is set to 20 calls per second

Get Available Models

To see which models are available:

Chat Completions API

* = Required

Chat Completions

POST /api/v1/chat/completions

Note: Due to the decentralized nature of the Lilypad Network we recommend using the streaming variant where possible at this time

This endpoint provides both a streaming interface using Server-Sent Events (SSE) and non-streaming interface for chat completions which is compliant with the OpenAI specification. This means that you can plug and play Anura using the OpenAI SDK by simply passing in the Anura Url and API Key into your client like so:

Request Headers

  • Content-Type: application/json*

  • Accept: text/event-stream (recommended for streaming)

  • Authorization: Bearer YOUR_API_KEY*

Request Parameters

Parameter
Description
Type

model*

Model ID used to generate the response (e.g. deepseek-r1:7b). Required.

string

messages*

A list of messages comprising the conversation so far. Required.

array

Optional Parameters and Default Values
Paraneter
Description
Default

frequency_penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

0

max_tokens

The maximum number of tokens that can be generated in the chat completion.

presence_penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

0

response_format

An object specifying the format that the model must output. Learn more.

seed

If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result. Determinism is not guaranteed, and you should refer to the system_fingerprint response parameter to monitor changes in the backend.

null

stop

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

stream

If set to true, the model response data will be streamed to the client as it is generated using server-sent events.

false

stream_options

Options for streaming response. Only set this when you set stream: true.

null

temperature

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic. We generally recommend altering this or top_p but not both.

1

tools

A list of tools the model may call. Currently, only functions are supported as a tool. Use this to provide a list of functions the model may generate JSON inputs for. A max of 128 functions are supported. At the moment only a select number models support tooling including:

  • llama3.1:8b

  • qwen2.5:7b

  • qwen2.5-coder:7b

  • phi4-mini:3.8b

  • mistral:7b

top_p

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

We generally recommend altering this or temperature but not both.

1

Request Body (non-streaming)

Response Format (non-streaming)

The response is an OpenAI ChatCompletion Object with the following format:

Response Codes

  • 200 OK: Request successful, stream begins

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 404 Not Found: Requested model not found

  • 500 Internal Server Error: Server error processing request

Response Object Fields

The response data contains the following fields:

Field
Description

id

A unique identifier for the chat completion

object

The object type

created

Timestamp when the response was created

model

The model used for generation

choices

The array containing the assistant's response

choices[0].message.role

Always "assistant" for responses

choices[0].message.content

The generated text content

choices[0].message.tool_calls

The array containing the corresponding tool response objects (this is only applicable if you make a tool request)

choices[0].finish_reason

Reason for completion (e.g., "stop", "length")

usage.prompt_tokens

The number of tokens used in the prompt

usage.completion_tokens

The number of tokens in the generated completion

usage.total_tokens

The sum of the prompt_tokens and the completion_tokens

Request Body (streaming)

Response Format (streaming)

The response is a stream of Server-Sent Events (SSE) with chunked OpenAI ChatCompletion objects with the following format:

Initial response:

Processing updates:

Content delivery:

Completion marker:

Response Codes

  • 200 OK: Request successful, stream begins

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 404 Not Found: Requested model not found

  • 500 Internal Server Error: Server error processing request

Response Object Fields

The delta event data contains the following fields:

Field
Description

id

A unique identifier for the chat completion

object

The object type

created

Timestamp when the response was created

model

The model used for generation

choices

The array containing the assistant's response

choices[0].delta.role

Always "assistant" for responses

choices[0].delta.content

The generated text content

choices[0].delta.tool_calls

The array containing the corresponding tool response objects (this is only applicable if you make a tool request)

choices[0].finish_reason

Reason for completion (e.g., "stop", "length")

usage.prompt_tokens

The number of tokens used in the prompt

usage.completion_tokens

The number of tokens in the generated completion

usage.total_tokens

The sum of the prompt_tokens and the completion_tokens

Conversation Context

The API supports multi-turn conversations by including previous messages in the request:

This allows for contextual follow-up questions and maintaining conversation history.

Tooling calls

The Anura chat completions endpoint supports requests with tooling allowing for function calling through many popular AI frameworks and sdks.

At the moment only a select number models support tooling including:

  • llama3.1:8b

  • qwen2.5:7b

  • qwen2.5-coder:7b

  • phi4-mini:3.8b

  • mistral:7b

Below is a sample request and response

Request:

Response:

Vision Support

The chat completions API also supports vision requests allowing for image-to-text search against a base64 encoded image. This will allow you to make a query against an image asking a LLM what the image is or about particular details around it. Currently vision is only supported via the following models (more coming soon):

  • llava:7b

  • gemma3:4b

Additionally, the vision capability is limited by the following constraints:

  • Images must only be base64 encoded (you cannot pass a link to an image at this time)

  • Maximum image size is 512px x 512px

  • Support for JPEG or PNG format

Request:

Response:

Embeddings

Use the embeddings endpoint to compute embeddings for user queries supported by the nomic-embed-text model. This endpoint is OpenAI compliant which means you can use it with the OpenAI SDK (see the end of the Embeddings section for a code example)

Endpoint

POST /api/v1/embeddings

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Request Parameters

Parameter
Description
Type

model*

Model ID used to generate the response (e.g. nomic-embed-text). Required.

string

input*

The input to create embeddings from. This can be either a single string or an array of strings. Required

string or array of strings

Request Sample (single input)

Response Sample (single input)

Request Sample (multiple input)

Response Sample (multiple input)

Response Codes

  • 200 OK: Request successful, stream begins

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 404 Not Found: Requested model not found

  • 500 Internal Server Error: Server error processing request

Example Code using the OpenAI SDK

Image Generation

The Anura API enables you to run stable diffusion jobs to generate images executed through our decentralized compute network. It's really easy to get started generating your own generative AI art using Anura through the endpoints we provide.

Retrieve the list supported image generation models

GET /api/v1/image/models

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Request Parameters

Parameter
Description
Type

model*

Model ID used to generate the response (e.g. sdxl-turbo). Required.

string

prompt*

The prompt input to generate your image from (max limit of 1000 characters)

string

Request Sample

Response

Response Codes

  • 200 OK: Request successful, stream begins

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 404 Not Found: Requested model not found

  • 500 Internal Server Error: Server error processing request

Currently we support sdxl-turbo; however, we are always adding new models, so stay tuned!

Generate an AI Image

POST /api/v1/image/generate

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Request Parameters

Parameter
Description
Type

model*

Model ID used to generate the response (e.g. sdxl-turbo). Required.

string

prompt*

The prompt input to generate your image from (max limit of 1000 characters)

string

Request Sample

Alternatively you can also make the same request through a curl command and have the image be output to a file on your machine

The result of running this command will be the creation of the spaceship.png file in the directory you ran the command from.

Response

This endpoint will return the raw bytes value of the image that was generated which you can output to a file (like shown in the curl command above) or place it in a buffer to write to a file in your app, e.g.

Note: Should you ever need to know what the corresponding Job Offer ID for image generation, it is provided in the response header as Job-Offer-Id

Response Codes

  • 200 OK: Request successful, stream begins

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 404 Not Found: Requested model not found

  • 500 Internal Server Error: Server error processing request

Video Generation

The Anura API enables you to run long running jobs to generate videos executed through our decentralized compute network. It's really easy to get started generating your own videos using Anura through the endpoints we provide.

Note: Video generation can take anywhere between 4-8 mins to produce a video

Retrieve the list supported video generation models

GET /api/v1/video/models

Currently we support wan2.1; however, we are always adding new models, so stay tuned!

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Request Sample

Response

Response Codes

  • 200 OK: Request successful

  • 401 Unauthorized: Invalid or missing API key

  • 500 Internal Server Error: Server error processing request

Send out a request to create an AI generated video

POST /api/v1/video/create-job

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Request Parameters

Parameter
Description
Type

model*

Model used to generate the response (e.g. wan2.1). Required.

string

prompt*

The prompt input to generate your video from (max limit of 1000 characters). Required.

string

negative_prompt

An optional field to specify to the model what to exclude from the generated scene

string

Request Sample

Response

This endpoint will return an job_offer_id which is an unique identifier corresponding to the job that's running to create your video. What you'll want to do with this id is pass it into our /video/results endpoint (see below) which will provide you the output as a webp file or report that the job is still running. In the latter case, you then can continue to call the endpoint at a later time to eventually retrieve your video. As mentioned in the beginning of this section, video generation can take anywhere between 4-8 mins to complete.

Response Codes

  • 200 OK: Request successful, stream begins

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 404 Not Found: Requested model not found

  • 500 Internal Server Error: Server error processing request

Retrieve your video

GET /api/v1/video/results/:job_offer_id

Parameter
Description
Type

job_offer_id*

The id returned to you in the video creation request i.e /api/v1/video/create-jobRequired.

string

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Response

If the video is still in the process of being generated you will see a response that looks like the following:

Response Codes

  • 102 Processing: Request is still processing the creation of the video

  • 200 OK: Request successful

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 500 Internal Server Error: Server error processing request

However, once the video has be generated you'll be returned the video in webp format with its raw bytes which you can save to a file in the following manner:

The result of the above command will be the video.webp file being saved in the directory from which you ran it from:

Two frogs sit on a lilypad, animatedly discussing the wonders and quirks of AI agents. As they ponder whether these digital beings can truly understand their froggy lives, the serene pond serves as a backdrop to their lively conversation.

Audio Generation

The Anura API enables you to generate audio from text executed through our decentralized compute network. It's really easy to get started generating your own audio using Anura through the endpoints we provide.

Note: Audio generation can take anywhere between 40 seconds to 3 mins to complete depending on the input length

Retrieve the list supported audio generation models

GET /api/v1/audio/models

Currently we support kokoro; however, we are always adding new models, so stay tuned!

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Request Sample

Response

Response Codes

  • 200 OK: Request successful

  • 401 Unauthorized: Invalid or missing API key

  • 500 Internal Server Error: Server error processing request

Send out a request to create an AI generated audio

POST /api/v1/audio/create-job

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Request Parameters

Parameter
Description
Type

model*

Model used to generate the response (e.g. kokoro). Required.

string

input*

The prompt input to generate your audio from (max limit of 420 characters). Required.

string

voice*

The voice to use when generating the audio sample. Possible values are heart, puck, fenrir, and bellaRequired.

string

Voice samples

Heart

783KB
Open
Heart Voice Sample

Puck

778KB
Open
Puck Voice Sample

Fenrir

804KB
Open
Fenrir Voice Sample

Bella

818KB
Open
Bella Voice Sample

Request Sample

Response

This endpoint will return an job_offer_id which is an unique identifier corresponding to the job that's running to create your audio. What you'll want to do with this id is pass it into our /audio/results endpoint (see below) which will provide you the output as a wav file or report that the job is still running. In the latter case, you then can continue to call the endpoint at a later time to eventually retrieve your audio. As mentioned in the beginning of this section, audio generation can take anywhere between 40 seconds to 3 mins to complete.

Response Codes

  • 200 OK: Request successful

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 404 Not Found: Requested model not found

  • 500 Internal Server Error: Server error processing request

Retrieve your video

GET /api/v1/audio/results/:job_offer_id

Parameter
Description
Type

job_offer_id*

The id returned to you in the audio creation request i.e /api/v1/audio/create-jobRequired.

string

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Response

If the audio is still in the process of being generated you will see a response that looks like the following:

Response Codes

  • 102 Processing: Request is still processing the creation of the audio

  • 200 OK: Request successful

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 500 Internal Server Error: Server error processing request

However, once the audio has be generated you'll be returned the audio in wav format with its raw bytes which you can save to a file in the following manner:

The Anura API provides developers with a web search capability enabling you to add a powerful tool to your AI Agent building arsenal. LLM's are only as great as their training data and are taken to the next level when provided with additional context from the web. With web search you can power your AI Agent workflow with live web search data providing your LLM the most up to date information on the latest on goings in the world.

It's easy to get started searching the web through the Anura API using our endpoint:

POST /api/v1/websearch

Request Headers

  • Content-Type: application/json*

  • Authorization: Bearer YOUR_API_KEY*

Request Parameters

Parameter
Description
Type

query*

The web search query you wish to execute

string

number_of_results*

The number of search results you want returned (limited to 1 to 10 inclusive)

number

Request Sample

Response Sample

The response will include the following fields:

Field
Description

results

The array of search results where each result object is made up of the strings: title, url and description

related_queries

An array of strings containing similar queries based on the one you supplied

count

The number of search results returned

Response Codes

  • 200 OK: Request successful, stream begins

  • 400 Bad Request: Invalid request parameters

  • 401 Unauthorized: Invalid or missing API key

  • 404 Not Found: Requested model not found

  • 500 Internal Server Error: Server error processing request

Jobs

  • GET /api/v1/jobs/:id - Get status and details of a specific job

Get Status/Details of a Job

You can use another terminal to check job status while the job is running.

Get Outputs from a Job

Once your job has run, you should get output like this:

Cowsay

  • POST /api/v1/cowsay - Create a new cowsay job

    • Request body: {"message": "text to display"}

  • GET /api/v1/cowsay/:id/results - Get results of a cowsay job

Last updated

Was this helpful?