Skip to main content

Gen AI

Description

Use the GenAI step to integrate Large Language Models (LLMs), such as OpenAI and Google AI, directly into your workflow. You can send custom prompts and file attachments to the LLM directly or route them securely through the AutomationEdge Gateway. The step processes your requests and returns the model's response as standard text or structured JSON data for use in subsequent workflow steps.

Worked Example – Invoice Field Extraction:

Goal: Extract the invoice number, date, total amount, and line items from an invoice, and obtain the bounding box coordinates for each extracted field. Input:

  1. On the LLM Settings tab, select OpenAI. Enter your endpoint (https://api.openai.com/v1/chat/completions) and your key (${OPENAI_KEY}).
  2. Set the Model to gpt-4o, Temperature to 0.0, Max Tokens to 4000, Top P to 0.9, and Request Timeout to 60.
  3. On the Input tab, enter the following System Prompt: "You are an expert invoice parser. Return only JSON that matches the provided schema."
  4. Enter the following User Prompt: "Extract the invoice fields invoice_number, invoice_date, total_amount, line_items, description, quantity, unit_price and line items from the attached document."
  5. Set the Input File Or Directory to ${invoicePath} and the Max File Count Limit to 1.
    Note: OpenAI accepts JPG, JPEG, and PNG files only. If you are processing a PDF and are not using Gemini, you must render the PDF to an image first.
  6. Enable Structured Output Definition and add the following rows to the schema table:

Invoice Schema

Field NameTypeParentDescription
invoice_numberstringInvoice identifier as printed on the document.
invoice_datestringInvoice date in ISO 8601 format (YYYY-MM-DD).
total_amountnumberGrand total amount with no currency symbol.
line_itemsjson arrayAll invoice line items.
descriptionstringline_itemsItem description text.
quantitynumberline_itemsQuantity as a number.
unit_pricenumberline_itemsUnit price with no currency symbol.
  1. On the Output Fields tab, set the Response Field to LLMResponse.
  2. Select Include Metadata Fields In Output to receive the bounding box data.

What You Can See:

After a successful execution, the step generates the following outgoing row columns:

  • LLMResponse (String): The complete JSON response from the model.
  • invoice_number, invoice_date, total_amount (String): The promoted top-level extracted fields.
  • invoice_number_metadata, invoice_date_metadata, total_amount_metadata (String): JSON objects containing the spatial data for each field, including page_number, coordinates (an 8-point polygon), page_width, page_height, confidence, and value.
  • line_items (String): The JSON array containing the line items. Each item contains the description, quantity, and unit price, along with a row_metadata object detailing the per-field bounding boxes.

Configuration

Field NameDescription
Step NameSpecify the unique step name for the workflow.
LLM Settings:
Connection Details:
LLM ProviderSelect the provider from the list
· OpenAI
· Azure OpenAI
· Google AI
· Google Vertex AI.
Note: Changing the provider automatically updates the underlying API request template. It also disables the Model field for Azure and Google.
Default value: OpenAI
The field is mandatory.
API EndPointSpecify the full API URL for the direct provider connection. Use workflow variables (for example, ${LLM_URL}) to manage endpoints dynamically and prevent security warnings for static URLs.
The field is mandatory.
API/LLM KeySpecify the authentication secret. Use workflow variables to store keys securely and avoid static credential warnings.
If the checkbox Accept Value as variable/static is selected, then the password field appears as a text box and accepts static or variable values.
OR
If the checkbox Accept Value as variable/static is not selected, then the password field appears as a dropdown in which you can select a field from the previous steps.
The field is mandatory.
Request timeout (seconds)Specify the maximum time the system waits for the API to respond.
Specify a positive integer to set the maximum wait time for each outbound HTTP call.
Default value: 60
The field is mandatory.
(Button) Test ConnectionClick to send a test prompt to the configured endpoint. This action verifies your credentials, URL accuracy, and network connectivity before saving the step.
AE Gateway Connection: 
Use AE GatewaySelect to route requests through the AE Gateway. Enabling this option ignores the API/LLM Key and activates the Gateway fields.
Notes:
· If selected, the Token Key field is disabled.
· Supports Google Vertex AI and Azure OpenAI LLM providers.
Gateway EndpointSpecify the AE Gateway URL provided by your administrator. The field accepts variable/static values
Gateway TokenSpecify the token issued by the AE Gateway. The system automatically injects this token as the Authorization header in the API request.
Test ConnectionClick Test to verify the connection is established successfully by checking the provided credentials and connection details.
Model Configuration:Define the parameters that control the model's behavior, response length, and creativity.
ModelSpecify the model ID recognized by the provider (for example, gpt-4o-mini). This field is unavailable for Azure and Google providers.
Note: The field is available if you select LLM provider as OpenAI.
Default value: gpt-4o-mini
The field is mandatory.
TemperatureSpecify a numeric value between 0 and 2 to control output randomness. Use lower values (0-0.3) for deterministic, repeatable output during data extraction. Use higher values to increase variance.
Default value: 0.4
The field is mandatory
Max TokensSpecify a positive integer to set the maximum number of tokens returned by the LLM. Ensure this value is high enough to accommodate your requested output fields; low limits truncate the JSON response and cause parsing errors.
Default value: 2000
The field is mandatory.
Top PSpecify a value between 0.1 and 1 to control vocabulary diversity during text generation.
Default value: 0.9
The field is mandatory.
Thinking BudgetSpecify a positive integer to allocate tokens for the model's internal reasoning process before it generates the final response.
Note: The field is available only if you select LLM provider is set as Google Vertex AI or Google AI
Default Value**: 300
Request Headers:Use to add custom HTTP headers to include with every API call..
Header KeySpecify a unique HTTP header key (for example, api-version, api-key, or x-tenant-id).
Important: Do not add an Authorization key if you use AE Gateway mode. The system automatically sets the Authorization: Bearer <gatewayToken> header. Adding it manually causes a duplicate-header error.
Note: Header key must not be null/empty. And duplicate header keys are rejected.
Header ValueSpecify the corresponding value for the header key.
Note: Header value must not be null/empty.
Input tab:Use the Input tab to define the AI's role, provide the primary task instructions, and supply training examples to guide the model's output. The tab contains three sections: Prompts, Few-Shot Examples, and File Attachments.
Variable Support: All text fields in this tab support dynamic data. Use ${variableName} to insert workflow variables, or ?{fieldName} to reference data from a previous step.
Input File Or DirectorySpecify the path to a single file or a directory. Use workflow variables (for example, ${invoicePath}) to provide dynamic paths and prevent static file warnings.
Note: If you provide a directory, the system sorts the files by filename and attaches them up to the defined Max File Count Limit. If the path does not exist, the step fails.
Max File Count LimitSpecify a positive integer to set the maximum number of files the system processes from a directory
Default value: 1
Image DetailsSelect the resolution for image analysis. High resolution provides fine-grained analysis but consumes more tokens; low resolution reduces token cost.
· Low
· High
· Auto
Default value: auto.
Note: The field is available only for the LLM Providers- Azure OpenAI and OpenAI
Prompts:Configure the instructions and contextual examples sent to the LLM.
System PromptSpecify instructions that define the model's role, tone, and boundaries (for example, "You are an expert invoice parser. Return only JSON.").
Note: The field is required if your API Request Body template includes the #{system_prompt} placeholder.
The field is mandatory.
User PromptSpecify the main task or question for the model to execute (for example, "Extract the fields from the attached invoice.").
Note: If you type a single word without brackets, the system automatically wraps it as a previous-step field reference (for example, ?{word}).
The field is mandatory.
Few-Shot Examples (Optional)Add pairs of user inputs and expected model responses to train the LLM on your preferred output format. The system sends these pairs as a conversation history to guide the model's behavior.
The system always appends your primary User Prompt and any file attachments after these examples.
Note: If you add a row, you must provide a value in the User Example column. Expected Response is optional.
Structured Output DefinitionUse the Structured Output tab to force the LLM to return data using a strict JSON schema, allowing you to parse the response directly into individual workflow fields instead of receiving a single block of plain text.
Structured Output DefinitionSelect the checkbox to enable structured output extraction. The system generates a schema from your grid and injects it into the API request. Clear this checkbox to receive free-form text.
Note: If you select this option, you must configure at least one row in the grid.
Field NameSpecify the JSON key name. The system uses this name to generate the workflow column. You must use unique names without spaces. Array fields cannot share a name with any other field.
TypeSelect the data type for the field:
· String
· Integer
· Number
· Boolean
· json array
To create a list of nested objects, select json array and add child rows that reference this field in the Parent column.
ParentEnter the name of the parent row to define nested fields. Leave this blank for top-level fields. The parent field must already exist in the grid and must use the json array type.
DescriptionSpecify plain-English instructions to guide the LLM's extraction. Include the expected format, units, and examples to improve accuracy (for example, "Total amount in USD, numeric, no currency symbol, two decimal places").
API Request Body:Shows the raw JSON payload sent to the LLM. Default placeholders are auto-populated from the LLM Settings and Input tabs. You may add parameters or modify values.
For the default template for each LLM provider, see section Default Provider Templates
Supported placeholders:
· #{model_name} – value from Model field.
· #{temperature} – value from Temperature field.
· #{max_tokens} – value from Max Tokens field.
· #{top_p} – value from Top P field.
· #{system_prompt} – value from System Prompt.
· #{user_and_assistant_prompt} – generated from User Prompt + Few-Shot rows.
· #{output_fields} – generated from Structured Output Definition.
#{thinking_budget}} – Value from Thinking Budget field.
Reset Payload
Output Fields tab:Use the Output Fields tab to define how the system maps the LLM's response to your workflow's outgoing data stream
A read-only grid at the bottom of the tab displays the exact list of columns the step appends to the outgoing workflow row. The system derives these columns directly from your configurations. The grid refreshes automatically to preview your changes whenever you edit the Structured Output Definition or toggle the Include Metadata Fields In Output option.
Response FieldSpecify the name of the string column that stores the raw LLM response.
Note: If you enable Structured Output, this column retains the complete, unparsed JSON payload. This allows downstream workflow steps to access any data you did not explicitly map to a specific column.
Default value: OutputText
The field is mandatory.
Include Metadata Fields In OutputSelect this checkbox to capture metadata (such as coordinates and confidence level) for each extracted value. The system creates an additional field named <field_name>_metadata to hold the information.
For example,
{
"page_number": 4,
 "confidence": "High",
 "coordinates": "539,346,602,346,602,356,539,356",
 "value": "$51,903.10",
 "page_width": 841.68,
 "page_height": 595.2
}