VertexAI

The VertexAI connector integrates Google Cloud’s powerful large language models into your Aparavi workflow. This documentation helps you understand how to use and configure the VertexAI node effectively. This is typically used for tasks such as reasoning, summarization, content generation, and conversational response.

Key capabilities:

Natural language understanding and generation
Multimodal processing (text, images, audio, video)
Reasoning and problem-solving
Content summarization and transformation
Conversational AI responses

Configuration:

When setting up the VertexAI node, you’ll need to configure several parameters:

Model Selection: Choose the appropriate model variant based on your needs
Project ID: Enter your Google Cloud project ID
Service Account JSON: Provide service account credentials
Region: Specify Google Cloud region

Inputs and Outputs

Input Channels

Prompt: Primary text input for the model
Questions: Structured query inputs
Documents: Content for context (PDFs, images)
System: System-level instructions to guide model behavior

Output Channels

Text: Generated text responses
Answers: Structured response outputs

Supported Model Variants:

Model Family & ID	Input Tokens (Context Window)	Output Token Limit	Input Modalities	Output Modalities	Notes / Status
Gemini 2.5 Pro(`gemini-2.5-pro`)	1,048,576	~65,536	Text, Image, Audio, Video, PDF	Text	Deep think mode, function‑calling (GA) (Google Cloud,Google Cloud)
Gemini 2.5 Flash(`gemini-2.5-flash`)	1,048,576	~65,536	Multimodal	Text	High throughput & cost‑effective (GA) (Google Cloud)
Gemini 2.5 Flash‑Lite(`gemini-2.5-flash-lite`)	1,048,576	~65,536	Multimodal	Text	Preview; fastest variant in 2.5 family (Google Cloud)
Gemini 2.0 Flash(`gemini-2.0-flash`)	1,048,576	~8,192	Multimodal	Text	GA; legacy regularly supported (Google Cloud)
Gemini 2.0 Flash‑Lite(`gemini-2.0-flash-lite`)	1,048,576	~8,192	Multimodal	Text	Cost‑efficient variant (GA) (Google Cloud)
Claude 3.5 Sonnet(`claude‑3‑5‑sonnet`)	~200,000	~8,000	Text, image (PDF via upload)	Text	Partner model via Model Garden; requires enabled access (Google Cloud)
Gemma 3(open; 1B/4B/12B/27B)	~131,072 (128K)	~8K–16K+	Text, Image	Text	Multilingual & long‑context open LLM (Google Cloud)
Gemma 2 / variants(PaliGemma, CodeGemma, TxGemma, ShieldGemma 2)	~128K	~8,000	Text (variant-specific multimodal)	Text	Specialized open models in Model Garden (Google Cloud)
DeepSeek R1(`deepseek‑ai/DeepSeek‑R1‑Distill‑Qwen‑32B`)	~128,000	~8,000	Text	Text	Available via Model Garden partnership (Google Cloud)
Imagen 3(for Generation / Editing / Fast)	— (image‑prompt-to-image)	— (image generation)	Text prompts (and masks/images)	Images	GA via Model Garden (Google Cloud)
Veo 3&Veo 3 Fast(video generation)	— (text/image prompt)	— (video output)	Text + optional images	Video (1080p)	Veo 3 FAST & full version in GA/Preview (July 2025) (Google Cloud)

Key Use Cases:

Content Generation: Create drafts, summaries, and reports from structured data
Data Analysis: Extract insights and identify patterns in unstructured text
Conversational AI: Build chatbots and virtual assistants
Document Processing: Analyze, summarize and extract information from documents

Frequently Asked Questions:

Authentication: Verify credentials and IAM permissions; ensure region availability
Performance: Implement back-off for rate limits; adjust timeouts for large contexts
Response Quality: Refine prompts; adjust temperature and token settings as needed
Cost Optimization: Select appropriate model variants based on complexity requirements