This is a research assistant that uses Mastra to answer questions about academic papers and technical documents.
pnpm i @mastra/core@latest @mastra/rag@latest @mastra/pg@latest ai@latest zod@latest
Set environment variable:
export POSTGRES_CONNECTION_STRING=postgresql://postgres:postgres@localhost:5432/postgres
Create files:
touch src/mastra/agents/researchAgent.ts
touch src/mastra/index.ts src/store.ts src/index.ts
The doc.chunk({...})
method prepares documents for Retrieval-Augmented Generation (RAG) in Mastra by breaking them into chunks and converting to embeddings.
Parameter breakdown:
- strategy: "recursive": Splits content based on structure (e.g., paragraphs, sentences) to preserve meaning.
- size: 512: Target maximum chunk length.
- overlap: 50: Overlap between chunks for context continuity.
- separator: "\n": Splits at newlines.
The text-embedding-004
model outputs 768-dimensional vectors for similarity searches.
- Expected to max CPU on shared micro instances.
- Do not run
ask run cmd
. - Run
npx bun src/store.ts
.
// src/store.ts
// Create an index for our paper chunks
await vectorStore.createIndex({
indexName: "papers",
dimension: 768, // text-embedding-004 outputs 768 dimensions
});
- Run SQL:
DROP TABLE IF EXISTS papers;
- Switch to a model with different dimensions (e.g., 1536 for some OpenAI models).
- Recreate the index with matching dimensionality.
DROP TABLE IF EXISTS papers;
Basic Response: The primary challenge is its sequential nature, preventing parallelization and limiting batching due to memory constraints.
After Metadata Filtering: The sequential nature of RNNs prevents parallelization, critical for longer sequences where memory limits batching.
With Chain-of-Thought: Sequence modeling with RNNs faces challenges due to sequential processing, preventing parallelization and causing memory constraints.
User Query: What problems does sequence modeling face with neural networks?
vectorQueryTool Output: [Content about sequential nature and parallelization issues]
User Query: What is reducing sequential computation?
vectorQueryTool Output: [Content about goals and models like Extended Neural GPU]
After Metadata Filtering: Reducing sequential computation is a goal achieved by certain models, but the constraint persists.
User Query: What are the main components of the Transformer architecture?
vectorQueryTool Output: Encoder and decoder with self-attention and fully connected layers.
- Can you explain self-attention?
- How does multi-head attention work?
- What problem does positional encoding solve?
User Query: Key advantages of Transformer over RNNs?
After Metadata Filtering: Relies on attention, superior performance, outperforms RNNs.
With Chain-of-Thought: THOUGHT PROCESS:
- Step 1: Chunks highlight dispensing with recurrence.
- Step 2: Connects to superior performance.
- Step 3: Replaces RNNs with attention for better results.
FINAL ANSWER: Dispenses with recurrence, relying on attention for superior performance.
const vectorQueryTool = createVectorQueryTool({
// ...
enableFilter: true,
// ...
});
const chunks = await doc.chunk({
// ...
extract: {
keywords: true, // Extracts keywords from each chunk
},
});
export const ragAgent = new Agent({
// ...
instructions: `
You are a helpful assistant that answers questions based on the provided context. Keep your answers concise and relevant.
Filter the context by searching the metadata.
The metadata is structured as follows:
{
text: string,
source: string,
excerptKeywords: string, // New field for extracted keywords
}
${PGVECTOR_PROMPT}
Important: When asked to answer a question, please base your answer only on the context provided in the tool.
If the context doesn't contain enough information to fully answer the question, please state that explicitly.
`,
// ...
});
CoT enables LLMs to reason by breaking tasks into steps.
Implementation:
- Explicit Guidance: Outlines analysis, breakdown, connections, conclusions, and handling missing info.
- Structured Format:
THOUGHT PROCESS: - Step 1: [Analysis] - Step 2: [Connections] - Step 3: [Reasoning] FINAL ANSWER: [Answer]
- Justification: Emphasizes explaining connections and usage of information.
Purpose:
- Improves reasoning.
- Enhances transparency and debugging.
- Reduces hallucinations.
// Balanced configuration
const balancedTool = createVectorQueryTool({
vectorStoreName: "postgres",
indexName: "embeddings",
model: openai.embedding("text-embedding-3-small"),
databaseConfig: {
pgvector: {
ef: 150, // Moderate accuracy
probes: 8, // Moderate recall
minScore: 0.7 // Moderate quality threshold
}
}
});
// Adjust parameters based on load
const adaptiveSearch = async (query: string, isHighLoad: boolean) => {
const runtimeContext = new RuntimeContext();
if (isHighLoad) {
// Reduce quality for speed
runtimeContext.set('databaseConfig', {
pgvector: {
ef: 75,
probes: 5,
minScore: 0.65
}
});
}
return await balancedTool.execute({
context: { queryText: query },
mastra,
runtimeContext
});
};