Pitch Deck Analyzer - Admin Guide
Overview
This guide covers the technical setup, configuration, and maintenance of the Pitch Deck Analyzer module. The analyzer uses a multi-agent AI evaluation pipeline with PostgreSQL storage, supporting PDF, DOCX, and PPTX pitch deck analysis against customizable rubrics.
System Architecture
Core Components
Pitch Analyzer Module:
- Document Parser (
pitch/src/document-parser.js) - Text extraction from PDF/DOCX/PPTX files - Evaluation Pipeline (
pitch/src/evaluation-pipeline.js) - Multi-agent AI analysis system - Report Generator (
pitch/src/report-generator.js) - HTML/PDF report creation - Routes Handler (
pitch/src/pitch-routes.js) - Express.js endpoints and progress tracking - Database (
src/utils/database.js) - PostgreSQL storage for analysis results - Web Server (
src/web-server.js) - Main application with pitch analyzer integration - Authentication - Admin session management and access control
- All requirements from main platform (Node.js 18+, PostgreSQL, OpenAI API)
- Additional document processing libraries (included in package.json)
- Categories: High-level evaluation areas (e.g., "Market Opportunity")
- Criteria: Specific questions within each category
- Scoring: Point values and evaluation guidelines
- Focus Areas: Specific aspects to evaluate
- Total
max_pointsmust equal sum of categorymax_points - Category
max_pointsmust equal sum of criteriamax_points - All IDs must be unique and use snake_case
evaluation_focusarrays guide AI evaluation- 70 total points across 7 categories
- Focus: Drug development, regulatory pathway, clinical evidence
- Categories: Unmet Medical Need, Class Potential, Clinical Evidence, Regulatory Strategy, IP Position, Team Expertise, Commercial Viability
- 65 total points across 6 categories
- Focus: Technical innovation, scalability, market readiness
- Categories: Technical Innovation, Market Opportunity, Product Development, Team Capabilities, Business Model, Risk Assessment
- 75 total points across 7 categories
- Focus: Data strategy, model performance, ethics, scalability
- Categories: Technical Foundation, Data Strategy, Product-Market Fit, AI Ethics, Team Expertise, Business Model, Market Opportunity
- Backup existing rubric:
cp rubrics/therapeutics.json rubrics/therapeutics-backup.json - Test changes locally: Use preview mode to validate
- Version control: Update
versionfield when making changes - Document changes: Add changelog comments in JSON
- Uses
pdf-parselibrary for text extraction - Handles multi-page documents with page structure detection
- Supports password-free PDFs with extractable text
- Fails gracefully on image-only or corrupted PDFs
- Uses
mammothlibrary for reliable text extraction - Preserves document structure and formatting
- Handles complex layouts and embedded content
- Custom ZIP-based extraction from PowerPoint XML
- Extracts text from slides and speaker notes
- Processes slide titles, content, and annotations
- Evaluates each criterion individually against rubric
- Uses GPT-4o-mini for cost efficiency and speed
- Generates structured JSON responses with scores and justifications
- Includes evidence extraction and improvement suggestions
- Identifies critical weaknesses and missing information
- Prioritizes gaps by severity and impact
- Provides comprehensive assessment of pitch strengths/weaknesses
- Generates actionable improvement recommendations
- Prioritizes suggestions by impact and effort required
- Provides specific next steps for entrepreneurs
- Keep prompts focused and specific
- Include clear scoring guidelines
- Reference rubric context for consistency
- Request structured JSON output for parsing
- Limit document text to avoid token limits
- Upload Processing: <30 seconds for 50MB files
- Text Extraction: <60 seconds for complex documents
- AI Evaluation: 2-3 minutes for standard rubrics
- Total Analysis Time: <5 minutes end-to-end
- Upload and analyze pitch deck
- Requires: multipart/form-data with
pitchDeckfile andrubricselection - Returns: Analysis ID for progress tracking
- Authentication: Admin session required
- Poll analysis progress
- Returns: Stage, progress percentage, status, and details
- Real-time updates during analysis
- View analysis results (HTML page)
- Includes: Complete evaluation, scores, recommendations
- Authentication: Admin session required
- Get analysis results (JSON API)
- Returns: Structured analysis data
- For programmatic access
- Download HTML report
- Formatted for viewing/printing
- Download PDF report (currently returns print-optimized HTML)
- Future: True PDF generation with Puppeteer
- List recent completed analyses
- Returns: Analysis metadata with quick access links
- Pagination: Latest 20 analyses
- Delete analysis from database
- Admin only operation
- Permanent deletion
- Pitch decks may contain confidential business information
- Analysis data stored securely in PostgreSQL with access controls
- Temporary files automatically cleaned up after 24 hours
- API responses filtered to exclude sensitive file paths
- Check logs for specific error:
tail -f logs/error.log - Verify file integrity: Re-upload if corrupted
- Check API limits: Wait for quota reset if needed
- Clear stuck analysis: Remove from database
- Retry with different file or rubric
Integration Points:
Database Schema
Primary Table:
`sql
CREATE TABLE pitch_analyses (
id SERIAL PRIMARY KEY,
analysis_id VARCHAR(255) UNIQUE NOT NULL,
original_filename TEXT NOT NULL,
file_type VARCHAR(10) NOT NULL,
analysis_data JSONB NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Indexes for performance
CREATE INDEX idx_pitch_analyses_created_at ON pitch_analyses(created_at);
CREATE INDEX idx_pitch_analyses_analysis_id ON pitch_analyses(analysis_id);
CREATE INDEX idx_pitch_analyses_file_type ON pitch_analyses(file_type);
`
Data Structure:
`json
{
"analysisId": "1703123456789",
"filename": "startup-pitch.pdf",
"rubricUsed": "therapeutics",
"documentResult": {
"extractedText": "...",
"metadata": { "wordCount": 1500, "pageCount": 12 }
},
"evaluationResult": {
"overallScore": { "percentage": 78, "rating": "Strong" },
"criteriaResults": [...],
"gapAnalysis": {...},
"recommendations": {...}
},
"timestamp": "2024-01-01T10:00:00.000Z"
}
`
File Structure
`
pitch/
├── src/
│ ├── document-parser.js # Text extraction logic
│ ├── evaluation-pipeline.js # Multi-agent AI evaluation
│ ├── pitch-routes.js # Express routes and API
│ └── report-generator.js # Report formatting
├── public/
│ ├── css/ # Analyzer-specific styles
│ └── js/ # Client-side JavaScript
└── views/
└── analyzer.html # Upload interface template
rubrics/
├── therapeutics.json # Biotech evaluation criteria
├── deep-tech.json # Technology evaluation criteria
└── ai.json # AI/ML evaluation criteria
uploads/ # Temporary file storage
data/pitch-analyses/ # Analysis result archives
`
Installation & Setup
Prerequisites
Core Requirements:
Additional Dependencies:
`json
{
"pdf-parse": "^1.1.1", // PDF text extraction
"mammoth": "^1.10.0", // DOCX processing
"jszip": "^3.10.1", // PPTX processing
"multer": "^1.4.5-lts.1", // File upload handling
"yauzl": "^3.2.0" // ZIP file processing
}
`
Environment Configuration
Required Variables (add to existing .env):
`bash
File Upload Settings
MAX_FILE_SIZE=52428800 # 50MB limit
UPLOAD_DIR=./uploads
TEMP_FILE_RETENTION=24 # Hours to keep uploaded files
Analysis Settings
MAX_CONCURRENT_ANALYSES=3 # Limit simultaneous analyses
ANALYSIS_TIMEOUT=600000 # 10 minutes timeout
DEFAULT_RUBRIC=therapeutics # Fallback rubric
Database Settings (inherited from main platform)
DATABASE_URL=postgresql://... # Shared database
OpenAI Settings (inherited from main platform)
OPENAI_API_KEY=sk-proj-... # Shared API key
OPENAI_MODEL=gpt-4o-mini # Model for analysis
`
Database Initialization
The pitch analyzer uses the same PostgreSQL instance as the main platform:
`bash
Initialize database (includes pitch analyzer tables)
npm run setup-db
Verify pitch analyzer tables
psql $DATABASE_URL -c "\dt pitch_*"
`
Manual Table Creation (if needed):
`sql
-- Connect to your database
\c yale_ventures_platform
-- Create pitch analysis table
CREATE TABLE pitch_analyses (
id SERIAL PRIMARY KEY,
analysis_id VARCHAR(255) UNIQUE NOT NULL,
original_filename TEXT NOT NULL,
file_type VARCHAR(10) NOT NULL,
analysis_data JSONB NOT NULL,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
-- Add indexes
CREATE INDEX idx_pitch_analyses_created_at ON pitch_analyses(created_at);
CREATE INDEX idx_pitch_analyses_analysis_id ON pitch_analyses(analysis_id);
CREATE INDEX idx_pitch_analyses_file_type ON pitch_analyses(file_type);
-- Grant permissions
GRANT ALL PRIVILEGES ON pitch_analyses TO platform_user;
GRANT USAGE, SELECT ON SEQUENCE pitch_analyses_id_seq TO platform_user;
`
Rubric Configuration
Understanding Rubrics
Rubrics define evaluation criteria for different types of startups. Each rubric contains:
Creating Custom Rubrics
1. Create Rubric File:
`bash
Create new rubric
cp rubrics/therapeutics.json rubrics/my-new-rubric.json
`
2. Rubric Structure:
`json
{
"id": "my-new-rubric",
"name": "My New Startup Evaluation Rubric",
"version": "1.0",
"description": "Evaluation criteria for [specific type] startups",
"max_points": 70,
"categories": [
{
"name": "Category Name",
"id": "category_id",
"max_points": 15,
"criteria": [
{
"id": "criterion_id",
"question": "What specific aspect should be evaluated?",
"max_points": 8,
"evaluation_focus": [
"Specific focus area 1",
"Specific focus area 2",
"Specific focus area 3"
]
}
]
}
]
}
`
3. Validation Rules:
Built-in Rubrics
Therapeutics (therapeutics.json):
Deep Tech (deep-tech.json):
AI (ai.json):
Modifying Existing Rubrics
Safe Modification Process:
Common Modifications:
`json
// Add new criterion to existing category
{
"id": "new_criterion",
"question": "Does the pitch address [specific aspect]?",
"max_points": 3,
"evaluation_focus": [
"Focus area 1",
"Focus area 2"
]
}
// Adjust point allocations
// (ensure totals still add up correctly)
"max_points": 10 // Changed from 8
// Add new category
{
"name": "New Category",
"id": "new_category",
"max_points": 10,
"criteria": [...]
}
`
Document Processing
Supported File Types
PDF Processing:
DOCX Processing:
PPTX Processing:
Processing Pipeline
Stage 1: File Validation (0-10%)
`javascript
// File type validation
const allowedTypes = ['.pdf', '.pptx', '.docx'];
const ext = path.extname(filename).toLowerCase();
// Size validation
const MAX_SIZE = 50 1024 1024; // 50MB
if (file.size > MAX_SIZE) throw new Error('File too large');
`
Stage 2: Text Extraction (10-30%)
`javascript
// Extract text based on file type
switch (ext) {
case '.pdf': text = await parsePDF(filePath); break;
case '.docx': text = await parseDOCX(filePath); break;
case '.pptx': text = await parsePPTX(filePath); break;
}
// Calculate metadata
metadata = {
wordCount: countWords(text),
characterCount: text.length,
pageCount: estimatePages(text, ext)
};
`
Stage 3: Content Validation (30-35%)
`javascript
// Ensure sufficient content for analysis
if (text.length < 500) {
throw new Error('Insufficient text content for analysis');
}
// Basic quality checks
if (text.match(/\[.parsing failed.\]/i)) {
throw new Error('Document parsing failed');
}
`
Troubleshooting Document Issues
PDF Issues:
`bash
Test PDF parsing
node -e "
const { parseDocument } = require('./pitch/src/document-parser.js');
parseDocument('./test.pdf').then(console.log);
"
Common PDF problems:
- Password protected: Remove password
- Image-only: Use OCR or get text-based version
- Corrupted: Re-save from original application
- Complex layout: May need manual review
`
DOCX Issues:
`bash
Test DOCX parsing
node -e "
const mammoth = require('mammoth');
mammoth.extractRawText({path: './test.docx'}).then(console.log);
"
Common DOCX problems:
- Corrupted file: Re-save from Word
- Complex formatting: May lose some structure
- Embedded objects: Text content only extracted
`
PPTX Issues:
`bash
Test PPTX parsing manually
unzip -l test.pptx | grep slide
unzip test.pptx ppt/slides/slide1.xml
Common PPTX problems:
- Image-only slides: No extractable text
- Complex animations: Text content only
- Embedded videos: Ignored during extraction
`
AI Evaluation Pipeline
Multi-Agent Architecture
The evaluation uses a three-agent pipeline:
Agent 1: Criteria Evaluation (0-60% progress)
Agent 2: Gap Analysis (60-80% progress)
Agent 3: Recommendations (80-100% progress)
Configuration Parameters
Model Settings:
`javascript
const modelConfig = {
model: 'gpt-4o-mini', // Cost-effective for high volume
temperature: 0.3, // Low for consistent evaluation
max_tokens: 1500, // Sufficient for detailed responses
timeout: 30000 // 30 second timeout per request
};
`
Rate Limiting:
`javascript
// Prevent API rate limit issues
const DELAY_BETWEEN_REQUESTS = 1000; // 1 second
const MAX_CONCURRENT_REQUESTS = 3; // Parallel limit
const MAX_RETRIES = 3; // Retry failed requests
`
Progress Tracking:
`javascript
// Progress calculation for UI updates
const totalCriteria = rubric.categories.reduce(
(sum, cat) => sum + cat.criteria.length, 0
);
const progress = {
criteria: (completedCriteria / totalCriteria) * 60, // 0-60%
gap_analysis: 60 + (gapProgress * 20), // 60-80%
recommendations: 80 + (recProgress * 20) // 80-100%
};
`
Customizing AI Prompts
Criteria Evaluation Prompt:
`javascript
const criteriaPrompt = `
You are evaluating a startup pitch deck against specific criteria for ${rubric.name}.
CATEGORY: ${category.name}
CRITERION: ${criterion.question}
MAX POINTS: ${criterion.max_points}
EVALUATION FOCUS: ${criterion.evaluation_focus?.join(', ')}
PITCH DECK CONTENT:
${documentText.substring(0, 6000)}
Based on the pitch deck content, evaluate this criterion and respond in JSON:
{
"score": [0 to ${criterion.max_points}],
"justification": "Detailed explanation with specific references",
"evidence": ["Specific evidence from pitch deck"],
"concerns": ["Missing elements or concerns"],
"suggestions": ["Brief improvement suggestions"]
}
`;
`
Prompt Customization Guidelines:
Error Handling
API Error Recovery:
`javascript
// Retry logic for OpenAI API failures
async function evaluateWithRetry(prompt, maxRetries = 3) {
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
return await openai.chat.completions.create(config);
} catch (error) {
if (attempt === maxRetries) throw error;
// Exponential backoff
await new Promise(resolve =>
setTimeout(resolve, 1000 * Math.pow(2, attempt))
);
}
}
}
`
JSON Parsing Fallbacks:
`javascript
// Handle malformed JSON responses
try {
const cleanedContent = content.replace(/`json\s|\s`/g, '');
evaluation = JSON.parse(cleanedContent);
} catch (parseError) {
// Fallback structure
evaluation = {
score: 0,
justification: content || 'Unable to parse evaluation',
evidence: [],
concerns: ['JSON parsing failed'],
suggestions: []
};
}
`
Performance & Monitoring
Performance Metrics
Target Performance:
Monitoring Commands:
`bash
Check analysis performance
psql $DATABASE_URL -c "
SELECT
file_type,
AVG(EXTRACT(epoch FROM (updated_at - created_at))) as avg_duration_seconds,
COUNT(*) as analysis_count
FROM pitch_analyses
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY file_type;
"
Monitor API usage
grep "OpenAI" logs/*.log | tail -20
Check memory usage during analysis
ps aux | grep node | grep pitch
`
Database Optimization
Regular Maintenance:
`sql
-- Analyze table statistics
ANALYZE pitch_analyses;
-- Check index usage
SELECT schemaname, tablename, indexname, idx_tup_read, idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public' AND tablename = 'pitch_analyses';
-- Archive old analyses (optional)
DELETE FROM pitch_analyses
WHERE created_at < NOW() - INTERVAL '1 year';
`
Performance Tuning:
`sql
-- Add additional indexes for common queries
CREATE INDEX idx_pitch_analyses_rubric ON pitch_analyses
USING gin ((analysis_data->>'rubricUsed'));
CREATE INDEX idx_pitch_analyses_score ON pitch_analyses
USING btree (((analysis_data->'evaluationResult'->'overallScore'->>'percentage')::int));
`
File Storage Management
Cleanup Uploaded Files:
`bash
Clean files older than 24 hours
find uploads/ -name "*.pdf" -mtime +1 -delete
find uploads/ -name "*.docx" -mtime +1 -delete
find uploads/ -name "*.pptx" -mtime +1 -delete
Archive completed analyses (optional)
mkdir -p archives/$(date +%Y%m)
cp data/pitch-analyses/*.json archives/$(date +%Y%m)/
`
Storage Monitoring:
`bash
Check storage usage
du -sh uploads/ data/ logs/
df -h
Monitor database size
psql $DATABASE_URL -c "
SELECT
pg_size_pretty(pg_total_relation_size('pitch_analyses')) as table_size,
pg_size_pretty(pg_database_size(current_database())) as db_size;
"
`
API Documentation
REST Endpoints
POST /pitch/analyze
GET /pitch/progress/:analysisId
GET /pitch/results/:analysisId
GET /pitch/api/results/:analysisId
GET /pitch/report/:analysisId
GET /pitch/report/:analysisId/pdf
GET /pitch/recent-analyses
DELETE /pitch/analysis/:analysisId
Integration Examples
Programmatic Analysis Trigger:
`javascript
// Upload and analyze pitch deck
const formData = new FormData();
formData.append('pitchDeck', fileBlob, 'pitch.pdf');
formData.append('rubric', 'therapeutics');
const response = await fetch('/pitch/analyze', {
method: 'POST',
body: formData,
credentials: 'include' // Include session cookie
});
const { analysisId } = await response.json();
// Poll for completion
const pollProgress = async () => {
const progress = await fetch(/pitch/progress/${analysisId});
const data = await progress.json();
if (data.status === 'completed') {
// Analysis complete, get results
const results = await fetch(/pitch/api/results/${analysisId});
return await results.json();
} else if (data.status === 'error') {
throw new Error(data.error);
} else {
// Continue polling
setTimeout(pollProgress, 2000);
}
};
`
Webhook Integration:
`javascript
// Add webhook notification to completed analysis
// In pitch-routes.js, after analysis completion:
const notifyWebhook = async (analysisId, results) => {
if (process.env.WEBHOOK_URL) {
try {
await fetch(process.env.WEBHOOK_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
event: 'analysis_completed',
analysisId,
overallScore: results.overallScore.percentage,
timestamp: new Date().toISOString()
})
});
} catch (error) {
console.warn('Webhook notification failed:', error);
}
}
};
`
Security & Access Control
Authentication Requirements
All pitch analyzer endpoints require admin authentication:
`javascript
// Authentication middleware check
if (!req.session?.isAdmin) {
return res.status(401).json({ error: 'Authentication required' });
}
`
File Upload Security
File Type Validation:
`javascript
const allowedTypes = ['.pdf', '.pptx', '.docx'];
const allowedMimeTypes = [
'application/pdf',
'application/vnd.openxmlformats-officedocument.presentationml.presentation',
'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
];
`
File Size Limits:
`javascript
const upload = multer({
storage: multerStorage,
limits: {
fileSize: 50 1024 1024, // 50MB limit
files: 1 // Single file only
},
fileFilter: validateFileType
});
`
Path Traversal Prevention:
`javascript
// Ensure uploaded files stay in designated directory
const safeFilename = path.basename(originalname);
const uploadPath = path.join(UPLOAD_DIR, safeFilename);
`
Data Privacy
Sensitive Information Handling:
Audit Trail:
`sql
-- Track analysis access
CREATE TABLE pitch_analysis_access_log (
id SERIAL PRIMARY KEY,
analysis_id VARCHAR(255),
user_session VARCHAR(255),
action VARCHAR(50),
timestamp TIMESTAMP DEFAULT NOW()
);
`
Troubleshooting Guide
Common Issues
1. Analysis Stuck in Progress:
`bash
Check for abandoned analyses
psql $DATABASE_URL -c "
SELECT analysis_id, created_at,
analysis_data->'progress'->>'stage' as stage
FROM pitch_analyses
WHERE analysis_data->'evaluationResult' IS NULL
AND created_at < NOW() - INTERVAL '30 minutes';
"
Manual cleanup if needed
DELETE FROM pitch_analyses
WHERE analysis_id = 'stuck_analysis_id';
`
2. OpenAI API Errors:
`bash
Test API connectivity
node -e "
const OpenAI = require('openai');
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{role: 'user', content: 'Test'}],
max_tokens: 10
}).then(console.log).catch(console.error);
"
Check API usage and limits
curl -H "Authorization: Bearer $OPENAI_API_KEY" \
"https://api.openai.com/v1/usage"
`
3. File Processing Failures:
`bash
Test document parser directly
node -e "
const { parseDocument } = require('./pitch/src/document-parser.js');
parseDocument('./test-file.pdf').then(result => {
console.log('Success:', result.extractedText.length, 'characters');
}).catch(console.error);
"
Check file permissions
ls -la uploads/
chmod 755 uploads/
`
4. Database Connection Issues:
`bash
Test database connectivity
psql $DATABASE_URL -c "SELECT COUNT(*) FROM pitch_analyses;"
Check table structure
psql $DATABASE_URL -c "\d pitch_analyses"
`
5. Memory Issues During Analysis:
`bash
Monitor memory usage
watch "ps aux | grep node | grep -v grep"
Check for memory leaks
node --expose-gc pitch-analysis-test.js
Restart if needed
pm2 restart yale-ventures-platform
`
Error Recovery Procedures
Recovery from Failed Analysis:
Database Recovery:
`sql
-- Backup before recovery
pg_dump yale_ventures_platform > backup_before_recovery.sql
-- Reset progress tracking for stuck analyses
UPDATE pitch_analyses
SET analysis_data = analysis_data - 'progress'
WHERE analysis_data->'evaluationResult' IS NULL
AND created_at < NOW() - INTERVAL '1 hour';
`
---
This admin guide provides comprehensive technical documentation for maintaining and extending the Pitch Deck Analyzer. For user-facing features, refer to the separate user guide.