Content Aggregation Platform - Admin Guide
Overview
This guide covers the technical setup, configuration, and maintenance of the Content Aggregation Platform. The platform uses a Node.js-based architecture with PostgreSQL database, integrating multiple APIs for content collection, AI classification, and email delivery.
System Architecture
Core Components
Backend Services:
- Web Server (
src/web-server.js) - Express.js dashboard and API - Scheduler (
src/scheduler.js) - Automated pipeline execution using node-cron - Pipeline Orchestrator (
src/orchestrator.js) - Coordinates collection, classification, and reporting - Database Layer (
src/utils/database.js) - PostgreSQL integration for persistent storage - RSS Collection (
src/collect-and-classify.js) - Fetches and processes RSS feeds - AI Classification - OpenAI API integration for content relevance scoring
- Email Generation (
src/email-sender.js) - Creates and sends formatted reports - Report Generation (
src/weekly-report.js) - Comprehensive digest compilation pitch_analyses- Stores pitch deck analysis results- Configuration stored in JSON files under
use-cases/ - Run results stored in timestamped folders under
data/ - Node.js 18+ and npm 8+
- PostgreSQL 12+ database
- Git for version control
- OpenAI API key for AI classification
- Email service API key (Resend or SMTP credentials)
- Admin notification email addresses
- Pitch analysis storage
- Session management
- Configuration caching
- [Criterion 1]: [Description]
- [Criterion 2]: [Description]
- [Criterion 3]: [Description]
- 90-100%: [Description of highest relevance]
- 75-89%: [Description of high relevance]
- 60-74%: [Description of moderate relevance]
- Below 60%: Not relevant, filter out
- relevance_score (0-100)
- reasoning (brief explanation)
- key_factors (list of relevant aspects)
- Edit
use-cases/{name}/config.json - Add/remove URLs in
content_sourcesarray - Test with manual run:
npm run run-{name} - Edit
use-cases/{name}/prompts.md - Update classification thresholds in
config.json - Test with preview:
node src/preview-report.js {name} - Update
delivery.daily.recipientsanddelivery.weekly.recipients - Modify email templates in
views/if needed - Test email delivery:
node src/send-report.js {name} test - Heroku CLI installed
- PostgreSQL addon provisioned
- Environment variables configured
- Ubuntu 20.04+ or similar Linux distribution
- 2+ GB RAM, 20+ GB storage
- Node.js 18+, PostgreSQL 12+, Nginx (optional)
- Application logs:
logs/directory - Web server access: Console output or
web-server.log - Pipeline execution:
logs/{use-case}-{timestamp}.log - Error logs:
logs/error.log - Pipeline execution time (target: <5 minutes for daily runs)
- Classification accuracy (monitor false positive/negative feedback)
- Email delivery success rate
- Database query performance
- Memory usage and process stability
- Pipeline results in
data/directory are automatically preserved - Configuration files in
use-cases/should be version controlled - Log files can be archived and compressed weekly
- Store API keys only in environment variables
- Rotate OpenAI API keys quarterly
- Monitor API key usage for unusual patterns
- Use least-privilege access for database users
- Implement admin authentication for web dashboard
- Use HTTPS in production (SSL/TLS certificates)
- Restrict database access to application only
- Monitor login attempts and suspicious activity
- Classify emails may contain sensitive research information
- Ensure proper access controls on email recipients
- Consider data retention policies for old analyses
- Follow Yale's data governance requirements
pages/dashboard.hbs- Main dashboardpartials/email-daily.hbs- Daily report templatepartials/email-weekly.hbs- Weekly report templateGET /api/use-cases- List configured use casesGET /api/runs/:useCase- Recent pipeline runsPOST /api/trigger/:useCase- Manual pipeline triggerGET /api/results/:runId- Detailed run results- Cache RSS feed responses for 15 minutes
- Store classification results to avoid re-processing
- Use Redis for session storage in production
- Implement CDN for static assets
Key Modules:
Database Schema
Core Tables:
File Structure
`
content-aggregation-platform/
├── src/ # Core application code
│ ├── web-server.js # Main web dashboard
│ ├── scheduler.js # Automated scheduling
│ ├── orchestrator.js # Pipeline coordination
│ ├── collect-and-classify.js # RSS collection and AI classification
│ ├── daily-collection.js # Daily pipeline runner
│ ├── weekly-report.js # Weekly report generator
│ ├── email-sender.js # Email delivery
│ └── utils/ # Utility modules
│ ├── database.js # PostgreSQL operations
│ ├── logging.js # Winston logging
│ ├── circuit-breaker.js # Reliability patterns
│ └── retry.js # Error recovery
├── use-cases/ # Configuration for different teams
│ ├── blavatnik/ # Blavatnik Fund configuration
│ ├── roberts/ # Roberts Innovation Fund
│ └── marketing/ # Marketing team
├── data/ # Pipeline run results
├── logs/ # Application logs
├── rubrics/ # Pitch analyzer evaluation criteria
├── pitch/ # Pitch analyzer module
└── views/ # Web dashboard templates
`
Installation & Setup
Prerequisites
Required Software:
Required API Keys:
Environment Configuration
Create .env file with required variables:
`bash
Core Configuration
NODE_ENV=production
PORT=3000
TZ=America/New_York
Database
DATABASE_URL=postgresql://username:password@host:port/database
OpenAI API
OPENAI_API_KEY=sk-proj-your_key_here
Email Service (Resend - Recommended)
RESEND_API_KEY=re_your_key_here
FROM_EMAIL=news@yourdomain.com
Alternative: SMTP Configuration
SMTP_HOST=smtp.gmail.com
SMTP_PORT=587
SMTP_USER=your.email@gmail.com
SMTP_PASS=your_app_password
Admin Settings
ADMIN_EMAIL=admin@yourdomain.com
ADMIN_PASSWORD=secure_admin_password
Scheduler Settings
ENABLE_SCHEDULER=true
`
Database Setup
Initialize Database:
`bash
npm run setup-db
`
This creates required tables and indexes for:
PostgreSQL Setup (if needed):
`sql
CREATE DATABASE yale_ventures_platform;
CREATE USER platform_user WITH ENCRYPTED PASSWORD 'secure_password';
GRANT ALL PRIVILEGES ON DATABASE yale_ventures_platform TO platform_user;
`
Application Installation
`bash
Clone and install dependencies
git clone cd content-aggregation-platform
npm install npm run setup-db node scripts/test-api-key.js
node scripts/test-rss-collection.js npm run web
Each use case requires a configuration directory under 1. Create Directory Structure:
mkdir use-cases/new-team
cd use-cases/new-team
2. Configuration File ( {
"name": "new-team",
"display_name": "New Team Name",
"description": "Brief description of the use case",
"content_sources": [
{
"name": "Yale News Health",
"url": "https://news.yale.edu/topics/health-medicine/feed",
"type": "rss"
}
],
"classification": {
"model": "gpt-4",
"threshold": 75,
"max_tokens": 1000
},
"delivery": {
"daily": {
"enabled": true,
"schedule": "0 7 1-5",
"recipients": ["team@yourdomain.com"]
},
"weekly": {
"enabled": true,
"schedule": "0 8 1",
"recipients": ["team@yourdomain.com", "manager@yourdomain.com"]
}
},
"email": {
"from_name": "Yale Ventures Content",
"subject_prefix": "Team Name",
"template": "default"
}
}
3. Prompts Configuration ( You are an expert analyst helping [Team Name] identify relevant content for [specific focus area].Initialize database
Test configuration
Start web dashboard
`Use Case Configuration
Creating New Use Cases
use-cases/:`bash
`config.json):
`json
`prompts.md):
`markdown
Content Classification Prompt
Classification Criteria
Scoring Guidelines
Output Format
Provide JSON with:
`
Modifying Existing Use Cases
Update RSS Sources:
Adjust Classification Criteria:
Change Email Recipients:
Deployment & Infrastructure
Local Development
`bash
Start development server
npm run dev
Access dashboard
open http://localhost:3000
Run tests
npm test
npm run test:coverage
`
Production Deployment (Heroku)
Prerequisites:
Deployment Steps:
`bash
Login and create app
heroku login
heroku create yale-ventures-platform
Add PostgreSQL addon
heroku addons:create heroku-postgresql:mini
Configure environment variables
heroku config:set OPENAI_API_KEY=sk-proj-...
heroku config:set RESEND_API_KEY=re_...
heroku config:set FROM_EMAIL=news@yourdomain.com
heroku config:set ADMIN_EMAIL=admin@yourdomain.com
heroku config:set NODE_ENV=production
heroku config:set TZ=America/New_York
Deploy application
git push heroku main
Scale scheduler process
heroku ps:scale scheduler=1
Initialize database
heroku run npm run setup-db
`
Production Monitoring:
`bash
View logs
heroku logs --tail
Check process status
heroku ps
Monitor scheduler
heroku logs --tail --dyno scheduler
Check database status
heroku pg:info
`
Alternative Deployment (Self-Hosted)
Server Requirements:
Setup Process:
`bash
Install dependencies
sudo apt update
sudo apt install nodejs npm postgresql nginx
Clone and configure application
git clone cd content-aggregation-platform
npm install --production sudo -u postgres createdb yale_ventures_platform
sudo -u postgres createuser platform_user sudo cp deployment/yale-ventures.service /etc/systemd/system/
sudo systemctl enable yale-ventures
sudo systemctl start yale-ventures sudo cp deployment/nginx.conf /etc/nginx/sites-available/yale-ventures
sudo ln -s /etc/nginx/sites-available/yale-ventures /etc/nginx/sites-enabled/
sudo systemctl reload nginx
Log Locations:
Setup PostgreSQL database
Configure systemd service
Setup Nginx proxy (optional)
`Monitoring & Maintenance
Logging & Debugging
Debug Commands:
`bash
Test API connections
node scripts/test-api-key.js
Test RSS collection
node scripts/test-rss-collection.js
Preview report without sending
node src/preview-report.js blavatnik
Check database connectivity
node scripts/test-database.js
Manual pipeline execution
NODE_ENV=development node src/run-pipeline.js blavatnik 24
`
Performance Monitoring
Key Metrics:
Monitoring Commands:
`bash
Check process status
npm run scheduler-status
View database performance
heroku pg:outliers # Heroku only
heroku pg:blocking # Heroku only
Monitor memory usage
ps aux | grep node
Check disk usage
df -h
du -sh data/ logs/
`
Backup & Recovery
Database Backups:
`bash
Manual backup
pg_dump yale_ventures_platform > backup-$(date +%Y%m%d).sql
Heroku automated backups
heroku pg:backups:schedule DATABASE_URL --at '02:00 America/New_York'
Restore from backup
heroku pg:backups:restore Data Backups:
`
Troubleshooting
Common Issues:
1. Pipeline Not Running:
`bash
Check scheduler status
heroku ps
npm run scheduler-status
Restart scheduler
heroku ps:restart scheduler
npm run restart-scheduler
`
2. OpenAI API Errors:
`bash
Check API key validity
node scripts/test-api-key.js
Monitor rate limits in logs
grep "rate limit" logs/*.log
Verify billing and usage
Visit https://platform.openai.com/usage
`
3. Email Delivery Issues:
`bash
Test email configuration
node src/send-report.js blavatnik test
Check Resend dashboard for delivery status
Verify SMTP credentials if using SMTP
`
4. Database Connection Issues:
`bash
Test database connectivity
node scripts/test-database.js
Check connection string
echo $DATABASE_URL
Verify PostgreSQL service status
sudo systemctl status postgresql
`
5. Memory/Performance Issues:
`bash
Check memory usage
free -h
ps aux --sort=-%mem | head
Monitor long-running processes
ps aux | grep node
Check disk space
df -h
Clear old logs if needed
find logs/ -name "*.log" -mtime +30 -delete
`
Security Considerations
API Key Management:
Access Control:
Data Privacy:
Advanced Configuration
Custom Email Templates
Templates located in views/ directory using Handlebars:
Customization Example:
`handlebars
{{!-- Custom daily email template --}}
{{useCase.display_name}} Daily Report - {{formatDate timestamp}}
{{#if highPriorityArticles}}
🔴 High Priority (90%+ relevance)
{{#each highPriorityArticles}}
{{/each}}
{{/if}}
`
API Integration
REST API Endpoints:
Webhook Integration:
`javascript
// Example webhook for external systems
app.post('/webhook/pipeline-complete', (req, res) => {
const { useCase, runId, success, summary } = req.body;
// Send notification to external system
// Update dashboard, trigger downstream processes, etc.
res.json({ received: true });
});
`
Performance Optimization
Database Optimization:
`sql
-- Add indexes for common queries
CREATE INDEX idx_pitch_analyses_created_at ON pitch_analyses(created_at);
CREATE INDEX idx_pitch_analyses_file_type ON pitch_analyses(file_type);
-- Analyze query performance
EXPLAIN ANALYZE SELECT * FROM pitch_analyses ORDER BY created_at DESC LIMIT 20;
`
Caching Strategies:
Resource Management:
`javascript
// Example resource limits
const CONCURRENT_CLASSIFICATIONS = 5;
const MAX_MEMORY_USAGE = '512MB';
const REQUEST_TIMEOUT = 30000; // 30 seconds
`
---
This admin guide provides comprehensive technical documentation for maintaining and extending the Content Aggregation Platform. For user-facing features, refer to the separate user guides.