Content Aggregation Platform - Admin Guide

Overview

This guide covers the technical setup, configuration, and maintenance of the Content Aggregation Platform. The platform uses a Node.js-based architecture with PostgreSQL database, integrating multiple APIs for content collection, AI classification, and email delivery.

System Architecture

Core Components

Backend Services:

Web Server (src/web-server.js) - Express.js dashboard and API
Scheduler (src/scheduler.js) - Automated pipeline execution using node-cron
Pipeline Orchestrator (src/orchestrator.js) - Coordinates collection, classification, and reporting
Database Layer (src/utils/database.js) - PostgreSQL integration for persistent storage

Key Modules:

RSS Collection (src/collect-and-classify.js) - Fetches and processes RSS feeds
AI Classification - OpenAI API integration for content relevance scoring
Email Generation (src/email-sender.js) - Creates and sends formatted reports
Report Generation (src/weekly-report.js) - Comprehensive digest compilation

Database Schema

Core Tables:

pitch_analyses - Stores pitch deck analysis results
Configuration stored in JSON files under use-cases/
Run results stored in timestamped folders under data/

File Structure

content-aggregation-platform/

├── src/ # Core application code

│ ├── web-server.js # Main web dashboard

│ ├── scheduler.js # Automated scheduling

│ ├── orchestrator.js # Pipeline coordination

│ ├── collect-and-classify.js # RSS collection and AI classification

│ ├── daily-collection.js # Daily pipeline runner

│ ├── weekly-report.js # Weekly report generator

│ ├── email-sender.js # Email delivery

│ └── utils/ # Utility modules

│ ├── database.js # PostgreSQL operations

│ ├── logging.js # Winston logging

│ ├── circuit-breaker.js # Reliability patterns

│ └── retry.js # Error recovery

├── use-cases/ # Configuration for different teams

│ ├── blavatnik/ # Blavatnik Fund configuration

│ ├── roberts/ # Roberts Innovation Fund

│ └── marketing/ # Marketing team

├── data/ # Pipeline run results

├── logs/ # Application logs

├── rubrics/ # Pitch analyzer evaluation criteria

├── pitch/ # Pitch analyzer module

└── views/ # Web dashboard templates

Installation & Setup

Prerequisites

Required Software:

Node.js 18+ and npm 8+
PostgreSQL 12+ database
Git for version control

Required API Keys:

OpenAI API key for AI classification
Email service API key (Resend or SMTP credentials)
Admin notification email addresses

Environment Configuration

Create .env file with required variables:

`bash

Core Configuration

NODE_ENV=production

PORT=3000

TZ=America/New_York

Database

DATABASE_URL=postgresql://username:password@host:port/database

OpenAI API

OPENAI_API_KEY=sk-proj-your_key_here

Email Service (Resend - Recommended)

RESEND_API_KEY=re_your_key_here

FROM_EMAIL=news@yourdomain.com

Alternative: SMTP Configuration

SMTP_HOST=smtp.gmail.com

SMTP_PORT=587

SMTP_USER=your.email@gmail.com

SMTP_PASS=your_app_password

Admin Settings

ADMIN_EMAIL=admin@yourdomain.com

ADMIN_PASSWORD=secure_admin_password

Scheduler Settings

ENABLE_SCHEDULER=true

Database Setup

Initialize Database:

`bash

npm run setup-db

This creates required tables and indexes for:

Pitch analysis storage
Session management
Configuration caching

PostgreSQL Setup (if needed):

`sql

CREATE DATABASE yale_ventures_platform;

CREATE USER platform_user WITH ENCRYPTED PASSWORD 'secure_password';

GRANT ALL PRIVILEGES ON DATABASE yale_ventures_platform TO platform_user;

Application Installation

`bash

Clone and install dependencies

git clone

cd content-aggregation-platform

npm install

Initialize database

npm run setup-db

Test configuration

node scripts/test-api-key.js

node scripts/test-rss-collection.js

Start web dashboard

npm run web

Use Case Configuration

Creating New Use Cases

Each use case requires a configuration directory under use-cases/:

1. Create Directory Structure:

`bash

mkdir use-cases/new-team

cd use-cases/new-team

2. Configuration File (config.json):

`json

{

"name": "new-team",

"display_name": "New Team Name",

"description": "Brief description of the use case",

"content_sources": [

{

"name": "Yale News Health",

"url": "https://news.yale.edu/topics/health-medicine/feed",

"type": "rss"

}

"classification": {

"model": "gpt-4",

"threshold": 75,

"max_tokens": 1000

"delivery": {

"daily": {

"enabled": true,

"schedule": "0 7 1-5",

"recipients": ["team@yourdomain.com"]

"weekly": {

"enabled": true,

"schedule": "0 8 1",

"recipients": ["team@yourdomain.com", "manager@yourdomain.com"]

}

"email": {

"from_name": "Yale Ventures Content",

"subject_prefix": "Team Name",

"template": "default"

}

3. Prompts Configuration (prompts.md):

`markdown

Content Classification Prompt

You are an expert analyst helping [Team Name] identify relevant content for [specific focus area].

Classification Criteria

[Criterion 1]: [Description]
[Criterion 2]: [Description]
[Criterion 3]: [Description]

Scoring Guidelines

90-100%: [Description of highest relevance]
75-89%: [Description of high relevance]
60-74%: [Description of moderate relevance]
Below 60%: Not relevant, filter out

Output Format

Provide JSON with:

relevance_score (0-100)
reasoning (brief explanation)
key_factors (list of relevant aspects)

Modifying Existing Use Cases

Update RSS Sources:

Edit use-cases/{name}/config.json
Add/remove URLs in content_sources array
Test with manual run: npm run run-{name}

Adjust Classification Criteria:

Edit use-cases/{name}/prompts.md
Update classification thresholds in config.json
Test with preview: node src/preview-report.js {name}

Change Email Recipients:

Update delivery.daily.recipients and delivery.weekly.recipients
Modify email templates in views/ if needed
Test email delivery: node src/send-report.js {name} test

Deployment & Infrastructure

Local Development

`bash

Start development server

npm run dev

Access dashboard

open http://localhost:3000

Run tests

npm test

npm run test:coverage

Production Deployment (Heroku)

Prerequisites:

Heroku CLI installed
PostgreSQL addon provisioned
Environment variables configured

Deployment Steps:

`bash

Login and create app

heroku login

heroku create yale-ventures-platform

Add PostgreSQL addon

heroku addons:create heroku-postgresql:mini

Configure environment variables

heroku config:set OPENAI_API_KEY=sk-proj-...

heroku config:set RESEND_API_KEY=re_...

heroku config:set FROM_EMAIL=news@yourdomain.com

heroku config:set ADMIN_EMAIL=admin@yourdomain.com

heroku config:set NODE_ENV=production

heroku config:set TZ=America/New_York

Deploy application

git push heroku main

Scale scheduler process

heroku ps:scale scheduler=1

Initialize database

heroku run npm run setup-db

Production Monitoring:

`bash

View logs

heroku logs --tail

Check process status

heroku ps

Monitor scheduler

heroku logs --tail --dyno scheduler

Check database status

heroku pg:info

Alternative Deployment (Self-Hosted)

Server Requirements:

Ubuntu 20.04+ or similar Linux distribution
2+ GB RAM, 20+ GB storage
Node.js 18+, PostgreSQL 12+, Nginx (optional)

Setup Process:

`bash

Install dependencies

sudo apt update

sudo apt install nodejs npm postgresql nginx

Clone and configure application

git clone

cd content-aggregation-platform

npm install --production

Setup PostgreSQL database

sudo -u postgres createdb yale_ventures_platform

sudo -u postgres createuser platform_user

Configure systemd service

sudo cp deployment/yale-ventures.service /etc/systemd/system/

sudo systemctl enable yale-ventures

sudo systemctl start yale-ventures

Setup Nginx proxy (optional)

sudo cp deployment/nginx.conf /etc/nginx/sites-available/yale-ventures

sudo ln -s /etc/nginx/sites-available/yale-ventures /etc/nginx/sites-enabled/

sudo systemctl reload nginx

Monitoring & Maintenance

Logging & Debugging

Log Locations:

Application logs: logs/ directory
Web server access: Console output or web-server.log
Pipeline execution: logs/{use-case}-{timestamp}.log
Error logs: logs/error.log

Debug Commands:

`bash

Test API connections

node scripts/test-api-key.js

Test RSS collection

node scripts/test-rss-collection.js

Preview report without sending

node src/preview-report.js blavatnik

Check database connectivity

node scripts/test-database.js

Manual pipeline execution

NODE_ENV=development node src/run-pipeline.js blavatnik 24

Performance Monitoring

Key Metrics:

Pipeline execution time (target: <5 minutes for daily runs)
Classification accuracy (monitor false positive/negative feedback)
Email delivery success rate
Database query performance
Memory usage and process stability

Monitoring Commands:

`bash

Check process status

npm run scheduler-status

View database performance

heroku pg:outliers # Heroku only

heroku pg:blocking # Heroku only

Monitor memory usage

ps aux | grep node

Check disk usage

df -h

du -sh data/ logs/

Backup & Recovery

Database Backups:

`bash

Manual backup

pg_dump yale_ventures_platform > backup-$(date +%Y%m%d).sql

Heroku automated backups

heroku pg:backups:schedule DATABASE_URL --at '02:00 America/New_York'

Restore from backup

heroku pg:backups:restore DATABASE_URL

Data Backups:

Pipeline results in data/ directory are automatically preserved
Configuration files in use-cases/ should be version controlled
Log files can be archived and compressed weekly

Troubleshooting

Common Issues:

1. Pipeline Not Running:

`bash

Check scheduler status

heroku ps

npm run scheduler-status

Restart scheduler

heroku ps:restart scheduler

npm run restart-scheduler

2. OpenAI API Errors:

`bash

Check API key validity

node scripts/test-api-key.js

Monitor rate limits in logs

grep "rate limit" logs/*.log

Verify billing and usage

Visit https://platform.openai.com/usage

3. Email Delivery Issues:

`bash

Test email configuration

node src/send-report.js blavatnik test

Check Resend dashboard for delivery status

Verify SMTP credentials if using SMTP

4. Database Connection Issues:

`bash

Test database connectivity

node scripts/test-database.js

Check connection string

echo $DATABASE_URL

Verify PostgreSQL service status

sudo systemctl status postgresql

5. Memory/Performance Issues:

`bash

Check memory usage

free -h

ps aux --sort=-%mem | head

Monitor long-running processes

ps aux | grep node

Check disk space

df -h

Clear old logs if needed

find logs/ -name "*.log" -mtime +30 -delete

Security Considerations

API Key Management:

Store API keys only in environment variables
Rotate OpenAI API keys quarterly
Monitor API key usage for unusual patterns
Use least-privilege access for database users

Access Control:

Implement admin authentication for web dashboard
Use HTTPS in production (SSL/TLS certificates)
Restrict database access to application only
Monitor login attempts and suspicious activity

Data Privacy:

Classify emails may contain sensitive research information
Ensure proper access controls on email recipients
Consider data retention policies for old analyses
Follow Yale's data governance requirements

Advanced Configuration

Custom Email Templates

Templates located in views/ directory using Handlebars:

pages/dashboard.hbs - Main dashboard
partials/email-daily.hbs - Daily report template
partials/email-weekly.hbs - Weekly report template

Customization Example:

`handlebars

{{useCase.display_name}} Daily Report - {{formatDate timestamp}}

🔴 High Priority (90%+ relevance)

Confidence: {{confidence}}%

API Integration

REST API Endpoints:

GET /api/use-cases - List configured use cases
GET /api/runs/:useCase - Recent pipeline runs
POST /api/trigger/:useCase - Manual pipeline trigger
GET /api/results/:runId - Detailed run results

Webhook Integration:

`javascript

// Example webhook for external systems

app.post('/webhook/pipeline-complete', (req, res) => {

const { useCase, runId, success, summary } = req.body;

// Send notification to external system

// Update dashboard, trigger downstream processes, etc.

res.json({ received: true });

});

Performance Optimization

Database Optimization:

`sql

-- Add indexes for common queries

CREATE INDEX idx_pitch_analyses_created_at ON pitch_analyses(created_at);

CREATE INDEX idx_pitch_analyses_file_type ON pitch_analyses(file_type);

-- Analyze query performance

EXPLAIN ANALYZE SELECT * FROM pitch_analyses ORDER BY created_at DESC LIMIT 20;

Caching Strategies:

Cache RSS feed responses for 15 minutes
Store classification results to avoid re-processing
Use Redis for session storage in production
Implement CDN for static assets

Resource Management:

`javascript

// Example resource limits

const CONCURRENT_CLASSIFICATIONS = 5;

const MAX_MEMORY_USAGE = '512MB';

const REQUEST_TIMEOUT = 30000; // 30 seconds

---

This admin guide provides comprehensive technical documentation for maintaining and extending the Content Aggregation Platform. For user-facing features, refer to the separate user guides.

boola

Content Aggregation Platform - Admin Guide

Content Aggregation Platform - Admin Guide

Overview

System Architecture

Core Components

Database Schema

File Structure

Installation & Setup

Prerequisites

Environment Configuration

Core Configuration

Database

OpenAI API

Email Service (Resend - Recommended)

Alternative: SMTP Configuration

Admin Settings

Scheduler Settings

Database Setup

Application Installation

Clone and install dependencies

Initialize database

Test configuration

Start web dashboard

Use Case Configuration

Creating New Use Cases

Content Classification Prompt

Classification Criteria

Scoring Guidelines

Output Format

Modifying Existing Use Cases

Deployment & Infrastructure

Local Development

Start development server

Access dashboard

Run tests

Production Deployment (Heroku)

Login and create app

Add PostgreSQL addon

Configure environment variables

Deploy application

Scale scheduler process

Initialize database

View logs

Check process status

Monitor scheduler

Check database status

Alternative Deployment (Self-Hosted)

Install dependencies

Clone and configure application

Setup PostgreSQL database

Configure systemd service

Setup Nginx proxy (optional)

Monitoring & Maintenance

Logging & Debugging

Test API connections

Test RSS collection

Preview report without sending

Check database connectivity

Manual pipeline execution

Performance Monitoring

Check process status

View database performance

Monitor memory usage

Check disk usage

Backup & Recovery

Manual backup

Heroku automated backups

Restore from backup

Troubleshooting

Check scheduler status

Restart scheduler

Check API key validity

Monitor rate limits in logs

Verify billing and usage

Visit https://platform.openai.com/usage

Test email configuration

Check Resend dashboard for delivery status

Verify SMTP credentials if using SMTP

Test database connectivity