Content Aggregation Platform - Admin Guide

Administrative setup and configuration

Content Aggregation Platform - Admin Guide

Overview

This guide covers the technical setup, configuration, and maintenance of the Content Aggregation Platform. The platform uses a Node.js-based architecture with PostgreSQL database, integrating multiple APIs for content collection, AI classification, and email delivery.

System Architecture

Core Components

Backend Services:

  • Web Server (src/web-server.js) - Express.js dashboard and API
  • Scheduler (src/scheduler.js) - Automated pipeline execution using node-cron
  • Pipeline Orchestrator (src/orchestrator.js) - Coordinates collection, classification, and reporting
  • Database Layer (src/utils/database.js) - PostgreSQL integration for persistent storage
  • Key Modules:

  • RSS Collection (src/collect-and-classify.js) - Fetches and processes RSS feeds
  • AI Classification - OpenAI API integration for content relevance scoring
  • Email Generation (src/email-sender.js) - Creates and sends formatted reports
  • Report Generation (src/weekly-report.js) - Comprehensive digest compilation
  • Database Schema

    Core Tables:

  • pitch_analyses - Stores pitch deck analysis results
  • Configuration stored in JSON files under use-cases/
  • Run results stored in timestamped folders under data/
  • File Structure

    `

    content-aggregation-platform/

    ├── src/ # Core application code

    │ ├── web-server.js # Main web dashboard

    │ ├── scheduler.js # Automated scheduling

    │ ├── orchestrator.js # Pipeline coordination

    │ ├── collect-and-classify.js # RSS collection and AI classification

    │ ├── daily-collection.js # Daily pipeline runner

    │ ├── weekly-report.js # Weekly report generator

    │ ├── email-sender.js # Email delivery

    │ └── utils/ # Utility modules

    │ ├── database.js # PostgreSQL operations

    │ ├── logging.js # Winston logging

    │ ├── circuit-breaker.js # Reliability patterns

    │ └── retry.js # Error recovery

    ├── use-cases/ # Configuration for different teams

    │ ├── blavatnik/ # Blavatnik Fund configuration

    │ ├── roberts/ # Roberts Innovation Fund

    │ └── marketing/ # Marketing team

    ├── data/ # Pipeline run results

    ├── logs/ # Application logs

    ├── rubrics/ # Pitch analyzer evaluation criteria

    ├── pitch/ # Pitch analyzer module

    └── views/ # Web dashboard templates

    `

    Installation & Setup

    Prerequisites

    Required Software:

  • Node.js 18+ and npm 8+
  • PostgreSQL 12+ database
  • Git for version control
  • Required API Keys:

  • OpenAI API key for AI classification
  • Email service API key (Resend or SMTP credentials)
  • Admin notification email addresses
  • Environment Configuration

    Create .env file with required variables:

    `bash

    Core Configuration

    NODE_ENV=production

    PORT=3000

    TZ=America/New_York

    Database

    DATABASE_URL=postgresql://username:password@host:port/database

    OpenAI API

    OPENAI_API_KEY=sk-proj-your_key_here

    Email Service (Resend - Recommended)

    RESEND_API_KEY=re_your_key_here

    FROM_EMAIL=news@yourdomain.com

    Alternative: SMTP Configuration

    SMTP_HOST=smtp.gmail.com

    SMTP_PORT=587

    SMTP_USER=your.email@gmail.com

    SMTP_PASS=your_app_password

    Admin Settings

    ADMIN_EMAIL=admin@yourdomain.com

    ADMIN_PASSWORD=secure_admin_password

    Scheduler Settings

    ENABLE_SCHEDULER=true

    `

    Database Setup

    Initialize Database:

    `bash

    npm run setup-db

    `

    This creates required tables and indexes for:

  • Pitch analysis storage
  • Session management
  • Configuration caching
  • PostgreSQL Setup (if needed):

    `sql

    CREATE DATABASE yale_ventures_platform;

    CREATE USER platform_user WITH ENCRYPTED PASSWORD 'secure_password';

    GRANT ALL PRIVILEGES ON DATABASE yale_ventures_platform TO platform_user;

    `

    Application Installation

    `bash

    Clone and install dependencies

    git clone

    cd content-aggregation-platform

    npm install

    Initialize database

    npm run setup-db

    Test configuration

    node scripts/test-api-key.js

    node scripts/test-rss-collection.js

    Start web dashboard

    npm run web

    `

    Use Case Configuration

    Creating New Use Cases

    Each use case requires a configuration directory under use-cases/:

    1. Create Directory Structure:

    `bash

    mkdir use-cases/new-team

    cd use-cases/new-team

    `

    2. Configuration File (config.json):

    `json

    {

    "name": "new-team",

    "display_name": "New Team Name",

    "description": "Brief description of the use case",

    "content_sources": [

    {

    "name": "Yale News Health",

    "url": "https://news.yale.edu/topics/health-medicine/feed",

    "type": "rss"

    }

    ],

    "classification": {

    "model": "gpt-4",

    "threshold": 75,

    "max_tokens": 1000

    },

    "delivery": {

    "daily": {

    "enabled": true,

    "schedule": "0 7 1-5",

    "recipients": ["team@yourdomain.com"]

    },

    "weekly": {

    "enabled": true,

    "schedule": "0 8 1",

    "recipients": ["team@yourdomain.com", "manager@yourdomain.com"]

    }

    },

    "email": {

    "from_name": "Yale Ventures Content",

    "subject_prefix": "Team Name",

    "template": "default"

    }

    }

    `

    3. Prompts Configuration (prompts.md):

    `markdown

    Content Classification Prompt

    You are an expert analyst helping [Team Name] identify relevant content for [specific focus area].

    Classification Criteria

  • [Criterion 1]: [Description]
  • [Criterion 2]: [Description]
  • [Criterion 3]: [Description]
  • Scoring Guidelines

  • 90-100%: [Description of highest relevance]
  • 75-89%: [Description of high relevance]
  • 60-74%: [Description of moderate relevance]
  • Below 60%: Not relevant, filter out
  • Output Format

    Provide JSON with:

  • relevance_score (0-100)
  • reasoning (brief explanation)
  • key_factors (list of relevant aspects)
  • `

    Modifying Existing Use Cases

    Update RSS Sources:

  • Edit use-cases/{name}/config.json
  • Add/remove URLs in content_sources array
  • Test with manual run: npm run run-{name}
  • Adjust Classification Criteria:

  • Edit use-cases/{name}/prompts.md
  • Update classification thresholds in config.json
  • Test with preview: node src/preview-report.js {name}
  • Change Email Recipients:

  • Update delivery.daily.recipients and delivery.weekly.recipients
  • Modify email templates in views/ if needed
  • Test email delivery: node src/send-report.js {name} test
  • Deployment & Infrastructure

    Local Development

    `bash

    Start development server

    npm run dev

    Access dashboard

    open http://localhost:3000

    Run tests

    npm test

    npm run test:coverage

    `

    Production Deployment (Heroku)

    Prerequisites:

  • Heroku CLI installed
  • PostgreSQL addon provisioned
  • Environment variables configured
  • Deployment Steps:

    `bash

    Login and create app

    heroku login

    heroku create yale-ventures-platform

    Add PostgreSQL addon

    heroku addons:create heroku-postgresql:mini

    Configure environment variables

    heroku config:set OPENAI_API_KEY=sk-proj-...

    heroku config:set RESEND_API_KEY=re_...

    heroku config:set FROM_EMAIL=news@yourdomain.com

    heroku config:set ADMIN_EMAIL=admin@yourdomain.com

    heroku config:set NODE_ENV=production

    heroku config:set TZ=America/New_York

    Deploy application

    git push heroku main

    Scale scheduler process

    heroku ps:scale scheduler=1

    Initialize database

    heroku run npm run setup-db

    `

    Production Monitoring:

    `bash

    View logs

    heroku logs --tail

    Check process status

    heroku ps

    Monitor scheduler

    heroku logs --tail --dyno scheduler

    Check database status

    heroku pg:info

    `

    Alternative Deployment (Self-Hosted)

    Server Requirements:

  • Ubuntu 20.04+ or similar Linux distribution
  • 2+ GB RAM, 20+ GB storage
  • Node.js 18+, PostgreSQL 12+, Nginx (optional)
  • Setup Process:

    `bash

    Install dependencies

    sudo apt update

    sudo apt install nodejs npm postgresql nginx

    Clone and configure application

    git clone

    cd content-aggregation-platform

    npm install --production

    Setup PostgreSQL database

    sudo -u postgres createdb yale_ventures_platform

    sudo -u postgres createuser platform_user

    Configure systemd service

    sudo cp deployment/yale-ventures.service /etc/systemd/system/

    sudo systemctl enable yale-ventures

    sudo systemctl start yale-ventures

    Setup Nginx proxy (optional)

    sudo cp deployment/nginx.conf /etc/nginx/sites-available/yale-ventures

    sudo ln -s /etc/nginx/sites-available/yale-ventures /etc/nginx/sites-enabled/

    sudo systemctl reload nginx

    `

    Monitoring & Maintenance

    Logging & Debugging

    Log Locations:

  • Application logs: logs/ directory
  • Web server access: Console output or web-server.log
  • Pipeline execution: logs/{use-case}-{timestamp}.log
  • Error logs: logs/error.log
  • Debug Commands:

    `bash

    Test API connections

    node scripts/test-api-key.js

    Test RSS collection

    node scripts/test-rss-collection.js

    Preview report without sending

    node src/preview-report.js blavatnik

    Check database connectivity

    node scripts/test-database.js

    Manual pipeline execution

    NODE_ENV=development node src/run-pipeline.js blavatnik 24

    `

    Performance Monitoring

    Key Metrics:

  • Pipeline execution time (target: <5 minutes for daily runs)
  • Classification accuracy (monitor false positive/negative feedback)
  • Email delivery success rate
  • Database query performance
  • Memory usage and process stability
  • Monitoring Commands:

    `bash

    Check process status

    npm run scheduler-status

    View database performance

    heroku pg:outliers # Heroku only

    heroku pg:blocking # Heroku only

    Monitor memory usage

    ps aux | grep node

    Check disk usage

    df -h

    du -sh data/ logs/

    `

    Backup & Recovery

    Database Backups:

    `bash

    Manual backup

    pg_dump yale_ventures_platform > backup-$(date +%Y%m%d).sql

    Heroku automated backups

    heroku pg:backups:schedule DATABASE_URL --at '02:00 America/New_York'

    Restore from backup

    heroku pg:backups:restore DATABASE_URL

    `

    Data Backups:

  • Pipeline results in data/ directory are automatically preserved
  • Configuration files in use-cases/ should be version controlled
  • Log files can be archived and compressed weekly
  • Troubleshooting

    Common Issues:

    1. Pipeline Not Running:

    `bash

    Check scheduler status

    heroku ps

    npm run scheduler-status

    Restart scheduler

    heroku ps:restart scheduler

    npm run restart-scheduler

    `

    2. OpenAI API Errors:

    `bash

    Check API key validity

    node scripts/test-api-key.js

    Monitor rate limits in logs

    grep "rate limit" logs/*.log

    Verify billing and usage

    Visit https://platform.openai.com/usage

    `

    3. Email Delivery Issues:

    `bash

    Test email configuration

    node src/send-report.js blavatnik test

    Check Resend dashboard for delivery status

    Verify SMTP credentials if using SMTP

    `

    4. Database Connection Issues:

    `bash

    Test database connectivity

    node scripts/test-database.js

    Check connection string

    echo $DATABASE_URL

    Verify PostgreSQL service status

    sudo systemctl status postgresql

    `

    5. Memory/Performance Issues:

    `bash

    Check memory usage

    free -h

    ps aux --sort=-%mem | head

    Monitor long-running processes

    ps aux | grep node

    Check disk space

    df -h

    Clear old logs if needed

    find logs/ -name "*.log" -mtime +30 -delete

    `

    Security Considerations

    API Key Management:

  • Store API keys only in environment variables
  • Rotate OpenAI API keys quarterly
  • Monitor API key usage for unusual patterns
  • Use least-privilege access for database users
  • Access Control:

  • Implement admin authentication for web dashboard
  • Use HTTPS in production (SSL/TLS certificates)
  • Restrict database access to application only
  • Monitor login attempts and suspicious activity
  • Data Privacy:

  • Classify emails may contain sensitive research information
  • Ensure proper access controls on email recipients
  • Consider data retention policies for old analyses
  • Follow Yale's data governance requirements
  • Advanced Configuration

    Custom Email Templates

    Templates located in views/ directory using Handlebars:

  • pages/dashboard.hbs - Main dashboard
  • partials/email-daily.hbs - Daily report template
  • partials/email-weekly.hbs - Weekly report template
  • Customization Example:

    `handlebars

    {{!-- Custom daily email template --}}

    {{useCase.display_name}} Daily Report - {{formatDate timestamp}}

    {{#if highPriorityArticles}}

    🔴 High Priority (90%+ relevance)

    {{#each highPriorityArticles}}

    {{title}}

    Confidence: {{confidence}}%

    {{reasoning}}

    {{/each}}

    {{/if}}

    `

    API Integration

    REST API Endpoints:

  • GET /api/use-cases - List configured use cases
  • GET /api/runs/:useCase - Recent pipeline runs
  • POST /api/trigger/:useCase - Manual pipeline trigger
  • GET /api/results/:runId - Detailed run results
  • Webhook Integration:

    `javascript

    // Example webhook for external systems

    app.post('/webhook/pipeline-complete', (req, res) => {

    const { useCase, runId, success, summary } = req.body;

    // Send notification to external system

    // Update dashboard, trigger downstream processes, etc.

    res.json({ received: true });

    });

    `

    Performance Optimization

    Database Optimization:

    `sql

    -- Add indexes for common queries

    CREATE INDEX idx_pitch_analyses_created_at ON pitch_analyses(created_at);

    CREATE INDEX idx_pitch_analyses_file_type ON pitch_analyses(file_type);

    -- Analyze query performance

    EXPLAIN ANALYZE SELECT * FROM pitch_analyses ORDER BY created_at DESC LIMIT 20;

    `

    Caching Strategies:

  • Cache RSS feed responses for 15 minutes
  • Store classification results to avoid re-processing
  • Use Redis for session storage in production
  • Implement CDN for static assets

Resource Management:

`javascript

// Example resource limits

const CONCURRENT_CLASSIFICATIONS = 5;

const MAX_MEMORY_USAGE = '512MB';

const REQUEST_TIMEOUT = 30000; // 30 seconds

`

---

This admin guide provides comprehensive technical documentation for maintaining and extending the Content Aggregation Platform. For user-facing features, refer to the separate user guides.