# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

HubWallet Sentiment Analysis is a multi-module FastAPI application combining:
- **Sentiment Analysis**: LLM-powered review analysis with emotion detection, topic extraction
- **Smart Inventory Management**: AI-driven inventory planning with demand forecasting (daily/monthly), service level monitoring, slow mover detection
- **Marketing Automation**: Social media management with Twitter/Facebook analytics, persona generation, content scheduling
- **Menu Design**: Restaurant menu management with AI-powered design suggestions

## Development Commands

### Running the Application

```bash
# Start FastAPI server (development)
uvicorn src.main:app --reload

# Start FastAPI server (production)
uvicorn src.main:app --host 0.0.0.0 --port 8000

# Start Celery worker
celery -A src.utils.celery_worker worker --loglevel=info

# Note: The application automatically starts a unified APScheduler on startup
# (see src/scheduler/cron.py for scheduled jobs)
```

### Database Management

```bash
# Apply pending migrations
alembic upgrade head

# Create new migration
alembic revision --autogenerate -m "description of changes"

# Rollback one migration
alembic downgrade -1

# View migration history
alembic history
```

### Smart Inventory Data Flow

The Smart Inventory module requires careful sequencing. Always follow this order:

```bash
# 1. Generate and load base data (companies, products, locations, vendors, sales)
python src/smart_inventory/dummy_data/generate_dummy_data.py
python src/smart_inventory/dummy_data/load_data_to_db.py

# 2. Compute aggregate tables (sequential dependencies):
python src/smart_inventory/dummy_data/daily_sales_trigger.py          # Must run first
python src/smart_inventory/dummy_data/inventory_snapshot_trigger.py   # Depends on daily_sales
python src/smart_inventory/dummy_data/service_level_trigger.py        # Depends on daily_sales
python src/smart_inventory/dummy_data/slow_movers_trigger.py          # Depends on daily_sales

# 3. Train forecasting models
python src/smart_inventory/dummy_data/train_forecast_trigger.py       # Daily model
python src/smart_inventory/dummy_data/train_monthly_forecast_trigger.py  # Monthly model

# 4. Generate forecasts
python src/smart_inventory/dummy_data/predict_trigger.py              # Daily forecasts
python src/smart_inventory/dummy_data/backfill_forecasts.py           # Backfill historical

# 5. Compute inventory planning (requires forecasts)
python src/smart_inventory/dummy_data/inventory_planning_trigger.py

# Cleanup
python src/smart_inventory/dummy_data/delete_data_from_db.py
python src/smart_inventory/dummy_data/delete_dummy_data.py
```

See [src/smart_inventory/smart_inventory_flow.md](src/smart_inventory/smart_inventory_flow.md) for detailed documentation.

## Architecture Overview

### Module Structure Pattern

Each module follows this consistent structure:
- `models.py` - SQLAlchemy ORM models
- `schemas.py` - Pydantic request/response schemas
- `router.py` - FastAPI route definitions
- `controller.py` - Business logic and orchestration
- `services.py` - Database operations and external service calls

### Key Architectural Components

**Main Application** ([src/main.py](src/main.py))
- FastAPI app with lifespan context manager
- Unified scheduler starts on application startup
- All module routers registered with specific prefixes and tags
- Custom exception handler for consistent error responses
- CORS middleware with allowed origins

**Database Layer** ([src/utils/db.py](src/utils/db.py))
- Platform-specific connection pooling:
  - **Linux**: Uses NullPool to prevent SSL connection issues after Celery worker fork
  - **Windows**: Uses default pooling (solo worker pool doesn't fork)
- Connection timeouts: 2min connect, 10min query execution
- Two session providers: `get_db()` for FastAPI dependencies, `get_db_session()` for cron jobs

**Celery Workers** ([src/utils/celery_worker.py](src/utils/celery_worker.py))
- Platform-specific configuration:
  - **Windows**: Uses solo pool (single-threaded)
  - **Linux**: Uses prefork pool with 4 workers
- Task autodiscovery from: `src.marketing.tasks`, `src.menu_design.tasks`, `src.smart_inventory.tasks`, `src.utils`
- Pusher integration for real-time progress updates
- Task time limit: 2 hours hard limit, 1h55m soft limit (Unix only)

**Unified Scheduler** ([src/scheduler/cron.py](src/scheduler/cron.py))
- Single BackgroundScheduler manages all cron jobs
- Smart Inventory: Stale task cleanup daily at 6:00 AM (nightly chain disabled)
- Marketing: Monthly performance (1st at 2:00 AM), Twitter metrics (1:00 AM), Facebook metrics (1:30 AM), post publishing (every 1 min)

**LLM Integration** ([src/core/](src/core/))
- Sentiment analysis uses batch processing for efficiency
- Atomic agents pattern for structured LLM outputs
- Instructor library for validated schema-based responses
- Topic extraction and emotion detection from reviews
- SQL chatbot agent for natural language inventory queries

### Smart Inventory Data Dependencies

Critical: These tables have hard dependencies. Never populate out of order:

```
Base Tables (no dependencies):
└── companies, products, categories, locations, vendors, sales_orders,
    sales_order_lines, sales_return_orders, sales_return_order_lines,
    inventory_movement, inventory_batch, reorder_policy

Aggregation Layer 1:
├── daily_sales (depends on: sales tables)
└── inventory_snapshot_daily (depends on: inventory_movement, itself)

Aggregation Layer 2:
├── service_level_daily (depends on: daily_sales)
├── slow_mover_snapshot (depends on: daily_sales, inventory_batch)
└── demand_forecast (depends on: daily_sales, trained model)

Final Layer:
└── inventory_planning_snapshot (depends on: inventory_snapshot_daily,
    daily_sales, demand_forecast, reorder_policy)
```

All aggregate tasks are idempotent (delete existing date data before recomputing).

### Multi-Module Application Layout

```
src/
├── apps/                    # Sentiment Analysis module
│   ├── auth/                # JWT-based authentication
│   ├── feedback/            # Review ingestion and storage
│   ├── sentiment/           # Sentiment analysis endpoints
│   ├── recommendations/     # AI-generated recommendations
│   ├── chatbot/             # SQL agent chatbot
│   ├── stores/              # Store management
│   └── users/               # User management
├── smart_inventory/         # Inventory Management module
│   ├── apps/
│   │   ├── inventory/       # Core inventory logic (101KB controller)
│   │   ├── products/        # Product catalog
│   │   ├── data_import/     # CSV upload handling
│   │   └── chat_bot/        # Inventory chatbot
│   ├── core/                # ML models and forecasting
│   ├── tasks/               # Celery tasks for computations
│   └── dummy_data/          # Data generation scripts
├── marketing/               # Marketing Automation module
│   ├── apps/
│   │   ├── Account/         # Social media account linking
│   │   ├── post/            # Post creation and scheduling
│   │   ├── Calendar/        # Content calendar
│   │   ├── Analytics/       # Twitter/Facebook metrics + Celery tasks
│   │   ├── persona/         # Brand persona generation
│   │   └── hwGpt/           # Marketing chatbot with threads
│   └── vector_db_collection/  # Qdrant vector database
├── menu_design/             # Menu Design module (uses DuckDB)
│   ├── apps/
│   │   ├── AI_chat/         # Design suggestions
│   │   ├── editor/          # Menu editor API
│   │   └── projects/        # Project management
│   └── designer/static/     # Generated images and templates
├── core/                    # Shared sentiment/LLM logic
├── utils/                   # Shared utilities (db, celery, settings)
└── scheduler/               # Unified cron job management
```

## Important Patterns and Conventions

### Authentication
- JWT-based authentication in `src/apps/auth/`
- Middleware validates tokens on protected routes
- SSO login support with JWT generation (see `temporary/sso-login.md`)

### Async Task Processing
- Celery tasks for long-running operations (sentiment analysis, forecasting)
- Always check platform-specific behavior (Windows vs Linux worker pools)
- Pusher WebSocket integration for real-time progress updates to frontend

### Database Migrations
- Alembic migrations stored in `alembic/versions/` (ignored in git)
- When adding new models, import them in the appropriate `models.py` before running autogenerate
- Migration files reference `src.utils.db:Base.metadata`

### Sentiment Analysis Flow
1. Reviews ingested via CSV upload or webhook (Bright Data API)
2. `process_csv_review_data` Celery task batches reviews
3. `analyze_review_batch` calls LLM for sentiment + topics + emotions
4. Results stored in database with Pusher progress updates
5. `generate_recommendations_task` creates actionable insights

### Smart Inventory Forecasting
- Two ML models: Daily (XGBoost/LightGBM) and Monthly (Random Forest)
- Models trained per company using historical daily_sales data
- Hyperparameter tuning with Optuna (configurable trials)
- Forecasts stored in `demand_forecast` and `monthly_forecast` tables
- Planning algorithm uses forecasts + reorder policies to compute order quantities

### Marketing Analytics (Twitter + Facebook)

**API Version Management**
- Facebook Graph API version is set in 3 files: `src/marketing/core/Analytics/providers/facebook.py` (provider), `src/marketing/apps/Account/router.py` (account linking), `src/marketing/apps/post/utils.py` (publishing). Currently **v22.0**. Facebook expires API versions ~2 years after release — check [Meta's API changelog](https://developers.facebook.com/docs/graph-api/changelog) before updating.
- Twitter API v2 (`api.twitter.com/2`) is versionless — no version management needed.
- `facebook_analytics.py` also has a token refresh URL that must match the provider version.

**Facebook Metric Deprecations (permanent)**
- Per-post `impressions`, `reach`, and `clicks` were deprecated by Meta (Nov 2025). These fields always return 0.
- Valid page-level metrics: `page_views_total`, `page_post_engagements`
- Valid post-level metrics: `post_reactions_by_type_total`, `likes.summary(true)`, `comments.summary(true)`, `shares`
- **Engagement rate calculation**: Facebook uses `engagement / followers` instead of `engagement / impressions` (since impressions are deprecated). Twitter still uses `engagement / impressions`.

**Platform-Specific Response Shapes**
- `GET /analytics/accounts/{id}/analytics` routes to different functions based on platform:
  - Twitter → `get_account_specific_analytics()` returns `top_tweets`, `post_retweets`, impressions data
  - Facebook → `get_facebook_account_analytics()` returns `top_posts`, `post_shares`, `followers_count` (no impressions/reach per post)
- `POST /analytics/dashboard-analytics` returns generic field names (`top_posts`, `post_shares`) for both platforms combined
- `POST /analytics/accounts/metrics-report` returns `top_posts` with platform-aware permalinks
- See `temporary/facebook_analytics_api_response_changes.md` for full response shape documentation and migration guide

**Cumulative Metrics Pattern**
- `PostMetricsDaily` stores **cumulative** values (total likes/comments/shares as of that date)
- All reporting functions use **baseline subtraction**: `period_value = end_cumulative - baseline_cumulative`
- Baseline = latest metric on or before `start_date - 1 day`

**Error Classification**
- Facebook wraps ALL API errors as `"type": "OAuthException"`, including non-auth errors like error code 100 (invalid parameter). The code uses `is_metric_error` guards to prevent false token refreshes.

**Thread Safety**
- Background threads (in `initialize_*_metrics`) must use `get_db_session()`, never the caller's session. The session is not thread-safe.

### Marketing Post Publishing (Instant + Scheduled)

**Post Creation Flow**
- Posts are created via `create_calendar_post_type()` in `src/marketing/apps/post/service.py`
- Images are stored in `post_images` table with `post_type_id` foreign key
- Instant posts (`is_instant_post=True`) are published immediately via `post_to_twitter()` or `post_to_facebook()`
- Scheduled posts are published by cron job (`publish_scheduled_posts()`) running every 1 minute

**Facebook Image Posting**
- `post_to_facebook()` in `src/marketing/apps/post/utils.py` explicitly queries `PostImage` table
- Images are uploaded to Facebook as **unpublished photos** first via `_collect_fb_media_ids()`
- Images are then attached to the post using `attached_media[{idx}]` parameter
- **Cannot combine `attached_media` with `link` parameter** in the same post
- See `temporary/facebook_image_posting_fix.md` for detailed flow documentation

**Twitter vs Facebook Posting Differences**
- Twitter: Uses Twitter API v2 `/tweets` endpoint, requires bearer token refresh on 401/403
- Facebook: Uses Graph API v22.0 `/{page_id}/feed` endpoint, supports multiple images via `attached_media`
- Both platforms support token refresh on authentication failure

### Environment Variables
- `.env` file in project root (not in git)
- Key variables: DATABASE_URL, REDIS_HOST, REDIS_PORT, OpenAI API keys, Pusher credentials, FB_APP_ID, FB_APP_SECRET
- Settings loaded via `src/utils/settings.py` (Pydantic settings)

## Critical Gotchas

1. **Never run Smart Inventory aggregate tasks out of order** - they have hard dependencies (see data flow above)
2. **Database connection pool differs by OS** - Linux uses NullPool to avoid fork issues, Windows uses default
3. **Celery worker pool differs by OS** - Windows must use solo pool, Linux can use prefork with concurrency
4. **Alembic migrations directory is gitignored** - migrations won't be in version control
5. **DuckDB file for menu design** (`menu_state.duckdb`) is gitignored
6. **Smart Inventory nightly chain is disabled** - must trigger tasks manually or via API
7. **LLM rate limits** - batch processing includes dynamic delays based on comment length
8. **Pusher channels** - use consistent channel naming for frontend WebSocket connections
9. **Facebook Graph API versions expire** - Meta retires API versions ~2 years after release. Currently using v22.0 across all files. If Facebook API calls start returning errors, check version expiration first.
10. **Facebook per-post impressions/reach/clicks are permanently deprecated** - These always return 0. Do not attempt to re-add deprecated metric names (`post_impressions`, `post_impressions_unique`, `post_engaged_users`, `post_clicks`). Use follower-based engagement rates instead.
11. **Facebook OAuthException is misleading** - Facebook wraps ALL errors (including invalid parameter errors) as `"type": "OAuthException"`. Always check error code 100 before triggering token refresh.
12. **`update_twitter_metrics_daily` uses hardcoded `MasterAccount.id == 3`** - This is dev-environment specific for filtering Twitter accounts. Update for production or multi-tenant setups.
13. **Facebook image posting requires explicit PostImage query** - Do not rely on lazy-loaded `post_type.images` relationship. Always query `PostImage` table explicitly with `post_type_id` filter to ensure images are collected (see `post_to_facebook()` in utils.py).
14. **Scheduled vs instant posts use different DB sessions** - Instant posts use the request's DB session, scheduled posts (cron) use `get_db_session()`. Be aware of session lifecycle differences.
