SQLite Database Reference
The project stores its operational state in a SQLite database that lives under sqlite_db/. All schemas are created and maintained by SQLiteConnector; no manual migrations are required. The connector will automatically create the database directory, initialize the tables, and keep library metadata in sync as Emby changes over time.
Library metadata tables
library_items
Central catalogue of every Movie and Episode imported from Emby. Each row contains:
- Identity:
item_id(primary key),item_type,item_name. - Series context:
series_id,series_name,season_number,episode_numberfor episodic content. - Runtime and chronology:
runtime_ticks(with generated seconds/minutes columns),premiere_date, anddate_created. - Descriptive metadata:
overview,community_rating,production_year. - File attributes:
file_path,container,video_codec,resolution_width,resolution_height. - Timestamps:
last_updatedauto-populates whenever the row is refreshed.
Indexes (idx_library_series, idx_library_type, idx_series_season_ep, idx_library_name, idx_library_year) keep lookups responsive for common filtering patterns.
item_genres
A many-to-many bridge that stores the genre tags attached to each library item. Genres sourced directly from Emby retain their numeric IDs; otherwise a deterministic CRC32-derived ID is generated so joins remain stable across runs.
item_tags
Stores free-form tag labels (e.g., “Kids”, “4K”) associated with a library item. Tags can originate from Emby’s TagItems collection or the fallback Tags array. A composite primary key (item_id, tag_id) prevents duplicates.
item_provider_ids
Normalises the ProviderIds payload emitted by Emby. Each row captures a provider namespace (TMDB, IMDB, TVDB, etc.) alongside the provider-specific identifier so you can map items back to external datasets. Records are refreshed on every ingest, and rows belonging to media removed from Emby are pruned automatically.
TMDB reference tables
SQLiteConnector._INIT_create_tmdb_schemas() provisions two lookup tables:
tmdb_movie_genrestmdb_tv_genres
Populate them with SQLiteConnector.ingest_tmdb_movie_tv_genres(...) so item_genres can map Emby names to canonical TMDB IDs. When a genre label is missing from the TMDB response, the connector falls back to the deterministic hash strategy described above.
Watch history tables
The watch-history pipeline converts raw Emby playback events into progressively richer aggregates.
watch_hist_raw_events
Snapshot of Emby’s playback reporting API. Each row stores the recorded date and time (normalised to Australia/Melbourne time for pre-August 2025 entries), the user_id, user_name, item_id, item_name, item_type, playback duration, and optional remote_address metadata.
watch_hist_agg_sessions
Aggregates raw events into contiguous sessions. Key fields include:
session_start_timestamp/session_end_timestampsession_span_minutesandtotal_seconds_watchedsession_count(number of underlying raw events)completion_ratio(capped at 1.0 once runtime metadata is available)outcomeclassification (sampled,abandoned,partial,completed)
Uniqueness is enforced per (user_id, item_id, session_start_timestamp) and each row links back to library_items via item_id.
watch_hist_user_item_stats
Derived statistics summarising a user’s relationship with a specific item. Generated columns (e.g., total_minutes_watched, days_between_first_last) and counters (completed_sessions, rewatch_count, adherence_score) are recalculated every time _INIT_POPULATE_watch_hist_user_item_stats() runs. The table is keyed by user_id + item_id and also links to library_items for runtime-aware metrics.
Watch history processing pipeline
Run the following steps—typically in this order—to refresh watch history:
- Library ingest (
ingest_all_library_items): Populatelibrary_items,item_genres,item_tags, anditem_provider_ids. ProvideEmbyConnector.iter_all_items()and optionallyEmbyConnector.get_item_metadata()so the connector can fetch series-level genres for episodes. - Raw event sync (
_INIT_POPULATE_watch_hist_raw_events): Downloads the full playback history viaEmbyConnector.get_all_watch_hist(), normalises timestamps, and bulk-loads thewatch_hist_raw_eventstable. - Session aggregation (
_INIT_POPULATE_watch_hist_agg_sessions): Groups raw events into sessions. Tunable parameters includesession_segment_minutes, completion thresholds, and minimum seconds for thesampledoutcome. - User-item statistics (
_INIT_POPULATE_watch_hist_user_item_stats): Summarises aggregate engagement for each(user, item)pair, calculating adherence scores and outcome counts. - Runtime-aware completion update (
update_completion_ratios): Rewritescompletion_ratiofor sessions when actual runtime data is available inlibrary_items.
Running the steps nightly (or after large library changes) keeps the downstream analytics model aligned with the latest viewing behaviour.
Index summary
Besides the library indexes mentioned earlier, the connector maintains:
idx_watch_hist_raw_user_timeonwatch_hist_raw_events(user_id, date, time DESC)idx_watch_hist_agg_sessionsonwatch_hist_agg_sessions(user_id, session_end_timestamp DESC)idx_watch_hist_agg_itemonwatch_hist_agg_sessions(item_id)idx_watch_hist_user_item_statsonwatch_hist_user_item_stats(user_id, adherence_score DESC)idx_genres_itemonitem_genres(item_id)idx_tags_itemonitem_tags(item_id)idx_provider_provideronitem_provider_ids(provider)
These indexes are recreated automatically with CREATE INDEX IF NOT EXISTS, so repeated pipeline runs are safe.