Download data

Reusable dataset outputs for citation, audit, and downstream digital humanities work.

Scholars
270
Presentation records
1388
Events
40
Dataset DOI
DOI
Cite as: Gasūns, M. (2026). IndologyScholars: Archive of Talks in Russian Indology [Data set]. Zenodo. https://doi.org/10.5281/zenodo.XXXXXXX
Replace XXXXXXX with the actual Zenodo ID after uploading article/snapshots/2026-06-03/

Files

SQLite database
Normalized relational source of events, sessions, presentations, people, venues, and affiliation strings.
1.4 MB
Dashboard payload
Generated browser data used by the interactive dashboard and static scholar pages.
4.4 MB
RDF/Turtle knowledge graph
Linked Open Data graph (schema.org + FOAF + OWL) with scholar profiles, presentations, affiliations, and Wikidata/ORCID/VIAF sameAs links.
606.5 KB
Data dictionary
Human-readable field guide for reusable CSV, JSON, SQLite, and generated publication outputs.
26.7 KB
Example analysis notebook
Python script demonstrating loading, descriptive statistics, gender trends, theme distribution, and null-model overlap test.
6.8 KB
Authority provenance
Field-level provenance for external identifiers and organization authority records.
11.3 KB
Theme provenance
Field-level provenance for generated presentation theme labels.
535.1 KB
Argument scale
Presentation-level argument-scale levels (canonical column argument_level; gumilyov_level kept as a legacy alias).
415.1 KB
Human review index
Unified curator-facing inbox for open manual review items across authority IDs, identity, classification, spacetime, affiliation, lineage, and data-quality queues.
1.1 MB
Coverage bias audit
Per-source authority/index coverage audit for responsible use of ORCID, Wikidata, VIAF, OpenAlex, Wikipedia, RINC, and related signals.
2.0 KB
FAIR reuse audit
Findability, accessibility, interoperability, and reusability checklist for release review.
1.4 KB
Coauthorship review
Source-backed review queue for multi-person presentation lines before treating them as coauthorship.
18.4 KB
Senior absence audit
Review queue for frequent senior-generation participants absent after 2022 or from the 2026 programme.
5.2 KB
Known relationships
Curated review table for relationships not always visible from conference-network sources.
9.9 KB
Verified affiliation spans
Dated, source-backed institutional trajectories; tentative open continuations into later gaps are marked (?).
736 B
YouTube video list
Source inventory of collected recordings; public discovery is attached to presentation records.
52.0 KB
YouTube mapping
Video-to-presentation matching status used to display recording availability on presentations.
64.2 KB
Network nodes
Typed person, event, organization, and theme nodes for downstream network analysis.
27.5 KB
Network edges
Weighted edges with explicit relation types for participation, affiliation, theme, and co-presence analysis.
683.9 KB

Reproducibility

The entire dataset is generated deterministically from primary source HTML caches. To reproduce the current build locally:

  1. python build_and_populate_db.py (Creates the SQLite database and generates deterministic presentation_id hashes based on event, year, title, speaker, and session order)
  2. python generate_analytics.py (Generates all CSV exports and networks)
  3. python generate_site_data.py (Compiles the browser payload)
  4. python generate_scholars_pages.py (Builds individual static profiles)
  5. python generate_publication_pages.py (Builds all other pages, search index, and updates index.html)
  6. python validate_publication.py (Runs integrity checks)

Pipeline Version: 2026-05-25

Build Date: 2026-06-14