Illinois Data Bank

Data for "Modeling citations and cartels"

The dataset contains sample data from those generated for the manuscript "Modeling citations and cartels" by Park et al. (2026), who describe the use of the SASCA-ReSA agent-based model to simulate the growth of citation networks and mimic citation cartels through simulations. The manuscript is presently under review. SASCA-ReSA s the latest stage in a series of progressively complex models of citation dynamics (Chacko et al. 2026 Applied Network Science, Park et al 2025 Proceedings of the XIV International Conference on Complex Networks and their Applications , Park et al 2025 MetaRoR). The model is implemented for high performance computing environments and all the results were generated on the Illinois Campus Cluster. The standard simulation reported in this manuscript results in roughly 1.2M nodes. The input to a simulation is a seed network, a configuration file, and real-world distributions for number of references made per article, and the count of authors per article. The output of a simulation is a larger citation network that includes the input network. Details of the model are described in the manuscript and instructions on how to use the software are available on the SASCA-ReSA GIthub site. We have included annotated nodelists from three different simulations.

a) bsl1 (bsl1.csv.tar.xz): has 1,193,102 rows, output of a standard simulation.

b) p5_1 (p5_1.csv.tar.xz): has 1,193,102 rows, output of a standard simulation with 5 agents "planted" in year 1 of the simulation.

c) ps5_1 (ps5_1.csv.tar.xz): has 1,193,102 rows, output of a standard simulation with one agent planted in each of the first five years of a simulation.

d) sample_config.ini: contains configuration parameters for a simulation

e) louvain.parquet.gz: has 160,714,032 rows, with two columns: node_id, and cluster_id with header row data representing a louvain clustering of the ABM161 network (https://doi.org/10.13012/B2IDB-9265079_V1). Generated using the louvain module from through kuzu and compressed using to_parquet module of pandas with gzip internal compression. The largest cluster (cluster id 5) has 81,675,241 nodes. This network was generated under the SASCA-ReS model.

Technology and Engineering
citation dynamics; agent-based models
CC BY
U.S. National Science Foundation (NSF)-Grant:OAC:2402559
Illinois:Insper Partnership
George Chacko
2 times
Version DOI Comment Publication Date
1 10.13012/B2IDB-9350811_V1 2026-05-06

25.2 MB File
295 MB File
25.1 MB File
25.1 MB File
2 KB File

Contact the Research Data Service for help interpreting this log.

Dataset update: {"all_globus"=>[nil, true]} 2026-05-07T12:03:35Z
Dataset update: {"all_medusa"=>[nil, true]} 2026-05-06T16:00:10Z
Research Data Service Illinois Data Bank
Access and Use Policies Web Privacy Notice Contact Us