Illinois Data Bank Dataset Search Results
Results
published:
2020-03-14
Rhoads, Bruce ; Lindroth, Evan
(2020)
Data on bank elevations determined from lidar data for the Upper Sangamon River, Illinois, the Mission River, Texas, and the White River in Indiana
keywords:
bank elevations, rivers, meandering, lowland
published:
2020-09-25
This repository contains the datasets and corresponding results for the paper "MAGUS: Multiple Sequence Alignment using Graph Clustering".
The Datasets.zip archive contains the ROSE, balibase, Gutell, and RNASim datasets used in our experiments.
The Results.zip archive contains the outputs of running our methods against these datasets.
Datasets used:
ROSE: 10 simulated nucleotide model conditions from the SATe paper, each with 20 replicates, and with 1000 sequences per replicate.
The ROSE datasets were originally taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i">https://sites.google.com/eng.ucsd.edu/datasets/alignment/sate-i</a>
RNASim: This is a collection of simulated nucleotide datasets that were generated under a model of evolution that reflects selection due to RNA structural constraints. We sampled 20 subsets of 1000 sequences each, as well as 10 subsets of 10000 each, by randomly sampling from the original million-sequence RNASim dataset.
Gutell: 16S.M, 16S.3, 16S.T, 16S.B.ALL: Four biological nucleotide datasets from the Comparative Ribosomal Website (CRW) with cleaned reference alignments from SATe. Since PASTA is restricted to datasets without sequence length heterogeneity, these were modified to remove sequences that deviate by more than 20% from the median length. The scrubbed datasets range from 740 to 24,246 sequences. The pre-screened 16S datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s">https://sites.google.com/eng.ucsd.edu/datasets/alignment/16s23s</a>
BAliBASE: We use eight BAliBASE amino acid datasets used in the PASTA paper. As above, we remove outlier sequences, which leaves us with sizes ranging from 195 to 732 sequences. The pre-screened Balibase datasets were taken from <a href="https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp">https://sites.google.com/eng.ucsd.edu/datasets/alignment/pastaupp</a>
published:
2024-04-05
Sinaiko, Guy; Cao, Yanghui; Dietrich, Christopher H.
(2024)
The following files include specimen information, DNA sequence data, and additional information on the analyses used to reconstruct the phylogeny of the leafhopper genus Neoaliturus as described in the Methods section of the original paper:
1. Taxon_sampling.csv: contains data on the individual specimens from which DNA was extracted, including sample code, taxon name, collection data (locality, date and name of collector) and museum unique identifier.
2. Alignments.zip: a ZIP archive containing 432 separate FASTA files representing the aligned nucleotide sequences of individual gene loci used in the analysis.
3. Concatenated_Matrix.fa: is a FASTA file containing the concatenated individual gene alignments used for the maximum likelihood analysis in IQ-TREE.
4. Genes_and_Loci.rtf: identifies the individual genes and loci used in the analysis. The partition name is the same as the name of the individual alignment file in the zipped Alignments folder.
5. Partitions_best_scheme.nex: is a text file in the standard NEXUS format that indicates the names of the individual data partitions and their locations in the concatenated matrix, and also indicates the substitution model for each partition.
6. (New in this version 2) Scripts & Description.zip includes 8 custom shell or perl scripts used to assemble the DNA sequence data by perform reciprocal blast searches between the reference sequences and assemblies for each sample, extract the best sequences based on the blast searches, screen the hits for each locus and keep only the best result, and generate the nucleotide sequence dataset for the predicted orthologues (see the file description.txt for details).
7. (New in this version 2) Full_genetic_distances_matrix.csv shows the genetic distances between pairs of samples in the datset (proportion of nucleotides that differ between samples).
keywords:
leafhopper; phylogeny; anchored-hybrid-enrichment; DNA sequence; insect
published:
2025-03-14
Mishra, Apratim; Diesner, Jana; Torvik, Vetle I.
(2025)
Hype - PubMed dataset
Prepared by Apratim Mishra
This dataset captures ‘Hype’ within biomedical abstracts sourced from PubMed. The selection chosen is ‘journal articles’ written in English, published between 1975 and 2019, totaling ~5.2 million. The classification relies on the presence of specific candidate ‘hype words’ and their abstract location. Therefore, each article (PMID) might have multiple instances in the dataset due to the presence of multiple hype words in different abstract sentences.
The candidate hype words are 35 in count: 'major', 'novel', 'central', 'critical', 'essential', 'strongly', 'unique', 'promising', 'markedly', 'excellent', 'crucial', 'robust', 'importantly', 'prominent', 'dramatically', 'favorable', 'vital', 'surprisingly', 'remarkably', 'remarkable', 'definitive', 'pivotal', 'innovative', 'supportive', 'encouraging', 'unprecedented', 'enormous', 'exceptional', 'outstanding', 'noteworthy', 'creative', 'assuring', 'reassuring', 'spectacular', and 'hopeful’.
This is version 3 of the dataset. Added new file - WSD_hype.tsv
File 1: hype_dataset_final.tsv
Primary dataset. It has the following columns:
1. PMID: represents unique article ID in PubMed
2. Year: Year of publication
3. Hype_word: Candidate hype word, such as ‘novel.’
4. Sentence: Sentence in abstract containing the hype word.
5. Hype_percentile: Abstract relative position of hype word.
6. Hype_value: Propensity of hype based on the hype word, the sentence, and the abstract location.
7. Introduction: The ‘I’ component of the hype word based on IMRaD
8. Methods: The ‘M’ component of the hype word based on IMRaD
9. Results: The ‘R’ component of the hype word based on IMRaD
10. Discussion: The ‘D’ component of the hype word based on IMRaD
File 2: hype_removed_phrases_final.tsv
Secondary dataset with same columns as File 1.
Hype in the primary dataset is based on excluding certain phrases that are rarely hype. The phrases that were removed are included in File 2 and modeled separately. Removed phrases:
1. Major: histocompatibility, component, protein, metabolite, complex, surgery
2. Novel: assay, mutation, antagonist, inhibitor, algorithm, technique, series, method, hybrid
3. Central: catheters, system, design, composite, catheter, pressure, thickness, compartment
4. Critical: compartment, micelle, temperature, incident, solution, ischemia, concentration, thinking, nurses, skills, analysis, review, appraisal, evaluation, values
5. Essential: medium, features, properties, opportunities, oil
6. Unique: model, amino
7. Robust: regression
8. Vital: capacity, signs, organs, status, structures, staining, rates, cells, information
9. Outstanding: questions, issues, question, questions, challenge, problems, problem, remains
10. Remarkable: properties
11. Definite: radiotherapy, surgery
File 3: WSD_hype.tsv
Includes hype-based disambiguation for candidate words targeted for WSD (Word sense disambiguation)
keywords:
Hype; PubMed; Abstracts; Biomedicine
published:
2025-12-14
Fraterrigo, Jennifer; Chen, Weile
(2025)
This dataset contains information about absorptive roots from 170 plots along a latitudinal and temperature gradient in northern Alaska, including tussock sedges and deciduous alder, birch, and willow shrubs. This dataset accompanies the paper "Impacts of Arctic Shrubs on Root Traits and Belowground Nutrient Cycles Across a Northern Alaskan Climate Gradient," which was published in Frontiers in Plant Sciences.
<b>*Note:</b> in the "patch coordinates" tab, the same coordinates/elevation ("Long", "Lat", and "Elev (m)") apply to all patches that share a number. For ex: "Patch" W1, B1, and G1 share the same "Long", "Lat", and "Elev (m)" values as "Patch" A1.
keywords:
absorptive root traits; shrub expansion; Arctic; Alaskan tundra
published:
2020-04-20
Supplemental data sets for the Manuscript entitled "Contribution of fungal and invertebrate communities to mass loss and wood depolymerization in tropical terrestrial and aquatic habitats"
keywords:
Coiba Island; wood decomposition; cellulose; hemicellulose; lignin breakdown; aquatic fungi
published:
2020-01-31
Bradshaw, Therin M.; Blake-Bradshaw, Abigail G.; Fournier, Auriel M.V.; Lancaster, Joseph D. ; O'Connell, John; Jacques, Christopher N.; Eicholtz, Michael W.; Hagy, Heath M
(2020)
Data inputs, and scripts for the analysis detailed in Bradshaw et al, published in PlosONE 2020.
keywords:
Marsh birds; wetlands
published:
2020-06-19
This dataset include data pulled from the World Bank 2009, the World Values Survey wave 6, Transparency International from 2009. The data were used to measure perceptions of expertise from individuals in nations that are recipients of development aid as measured by the World Bank.
keywords:
World Values Survey; World Bank; expertise; development
published:
2025-02-07
Huang, Annie H.; Matthews, Jeffrey W.
(2025)
These data represent the raw data from the paper “Influence of light availability and water depth on competition between Phalaris arundinacea and herbaceous vines” published in Wetlands by Annie H. Huang and Jeffrey W. Matthews. The data are archived in one file: Huang&Matthews_mesocosm_data_archive. This file includes raw data collected during a greenhouse experiment described in the paper.
published:
2020-01-27
Morphologic data of dunes in the World's big rivers. Morphologic descriptors for large dunes include: dune height, dune mean leeside angle, dune maximum leeside angle, dune wavelength, dune flow depth (at the crest), and the fractional height of the maximum slope on the leeside for each dune. Morphologic descriptors for small dunes include: dune height, dune mean leeside angle, dune maximum leeside angle, dune wavelength, and dune flow depth (at the crest).
keywords:
dune; bedform; rivers; morphology;
published:
2023-04-12
Towns, John; Hart, David
(2023)
The XSEDE program manages the database of allocation awards for the portfolio of advanced research computing resources funded by the National Science Foundation (NSF). The database holds data for allocation awards dating to the start of the TeraGrid program in 2004 through the XSEDE operational period, which ended August 31, 2022. The project data include lead researcher and affiliation, title and abstract, field of science, and the start and end dates. Along with the project information, the data set includes resource allocation and usage data for each award associated with the project. The data show the transition of resources over a fifteen year span along with the evolution of researchers, fields of science, and institutional representation.
Because the XSEDE program has ended, the allocation_award_history file includes all allocations activity initiated via XSEDE processes through August 31, 2022. The Resource Providers and successor program to XSEDE agreed to honor all project allocations made during XSEDE. Thus, allocation awards that extend beyond the end of XSEDE may not reflect all activity that may ultimately be part of the project award. Similarly, allocation usage data only reflects usage reported through August 31, 2022, and may not reflect all activity that may ultimately be conducted by projects that were active beyond XSEDE.
keywords:
allocations; cyberinfrastructure; XSEDE
published:
2025-09-17
Kamara, Shasta; Glomb, Jackson; Suski, Cory
(2025)
Data was generated from juvenile paddlefish acclimated to one of three different temperatures (13.0°C, 17.5°C, or 22.0°C) for two weeks. After which, fish were subjected to one of two experiments, one being simulated angling in which physiological parameters (stress hormones, lactate, glucose, ions, and oxygen transport parameters were evaluated in plasma or whole blood), the other experiment consisted of critical thermal maxima tests. Data set includes physiological parameters, water quality temperatures, and morphometric data generated from these experiments and fish.
keywords:
Sport fish, critical thermal maximum, exercise, recovery, conservation, fisheries, management
published:
2025-07-31
Gibson, Jared; Jiang, Zhanzhi; Kou, Angela
(2025)
This repository includes data files and analysis and plotting codes for reproducing the figures in the paper "A scanning resonator for probing quantum coherent devices" arXiv:2506.22620
published:
2025-08-01
Beach, Cheyenne R.; Koop, Jennifer A.H.; Fournier, Auriel M.V.
(2025)
Data from the 2025 publication in the Wilson Journal of Ornithology with the same name.
keywords:
Lesser Scaup; Waterfowl; Transmitter Effects
published:
2018-07-28
Hoang, Linh; Schneider, Jodi
(2018)
This dataset presents a citation analysis and citation context analysis used in Linh Hoang, Frank Scannapieco, Linh Cao, Yingjun Guan, Yi-Yun Cheng, and Jodi Schneider. Evaluating an automatic data extraction tool based on the theory of diffusion of innovation. Under submission. We identified the papers that directly describe or evaluate RobotReviewer from the list of publications on the RobotReviewer website <http://www.robotreviewer.net/publications>, resulting in 6 papers grouped into 5 studies (we collapsed a conference and journal paper with the same title and authors into one study). We found 59 citing papers, combining results from Google Scholar on June 05, 2018 and from Scopus on June 23, 2018. We extracted the citation context around each citation to the RobotReviewer papers and categorized these quotes into emergent themes.
keywords:
RobotReviewer; citation analysis; citation context analysis
published:
2025-08-04
Hartman, Theodore; Studt, Jacob; VanLoocke, Andy; McDaniel, Marshall; Howe, Adina; Masters, Michael D. ; Mitchell, Corey; DeLucia, Evan H.; Heaton, Emily
(2025)
This dataset contains the data used for the publication “Aboveground rather than belowground productivity drives variability in Miscanthus x giganteus net primary productivity”. This dataset contains Miscanthus x giganteus biomass, carbon, and nitrogen tissue data for aboveground and belowground plant parts collected in 2021 for three different sites in Iowa with three different nitrogen application rates. Data at the Iowa sites were collected via biometric hand harvesting, belowground excavations, and soil coring both in-clump and beside-clump. Data were collected at two collection timepoints to calculate the contributions of belowground parts to Miscanthus x giganteus net primary productivity. This dataset also includes Miscanthus x giganteus and Switchgrass soil coring and excavation data collected in 2012 at the University of Illinois Urbana Champaign Energy Farm.
keywords:
Miscanthus; Net Primary Productivity; Excavation; Nitrogen fertilization; Translocation; Belowground Biomass; Carbon
published:
2025-09-06
4D-STEM datasets for solution-treated (CrCoNi)93Al4Ti2Nb MEA in [111], [112], and [114] zone. Data used for Ultramicroscopy article "Differentiating electron diffuse scattering via 4D-STEM spatial fluctuation and correlation analysis in complex FCC alloys". Experiment details can be found in the paper. Data-specific details are listed in the Readme file.
keywords:
4D-STEM; MEA; Electron Diffuse-Scattering; FluCor
published:
2025-09-24
Lee, Jaewon; Kwak, Suryang; Liu, Jing-Jing; Yu, Sora; Yun, Eun Ju; Kim, Dong Hyun; Liu, Cassie; Kim, Kyoung Heon; Jin, Yong-Su
(2025)
2′-Fucosyllactose (2′-FL), a human milk oligosaccharide with confirmed benefits for infant health, is a promising infant formula ingredient. Although Escherichia coli, Saccharomyces cerevisiae, Corynebacterium glutamicum, and Bacillus subtilis have been engineered to produce 2′-FL, their titers and productivities need be improved for economic production. Glucose along with lactose have been used as substrates for producing 2′-FL, but accumulation of by-products due to overflow metabolism of glucose hampered efficient production of 2′-FL regardless of a host strain. To circumvent this problem, we used xylose, which is the second most abundant sugar in plant cell wall hydrolysates and is metabolized through oxidative metabolism, for the production of 2′-FL by engineered yeast. Specifically, we modified an engineered S. cerevisiae strain capable of assimilating xylose to produce 2′-FL from a mixture of xylose and lactose. First, a lactose transporter (Lac12) from Kluyveromyces lactis was introduced. Second, a heterologous 2′-FL biosynthetic pathway consisting of enzymes Gmd, WcaG, and WbgL from E. coli was introduced. Third, we adjusted expression levels of the heterologous genes to maximize 2′-FL production. The resulting engineered yeast produced 25.5 g/L of 2′-FL with a volumetric productivity of 0.35 g/L∙h in a fed-batch fermentation with lactose and xylose feeding to mitigate the glucose repression. Interestingly, the major location of produced 2′-FL by the engineered yeast can be changed using different culture media. While 72% of the produced 2′-FL was secreted when a complex medium was used, 82% of the produced 2′-FL remained inside the cells when a minimal medium was used. As yeast extract is already used as food and animal feed ingredients, 2′-FL enriched yeast extract can be produced cost-effectively using the 2′-FL-accumulating yeast cells.
keywords:
Conversion;Genome Engineering
published:
2022-05-20
Haselhorst, Derek; Moreno, J. Enrique; Tcheng, David K.; Punyasena, Surangi W.
(2022)
This dataset includes images and annotated counts for 150 airborne pollen samples from the Center for Tropical Forest Science 50 ha forest dynamics plot on Barro Colorado Island, Panama. Samples were collected once a year from April 1994 to June 2010.
keywords:
aerial pollen traps; automated pollen identification; Barro Colorado Island; convolutional neural networks; Neotropics; palynology; phenology
published:
2022-08-20
Jones, Todd; Ward, Michael
(2022)
Dataset associated with Jones and Ward BEAS-D-21-00106R2 submission: Parasitic cowbird development up to fledging and subsequent post-fledging survival reflect life history variation found across host species. Excel CSV files and .inp file with data used in nest survival and Brown-headed Cowbird post-fledging analyses and file with descriptions of each column. The CSV file is setup for logistic exposure models in SAS or R and the .inp file is setup to be uploaded into program MARK for multi-state recaptures only analysis. Species included in the analyses: American Robin, Blue Grosbeak, Brown Thrasher, Blue-winged Warbler, Carolina Chickadee, Chipping Sparrow, Common Yellowthroat, Dickcissel, Eastern Bluebird, Eastern Phoebe, Eastern Towhee, Field Sparrow, Gray Catbird, House Wren, Indigo Bunting, Northern Cardinal, Red-winged Blackbird, Tree Swallow, Yellow-breasted Chat, and Yellow Warbler.
keywords:
brood parasitism; cowbird; carryover effects; phenotypic plasticity; post-fledging; songbirds
published:
2024-07-11
Pelech, Elena; Long, Steve
(2024)
This dataset includes the gas exchange and TDL (tunable diode laser) files between 4 accessions of Glycine soja and 1 elite accession of Glycine max (soybean) during light induction.
In this V2, code files for Matlab and R are also included to calculate mesophyll conductance and calculate the limitation on photosynthesis, respectively.
keywords:
photosynthesis; mesophyll conductance; soybean; light induction
published:
2020-08-01
Horna Munoz, Daniel; Constantinescu, George; Rhoads, Bruce ; Lewis, Quinn; Sukhodolov, Alexander
(2020)
This data set shows how density effects have an important influence on mixing at a small river confluence. The data consist of results of simulations using a detached eddy simulation model.
keywords:
confluence; flow dynamics; density effects
published:
2023-06-01
Pan, Chao; Peng, Jianhao; Chien, Eli; Milenkovic, Olgica
(2023)
This dataset contains four real-world sub-datasets with data embedded into Poincare ball models, including Olsson's single-cell RNA expression data, CIFAR10, Fashion-MNIST and mini-ImageNet. Each sub-dataset has two corresponding files: one is the data file, the other one is the pre-computed reference points for each class in the sub-dataset. Please refer to our paper (https://arxiv.org/pdf/2109.03781.pdf) and codes (https://github.com/thupchnsky/PoincareLinearClassification) for more details.
keywords:
Hyperbolic space; Machine learning; Poincare ball models; Perceptron algorithm; Support vector machine
published:
2025-01-27
Shen, Chengze; Wedell, Eleanor; Pop, Mihai; Warnow, Tandy
(2025)
The zip file contains the benchmark data used for the TIPP3 simulation study. See the README file for more information.
keywords:
TIPP3;abundance profile;reference database;taxonomic identification;simulation
published:
2025-08-05
Zhu, Minjiang; Sanders, Derrick M.; Kim, Yun Seong; Shah, Rohan ; Hossain, Mohammad Tanver; Ewoldt, Randy H.; Tawfick, Sameh H.; Geubelle, Philippe H.
(2025)