Illinois Data Bank Dataset Search Results
Results
published:
2025-10-29
Chen, Chu-Chun; Dominguez, Francina; Matus, Sean
(2025)
This dataset contains variables from the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis v5 (ERA5; Hersbach et al., 2020). These data were used for the analysis in “The impact of large-scale land surface conditions on the South American low-level jet” published in Geophysical Research Letters.
Acknowledgments:
This work was supported by NSF Award AGS-1852709. We thank Dr. Zhuo Wang and Dr. Divyansh Chug for their valuable feedback and insightful discussions.
References:
Hersbach H, Bell B, Berrisford P, et al. The ERA5 global reanalysis. Q J R Meteorol Soc. 2020; 146: 1999–2049. https://doi.org/10.1002/qj.3803
keywords:
atmospheric sciences; South American low-level jet; land-atmosphere interactions; soil moisture; regional atmospheric circulation; southeastern South America
published:
2025-01-30
Peyton, Buddy; Bajjalieh, Joseph; Martin, Michael; Alahi, Sam; Fadell, Norah; Jeralds, Maddie
(2025)
Coups d'Ètat are important events in the life of a country. They constitute an important subset of irregular transfers of political power that can have significant and enduring consequences for national well-being. There are only a limited number of datasets available to study these events (Powell and Thyne 2011, Marshall and Marshall 2019). Seeking to facilitate research on post-WWII coups by compiling a more comprehensive list and categorization of these events, the Cline Center for Advanced Social Research (previously the Cline Center for Democracy) initiated the Coup d’État Project as part of its Societal Infrastructures and Development (SID) project. More specifically, this dataset identifies the outcomes of coup events (i.e., realized, unrealized, or conspiracy) the type of actor(s) who initiated the coup (i.e., military, rebels, etc.), as well as the fate of the deposed leader.
Version 2.2.0 adds 94 additional coup events. 66 of these came from examining Powell and Thyne’s “discarded” events and 28 of these events were added to the data set in the normal annual review of potential new coup events. This version also updates the coding to events in Brazil in 1945 and the Congo in 1968.
Version 2.1.3 adds 19 additional coup events to the data set, corrects the date of a coup in Tunisia, and reclassifies an attempted coup in Brazil in December 2022 as a conspiracy.
Version 2.1.2 added 6 additional coup events that occurred in 2022 and updated the coding of an attempted coup event in Kazakhstan in January 2022.
Version 2.1.1 corrected a mistake in version 2.1.0, where the designation of “dissident coup” had been dropped in error for coup_id: 00201062021. Version 2.1.1 fixed this omission by marking the case as both a dissident coup and an auto-coup.
Version 2.1.0 added 36 cases to the data set and removed two cases from the v2.0.0 data. This update also added actor coding for 46 coup events and added executive outcomes to 18 events from version 2.0.0. A few other changes were made to correct inconsistencies in the coup ID variable and the date of the event.
Version 2.0.0 improved several aspects of the previous version (v1.0.0) and incorporated additional source material to include:
• Reconciling missing event data
• Removing events with irreconcilable event dates
• Removing events with insufficient sourcing (each event needs at least two sources)
• Removing events that were inaccurately coded as coup events
• Removing variables that fell below the threshold of inter-coder reliability required by the project
• Removing the spreadsheet ‘CoupInventory.xls’ because of inadequate attribution and citations in the event summaries
• Extending the period covered from 1945-2005 to 1945-2019
• Adding events from Powell and Thyne’s Coup Data (Powell and Thyne, 2011)
Version 1.0.0 was released in 2013. This version consolidated coup data taken from the following sources:
• The Center for Systemic Peace (Marshall and Marshall, 2007)
• The World Handbook of Political and Social Indicators (Taylor and Jodice, 1983)
• Coup d’Ètat: A Practical Handbook (Luttwak, 1979)
• The Cline Center’s Social, Political and Economic Event Database (SPEED) Project (Nardulli, Althaus and Hayes, 2015)
• Government Change in Authoritarian Regimes – 2010 Update (Svolik and Akcinaroglu, 2006)
<br>
<b>Items in this Dataset</b>
1. <i>Cline Center Coup d'État Codebook v.2.2.0 Codebook.pdf</i> - This 17-page document describes the Cline Center Coup d’État Project dataset. The first section of this codebook provides a summary of the different versions of the data. The second section provides a succinct definition of a coup d’état used by the Coup d'État Project and an overview of the categories used to differentiate the wide array of events that meet the project's definition. It also defines coup outcomes. The third section describes the methodology used to produce the data. <i>Revised January 2025</i>
2. <i>Coup Data v2.2.0.csv</i> - This CSV (Comma Separated Values) file contains all of the coup event data from the Cline Center Coup d’État Project. It contains 29 variables and 1094 observations. <i>Revised January 2025</i>
3. <i>Source Document v2.2.0.pdf</i> - This 347-page document provides the sources used for each of the coup events identified in this dataset. Please use the value in the coup_id variable to identify the sources used to identify that particular event. <i>Revised January 2025</i>
4. <i>README.md</i> - This file contains useful information for the user about the dataset. It is a text file written in markdown language. <i>Revised January 2025</i>
<br>
<b> Citation Guidelines</b>
1. To cite the codebook (or any other documentation associated with the Cline Center Coup d’État Project Dataset) please use the following citation:
Peyton, Buddy, Joseph Bajjalieh, Dan Shalmon, Michael Martin, Jonathan Bonaguro, and Scott Althaus. 2025. “Cline Center Coup d’État Project Dataset Codebook”. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.0. Janurary 30. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V8
2. To cite data from the Cline Center Coup d’État Project Dataset please use the following citation (filling in the correct date of access):
Peyton, Buddy, Joseph Bajjalieh, Michael Martin, Sam Alahi, Norah Fadell, and Maddie Jeralds. 2025. Cline Center Coup d’État Project Dataset. Cline Center for Advanced Social Research. V.2.2.0. Janurary 30. University of Illinois Urbana-Champaign. doi: 10.13012/B2IDB-9651987_V8
published:
2026-01-14
Bansal, Prateek; Shukla, Diwakar
(2026)
This dataset contains the .npy and .pkl files required to reproduce the plots in the study.
keywords:
GPCR; activation; STE2; Class D; molecular dynamics
published:
2026-02-01
Xu, Xiaotian; Yao, Yu; Liu, Yicen; Curtis, Jeffrey; West, West; Riemer, Nicole
(2026)
This dataset contains simulation results from PartMC-MOSAIC and WRF-PartMC that used in the journal article: Quantifying the Impact of Surfactants on Cloud Condensation Nuclei Activity Using a Particle-Resolved Model. Two compressed folder are uploaded here, one is for the data that used in this article, the other folder is the python scripts to process the data. For more details of the uploaded files, please check the README file.
keywords:
Surfactants; CCN; Effective surface tension
published:
2026-01-28
Nahid, Shahriar Muhammad; Dong, Haiyue; Nolan, Gillian; Nam, Sungwoo; Mason, Nadya; Huang, Pinshane; van der Zande, Arend
(2026)
Room-temperature transfer curves; Benchmarking conductance; STEM images of charged domain walls; Temperature-dependent transfer curves; Scaling of conductance, hopping length, threshold voltage, trap density, and field-effect mobility with temperature; Magnetotransport data; Optical, AFM, and PFM image of different field-effect transistors; STEM images of contacts; Output and transfer curves of FETs; Additional STEM images of charged domain walls; Temperature scaling of subthreshold swing and threshold voltage difference; Comparison of maximum field-effect mobility for different structures
published:
2026-01-27
Trivellone, Valeria; Canuto, Francesca; Lucetti, Giulia; Dietrich, Christopher H.; Galetto, Luciana; Marzachì, Cristina
(2026)
Trivellone_etal_Full_PaperList_SystRev.xlsx: This dataset contains the list of peer-reviewed studies selected and critically appraised for a systematic review of quantitative PCR (qPCR) investigations tracking phytoplasma load dynamics in insect vectors. The dataset includes bibliographic information and selection status for each study, reflecting the inclusion and exclusion criteria applied during the review process. The literature search was completed on December 15, 2025. The list of inclusion and exclusion criteria are listed in the second spreadsheet.
Further methodological details, including search strategy, screening workflow, and appraisal criteria, are described in the associated paper, “Tracking the early spatio-temporal dynamics of phytoplasma multiplication within its leafhopper vector”, as well as in the Supplementary Materials (see below), by Valeria Trivellone, Francesca Canuto, Giulia Lucetti, Christopher H. Dietrich, Luciana Galetto, Cristina Marzachì.
keywords:
qPCR; systematic review; phytopalsma; multiplication; vector
published:
2025-05-07
Reves, Olivia; Larson, Eric
(2025)
Data collected at 71 study sites from 2023 to 2024 for Reves, Olivia P. (2025): Using Environmental DNA Metabarcoding to Inform Biodiversity Conservation in Agricultural Landscapes. Master's thesis, University of Illinois Urbana-Champaign. Files include study site information, taxa by site matrices for vertebrates from environmental DNA metabarcoding using multiple mitochondrial DNA primers (COI, 12S), and bird species audibly detected by a phone app at study sites.
keywords:
agricultural conservation; biodiversity; eDNA; environmental DNA; Illinois; metabarcoding; riparian buffers; stream flow; vertebrates
published:
2016-05-19
Donovan, Brian; Work, Dan
(2016)
This dataset contains records of four years of taxi operations in New York City and includes 697,622,444 trips. Each trip records the pickup and drop-off dates, times, and coordinates, as well as the metered distance reported by the taximeter. The trip data also includes fields such as the taxi medallion number, fare amount, and tip amount. The dataset was obtained through a Freedom of Information Law request from the New York City Taxi and Limousine Commission.
The files in this dataset are optimized for use with the ‘decompress.py’ script included in this dataset. This file has additional documentation and contact information that may be of help if you run into trouble accessing the content of the zip files.
keywords:
taxi;transportation;New York City;GPS
published:
2025-02-07
Wang, Binghui; Kudeki, Erhan
(2025)
Incoherent scatter radar datasets collected during the September 2016 campaign at Arecibo have been deposited in this databank. The lag products of the ISR data are stored as lag profile matrices with 5 minutes of integration time. The data is organized in a Python dictionary format, with each file containing 12 lag profile matrices representing one hour of observation. A sample Python script is provided to illustrate its usage.
published:
2025-12-18
Marshalla, Dan; Fraterrigo, Jennifer
(2025)
This dataset includes data from a study conducted in southern Illinois, USA, which was published in the Journal of Applied Ecology. The study investigated the interactive effects of fire history and invasion by the non-native grass Microstegium vimineum on fire intensity and oak regeneration in central hardwood forests. The dataset includes data on environmental conditions, historical fire occurrence, experimental fire intensity and fuel load, seedling and juvenile oak characteristics, Microstegium cover, and plot descriptions.
keywords:
Fire-grass-tree interactions; Historical fire regime; Invasive grasses; Microstegium vimineum, Post-fire oak survival; Prescribed fire
published:
2025-05-14
1228 egg hyperspectral images, the wavelength from 400 nm to 900 nm.
published:
2026-01-22
Edmonds, Devin; Du, Jane; Stickley, Samuel; Sucre, Samuel
(2026)
This dataset contains data and R scripts used to analyze the trade of non-native pet amphibians in the United States by integrating online classified advertisements with U.S. Fish and Wildlife Service import records. The data include records of amphibian advertisements, U.S. imports, taxonomic reference lists, and conservation status information. The dataset supports analyses identifying domestically produced species, species entering U.S. markets through unrecorded or unofficial trade pathways, and price differences associated with documented and undocumented trade. The dataset supports the analyses presented in an associated peer-reviewed publication in Biological Conservation.
keywords:
amphibian; biocommerce; biosecurity; conservation; LEMIS; pet trade; species laundering; wildlife trade
published:
2026-01-23
Kaman, Bobby; Lim, Jinho; Liu, Yingkai; Hoffmann, Axel
(2026)
Data related to a publication, "Emulating 2D Materials with magnons" to be published, but also as a preprint on arXiv https://arxiv.org/abs/2601.03210.
It contains scripts for the simulation program Mumax3, and python scripts for conversion and analysis.
keywords:
micromagnetics; mumax; tight-binding; spin waves; magnons
published:
2026-01-20
Willson, James; Warnow, Tandy
(2026)
Dataset from "CAMUS: Scalable Phylogenetic Network Estimation." This dataset contains simulated phylogenetic networks, gene trees, and sequence data.
- camus-dataset.tar.xz is the main archive containing all the simulated data. More details about the files and directories it contains can be found in README.md
- scripts.zip contains various scripts used in the simulation study.
keywords:
evolution; computational biology; bioinformatics; phylogenetics
published:
2026-01-22
Cao, Yanghui; Dietrich, Christopher H.; Dmitriev, Dmitry A.; Zou, Hongfen; Xue, Qingquan; Zhang, Yalin
(2026)
The following 5 files were used to reconstruct the phylogeny of the Membracoidea.
1. Taxon_sampling.csv: contains the sample IDs (1st column, used in the alignments) and the taxonomic information (2nd to 6th columns) for 269 samples.
2. concatenated_aa_.phy: a concatenated amino acid dataset with 52,987 amino acid positions. This dataset was used for the maximum likelihood analysis by IQ-TREE v1.6.12. Hyphens are used to represent gaps.
3. concatenated_nt.phy: a concatenated nucleotide dataset with all codon positions included (158,961 nucleotide positions). This dataset was used for the maximum likelihood analysis by IQ-TREE v1.6.12. Hyphens are used to represent gaps.
4. concatenated_12nt.phy: a concatenated nucleotide dataset with the third codon positions excluded (105,974 nucleotide positions). This dataset was used for the maximum likelihood analysis by IQ-TREE v1.6.12. Hyphens are used to represent gaps.
5. Individual_gene_alignment.zip: contains 427 FASTA files, each one represents the nucleotide alignment for a gene. Hyphens are used to represent gaps. These files were used to construct gene trees using IQ-TREE v1.6.12, followed by multispecies coalescent analysis using ASTRAL v 4.10.5.
keywords:
Auchenorrhyncha; evolution; phylogeny; timetree
published:
2026-01-21
Suthers, Patrick; Maranas, Costas
(2026)
Growth-coupling product formation can facilitate strain stability by aligning industrial objectives with biological fitness. Organic acids make up many building block chemicals that can be produced from sugars obtainable from renewable biomass. Issatchenkia orientalis is a yeast strain tolerant to acidic conditions and is thus a promising host for industrial production of organic acids. Here, we use constraint-based methods to assess the potential of computationally designing growth-coupled production strains for I. orientalis that produce 22 different organic acids under aerobic or microaerobic conditions. We explore native and engineered pathways using glucose or xylose as the carbon substrates as proxy constituents of hydrolyzed biomass. We identified growth-coupled production strategies for 37 of the substrate-product pairs, with 15 pairs achieving production for any growth rate. We systematically assess the strain design solutions and categorize the underlying principles involved.
keywords:
Bioproducts; Modeling
published:
2026-01-19
Note: The GTAP dataset includes a total of 140 regions, some of which are aggregated regions. For all map-related supplementary files (S11, S12, S13), we assign values to each individual country to enhance visualization. Countries within the same aggregated region are assigned the same regional value to maintain consistency across the map.
<b>Data S1 (separate file): S1.csv</b>- CSV file detailing production-related deaths for the GTAP dataset.
Rows: Each row represents a country where deaths occur as a result of production activities.
Columns: Each column represents a country-sector pair on the production side.
Values: The values indicate the number of deaths caused by production activities in the country-sector listed in each column and occurring in the country listed in each row.
<b>Data S2 (separate file): S2.csv</b>- CSV file detailing production-related deaths for the EORA dataset.
Structure: The file has the same structure as S1.csv.
<b>Data S3 (separate file): S3.csv</b>- CSV file detailing consumption-related deaths for the GTAP dataset.
Rows: Each row represents a country where deaths occur as a result of consumption activities.
Columns: Each column represents a consumption country.
Values: The values indicate the number of deaths caused by consumption activities in the country listed in the column and occurring in the country listed in the row.
<b>Data S4 (separate file): S4.csv</b>- CSV file detailing consumption-related deaths for the EORA dataset.
Structure: The file has the same structure as S3.csv.
<b>Data S5 (folder of files): S5.zip</b>- a folder containing 141 CSV files, each named after a country's 3-digit code (e.g., USA.csv, CHN.csv), representing production-related spatial PM₂.₅ concentration patterns for all GTAP countries.
Rows: Each row corresponds to a grid cell.
Columns: Each column represents an industrial sector. The final column, "geometry," contains the spatial coordinates (latitude and longitude) for each grid cell.
Values: Each value indicates the PM₂.₅ concentration level (in µg/m³) attributable to emissions from the specified sector in the given country, as they occur in each grid cell.
<b>Data S6 (folder of files): S6.zip</b>- a folder containing 188 CSV files, each named after a country's 3-digit code, representing production-related spatial PM₂.₅ concentration patterns for all EORA countries.
Structure: Each file follows the same format as those in S5.zip, with rows representing grid cells and columns representing industrial sectors, plus a "geometry" column containing spatial coordinates.
<b>Data S7 (separate file): S7.csv</b>- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all GTAP countries.
Rows: Each row represents a grid cell.
Columns: Apart from the last column ("geometry"), which contains spatial information for each grid cell in latitude-longitude coordinates, each column represents a consumption country.
Values: Each value indicates the PM₂.₅ concentration level caused by each country’s consumption process and occurring in each grid cell, measured in µg/m³.
<b>Data S8 (separate file): S8.csv</b>- CSV file containing consumption-related spatial PM₂.₅ concentration patterns for all EORA countries.
Structure: The file has the same structure as S7.csv.
<b>Data S9 (separate file): S9.csv</b>- CSV file listing the total net bidirectional export of deaths for all countries in GTAP, displaying only positive values.
Columns:
"from": The country that exports more consumption-related deaths.
"to": The country that imports more consumption-related deaths.
"values": The net export of deaths between these two countries, calculated as the difference between the deaths flowing from "from" to "to" and those from "to" to "from."
<b>Data S10 (separate file): S10.csv</b>- CSV file listing the total net bidirectional export of deaths for all countries in EORA, displaying only positive values.
Structure: The file has the same structure as S9.csv.
<b>Data S11 (separate file): S11.csv</b>- CSV file listing the Value of Statistical Lives (VSLs), and consumption-related externalities under three scenarios—Business as Usual (BAU), Global Community (GC), and Fair Trade in Deaths (FTD)—along with externalities per GDP and their differences for GTAP countries.
Columns:
VSL, BAU_Externality, GC_Externality, FTD_Externality
BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP
Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC
<b>Data S12 (separate file): S12.csv</b>- Same as S11.csv, but for EORA countries.
Structure: Identical to S11.csv.
<b>Data S13 (separate file): S13.csv</b>- purpose: Includes data used to generate Figures 1, 2, 3, and 5 in the main text.
Columns:
country_code: 3-letter country code
GTAP_region, continent, population, GDP, GDP_capita, VSL
export_of_death, import_of_death, net_export, net_export_capita
allforeign_world, G50foreign_world, G100foreign_world
cause_allforeign_world, cause_L30foreign_world, cause_L50foreign_world
BAU_Externality, GC_Externality, FTD_Externality
BAU_Ext_perGDP, GC_Ext_perGDP, FTD_Ext_perGDP
Diff_GC_BAU, Diff_FTD_BAU, Diff_FTD_GC
geometry (used for visualization)
<b>Data S14 (separate file): S14.xlsx</b>- this Excel file contains six sheets summarizing cross-model Pearson correlation coefficients between sectoral economic activity fractions and transboundary mortality impact metrics, based on both GTAP and EORA datasets.
Sheets:
Output_fraction_GTAP
Direct_demand_fraction_GTAP
Final_demand_fraction_GTAP
Output_fraction_EORA
Direct_demand_fraction_EORA
Final_demand_fraction_EORA
Rows: Each row represents an economic sector.
Columns:
G50foreign_world: Fraction of deaths attributable to final demand from regions where demand per capita is more than 50% higher than in the current country.
cause_L50foreign_world: Fraction of deaths caused by consumption within the current country but occurring in countries with more than 50% lower demand per capita.
Values: Each value represents the Pearson correlation between the sectoral fraction and the corresponding transboundary mortality metric.
<b>Data S15 (separate file): S15.csv</b>- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of production-based premature deaths.
Column Producer: The producing country–sector pair responsible for the emissions leading to health impacts.
Column Affected Country: The country where the resulting premature deaths occur.
Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis.
Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each producer–affected country pair.
<b>Data S16 (separate file): S16.csv</b>- CSV file derived from the GTAP dataset, containing Monte Carlo simulation results (500 draws) for the uncertainty analysis of consumption-based premature deaths.
Column Consumer: The consuming country whose final demand drives the global production and associated health impacts.
Column Affected Country: The country where the resulting premature deaths occur.
Column Deaths: The estimated number of deaths corresponding to the one used in the main analysis.
Columns Deaths_median, Deaths_low95, Deaths_high95: The median, 2.5th percentile, and 97.5th percentile values across 500 Monte Carlo draws of the GEMM θ parameter, representing the 95% confidence interval for each consumer–affected country combination.
published:
2025-09-18
Chen, Maosi; Parton, William J.; Hartman, Melannie D.; Del Grosso, Stephen J.; Smith, William K.; Knapp, Alan; Lutz, Susan; Derner, Justin; Tucker, Compton; Ojima, Dennis; Volesky, Jerry; Stephenson, Mitchell B.; Schacht, Walter H.; Gao, Wei
(2025)
Productivity throughout the North American Great Plains grasslands is generally considered to be water limited, with the strength of this limitation increasing as precipitation decreases. We hypothesize that cumulative actual evapotranspiration water loss (AET) from April to July is the precipitation‐related variable most correlated to aboveground net primary production (ANPP) in the U.S. Great Plains (GP). We tested this by evaluating the relationship of ANPP to AET, precipitation, and plant transpiration (Tr). We used multi‐year ANPP data from five sites ranging from semiarid grasslands in Colorado and Wyoming to mesic grasslands in Nebraska and Kansas, mean annual NRCS ANPP, and satellite‐derived normalized difference vegetation index (NDVI) data. Results from the five sites showed that cumulative April‐to‐July AET, precipitation, and Tr were well correlated (R2: 0.54–0.70) to annual changes in ANPP for all but the wettest site. AET and Tr were better correlated to annual changes in ANPP compared to precipitation for the drier sites, and precipitation in August and September had little impact on productivity in drier sites. April‐to‐July cumulative precipitation was best correlated (R2 = 0.63) with interannual variability in ANPP in the most mesic site, while AET and Tr were poorly correlated with ANPP at this site. Cumulative growing season (May‐to‐September) NDVI (iNDVI) was strongly correlated with annual ANPP at the five sites (R2 = 0.90). Using iNDVI as a surrogate for ANPP, we found that county‐level cumulative April–July AET was more strongly correlated to ANPP than precipitation for more than 80% of the GP counties, with precipitation tending to perform better in the eastern more mesic portion of the GP. Including the ratio of AET to potential evapotranspiration (PET) improved the correlation of AET to both iNDVI and mean county‐level NRCS ANPP. Accounting for how different precipitation‐related variables control ANPP (AET in drier portion, precipitation in wetter portion) provides opportunity to develop spatially explicit forecasting of ANPP across the GP for enhancing decision‐making by land managers and use of grassland ANPP for biofuels.
keywords:
Sustainability;Field Data;Modeling
published:
2026-01-19
Fourkas, Austen; Looney, Leslie
(2026)
This dataset includes the FITS files for all ALMA images used in the ApJ publication "Multiband ALMA Polarization Observations of BHB 07-11 Reveal Aligned Dust Grains in Complex Spiral Arm Structures". Additionally, this dataset includes details regarding the data reduction process so that interested users can perform the reduction and imaging themselves.
keywords:
FITS files; ALMA data; reduction instructions
published:
2026-01-12
Yan, Qiang; Cordell, William; Jindra, Michael; Pfleger, Brian
(2026)
Microbial lipid metabolism is an attractive route for producing oleochemicals. The predominant strategy centers on heterologous thioesterases to synthesize desired chain-length fatty acids. To convert acids to oleochemicals (e.g., fatty alcohols, ketones), the narrowed fatty acid pool needs to be reactivated as coenzyme A thioesters at cost of one ATP per reactivation – an expense that could be saved if the acyl-chain was directly transferred from ACP- to CoA-thioester. Here, we demonstrate such an alternative acyl-transferase strategy by heterologous expression of PhaG, an enzyme first identified in Pseudomonads, that transfers 3-hydroxy acyl-chains between acyl-carrier protein and coenzyme A thioester forms for creating polyhydroxyalkanoate monomers. We use it to create a pool of acyl-CoA’s that can be redirected to oleochemical products. Through bioprospecting, mutagenesis, and metabolic engineering, we develop three strains of Escherichia coli capable of producing over 1 g/L of medium-chain free fatty acids, fatty alcohols, and methyl ketones.
keywords:
Bioproducts; Metabolomics
published:
2025-10-22
Yan, Qiang; Jacobson, Tyler B.; Ye, Zhou; Cortes-Peña, Yoel R.; Bhagwat, Sarang; Hubbard, Susan; Cordell, William T.; Oleniczak, Rebecca E.; Gambacorta, Francesca V.; Rivera-Vasquez, Julio; Shusta, Eric V.; Amador-Noguez, Daniel; Guest, Jeremy; Pfleger, Brian
(2025)
Plants produce many high-value oleochemical molecules. While oil-crop agriculture is performed at industrial scales, suitable land is not available to meet global oleochemical demand. Worse, establishing new oil-crop farms often comes with the environmental cost of tropical deforestation. The field of metabolic engineering offers tools to transplant oleochemical metabolism into tractable hosts while simultaneously providing access to molecules produced by non-agricultural plants. Here, we evaluate strategies for rewiring metabolism in the oleaginous yeast Yarrowia lipolytica to synthesize a foreign lipid, 3-acetyl-1,2-diacyl-sn-glycerol (acTAG). Oils made up of acTAG have a reduced viscosity and melting point relative to traditional triacylglycerol oils making them attractive as low-grade diesels, lubricants, and emulsifiers. This manuscript describes a metabolic engineering study that established acTAG production at g/L scale, exploration of the impact of lipid bodies on acTAG titer, and a techno-economic analysis that establishes the performance benchmarks required for microbial acTAG production to be economically feasible.
keywords:
Conversion;Sustainability;Biomass Analytics;Lipidomics;Metabolomics
published:
2025-11-20
Yan, Qiang; Cordell, William; Breckner, Christian; Chen, Xuanqi; Jindra, Michael; Pfleger, Brian
(2025)
Medium-chain length methyl ketones are potential blending fuels due to their cetane numbers and low melting temperatures. Biomanufacturing offers the potential to produce these molecules from renewable resources such as lignocellulosic biomass. In this work, we designed and tested metabolic pathways in Escherichia coli to specifically produce 2-heptanone, 2-nonanone and 2-undecanone. We achieved substantial production of each ketone by introducing chain-length specific acyl-ACP thioesterases, blocking the β-oxidation cycle at an advantageous reaction, and introducing active β-ketoacyl-CoA thioesterases. Using a bioprospecting approach, we identified 15 homologs of E. coli β-ketoacyl-CoA thioesterase (FadM) and evaluated the in vivo activity of each against various chain length substrates. The FadM variant from Providencia sneebia produced the most 2-heptanone, 2-nonanone, and 2-undecanone, suggesting it has the highest activity on the corresponding β-ketoacyl-CoA substrates. We tested enzyme variants, including acyl-CoA oxidases, thiolases, and bi-functional 3-hydroxyacyl-CoA dehydratases to maximize conversion of fatty acids to β-keto acyl-CoAs for 2-heptanone, 2-nonanone, and 2-undecanone production. In order to address the issue of product loss during fermentation, we applied a 20% (v/v) dodecane layer in the bioreactor and built an external water cooling condenser connecting to the bioreactor heat-transferring condenser coupling to the condenser. Using these modifications, we were able to generate up to 4.4 g/L total medium-chain length methyl ketones.
keywords:
Metabolomics; Metabolic Engineering
published:
2025-11-03
Woodruff, William; Deshavath, Narendra Naik; Susanto, Vionna; Rao, Christopher V.; Singh, Vijay
(2025)
Oleaginous yeasts are a promising candidate for the sustainable conversion of lignocellulosic feedstocks into fuels and chemicals, but their growth on these substrates can be inhibited as a result of upstream pretreatment and enzymatic hydrolysis conditions. Previous studies indicate a high citrate buffer concentration during hydrolysis inhibits downstream cell growth and ethanol fermentation in Saccharomyces cerevisiae. In this study, an engineered Rhodosporidium toruloides strain with enhanced lipid accumulation was grown on sorghum hydrolysate with high and low citrate buffer concentrations. Both hydrolysis conditions resulted in similar sugar recovery rates and concentrations. No significant differences in cell growth, sugar utilization rates, or lipid production rates were observed between the two citrate buffer conditions during batch fermentation of R. toruloides. Under fed-batch growth on low-citrate hydrolysate a lipid titer of 16.7 g/L was obtained. Citrate buffer was not found to inhibit growth or lipid production in this engineered R. toruloides strain, nor did reducing the citrate buffer concentration negatively affect sugar yields in the hydrolysate. As this process is scaled-up, $131 per ton of hydrothermally pretreated biomass can be saved by use of the lower citrate buffer concentration during enzymatic hydrolysis.
keywords:
Conversion;Hydrolysate;Lipidomics
published:
2025-10-15
York, Julia M.; Bhat, Shriram; Kim, Jinmu; Cardenas, Leyla; Cheng, Chi-Hing Christina
(2025)
This repository contains supplementary information, alternate genome assemblies, annotation, and predicted protein datasets for Notothenia coriiceps and Paranotothenia angustata genome assemblies. Primary assemblies, mitochondrial assemblies, RNA-Seq data, and raw read data can be found under NCBI Bioproject PRJNA1310647.
keywords:
notothenioid; Antarctic; fish; genome; DNA
published:
2025-10-16
Maitra, Shraddha; Long, Stephen P.; Singh, Vijay
(2025)
Transgenic bioenergy crops have shown the potential to produce vegetative oil by accumulating energy-rich triacylglyceride molecules that can be converted into biofuels (biodiesel and biojet). These transgenic crops cater to improved biofuel yield by providing lipids along with cellulosic sugars. Efficient bioprocessing technologies are needed to utilize these transgenic plants to their maximum potential. To this end, this study investigates a low- and high-severity chemical-free hydrothermal pretreatment of transgenic oilcane 1566 bagasse with in situ lipids to maximize the recovery of lipids for biodiesel and fermentable sugars for ethanol with minimal inhibitor generation. Hydrothermal pretreatment at 170°C recovered ∼25% of total lipids in the pretreatment liquor, leaving the remainder in bagasse residue for hexane recovery post fermentation. The recovery of lipids in pretreatment liquor remained constant beyond 170°C. Along with lipids, ∼35% w/w and ∼50% w/w fermentable sugars were recovered post saccharification from bagasse pretreated at 170°C and 210°C for 20 min, respectively. Hydrothermal pretreatment at 170°C for 20 min provided the optimum conditions for maximum recovery of lipids and cellulosic sugars that resulted in enhanced biofuel yield per unit biomass. High severity pretreatment increased the generation of inhibitors beyond the tolerance of fermentation microorganisms. In addition, the application of time-domain proton NMR spectroscopy was extended to bioprocessing. NMR technology facilitated the analysis of total lipids, the composition of fatty acids, and the characterization of free and bound lipids in untreated and pretreated oilcane 1566 bagasse subsequent to each step of biomass to biofuel conversion.
keywords:
Conversion;Feedstock Bioprocessing