For general radiocarbon data use, we recommend choosing data appropriate to the research question. As mentioned previously, some of the older data may require verification to ensure the link between humans and the dated object is secure. We have included region-specific usage and quality notes below to aid data usage.
Regional usage and quality
For the following Regions, where one of the co-authors is considered a specialist, we have included a brief description of the regional dataset. We do not feature specialists for Oceania, South Asia, Southeast Asia, or North Asia. Northern South America is also underrepresented by our co-author specialities, though many continent-wide datasets include this area. We have chosen not to write as though we are experts for these regions, and we encourage readers to look directly at region-specific studies for these particular areas for recommendations.
Africa
P3k14c contains 11,129 radiocarbon ages for all of Africa. Prior to p3k14c, as elsewhere in the world, radiocarbon data for Africa were accessible across several regional databases compiled by distinct research teams with differing research goals. The majority of African data presented here are aggregated from eight sources, available online or on request, with a small number of dates available from other sources.
While there are frequent overlaps between databases, large parts of the continent are entirely unrepresented in these sources or feature only selective, project-specific data compilations. Many of the contributing datasets have research foci on the Holocene collected for a subregion of the continent. Kay et al.38 present c. 3000 Holocene archaeological dates to investigate land-use changes associated with food production across West and Central Africa, while the complementary “aDRAC” online repository archives c. 1500 Holocene Central African dates39. From Saharan and north Africa, Manning and Timpson40 collated >3000 Holocene dates to investigate demographic responses to climate shifts. Further Holocene dates for the region are recorded in the online MedAfriCarbon database of Lucarini et al.41. The CalPal database, maintained and distributed by Weninger42, archives c. 1100 additional Holocene dates for north-east and north-west Africa.
Regarding underrepresented regions, Eastern Africa is currently one of the less well-represented regions in the database, with only several hundred dates spanning thousands of years43. For parts of southern Africa, an online database integrated with OxCal presents c. 2500 dates spanning the entire radiocarbon age range44. As evident in a map of site distributions, there are few data for much of southern Africa outside of South Africa. Meanwhile, thirteen African countries (Madagascar, Zambia, Cape Verde, Comoros, Ivory Coast, Djibouti, Eritrea, Guinea-Bissau, Malawi, Mauritius, Sao Tome and Principe, Seychelles, South Sudan) are entirely unrepresented in the p3k14c compilation, and a further six countries, including Ethiopia, record less than ten radiocarbon dates each (also Somalia, Sierra Leone, Liberia, Guinea, and the Gambia). These patterns likely reflect both the remit of the published databases, and the true incidence of radiocarbon dates in these countries. Given the differing cut-off age ranges utilised in the different contributory data compilations, inter-regional comparisons beyond the Holocene are likely to be invalid.
Although more comprehensive regional data collations are in progress for Africa, we recommend against conducting continent-wide analyses of radiocarbon data given dramatically different sampling strategies throughout the continent at this juncture.
Southwest Asia
P3k14c contains a total of 6,222 radiocarbon dates in Southwest Asia collected from many regional databases42,45,46,47,48,49,50,51,52,53. Unlike other areas of the world where archaeological work is routinely integrated into planning and construction industries given a longer tradition of commercial archaeology (e.g., North America, central and northern Europe), in southwestern Asia most archaeological investigations are carried out by academic projects. As a consequence, this region is prone to spatial and chronological biases due to investigator bias.
Regarding chronological biases, for later historical periods, archaeologists rely more on short-lived pottery types and historical media (e.g., clay tablets) for dating archaeological layers rather than using radiometric dating. A certain reluctance in using radiocarbon dating among those archaeologists digging Iron Age sites is also justified by the Hallstatt radiocarbon calibration plateau (ca. 2750–2350 cal BP) which makes it difficult to obtain refined radiocarbon-based chronologies. On a pan-regional scale, the present dataset guarantees a good chronological coverage prior to ~3000 cal BP, while other regions such as Mesopotamia, Anatolia and Iran show a research-biased drop in the available radiocarbon dates from ∼4000-3500 cal BP onwards as shown recently by54.
Regarding spatial bias, 40% of the radiocarbon dates come from the Southern Levant (modern Israel and West Bank). This is due to the higher research budget of Israeli archaeological teams interested in producing or improving an absolute chronology. Additionally, certain chronological periods are more likely to be sampled than others. A clear research bias in the southern Levant is due to the interest of many archaeologists in providing a better chronology for the Early Bronze Age sub-periods (ca. 3800 – 2500 BC) and the Late Bronze Age/Iron Age transition (ca. 1200 – 950 BC)55,56,57,58,59,60.
East Asia
P3k14c has 5,818 radiocarbon ages for East Asia (China, Japan, and Korea). There are no comprehensive radiocarbon databases for the area. Most of the data come from mainland China (4,285), many come from Japan (1,433), and few come from Taiwan or South Korea (3). There are only three contributing data sources for this region17,61,62.
For China, the Chinese Academy of Social Sciences publishes compilations of all radiocarbon dates on roughly 30 year intervals63. The dataset we present here relied on dates that were compiled by61 for a meta-analysis of settlement site density in the People’s Republic of China, which include datasets from the Chinese Academy of Social Sciences. Regarding scope, the earliest dates in this database range as early as 40,000 BP. We also worked with data from a second database which focused on radiocarbon dates carried out directly on or associated with the introduction of crop remains17. The earliest dates in this database date to roughly 9000 BP. Because of its focus on association with published crop remains, this database does not present a comprehensive view of radiocarbon dates across East Asia. The dataset we present here should thus represent a relatively comprehensive set of dates from the People’s Republic of China, which is our area of expertise and which has been the focus of previous metadata compilations. Missing dates from the People’s Republic of China should only represent dates carried out over the past 8 years.
For Japan, 64 gathered data from eastern Japan to study Jomon demography between 7,000 and 3,000 BP. The uncalibrated data therefore range from 7,430 to 2,500 14C BP. The authors only gathered radiocarbon ages dated with the more accurate Accelerated Mass Spectrometer (AMS) method, allowed for only a narrow δ13C range, and removed all marine samples. The data are therefore very concentrated in three provinces of Japan for a fairly narrow temporal range.
The dataset we present here is sorely lacking comprehensive data from Taiwan, the rest of Japan, and Korea. Additionally, Southeast Asia, Central Asia and South Asia are all lacking in radiocarbon data as these regions fell outside of our region of expertise and there are no preexisting datasets currently in these regions.
Europe
P3k14c has 77,393 radiocarbon dates from Europe. Despite having been the focus of numerous regional and pan-regional surveys, there is currently no single comprehensive database for 14 C measurements. The p3k14c dataset collates all major existing resources, supplemented by specific regional assessment of the published and unpublished grey literature provided by several of the co-authors41,42,45,47,48,65,66,67,68,69,70,71,72,73.
Temporally speaking, the vast majority of dates are 2000 calBP or younger, primarily from three distinct resources covering Norway, Finland, and the UK and Ireland. Though this does not entirely reflect the variety of local research traditions, it does emphasise that, in many respects, the collation of 14 C dates has been largely driven by studies focusing on Prehistoric, and especially Later Prehistoric (Mesolithic, Neolithic and to a lesser extent Bronze Age) periods. From a spatial point of view, the European dataset is characterised by a marked discrepancies in terms of number of dates and sites per country, but also in terms of average number of dates per site (see above KDE analysis). The latter trend results from the recent multiplication of studies combining extensive sampling and Bayesian statistics. While such research arguably provides exceptional records at site-level, they also lead to major regional imbalances where a minority of intensively dated sites co-exists with a majority of limitedly investigated sites, thus hampering regional contextualisation (e.g.74).
Given the long history of research and early use of 14C in Europe, pre-existing compilations often extensively overlap in content, although the recordation of site name, cultural attributions, and locational data sometimes differ significantly across resources. In many instances, it is frequently difficult to identify qualitative criteria linked to database compilation and management (e.g.67), though recent efforts to improve and share auditing methods are noticeable75,76. As a result, the available raw data contain redundant and/or conflicting information, which was tackled here through a combination of automated and qualitative assessment of the evidence (see above).
North America
P3k14c contains 64,933 radiocarbon ages from North America divided between the USA (56,612) and Canada (8,322). We have included all archaeological data from the Canadian Archaeological Radiocarbon Database (CARD) and supplemented it with data collected through the NSF-funded project Populating a Radiocarbon Database of North America (PI: Robert L. Kelly), which compiled data from the lower 48 United States. The UWyo2021 dataset benefitted from several existing albeit smaller collection efforts whose results were generously shared, and through a number of state radiocarbon databases77. They also searched for dates through open Google searches, searches (manual and digital) through journals, including all the state journals and bulletins they could obtain, and through searches of SHPO records where possible. Data from CARD are submitted voluntarily by researchers and have a minimal review process. This region contains no cutoff date for age, and they recommend that researchers conducting research on earlier time periods personally verify the earlier dates.
In the USA, the number of archaeological dates by region and state differs widely, from >12,500 for California to <225 for New Hampshire, reflecting not just state sizes but also research and CRM intensity. Comparing the number of dates to the number of sites recorded in SHPO files, some states (California, Texas, Wyoming, Pennsylvania, Washington, Florida, Oregon, Ohio, Illinois) are better sampled than other states, while others (Arkansas, Idaho, South Carolina, North Carolina) are relatively under-sampled, although not poorly sampled. Differences are due to uneven access to unpublished data and grey literature. The dates may also suffer from investigator bias (e.g., a focus on research surrounding the adoption and spread of agriculture).
The data from the United States and Canada have obfuscated locational information (per the section on the Obfuscation of precise coordinates for certain dates), which are freely available through tdar, github, and zenodo. Data with more precise locational information are also available through tdar under restricted access. The UWyo2021 dataset only provided county-centroid locations, however. Should researchers require more precise locations than county/division centroids, the data may still be obtained by reaching out to the State Historic Preservation Office (SHPO).
Central America
P3k14c contains 1,218 radiocarbon ages from Central America. Until recently, radiocarbon dates published at sites across Mesoamerica have not been compiled and organized in any comprehensive manner. Two recent dates-as-data studies78,79 identified published dates from the literature of the Maya lowlands, in efforts to identify social and political developments associated with climatic change. These studies, and subsequent compilations of published 14C dates, has led to the creation of the Mesoamerican Radiocarbon Database (MesoRAD)80, which is the only dataset contributing to p3k14c’s Central American sample. MesoRAD represents the largest compilation of published data from the Archaic to the Colonial periods in Mesoamerica.
Chronologically, all dates identified in the literature were included in the database, with the earliest secure dates as early as 9785-9290 cal BCE and the most recent associated with modern landscape disturbance (e.g., plowing and burning). Despite taphonomy and time depth, data show good coverage for the Preclassic/Formative period (1200 cal BCE to 300 CE), due to research agendas focused on timing for the origins of village life across the region (e.g.79,81,82,83). An increase in dates is noted from the Early Classic periods (300 CE to 600 CE) to the Late Classic period (600 to 750 CE), associated with large-scale population increase recorded across the region. Similarly, a drop in the frequency of 14 C dates can be identified in the transition from the Late Classic to Terminal Classic period (~750 CE), concomitant with identified reductions in populations associated with what is commonly described as the ‘Classic Maya collapse’. Several well-dated Postclassic sites, such as Mayapan84 represent the largest concentrations of Postclassic dates in the sample, while other regions, such as northern Belize and the Peten Lakes, include lower numbers of Postclassic dates. Finally, there are fewer dates associated with contexts after European conquest.
Spatially, large gaps continue to exist where no 14C dates have yet been compiled, including central Mexico, Oaxaca, and the Gulf Coast region, as well as other parts of Mesoamerica and Central America.
South America
P3k14c contains 7,668 radiocarbon ages from South America. Currently there does not exist a comprehensive radiocarbon database for the entire continent, but there are a few databases that have been compiled with very specific temporal and spatial constraints and goals. The South American data is derived from existing databases (e.g.85,86,87,88,89,90 and papers where a large number of dates are published (e.g.91,92,93). We are aware that many of these datasets are incomplete and largely outdated, particularly because South America has witnessed a surge in radiocarbon dating over the last two decades but also because initial compilations of radiocarbon dates were regionally and temporarily biased.
Regarding temporal biases, this dataset does not include dates older than 15,000 BP and we remain skeptical of human occupations dating earlier than this date. A recent study93 includes a recent continent-wide review of late Pleistocene and early Holocene dates for timing the occupation of South America and include older dates as well as cautionary notes on their use. Regarding recent dates, we do not include a cut-off point, but many of the databases we rely on did not have dates younger than 2000 BP. Furthermore, many researchers working in South America typically do not use radiocarbon dating on materials that post-date the European conquest, approximately 500 years ago86. Therefore, the late Holocene record is limited in coverage.
We recognize the imperfect quality of the coverage of this dataset. We are more certain about omissions from regions of our general expertise (including central Argentina, Bolivia, Chile, southern Peru, and southeastern Brazil), and recognize that northern South America might be poorly covered and could incorporate significant oversights. For Bolivia, because many of these include roughly the same dates as well as various errors, we have relied on94 for a countrywide review of radiocarbon dates for updated and corrected information about site location, lab codes, dates, etc.
Finally, we are in the process of updating a new synthetic continental scale database based on primary literature, but this work is still in process and for researchers interested in a broad sweep of data, this database might be suitable.
Usage notes for R users
We provide an R package called p3k14c to facilitate access to the scrubbed/fuzzed dataset for R users, and to make the quality analysis, table and figure generation code available (see the Code Availability section below). The R package is available as a Github repository (https://github.com/people3k/p3k14c), and the version of the package used that created the analyses reported here are archived on Zenodo95. Users interested in accessing the p3k14c data should refer to the package documentation in R, and to the README.md file available in the Github repository.
Python code for scrubbing radiocarbon data
We provide in full the suite of Python 3.7 scripts used to process the dataset as of the time of submission. The code is hosted under version control on GitHub (https://github.com/people3k/p3k14c-data-scrubbing), also archived on Zenodo96. This suite contains complete replication steps, usage instructions, and structure explanations as part of its README.md. Further, all blocks of code are paired with commented documentation explaining their function should the user desire to modify the programs or obtain a finer-grained understanding of the suite.
The suite consists of three main scripts. The primary script, scrub.py, accepts unprocessed radiocarbon records and performs the scrubbing procedures specified in prior sections. This script relies on removeDuplicates.py, the duplicate-handling division of routines, which is also capable of being run independently if the user desires only to handle a dataset’s duplicates without scrubbing it first. Further, fuzz/fuzz.py converts all US, Canada, and GuedesBocinsky2018 spatial coordinates to county centroids, province centroids, and truncated coordinates, respectively. Lastly, the scripts used to perform the unicode character correction procedure are located in the charfix directory. An anaconda environment.yml is specified for ease of consistent environment creation with proper package versioning.
R code for quality analyses
A research compendium, complete with R code, to run the quality analyses and produce the figures and tables presented here is available as a part of the p3k14c R package on Github (https://github.com/people3k/p3k14c) and archived on Zenodo95. Code was run in RStudio version 4.0.5; other details on the runtime environment are available in the colophon of the research compendium. This package also includes the scrubbed/fuzzed data, site count data, and an executable paper that recreates the figures and tables in this publication.

