Data
Climate and groundwater data
Rainfall data at the daily time scale from 1901 to 2009 (109 years), and at a spatial resolution of 10 by 10 are available from the Indian Meteorological Department (IMD)19. Temperature data at the daily time scale from 1969 to 2005, and at the same spatial resolution of 10 by 10 are also available from IMD20. We used simulated data from daily climatology (mean and standard deviation of daily temperature) as a proxy for the 1901–1968 and 2006–2009 spans to augment the temperature data and match the time period of the rainfall data. From the daily minimum, mean and maximum temperature, and the latitude and extra-terrestrial solar radiation, we computed the daily reference crop evapotranspiration (ET0) for each of the 10 by 10 grid using the Hargreaves method26. The Hargreaves method for estimating ET0 is typically employed in regions where data availability is limited to air temperature21. The 10 by 10 climate grids are spatially interpolated to 586 districts in India based on a 2001 district boundary layer to create a district level, 109-year daily time series data of rainfall and ET0. We then use this district-level rainfall and ET0 data to estimate crop-specific water deficit.
India’s Central Ground Water Board (CGWB) estimates groundwater extraction using a number of wells and a uniform assumption on the extraction of each type of the well27,28. The average depth to groundwater level for each district is computed from this data. State-wide percentage coverage of irrigated area for major crops is available from the Directorate of Economics and Statistics, Ministry of Agriculture, India—DACNET5.
The district-level average annual rainfall, its inter-annual coefficient of variation, the average depth to groundwater level estimated based on the CGWB data, and the state-wide percentage total irrigation area coverage under all the crops are shown in Fig. S1 of the supplemental material.
Agricultural data
The Directorate of Economics and Statistics, Ministry of Agriculture, India5 hosts the Indian Harvest Database. We selected twelve MSP-supported primary Kharif season crops and obtained their respective cultivated areas and yields at the district level. The Kharif season spans from June to September, and is the predominant rainfall season that accounts for 90% of the annual rainfall. We grouped the crops into cereals (rice, bajra, maize, jowar, and ragi), pulses (tur and other pulses), and oilseeds (groundnuts, sesamum, soybean, nigerseed, and sunflower). The cereals and pulses selected here together comprise about 98% of the total food grains produced in the Kharif season5. The four chosen oilseeds account for about 93% of the total oilseeds produced in the Kharif season.
For each crop variety, we determine the potential yield under experimental conditions. For the potential yield for cereals, we used national average yields based on-farm research demonstrations over 13 years29. For pulses, we used the potential yields reported by the Expert Committee Report on Pulses (TMOP)/MOA30. Since we were unable to obtain such estimates for oilseeds, we used the maximum actual yields across all districts over the past 15 years as the potential yield for oilseeds if full irrigation was applied.
We also accessed the current seasonal agricultural production, the minimum support prices, and the cost of production for each of these crops as of the 2018 Kharif season. The cost of production covers all the tangible expenses incurred by the owner, i.e., the interest on the value of owned lands and fixed capital assets; the rental value of owned land, and credited value of fixed capital assets in addition to the direct costs (seeds, fertilizers, irrigation, labor, etc.)5. The cost of production data is available at the state level. In this study, we used the average of the previous three cropping years (2013–2014, 2014–2015, and 2015–2016) as the estimate for the cost of production. For the states where this data is not available, we assume a national average per crop. Further, we assume the same cost for all the districts in a state. The Ministry of Agriculture of the Government of India announces the minimum support price (MSP) at the beginning of each season based on the Commission for Agricultural Costs and Prices recommendations. All these details are provided in Tables S2 and S3 of the supplemental material.
The 2001 estimates of the population for each district are obtained from the Census of India31. The Indian Census data on population is available every ten years beginning 1872.
The nutrient composition of each crop variety is obtained from the United States Department of Agriculture (USDA), National Nutrient database32. The USDA Nutrient Database is a major source of food composition data in the United States and has information for 7636 food items. The recommended daily intake of nutrients in the diet for various groups of people, particularly in developing countries are obtained from the Food and Agriculture Organization of the United Nations (FAO) database. It provides safe levels of intake for a variety of nutrients for different gender and age groups. Safe levels of consumption are the levels that maintain health and nutrient stores in healthy individuals within a group. Further, these recommended intakes provide sufficient amounts of nutrients for prevention of deficiency disease, for growth and healthy maintenance of the body, and optimum levels of activity33. These details are provided in Table S4 of the supplemental information.
Models
Estimating crop water deficit and yields
For each crop, we first calculate the Kharif season crop water deficits using the daily 109-year time series of rainfall and ET0. The deficit, estimated as the difference between daily potential crop water requirement and renewable water supply is accumulated over the entire season. The maximum accumulated deficit over the season is considered as seasonal crop water deficit that may lead to a reduction in crop yield if irrigation is not provided.
A fraction of daily rainfall is assumed to be available as water supply for each day.
$${S}_{j,t,d}=alpha * {P}_{j,t,d}$$
(1)
({P}_{j,t,d}) is the rainfall over a district j, for an year t, and a day d. (alpha) is the parameter that determines the fraction that can be utilized for crops. For this analysis, we set (alpha) at 0.76.
For each crop, we estimate the daily water use based on the expected growth stage and evapotranspiration.
$${D}_{i,j,t,d}={left[{k}_{c}right]}_{i,d}* {left[{{ET}}_{0}right]}_{j,t,d}$$
(2)
({left[{k}_{c}right]}_{i}) is the crop coefficient for crop i. It is the ratio of the actual evapotranspiration (({{ET}}_{a})) under non-stressed conditions to the reference crop evapotranspiration (({{ET}}_{0})). It represents crop specific water use at various growth stages of the crop, and is typically derived empirically based on local climatic conditions34. The total crop water requirement over the entire season of ({n}_{s}) days (approximately 120 days) is:
$${{CWR}}_{i,j,t}=mathop{sum }limits_{d=1}^{{n}_{s}}{D}_{i,j,t,d}$$
(3)
The accumulated deficit over a season is given as:
$${{deficit}}_{i,j,t,d}={{max }}left(0,{{deficit}}_{i,j,t,d-1}+{D}_{i,j,t,d}-{S}_{j,t,d}right),{where},{{deficit}}_{i,j,t,d=0}=0$$
(4)
and the seasonal crop water deficit is:
$${{SD}}_{i,j,t}=mathop{{{max }}}limits_{d}left({{deficit}}_{i,j,t,d}right)$$
(5)
The seasonal crop water deficit (({{SD}}_{i,j,t})) focuses on rainfall distribution within the season relative to the crop water demand. It, therefore, accounts for the timing of planting, different stages of crop growth, and the timing and distribution of rainfall in the season, and hence, can discriminate between 2 years that have the same total rainfall but differ in their daily distribution. For instance, one year may have rainfall distributed uniformly over the season through modest rainfall events, while the other may have a few intense rain events separated by extended dry periods. The latter has an immediate and adverse effect on the crop growth and hence the yield.
Using ({{SD}}_{i,j,t}), the seasonal crop water deficit, we estimate the crop yield.
$${Y}_{i,j,t}=left(1-frac{{(1-{eta }_{i,j})* {SD}_{i,j,t}}}{{{CWR}}_{i,j,t}}right)* {{PY}}_{i}$$
(6)
({{SD}}_{i,j,t}) and ({{CWR}}_{i,j,t}) are the seasonal crop water deficit and the total crop water requirement estimated for each crop i, in a district j, for a year t. ({eta }_{i,j}) is the irrigation potential for crop i, and district j. ({{PY}}_{i}) is the potential yield for crop i, defined as the yield attained when cultivated under favorable conditions with full irrigation and nutrient supply. ({PY}) is the maximum achievable yield for the crop under non-stress conditions. For ({eta }_{i,j}), we used the state-wide percentage coverage of irrigated area for these crops that is available from DACNET5. Details for all the states are provided in Table S5 of the supplemental information. We use the maximum fraction per district as the irrigation potential for all the crops in that district. As an example, for the districts in Punjab, the percent area irrigated under rice and maize are 97% and 64%, respectively. In the optimization model, we assume that all the crops in Punjab can be irrigated up to 97%. This fraction of the seasonal crop water deficit can be supplied through irrigation, and hence, if the irrigation potential is close to 1, the estimated yield ({Y}_{i,j,t}) approaches potential yield ({{PY}}_{i}). The expected value of the estimated yield is used in the crop allocation model.
Crop allocation model
Our crop allocation model is developed using linear programming. With an objective to maximize the aggregate national agricultural revenue, the model determines feasible regions and crop choices across India for the Kharif season while trying to satisfy a set of linear constraints.
We define aggregate national agricultural revenue as the difference between the expected value of the total income from all the crops cultivated in the season in all the districts and the cost of cultivation of these crops, including the cost of irrigation. We impose the following constraints on the model.
-
1.
A district-level upper bound on the total cropped area.
-
2.
A district-level upper bound on the total irrigation water.
-
3.
A national food security constraint in terms of target production of the crops.
-
4.
Target nutritional requirements recommended for the entire population.
In addition to the district-level irrigation constraint, Eq. (6) also serves as an implicit water sustainability constraint in the model. As explained in the previous section, we estimate crop yield as the reduction from potential yield due to crop water deficit that cannot be supplied through irrigation. Hence, in districts where irrigation potential is close to zero, yield loss resulting from crop water deficit is high for crops that require more water through the season (e.g., rice) compared to crops that require less water through the season (e.g., pulses). Consequently, the annual revenue generated from a crop with high water requirements in a district is lower than the revenue generated from a crop with low water requirements. Further, yield loss that results from crop water deficit is high for districts in arid regions that cannot provide irrigation than districts in a humid region. Hence, the model would identify suitable crops for districts per their climatic patterns.
The model is formally presented below.
Objective function
The goal is to maximize the expected net national agricultural revenue
$$O=mathop{{{{{{rm{E}}}}}}}limits_{t}left[mathop{sum }limits_{j=1}^{{n}_{d}}left(begin{array}{c}left(mathop{sum }limits_{i=1}^{{n}_{c}}{delta }_{i,j}* left({{MSP}}_{i,j}-{{CP}}_{i,j}right)* {Y}_{i,j,t}* {a}_{i,j}right)\ -{{CI}}_{j}* left(mathop{sum }limits_{i=1}^{{n}_{c}}left({frac{1}{{beta }_{i}}* eta }_{i,j}* {{SD}}_{i,j,t}* {delta }_{i,j}* {a}_{i,j}right){* psi }_{1}* g* {h}_{j}* frac{1}{{mu }_{p}}right)* {psi }_{2}end{array}right)right]$$
(7)
({delta }_{i,j}) is the indicator function that determines the suitability of crop i in district j. While this is typically determined using soil characteristics and temperature profile, we estimate this based on the historical crop cultivation data in this study. If crop i was cultivated in district j for at least five times in the past, we assume that the district is suitable for this crop—({delta }_{i,j}=1). ({{MSP}}_{{i,j}}-{{CP}}_{i,j}) is the net profit (INR/kg) resulting from crop i in a district j. ({{MSP}}_{i,j}) and ({{CP}}_{i,j}) are the minimum support price and the cost of cultivation, respectively. These returns can be based either on the government announced minimum support prices, which are constant across the whole country, or the market prices, that can vary by the district. The cost of production typically varies by crop across the country. ({Y}_{i,j,t}) represents the yield (Kg/Ha) estimated from crop water deficit for crop i in district j for a year t (see Eq. (6)). ({a}_{i,j}) is the decision variable i.e., the area (Ha) allocated for each crop i, in district j. ({{CI}}_{j}) is the electricity cost charged for irrigation. We assumed a nominal national flat charge of INR 3/kWh. The average agricultural power tariff in 2011 was around INR 1.5/kWh18,35. The term (({frac{1}{{beta }_{i}}* eta }_{i,j}* {{SD}}_{i,j,t}* {delta }_{i,j}* {a}_{i,j})) is the total irrigation water pumped for crop i in district j. It includes an irrigation efficiency factor ({beta }_{i}) to adjust for additional losses due to application efficiency. For rice, we assumed a 30% efficiency (due to its flood irrigation practice). For the other 11 crops, we assumed a 75% irrigation efficiency36. ({psi }_{1})is the conversion factor from volume to mass. Since ({{SD}}_{i,j,t}) is in units of millimeters, and ({a}_{i,j}) is in units of hectares, ({psi }_{1}=frac{1}{1000}(m)* 10,000({m}^{2})* 1000(frac{{kg}}{{m}^{3}})). (g) is the acceleration due to gravity on earth, 9.81 m/s2 . ({h}_{j}) is the average depth (in meters) to groundwater level in district j from where water is extracted for irrigation. District-level data for average depth to ground water levels are available from the Central Ground Water Board (CGWB). ({mu }_{p}) is the coefficient to account for pump efficiency. We assumed that pump efficiency in all the districts is 30% based on the efficiencies reported in various Indian states37,38. Finally, ({psi }_{2}) is the conversion factor from Joules to kWh—(1/3600,000).
The operator (mathop{{{{{{rm{E}}}}}}}limits_{t}[.]) denotes the expectation of the objective function, and ({n}_{c}) and ({n}_{d}) are the number of crops for the season (12) and the number of districts (586) in the country, respectively.
Constraints
We group the constraints into three categories: (a) area and location constraints, (b) irrigation constraints, and (c) food security and nutritional constraints.
The area and location constraints prescribe the maximum area allocated for agriculture in a given district and the suitability of the type of crop in that district.
$$0le mathop{sum }limits_{i=1}^{{n}_{c}}{delta }_{i,j}* {a}_{i,j}le {{TCA}}_{j},forall ,j$$
(8)
({{TCA}}_{j}) is the total Kharif season cropped area for the selected crops in each district j. The area and location constraint ensure that the allocated crop acreage is within maximum possible cropped area in a given district.
The irrigation constraint ensures a sustainable limit—it is restricted to be no more than 15% of the average annual rainfall. We assume that 15% is the percentage of average annual precipitation that recharges groundwater, a reasonable assumption for subhumid to humid regions39,40. This quantity is available as renewable groundwater.
$$0le mathop{{{{{{rm{E}}}}}}}limits_{t}left[mathop{sum }limits_{i=1}^{{n}_{c}}{frac{1}{{beta }_{i}}*{delta }_{i,j}}* {eta }_{i,j}* {{SD}}_{i,j,t}*{a}_{i,j}right]le {lambda * bar{P}}_{j}* {A}_{j},forall ,j$$
(9)
({A}_{j}) is the net cropped area, and ({bar{P}}_{j}) is the average annual rainfall for district j. We set (lambda =0.15).
The food security constraint ensures that the aggregate produce from different crops is at least as much as the current aggregate produce.
$$mathop{{{{{{rm{E}}}}}}}limits_{t}left[mathop{sum }limits_{j=1}^{{n}_{d}}{delta }_{{i,j}}* {Y}_{{i,j,t}}* {a}_{{i,j}}right]ge {Q}_{i},forall ,i$$
(10)
({Q}_{i}) is the current national aggregate production of crop i. The number of food security constraints will be equal to the total number of crops chosen. This constraint ensures that the net agricultural produce resulted from the new allocation is at least equal to the current net production of each of these crops.
Lastly, we introduce nutrition targets since self-sufficiency in terms of aggregate food grains produced does not ensure nutritional goals. Our nutrition constraints ensure that the total nutritional requirement for a selected spectrum of nutritional goals is at least as much as the recommended nutritional goals for the population.
$$mathop{{{{{{rm{E}}}}}}}limits_{t}left[mathop{sum }limits_{i=1}^{{n}_{c}}mathop{sum }limits_{j=1}^{{n}_{d}}{delta }_{i,j}* {c}_{{ni}}* {Y}_{i,j,t}* {a}_{{i,j}}right]ge {N}_{n},forall ,n$$
(11)
({N}_{n}) are the nutritional needs of the country’s population corresponding to a suite of nutritional goals ranging from calories, proteins, fats, etc. ({c}_{{ni}}) is the amount of nutritional content for nutrient n (calories, proteins, etc.) in crop i.
This model has (left({2n}_{d}+{n}_{c}+nright)) number of constraints and can be solved using any of the traditional linear programming algorithms such as the simplex algorithm41. We used the simplex algorithm available through the lpSolve solver package in R42.
Scenarios
We considered two scenarios, “Irrigation Capped” and “Irrigation Zero”. The “Irrigation Capped” scenario considers irrigation and has the following constraints: (a) area and location constraints, (b) irrigation constraints, and (c) food security and nutritional constraints. Here we assumed ({eta }_{i,j}=mathop{{{max }}}limits_{i}({eta }_{i,j})), i.e., for each district, the irrigation potential for any crop is the maximum irrigation potential in that district. For example, for districts in Punjab, the percent area irrigated under rice is 97%, the largest for any crop in Punjab. We assume that any crop in Punjab can be irrigated to this level. The “Irrigation Zero” scenario considers no irrigation. Here, the model has only area, location, food security, and nutritional constraints. We assumed no irrigation potential for the country, i.e., ({eta }_{i,j}=0) for all the crops and districts. For the “Irrigation Capped” scenario, 1190 constraints (586 district area constraints; 586 district irrigation constraints; 12 production constraints; six nutritional constraints—energy, proteins, fat, iron, niacin, and folate) result. For the “Irrigation Zero” scenario, 604 constraints (586 district area constraints; 12 production constraints; six nutritional constraints—energy, proteins, fat, iron, niacin, and folate) need to be satisfied.

