Beyond Yield Prediction 

The importance of climate and context in tomato production

The yield prediction challenge

Tomatoes are one of the most grown fresh produce crops in the world. While the majority are grown in open field, recent decades have seen an increase in protected or controlled environment production. By utilizing protected environments such as plastic houses, greenhouses, or glasshouses, tomato growers can produce crops for most of the year and better protect against unfavorable weather conditions, pests, and diseases.

A further benefit is that controlled environments enable growers to manipulate environmental factors such as climate, water, and light to improve produce quality and achieve around a 5-fold increase in production, compared to traditional outdoor cultivation.

For tomato growers, predicting yield is a crucial practice that drives both short-term response and long-term strategy within their business. Yield prediction assists growers to steer their crop to a desired production outcome, provides a level of certainty on the available produce for sale, and serves as an indicator for supply contract fulfilment.

Yield prediction is difficult and complex but necessary as the financial impact of over or under-producing is one of the most significant challenges for today’s growing teams.

While many growers aim for a high level of yield prediction accuracy, they may not achieve it consistently. For example, a greenhouse grower may achieve an average of 85% accuracy overall but experience a week- on-week variation between 65% and 95% accuracy. Such variance is problematic for business operations and forecasting. A common cause of this variance is the occurrence of sudden spikes or dips in the actual yield, called yield swings. Yield swing weeks are the most difficult weeks to predict in any crop cycle. The unexpected low or high yield swings can be the result of biological, environmental, or other external factors that traditional yield prediction modeling may not account for.

Therefore, we need to consider whether chasing the "holy grail" of high accuracy yield prediction is the right approach. If not, then can we identify and describe biological and environmental events leading up to yield swings? And to what degree would taking a more holistic approach to interpreting yield prediction improve confidence in the prediction? These are the critical questions that we need to ask ourselves to maximize yield.

The case for a holistic approach

Yield prediction in tomato cultivation requires a vast amount of historical data. This data is used to train artificial intelligence (AI) models to calculate yield averages for future yields. However, such a volume of data is not always available. Even when it is, past events may now not be a good indicator of the future due to changing climate, weather patterns and the emergence of new pests and diseases; or their increasing prevalence and intensity.

Growers understand that tomato production is vulnerable to adverse environmental events and other non-environmental factors. These factors include changes in crop biological measures, modifications to growing practices, and external factors like COVID-19 lockdowns or public holidays. Such events can cause swings in yield, pest and disease infestation, crop/financial losses, and unintended excesses.


To achieve oversight of these factors, growers measure numerous parameters including climate, irrigation, and crop development. Crop development is measured in a practice known as crop registration, which includes measures such as plant stem diameter, plant length, number of leaves/flowers/fruits, and the speed at which flowers are produced and pollinated.

Growers often manage crop outcomes using yield prediction in conjunction with plant balance. 

Plant balance is the optimum state that enables plant development in line with growth milestones and target yield. By understanding the current plant state (either vegetative growth e.g., producing more leaves and stems or generative growth e.g., producing more flowers and fruits), growers can ‘steer’ the crop in a particular direction to meet their production targets.

With the breadth of data at their disposal growers should be able to surface some clues on yields swings and the factors at play in the lead up to them. Furthermore, if growers could predict when a yield swing is likely to occur and identify the cause, pre-emptive actions can be taken. Like a weather forecast, if you know there is a 30% chance of rain, you can be prepared and have an umbrella handy.


The path to predicting yield with confidence

The team of crop and data scientists at WayBeyond help tomato growers improve their yield prediction. Using proprietary digital data collection, data analysis algorithms and AI models, their goal is to provide a wider understanding of yield prediction by studying data on plant genetics, environment, and crop management practices.   

To demonstrate the value of a more holistic approach to predicting yield, the team collated and analyzed anonymized data from 20 tomato growing cycles from growers in varied protected growing environments.

Yield prediction and yield swings 

Yield swings are defined as sudden and significant variance in yield compared to the yield in surrounding weeks. We set out to find factors associated with week-to-week variability in yield prediction accuracy. 

To predict yield, we utilized a proprietary AI model that allowed us to forecast one week and four weeks in advance. We did this for each week in the crop cycle. During analysis, we used proprietary algorithms to identify the yield swings within crop cycles. Yield swings are represented visually as sudden and significant jumps or drops in yield compared to the yield in the surrounding weeks in the graph below.


The accuracy of yield prediction for one week ahead using the AI model ranged from 81-95%.

We classified the crop cycles into three groups, based on the accuracy of yield prediction one week ahead: -

  • Group 1 - highest average accuracy
  • Group 2 - second highest average accuracy
  • Group 3 - lowest average accuracy

The average accuracy one week and four weeks ahead was: -

  • Group 1 - 91.1% for Week 1, 90.3% for Week 4
  • Group 2 - 86.4% for Week 1, 87.1% for Week 4
  • Group 3 - 82.9% for Week 1, 81.2% for Week 4

In each group, the yield swings were identified and counted using proprietary algorithms. Group 1 had the lowest average number of swings (0.83) whereas Group 3 had the highest average number of swings (3.25).


Yield swings and environmental and biological events

To better understand events leading up to yield swings, we collected environmental and biological data from the eight weeks prior to each swing. Our hypothesis was that the data would reveal patterns or events occurring before swings, which could be used as indicators to predict swings in the future. 

Environmental data consisted of climate data including the outside and inside protected environment temperature and humidity. Biological data included crop registration measurements of weekly growth, truss height, leaf length, stem width, and the number of leaves. We also used a proprietary plant balance model to measure plant balance.

We standardized the data collected using proprietary algorithms, and then derived scores which indicated the deviation of each weekly measurement from what would normally be expected for the given week (deviation scores). 

We then identified and tallied common patterns in deviation scores in the environmental and biological data leading up to yield swings. Next, we identified patterns that had the strongest correlations with low and high yield swings. Finally, we described the identified patterns in a contextual manner, such as 'low light in the last eight weeks,' and the risk of swing they posed to yield, for example, 'risk of low yield in the next three weeks' for a low-light pattern associated with a low yield swing.

Note: We did not interpret irrigation data or pest, and diseases in this study because irrigation and pest management practices are highly variable among growers and have a direct impact on yield. Thus, both should be incorporated into interpreting data on yield prediction and making yield decisions.

Yield swings and environmental and biological events

The most common patterns observed in the eight weeks before the occurrence of low yield swings were as follows:  low outside night temperature (20%),

  • low total light (19%),
  • low difference between internal day and night temperature (18%), and
  • low outside day temperature (15%).

These events can serve as a warning to tomato growers of a possible low yield swing associated with the identified patterns. The warning can be framed as 'risk of low yield in the next three weeks'.


Environmental events leading to high yield

The most common patterns observed in the eight weeks before the occurrence of high yield swings were: 

  • high difference between internal day and night temperature (24%), 
  • high outside night temperature (22%), and 
  • high total light (20%)

These events can also serve as a warning to tomato growers of a possible high yield swing associated with the identified patterns. The warning can be framed as 'risk of high yield in the next three weeks'.

Biological events leading to low and high yield

We analyzed and compared plant growth measurements from weeks leading to low and high yield swings using tabular analysis of the deviation scores as described earlier. Weeks leading to low yield swings showed evidence of greater vegetative growth as shown by plant measurements compared to weeks leading to high yield swings. For example, a higher deviation score for leaf numbers in weeks leading to low yield swings is indicative of more leaves and therefore greater vegetative growth than in weeks leading to high yield swings.


For each week with a swing, we also counted the number of weeks in the preceding eight weeks where plant balance indicated highly vegetative or highly generative growth. This was conducted using plant balance scores from our proprietary model. 

This 8-week period was categorized depending on whether the majority of those weeks were highly vegetative or highly generative. If they were, the 8-week period was denoted as a vegetative or generative event, respectively. We then compared the proportion of vegetative or generative events in the weeks leading to low and high yield. This indicated more generative growth events in the weeks leading to high yield swings than in the weeks leading to low yield swings.

bottom-wave-1 (11)

Should tomato growers look
beyond yield prediction?

Yield swings impact the accuracy of yield prediction models. To help tomato growers manage these swings, we were able to identify key environmental and biological patterns in the weeks leading up to swings that are commonly associated with low and high yield swings. These patterns represent the contextual layer which can be utilized in conjunction with yield prediction to expand a grower’s decision-making capabilities and thus to help them anticipate and manage yield swings in protected environments.

If growers can identify the patterns that occur before swings, they can anticipate them and interpret yield prediction with this in mind. Different growers will be prone to different risks depending on their level of protection and crop management, and identifying patterns relevant to them will help them interpret their yield prediction with relevant context.

With this understanding, they can anticipate swings and interpret yield prediction with greater accuracy. Over time, growers can also learn to mitigate the risks of swings happening in the first place, leading to more consistent crop management and higher quality data.

Based on our analysis, we suggest that yield prediction becomes a more effective decision-making tool when it is supported by relevant insights from environmental and biological data. By collecting and analyzing environmental, plant and context data, tomato growers can gain a more comprehensive understanding of their growing environment, crop and management practices. This helps them anticipate yield swings, make informed decisions on yield prediction and crop management, and avoid financial or produce losses.

bottom-wave-5 (4)

Watch the presentation

The Beyond Yield Prediction whitepaper was published and presented at the 2023 Global Tomato Conference by Lee Kirsopp, Product Manager at WayBeyond, in May 2023.

About the Authors


Dr. Mpatisi Moyo

Head of Artificial Intelligence

Dr. Mpatisi Moyo obtained a BSc and Master’s in Medical Laboratory Sciences from the University of Zimbabwe and Massey University, respectively. Followed by a Post-Graduate Diploma in Statistics then completing his PhD in Health Sciences from the University of Auckland. Mpatisi has over 15 years’ experience in data science, analytics and AI across healthcare, government, telecommunications, energy, finance and agritech sectors. He heads Data Science and Artificial Intelligence at WayBeyond. His team focuses on combining biology, data and AI to build smart insights, prediction and recommendation tools to help growers improve efficiency in their production. 


Dr. Tharindu Weeraratne

Director of Crop Science and Agronomy

Dr. Tharindu Weeraratne is an academic, AgTech, and farm consultant specializing in plant physiology, plant pathology, molecular genetics, crop science, and agronomy. After obtaining his BSc in Botany in Sri Lanka, he did research in postharvest biology and technology of cut flowers and foliage. He did his PhD in Plant Biology at the University of Texas at Austin. Tharindu leads the Crop Science, Agronomy Research and Consulting at WayBeyond, and works with AI research and engineering teams to provide cutting-edge solutions for growers. He also support growers with agronomic and crop consulting using WayBeyond’s solutions.

Featured Resources

Lorem ipsum dolor sit amet, consectetur adipiscing elit. In fringilla ultricies lacinia.
Nulla rhoncus ac sapien eget efficitur.

IMAGE (14)
Heading of article , whitepaper, etc etc goes here

Amet minim mollit non deserunt ullamco est sit aliqua dolor do amet sint. Velit officiaLAmet minim mollit non deserunt ullamco est sit aliqua dolor do amet sint. Velit officiaL

Heading of article , whitepaper, etc etc goes here

Amet minim mollit non deserunt ullamco est sit aliqua dolor do amet sint. Velit officiaLAmet minim mollit non deserunt ullamco est sit aliqua dolor do amet sint. Velit officiaL

Heading of article , whitepaper, etc etc goes here

Amet minim mollit non deserunt ullamco est sit aliqua dolor do amet sint. Velit officiaLAmet minim mollit non deserunt ullamco est sit aliqua dolor do amet sint. Velit officiaL

Interested in partnering with a digital agronomy industry leader?

Schedule a discovery call with our expert team >