French Trot Horse Racing: Forecasting Competition

Calling all machine/statistical learning enthusiasts!

Join our latest forecasting competition and win exciting prizes!

Are you passionate about machine/statistical learning and eager to test your forecasting skills?

CANSSI Ontario is thrilled to announce a competition for Ontario university students to build the best models for predicting French Trot Horse Racing (a background document about the sport will be provided to registered teams).

Grab your teammates and take up the challenge to win a share of $10,000 in prize money!

Compete for glory (and money!)

💰 Performance Award: $5,000 
Be crowned the champion by using historical data (that will be provided to registered teams) to develop a model that delivers the most accurate winner forecasts for French Trot over a validation dataset of three-months.

💡 Innovation Award: $3,500 
Think outside the box! Impress the judges by exploring innovative concepts, adding innovative features, or telling interesting stories.

🎓 Best Entry by Undergraduates: $1,500 
Undergraduates, showcase your talent and skills. Be judged in either category—performance or innovation—and win $1,500 prize specifically reserved for the best undergraduate entry.

All amounts are in Canadian Dollars.

Key Dates

  1. September 22, 2023: Team Registration Deadline 
    Register a team consisting of one to three members, each of whom must be currently enrolled in a CANSSI Ontario partner university, by September 22, 2023.
  2. September 25, 2023: Data Distribution
    Once registered, the team leader will be sent a link to the dataset. Start exploring the data and dive into the French Trot world!
  3. November 7, 2023: Submission Deadline 
    Fine-tune your model and finalize your predictions. Submit your final forecasts, using Submission Form that’s available from September 25, 2023, by November 7, 2023 to qualify for the grand prizes.

How to participate?

  1. Form a team of one to three members all of whom are from a CANSSI Ontario partner university.
  2. Register your team, using your university emails, by visiting the Registration Page and completing the registration process before September 22, 2023.
  3. Use your expertise in R, Python or Julia (or a combination) to conduct analysis and build a powerful forecasting model. You are welcome to augment the training dataset with additional data, but it is not mandatory.
  4. Build your reproducible model. Use your expertise in R, Python or Julia (or a combination) to conduct analysis and build a powerful forecasting model. You are welcome to augment the training dataset with additional data, but it’s not mandatory.
  5. Develop notebooks and scripts. Organize your work using notebooks and scripts. Create a GitHub repository to store your code. Please don’t add the provided dataset to GitHub.
  6. Submit your entry. As the competition deadline approaches, polish your model and finalize your forecasts. Submit your entry by sharing the GitHub repository link with the competition organizers, using the Submission Form that’s available on September 25, 2023. An entry consists of an end-to-end reproducible model that the judges can run themselves (after they add the data); an explanatory notebook talking about your submission; and forecasts for the three-month holdout test period.

Showcase your skills!

This is a chance to prove your skills in the world of machine learning and statistics. The skills that you develop in this competition are valuable in industry and would make a great case study to talk about in interviews. Whether you excel in performance, innovation, or are an exceptional undergraduate, the French Trot forecasting has something for everyone.

Register today, harness the power of data, and trot your way to victory!

Frequently Asked Questions (FAQ)

Yes, you are welcome to have a team of one.

No, teams are limited to a maximum of three students. We want to provide you with the chance to create a portfolio of work, and any more than three could dilute this.

Undergraduates, graduate students, and postdocs are all welcome to compete.

It’s fine to mix levels in the teams, but for the sake of fairness we can only consider teams that are entirely undergrads for the undergrad prize.

Yes, you can register until October 6, 2023, although that would not leave much time to put together an entry!

In the interest of fairness, we cannot accept submissions submitted after November 7, 2023.

The use of open-source, free, software is an important part of reproducibility and so Stata and other tools are not appropriate. If you want to use a language other than R/Python/Julia, please get in touch early in the process.

No. We welcome entries from students in any discipline at a CANSSI Ontario university.

Brock University; Carleton University; Wilfrid Laurier University; McMaster University; Queen's University; University of Guelph; University of Ottawa; University of Toronto; University of Waterloo; University of Windsor; Western University; York University.

No. You can be from different universities, so long as you are all from eligible universities.

A panel of academic and industry participants

Please do not add the dataset to a public GitHub repo. While we understand that you may prefer to initially keep the repo private, your GitHub repo will eventually need to be public. You should feel free to use a .gitignore file to specify that Git not track the data folder.

Please direct questions to Esther Berzunza, Program Manager, CANSSI Ontario: esther.berzunza@utoronto.ca.

The submitted model will be judged against a set of races not in the provided dataset.

The Performance Award will be based on model performance (although this is typically associated with being strong in other aspects also).

Noting that undergraduates are eligible for both categories outright, the undergraduate award itself was envisaged as a Performance Award.

The quality of the winprobability output variable will be judged through the logloss evaluation metric. Please note that winprobability must sum to 1.0 / 100% at RaceID level.

We will be comparing the probabilities to what actually happened.

In general, we are not planning to provide more information about the dataset, beyond that already provided. However, we recognize that ClassRestriction may be unfamiliar. The field contains a mix of the race type along with any restrictions. Students could try parsing them for common strings and then one-hot encode etc.

We will provide a submission form for you closer to the deadline.

The forecasts are evaluated based on the winprobability variable, which will be compared with the outcome.

No, you should consider all.

In general, we are not planning to provide more information about the dataset, beyond that already provided. However, we recognize that some of these variables may be unfamiliar. Students could try parsing them for common strings and then one-hot encode etc, or looking at the guide that was provided and recordings of past races to back out some of the variables.