French Trot Horse Racing: Forecasting Competition

Calling all machine/statistical learning enthusiasts!

Join our latest forecasting competition and win exciting prizes!

Are you passionate about machine/statistical learning and eager to test your forecasting skills?

CANSSI Ontario is thrilled to announce a competition for Ontario university students to build the best models for predicting French Trot Horse Racing (a background document about the sport will be provided to registered teams).

Grab your teammates and take up the challenge to win a share of $10,000 in prize money!

Compete for glory (and money!)

💰 Performance Award: $5,000
Be crowned the champion by using historical data (that will be provided to registered teams) to develop a model that delivers the most accurate winner forecasts for French Trot over a validation dataset of three-months.

💡 Innovation Award: $3,500
Think outside the box! Impress the judges by exploring innovative concepts, adding innovative features, or telling interesting stories.

🎓 Best Entry by Undergraduates: $1,500
Undergraduates, showcase your talent and skills. Be judged in either category—performance or innovation—and win $1,500 prize specifically reserved for the best undergraduate entry.

All amounts are in Canadian Dollars.

Key Dates

September 22, 2023: Team Registration Deadline
Register a team consisting of one to three members, each of whom must be currently enrolled in a CANSSI Ontario partner university, by September 22, 2023.
September 25, 2023: Data Distribution
Once registered, the team leader will be sent a link to the dataset. Start exploring the data and dive into the French Trot world!
November 7, 2023: Submission Deadline
Fine-tune your model and finalize your predictions. Submit your final forecasts, using Submission Form that’s available from September 25, 2023, by November 7, 2023 to qualify for the grand prizes.

How to participate?

Form a team of one to three members all of whom are from a CANSSI Ontario partner university.
Register your team, using your university emails, by visiting the Registration Page and completing the registration process before September 22, 2023.
Use your expertise in R, Python or Julia (or a combination) to conduct analysis and build a powerful forecasting model. You are welcome to augment the training dataset with additional data, but it is not mandatory.
Build your reproducible model. Use your expertise in R, Python or Julia (or a combination) to conduct analysis and build a powerful forecasting model. You are welcome to augment the training dataset with additional data, but it’s not mandatory.
Develop notebooks and scripts. Organize your work using notebooks and scripts. Create a GitHub repository to store your code. Please don’t add the provided dataset to GitHub.
Submit your entry. As the competition deadline approaches, polish your model and finalize your forecasts. Submit your entry by sharing the GitHub repository link with the competition organizers, using the Submission Form that’s available on September 25, 2023. An entry consists of an end-to-end reproducible model that the judges can run themselves (after they add the data); an explanatory notebook talking about your submission; and forecasts for the three-month holdout test period.

Showcase your skills!

This is a chance to prove your skills in the world of machine learning and statistics. The skills that you develop in this competition are valuable in industry and would make a great case study to talk about in interviews. Whether you excel in performance, innovation, or are an exceptional undergraduate, the French Trot forecasting has something for everyone.

Frequently Asked Questions (FAQ)

Can I compete as an individual?

Yes, you are welcome to have a team of one.

Can I be part of a team of four or more?

No, teams are limited to a maximum of three students. We want to provide you with the chance to create a portfolio of work, and any more than three could dilute this.

What levels are eligible?

Undergraduates, graduate students, and postdocs are all welcome to compete.

Can teams be mixed in terms of level?

It’s fine to mix levels in the teams, but for the sake of fairness we can only consider teams that are entirely undergrads for the undergrad prize.

Are late registrations accepted?

Yes, you can register until October 6, 2023, although that would not leave much time to put together an entry!

Are late submissions accepted?

In the interest of fairness, we cannot accept submissions submitted after November 7, 2023.

Can I use Stata/SAS/etc.?

The use of open-source, free, software is an important part of reproducibility and so Stata and other tools are not appropriate. If you want to use a language other than R/Python/Julia, please get in touch early in the process.

Do I have to be in a statistics major/degree?

No. We welcome entries from students in any discipline at a CANSSI Ontario university.

Which universities are eligible?

Brock University; Carleton University; Wilfrid Laurier University; McMaster University; Queen's University; University of Guelph; University of Ottawa; University of Toronto; University of Waterloo; University of Windsor; Western University; York University.

Do all students need to be from the same university?

No. You can be from different universities, so long as you are all from eligible universities.

Who judges the Innovation Award?

A panel of academic and industry participants

When making the Github repo, should teams include the dataset in the Github repo?

Please do not add the dataset to a public GitHub repo. While we understand that you may prefer to initially keep the repo private, your GitHub repo will eventually need to be public. You should feel free to use a .gitignore file to specify that Git not track the data folder.

Who can I ask if I have more questions?

Please direct questions to Esther Berzunza, Program Manager, CANSSI Ontario: esther.berzunza@utoronto.ca.

How is the submitted model judged? Are there a particular (series of) French trot races its predictions are judged against?

The submitted model will be judged against a set of races not in the provided dataset.

Is the Performance Award purely objective? i.e. does the team with the most accurate forecasts win, regardless of the quality of explanatory notebook, etc.

The Performance Award will be based on model performance (although this is typically associated with being strong in other aspects also).

For the undergraduate award, who decides whether the award goes to a team who has been judged in the performance category or in the innovation category?

Noting that undergraduates are eligible for both categories outright, the undergraduate award itself was envisaged as a Performance Award.

How is accuracy of the winprobability variable being judged relative to outcome of the event?

The quality of the winprobability output variable will be judged through the logloss evaluation metric. Please note that winprobability must sum to 1.0 / 100% at RaceID level.

We will be comparing the probabilities to what actually happened.

Can you please provide more information about the dataset, especially ClassRestriction?

In general, we are not planning to provide more information about the dataset, beyond that already provided. However, we recognize that ClassRestriction may be unfamiliar. The field contains a mix of the race type along with any restrictions. Students could try parsing them for common strings and then one-hot encode etc.

Is the Submission Form accessed via the link provided in the previous email?

We will provide a submission form for you closer to the deadline.

Clarification on the criteria for determining a win in this French Trot Horse Racing project. If the horse is placed between 1st and 7th which can have a portion of the prize money considered a win, or must the horse secure 1st place considered a win?

The forecasts are evaluated based on the winprobability variable, which will be compared with the outcome.

Should we consider only the horses placed between 1st and 7th when calculating the winprobability variable?

No, you should consider all.

For variables “FrontShoes” and “HindShoes”, what do the values (0, 1, 2, 3) stand for? Similarly, would it be possible to get more indications for the “CourseIndicator”, “HandicapType”, “WideOffRail” and “NoFrontCover” variables, in terms of what the values stand for?

In general, we are not planning to provide more information about the dataset, beyond that already provided. However, we recognize that some of these variables may be unfamiliar. Students could try parsing them for common strings and then one-hot encode etc, or looking at the guide that was provided and recordings of past races to back out some of the variables.

Apply Online