CANSSI Ontario Statistical Software Conference
Statistical software is an increasingly important aspect of the work of statisticians and data scientists.
CANSSI Ontario and the Faculty of Information, at the University of Toronto (U of T), are excited to host the CANSSI Ontario Statistical Software Conference. This is a one-day conference bringing together academic and industry participants to share best practices on developing statistical software, exchange ideas for what is needed, and show off their latest software advances.
November 10, 2022 | 9am – 6pm ET
November 10, 2022, Recordings.
- 8:50 - 9:00 am
- Opening Remarks.
- Rohan Alexander, Assistant Professor, University of Toronto.
- 9:00 - 9:15 am
- A new BART prior for flexibly modeling with categorical inputs.
- Sameer Deshpande, Assistant Professor, University of Wisconsin, Madison.
- 9:15 - 9:30 am
- An interactive operations research tool for field work site selection in forestry.
- Clara Risk, Ph.D. Student, University of Toronto.
- 9:30 - 9:45 am
- Lessons in knowing your audience: statistical software in quantitative social science.
- Monica Alexander, Assistant Professor, University of Toronto.
- 9:45 - 10:00 am
- Parameterized Reporting with RMarkdown.
- Lisa Lendway, Principal Healthcare Data Scientist, Blue Cross Blue Shield Minnesota.
- 10:00 - 10:30 am
- How to interpret and report estimates from (almost) any `R` model? A post-estimation workflow with `marginaleffects` and `modelsummary`.
- Vincent Arel-Bundock, Associate Professor, Université de Montréal.
- 10:30 - 11:00 am
- Towards Implementing Approximate Inference via Adaptive Quadrature.
- Alex Stringer, Assistant Professor, University of Waterloo.
Abstract: Adaptive Quadrature (AQ; including the Laplace approximation) is used to approximate intractable likelihoods and posterior distributions. Implementing new methods based on AQ is substantially aided by the vast landscape of modern open-source statistical software for scientific and numerical computation. I review this landscape in the context of developing software for fitting mixed models using AQ-approximate marginal likelihood.
- 11:00 - 11:15 am
- r2u: CRAN as Ubuntu Binaries.
- Dirk Eddelbuettel, Clinical Professor, University of Illinois Urbana-Champaign; Principal Software Engineer, TileDB.
- 11:15 - 11:30 am
- The Data (error) Generating Process.
- Emily Riederer, Senior Analytics Manager, Capital One.
- 11:30 am - 12:00 pm
- Censored: A tidymodels package for survival models
- Hannah Frick, Software Engineer and Statistician, RStudio.
- 12:00 - 12:15 pm
- A real data driven simulation strategy for selecting an imputation method for mixed-type trait data.
- Zeny Feng, Professor, University of Guelph.
Abstract: Missing observations in trait datasets pose an obstacle for analyses in myriad biological disciplines. Phylogenetic imputation methods have demonstrated improved accuracy over standard techniques, though results are dependent on both missingness proportion and mechanism. Previous studies of phylogenetic imputation tools are also largely limited to simulations of numerical trait data, with performance on mixed-type data not evaluated. Given the mixed results of imputation and the varied structure of real trait datasets, a framework for selecting a suitable imputation method for a given target dataset is advantageous. To select a suitable imputation method for a given target dataset, we invoked a real data-driven simulation strategy. Candidate methods included mean/mode imputation, k-nearest neighbour, random forests, and multivariate imputation by chained equations (MICE), and each of them with and without the inclusion phylogeny information. Suppose a mixed-type trait dataset of squamates (lizards and amphisbaenians; order: Squamata) is the target dataset, a complete-case data set consisting of species with nearly completed information were formed for the real data driven simulation strategy for the imputation method selection. In this talk, we will present a bioinformatic pipeline procedure that implement our strategy for selecting a best-suited imputation method for imputing the missing data presented in a targeted squamates dataset.
- 12:15 - 12:30 pm
- Reproducible papers in the life sciences using R.
- Ariel Mundo Ortiz, Postdoctoral Fellow, Université de Montréal.
- 1:00 - 2:00 pm
- The Happiest Notebooks on Earth.
- Alison Presmanes Hill, Director of Knowledge, Voltron Data.
- 2:00 - 2:30 pm
- Regression Graphics: Added-Variable and Component+Residual Plots.
- John Fox, Professor Emeritus, McMaster University.
- 2:30 - 3:00 pm
- An ecosystem of R packages to access and process Canadian data.
- Jens Von Bergmann, Founder, MountainMath.
- 3:00 - 3:30 pm
- Evidence-based practices for better research software.
- Ana Trisovic, Research Associate, Harvard University.
- 3:30 - 3:45 pm
- A Software for Clustering Three-way Count Data Using Mixtures of Matrix Variate Distributions.
- Anjali Silva, Data Analyst and Lecturer, University of Toronto.
Abstract: Three-way data structures are characterized by three entities, the units, the variables and the occasions. Matrix variate distributions offer a natural way to model three-way data and mixtures of matrix variate distributions can be used to cluster three-way data. A mixture of matrix variate Poisson-log normal distributions was proposed for clustering three-way count data. This work was implemented as a GitHub R package, which includes functions for data simulation, initialization, three approaches for parameter estimation, model selection, and visualization. Help documentation, including detailed examples, function documentation, and vignettes are provided. Testing, including for user inputs, with simulated and real data was performed. Advantages and challenges associated with this ongoing work will be discussed.
- 3:45 - 4:00 pm
- Converting R code into C++, Is it worth it?
- Osvaldo Espin-Garcia, Assistant Professor, Western University.
- 4:00 - 4:15 pm
- Thinking Big with Maps in R: Tips on Wrangling Large Vector Data into Interactive Maps.
- Silvia Canelón, Data Analyst, University of Pennsylvania.
- 4:15 - 4:30 pm
- cytosel: Interactive cytometry panel design using single-cell RNA-seq.
- Matthew Watson, Developer/Programmer, Lunenfeld-Tanenbaum Research Institute.
- 4:30 - 4:45 pm
- mverse: How the R package is designed to help students explore the multiverse.
- Michael Jongho Moon, Ph.D. Student, University of Toronto.
- 4:45 - 5:00 pm
- Switching between space and time: Spatio-temporal analysis with cubble.
- Sherry Zhang, Ph.D. Student, Monash University.
- 5:00 - 5:30 pm
- PyPhi: A toolbox for integrated information theory.
- William Marshall, Assistant Professor, Brock University.
- 5:30 - 6:00 pm
- No Designer Needed: How to Create Beautiful Reports Using Only R.
- David Keyes, Founder, R for the Rest of Us.
- 6:00 - 6:05 pm
- Closing Remarks