IISE Data Analytics and Information systems division (dais) student data analytics competition
The Data Analytics Competition is an annual student competition organized by the Data Analytics and Information Systems (DAIS) Division of the Institute of Industrial and Systems Engineers (IISE). This year's competition is Nursing Home Time-Series Data Prediction Triathlon. The main objective of the competition is to provide the students with the opportunity to learn, showcase, and enhance their data analysis and visualization skills through working on real-world problems with
real-sized data sets.
Problem Description - Time Series Prediction of CNA Staffing Hours
Certified nursing assistants (CNAs) constitute the largest segment of the direct-care workforce in U.S. nursing homes and account for a substantial proportion of daily labor expenditures. As CNAs provide the majority of hands-on nursing care to residents with chronic illness or functional impairment, maintaining adequate CNA staffing is essential for ensuring continuous, high-quality care and stable facility operations. However, CNA staffing levels cannot be modified in real time, and adjustments to hiring, scheduling, or agency contracting require advance planning. Consequently, nursing home administrators must be able to generate accurate forecasts of daily CNA staffing hours. Accurate time-series predictions of CNA staffing hour are integral to effective workforce planning, cost management, and overall operational decision-making within nursing home settings.
The Centers for Medicare & Medicaid Services (CMS) provides public data sets, known as Payroll Based Journal Public Use Files (PBJ PUFs), containing daily nursing home staffing levels (often measured by staffing hours) and resident census data. Nursing homes must submit accurate staffing information, including agency and contract staff, through the PBJ system, based on verifiable payroll data, in a format specified by CMS. Facilities report the number of hours each staff member is paid to work each day. The quarterly PBJ data files are available beginning with data from the first calendar quarter of 2017. New data files will be uploaded to data.cms.gov every quarter. The public use files report information on staffing hours for each day in the quarter. The staffing data in the PBJ PUFs is aggregated to the facility-day. This means that all included facilities have one record (or row of data) for each day in a quarterly file. For more information, please visit
https://data.cms.gov/sites/default/files/2023-06/PBJ_PUF_Documentation_July_2023.pdf
Two-Phase Competition. The competition will contain two phases of judging based on the prediction results submitted. From phase 1, we will select four finalists. From phase 2, we will determine among the four finalists a first-prize winner, a second-prize winner, a third-prize winner, and a fourth-prize winner.
In addition to the award certificate, cash award will be provided.
Competition Rules
- Training data set. CNA staffing data files of multiple nursing homes from different capacity-size groups in Indiana, Florida, and Connecticut (where the three competition committee chairs live) will be released to participating teams by January 24, 2026. These files will include data collected for the full calendar year of 2024. The names and identities of the selected nursing homes will not be disclosed.
Prediction events.
- Phase 1: Using the provided training data, each team must independently develop time-series prediction models to forecast daily CNA staffing hours for another nursing home in each of the capacity-size groups. These other nursing homes will be randomly picked by the competition committee chairs. The prediction task will focus on forecasting CNA hours in the first quarter (Q1) of 2025. The types of machine learning methods allowed in this competition will be limited primarily to tree-based methods and neural network–based methods.
Prediction performance will be evaluated comprehensively based on the metrics specified
here.
- Phase 2: Q1 2025 data for both the training nursing homes and the Phase 1 test nursing homes will be released to the four finalists. Each finalist team may use these additional data, along with any publicly available external data sources if needed, to refine the existing prediction model developed and forecast CNA staffing hours of Q2 2025 for a single nursing home in each capacity-size group.
Prediction performance will be evaluated comprehensively based on the metrics specified here.
- Phase-1 Judging. After teams submit their Phase 1 results, the competition committee chairs will evaluate model performance by the randomly selected additional nursing home from each capacity-size group to serve as the test data set. These additional facilities will not be disclosed to the teams in advance. Phase 1 therefore assesses the temporal predictability of the developed prediction tools.
- Phase-1 Scoring. All teams will be sorted based on the prediction performance in different categories. The team that has the worst prediction in a category will be given 1 point. Then every team ranked one spot ahead will be given 1 more point until the top 3 high-performing teams, who will be given 2 more points than the team ranked one spot lower. The total score will be tallied for each team to determine the ranking.
- Phase-2 Judging. The committee chairs will schedule meetings with each finalist team during the annual conference to run their finalized models on newly selected nursing homes from each capacity-size group for Q2 2025. Identical to Phase 1, other nursing homes for testing will be randomly picked by the competition committee chairs. Phase 2 therefore evaluates the temporal generalizability and robustness of the developed prediction tools.
- Phase-2 Scoring. The four semi-finalist teams will be sorted based on the performance metrics with the same set of rules. Specifically, the four teams will be given 10, 7, 4, 1 based on their ranking in each category of the prediction performance. The total score of each team will be the sum of their scores over all categories. The ranking based on the total score will carry 60% of the weight. The other two items will be the final report (20%) and the final presentation (20%).
Eligibility
- Individuals or teams of a maximum of four members (no post-doc/faculty allowed).
- Student members must be either undergraduate or graduate students from higher education institutes in the field of Industrial & Systems Engineering or related fields.
- A team must submit a notice of intent (NOI) to be eligible for the competition participation. All student members should be enrolled at the time of the submission of the NOI.
- At least one of the team members must be an active member of the Data Analytics and Information System (DAIS) division of IISE in Year 2026.
Competition Process
Notice of Intent (NOI)
A team must submit a notice of intent to participate in the competition via email to the chairs of the competition committee by Saturday, Jan. 17, 2026. The notice of intent needs to include:
- The list of names of team members, their affiliations, and contact information (email and phone).
- One team member is identified as the main contact.
The competition committee chairs will share the initial data sets of 100 nursing homes through an email to the main contact provided by each team on Saturday, Jan. 24, 2026.
Submission and Judging of the Results for Phase 1
For Phase 1, participating teams or individuals are required to submit their predictions via email to the chairs of the competition committee by Saturday, Feb. 28, 2026. The competition committee will evaluate the submissions, and a maximum of the top four teams will be selected as finalists. More details about the submission guidelines and review criteria will be released along with the initial data set. The committee chairs will judge each team in the first week of March. The four finalist teams will be notified by Saturday, Mar. 7, 2026.
Submission and Judging of the Results for Phase 2
For Phase 2, the finalist teams will be given Q1 2025 data for both the training nursing homes and the Phase 1 test nursing homes on Tuesday, Mar. 10, 2026. This will give them more real data to refine their time-series prediction models for Q2 2025 staffing hours forecasting. The finalists are required to submit 1) source code and scripts and 2) a final report via email to the committee chairs by Tuesday, May 5, 2026.
Final Presentation
The selected finalist teams and individuals will present the model motivation, model approach, and results at the 2026 IISE Annual Conference & Expo, May 16 – 19, 2026, Arlington, TX. More information on the final report and presentation contents will be provided to the finalist teams.
Evaluation Process
Approval of the Notice of Intent
The competition steering committee will review and approve or reject the submitted notices of intent to participate based on the eligibility criteria. Approval emails will be sent together with the dataset by the chair of the committee by Saturday, Jan. 24, 2026.
Notification of Top Finalist Teams
The competition committee will select a maximum of four finalist teams. Finalist teams will be notified by email by the chair of the committee by Saturday, March 7, 2026.
Announcement of Winners
The selected finalist teams will run their codes, produce their results, present their products in 2026 IISE Annual Conference & Expo, May 16 – 19, 2026, Arlington, TX. A blind vote cast by invited judges will decide the winner after the presentations.
Important Dates (Deadlines)
- Notice of Intent: January 17, 2026
- Competition challenge datasets are made available: January 24, 2026
- Deadline for submission for Phase-I judging: February 28, 2026
- Notification to the finalist teams: March 7, 2026
- Competition challenge dataset for phase 2 made available to the finalist teams: March 10, 2026
- Deadline for submission for Phase-II judging: May 5, 2025
- Final code checking and presentation at the IISE Annual Conference & Expo: May 16- 19, 2026
Recognition
- Recognition at the DAIS Town Hall Meeting
- Certificate provided by IISE (either mailed or given at the town hall meeting) for the 1st prize winners.
- 2nd and 3rd prize winners will receive a digital certificate.
- Recognition in ISE magazine
- Recognition on
DAIS webpage and in the newsletter
Competition Chairs
Nan Kong, Purdue University
Mingyang Li, University of South Florida
Narjes Sadeghiamirshahid, University of New Haven
Conflict of Interest
- Societies and divisions must follow standard conflict of interest guidelines. Those guidelines include, but are not limited to:
- Officers and Board members of the S/D should be ineligible for awards during the period of their service, without approval by the Senior VP for Technical Operations (SVP). Exceptions may only be made by the SVP when awards are time sensitive (i.e., a Student Best Paper Award for a Student Board Member, when the student is graduating), and the impacted board member(s) or officer(s)
must recuse themselves from the award process. For awards that are not time-sensitive, the nominee should wait until their Officer or Board service is complete to be nominated.
- The awards committee (or judging committee) should not include members who have either a personal or professional relationship with the nominees. For example, a faculty member should not be judging a paper competition where a student from the same university
is a nominee for best student paper award.
- The awards committee (or judging committee) should actively change its membership on a rotating basis from year to year to ensure fairness, equity, and diversity. That is, some members of the judging committee should roll off the committee and new members should roll on.
Questions, contact
Amy Straub at IISE.
2025 Winner
1st Place
Edgar Castillo, Bennett Frohock, Mahla Hosseini, and Hamed Hemmati
Oklahoma State University
2024 Winner
1st Place
Hairong Wang, Lingchao Mao, Zihan Zhang
Georgia Institute of Technology