2014 Projects

During the program, our fellows work on data science projects in healthcare, education, energy, transportation, and more.

Our program is intensive and hands-on. Fellows work directly with nonprofits, local governments, and federal agencies, learning about what they do and tackling data problems they face.

We're strong believers in open knowledge and open source. We blog extensively about our projects, and make much of our research and code open to everyone.

You can find our 2013 projects here.

World Bank Group - Prediction & Identification of Collusion in International Development Projects

The World Bank Group lends billions of dollars every year to fund large infrastructure projects around the globe. Project-related contracts are awarded to companies and entities via open and competitive bidding processes. Such processes can sometimes be subject to collusion and corruption risks.

Working with data on major contract awards and projects, we will help develop a model that predicts potential collusion cases. The model will look for anomalous patterns of bidding and spending and subsequently alert the organization’s Integrity Unit to take a closer look at potentially suspicious behavior.

Mentor: Eric Rozier

Fellows: Jeff Alstott, Dylan Fitzpatrick, Carlos Petricioli, Misha Teplitskiy

Blog post: overview of problem and solution

Chicago Public Schools - Student Enrollment Prediction for Budget Allocation

Each spring, Chicago Public Schools allocates $1.8 billion to the hundreds of public schools in its system. To determine where to distribute that money, CPS must predict next year’s enrollment for each school months ahead of time, then adjust budgets two to three weeks into the school year when the actual enrollment numbers are set. Large discrepancies between projected enrollment and the real numbers lead to large adjustments in funding, which can disrupt teachers and students.

We’re working with CPS to develop a better model that more accurately predicts next year’s enrollment for each school in the system. The project team will work with data from CPS on student, school, and staff attributes, as well as other data sources (including publicly available crime data, housing data, and economic development data) to develop a frequently-updating model that will lower the amount of money shuffled each school year, and reduce the number of schools that face major re-allocations of funding.

Mentor: Joe Walsh

Fellows: Vanessa Ko, Andrew Landgraf, Tracy Schifeling, Zhou Ye

Blog post: overview of problem and solution

Harris School of Public Policy , Sunlight Foundation - Text Analysis of Government Spending Bills to Understand Pork Spending

Government legislation is not designed for readability, and their volumes of text are not easily analyzed. Advocacy and research groups would like a way to digest bills quickly, filtering out the bureaucratic jargon and leaving the important details. The Sunlight Foundation is a nonpartisan nonprofit that uses technology to make governments more accountable. Their API for federal bills are valuable streams of legislative text that can be used for analysis given the right tools.

With Christopher Berry from the Harris School of Public Policy, we will develop these tools, transforming text into usable data for fast, in-depth analysis. The first use case will be spending bills; we will create a database of federal and state spending that identifies and organizes the what, where, how much, and who from legislature. Ideally, these tools will be universal, enabling organizations to search legislature for other topics as well.

Mentor: Joe Walsh

Fellows: Matthew Heston, Madian Khabsa, Vrushank Vora, Ellery Wulczyn

Blog post: overview of problem and solution

Pecan Street , WikiEnergy - Building Open Source Tools to Analyze Smart Meter Data

Millions of homes around the world are now equipped with “smart meters.” Data generated by these meters present the previously untapped potential to create economic opportunities and support the operation of a more distributed, resilient, and cleaner electric grid. Researchers and companies are only beginning to tap smart meter data to manage a smarter, more efficient electric grid and create energy management platforms and products that appeal to consumers.

Texas-based non-profit Pecan Street Inc. operates the world’s largest database of consumer energy information: WikiEnergy. This database is highly granular, including use measurements collected at 1-minute intervals from up to 24 circuits within the home. With access to this rich source of information and an algorithmic-based approach model, DSSG will develop new residential energy management tools. For example, a model may infer appliance-level usage patterns to identify wasteful appliances, provide savings advice based on future consumption forecasts, or use weather conditions to help homeowners optimize thermostat settings. These tools may be used connect consumers with the powerful data generated by smart meters, help move the needle on energy savings, and create products that improve consumers’ lives.

Mentor: Varun Chandola

Fellows: Philip Ngo, Miguel Perez, Stephen Suffian, Sabina Tomkins

Blog post: overview of problem and solution

Enroll America , Get Covered Illinois - Targeting the Uninsured for Health Insurance Enrollment

During the first open enrollment period under the new Affordable Care Act (ACA), over 8 million people obtained health insurance. But an estimated 13.4% of Americans remain uninsured, including over 1 million residents of Illinois. Enroll America is a nonprofit organization focused on maximizing the number of Americans who are enrolled in and retain health coverage. Get Covered Illinois is the official health marketplace for Illinois and is the federal partner responsible for leading all ACA education, outreach and enrollment efforts statewide. In its first year, Get Covered Illinois and its over 200 statewide grant funded partners enrolled over half a million Illinoisans in the Health Marketplace and ACA Expanded Medicaid program.

The two organizations are working together to help identify and engage Illinois’ uninsured population in preparation for the next open enrollment period, from November 15, 2014 to February 15, 2015. Using the databases of both of these groups, we will construct models of the best channels of communication and key messaging to reach the different subpopulations of the uninsured in Illinois and other states. These models will help inform their collective outreach strategies and support their continued efforts under the ACA to bring health coverage to all Americans.

Mentor: Tom Plagge

Fellows: Peter Landwehr, Diana Palsetia, James Savage, Sam Zhang

Blog post: overview of problem and solution

Nurse-Family Partnership - Predicting Success in Mother-Child Interventions

Young, low-income, first-time mothers and their babies often face dramatically increased risks to their health, education, and economic self-sufficiency. Nurse-Family Partnership (NFP), a national nonprofit organization, intervenes by pairing these mothers with specially-trained, registered nurses. Expectant mothers receive regular home visits from pregnancy until the baby is two years old. The result: healthier pregnancies, more stable families, and better developmental outcomes for children.

NFP’s approach is based on decades of research, and last year DSSG fellows helped NFP quantify its impact by combining its data with national demographic data to assess how nurse visits affect measures of early childhood development such as immunization and breastfeeding rates. Some local NFP agencies currently face greater demand than they can meet. With our models, we seek to identify mothers who will benefit the most from NFP’s programs and the most impactful timing of nurse visits. We hope to help NFP better understand their target population and personalize their services based on each mother’s needs.

Mentor: Young-Jin Kim

Fellows: Sarah Abraham, Jeff Lockhart, Sarah Tan, Rafael Turner

Blog post: overview of problem and solution

City of Memphis - Targeted Urban Investments to Improve Future Economic Outcomes

Cities must work within a limited budget, balancing revenues (taxes and fees) with the ability to deliver services and maintain infrastructure. Since 1970, the population of Memphis, Tennessee has increased by four percent while the geographical area has increased by 55 percent. The city would like to use data to make better policy and investment decisions around economic development and more productive delivery of services.

With data on city revenue and spending, we will help the City of Memphis develop a new system for informing policy and investment decisions. Our goal is to help the city map the productivity of its tax base and the related costs to deliver services. This will assist the city in understanding the return on past and future investments and in identifying more productive approaches to delivering services. If successful, the system could be implemented in cities across the United States and around the world to help governments get the most value out of their budget and find the best approaches for revitalizing blighted areas.

Mentor: Tom Plagge

Fellows: Alejandra Caro, Matt Conway, Ben Green, Robert Manduca

Health Leads - Improving Social Services Interactions

Medicine by itself cannot always provide us with health. If a sick child lives in a mold-infested house, asthma medication will only do so much. Medical issues are caused, prolonged, or exacerbated by lack of access to basic human needs such as food, transportation, or social services.

Health Leads envisions a healthcare system in which all patients’ basic resource needs are addressed as a standard part of quality care. In the clinics where they operate, physicians and other providers can prescribe food, heat, and other basic resources their patients need to be healthy, alongside prescriptions for medication. Health Leads advocates then help families navigate the complex social services landscape and provide ongoing support with frequent follow-up calls, text messages, emails, and research.

However, maintaining relationships with patients who live in an unstable environment presents a unique challenge for Health Leads. This summer, we will help them improve sustained engagement, determining why some clients are actively engaged, why others disengage, and what effective strategies Health Leads can pursue. Using the organization’s client engagement data, service provider data, and patient demographics, the DSSG team will find effective ways to reduce the dropout rate and increase responsiveness, exploring different methods of communication. By discovering strategies to improve patient responsiveness, the project will help Health Leads advocates strengthen their relationship with clients and ensure that the most needed resources are effectively delivered.

Mentor: Young-Jin Kim

Fellows: Chris Bopp, Cindy Chen, Isaac McCreery, Carl Shan

Chicago Alliance to End Homelessness - Effectiveness of Intervention Strategies on the Homeless

As many as 750,000 adults and children are homeless in the United States, and thousands spend any given night without a safe place to sleep in Chicago alone. To combat these grim statistics, Chicago created Plan 2.0 – A Home for Everyone, a progressive seven-year action plan that serves as the blueprint toward the vision of a city in which everyone has a home. The Chicago Alliance to End Homelessness manages the implementation of this plan in partnership with Chicago’s Department of Family and Support Services, and serves as the backbone organization for Chicago’s homeless services system, bringing together the essential components for creating housing solutions in Chicago.

DSSG Fellows will investigate the housing interventions prescribed by Plan 2.0, with special attention given to emergency shelter, transitional housing, and permanent housing interventions. The team will examine the effectiveness of these strategies for different demographic and geographic targets, supporting the goal of ensuring that all individuals and families in Chicago have access to safe, quality affordable housing, as well as the resources and support needed to ensure housing stability.

Mentor: Young-Jin Kim

Fellows: Chris Bopp, Cindy Chen, Isaac McCreery, Carl Shan

Chicago Department of Public Health - Targeting Proactive Public Health Inspections

Lead paint and leaded gasoline were banned in the United States in the 1970s because of the enormous public health dangers lead poses. In the decades since, it has become clear that even small amounts of exposure to lead during childhood can cause behavior or attention problems, learning difficulties, speech and language problems, reduced IQ and failure at school.

Over the past several decades, Chicago has made great strides in preventing exposure to lead. Even with this progress, there is more work to be done. In 2013, it is estimated that almost 9000 children in Chicago had been exposed to levels of lead that the CDC classifies as dangerous, and that most of this exposure happened in the home.

DSSG has partnered with the Chicago Department of Public Health to help find the homes that are most likely to still contain lead-based paint hazards. By building statistical models that predict exposure based on evidence such as the age of a house, the history of children’s exposure at that address, and economic conditions of the neighborhood, CDPH and their partners can link high-risk children and pregnant women to inspection and lead-based paint mitigation funding before any harm is done. This integrated and innovative system will ensure resources are used most efficiently, and ultimately will mean healthier Chicago children.

Mentor: Eric Rozier

Fellows: Joe Brew, Alex Loewi, Subhabrata Majumdar, Andrew Reece

Montgomery County Public Schools, Rockville, MD - Increasing Graduation Rates and Improving College Readiness for High School Students

To ensure that all students are on track for success beginning in the primary years, Montgomery County Public Schools in Rockville, Maryland built their own “early warning” model to identify students who are not making sufficient academic progress using data on grades, attendance, and other measures.

We will help the MCPS administrators validate and improve their model, using new sources of student-level data. In addition, we will look to broaden the objective to look at other measures of academic success.

The DSSG project will also build upon last year’s work with Mesa Public Schools, which sought to identify students not achieving their college potential. Each year across America, students consistently under-enroll in colleges – leading to lost college potential. With help from DSSG, MCPS hopes to improve the college going rate of its students and the caliber of institutions they attend.

Mentor: Ben Yuhas

Fellows: Everaldo Aguiar, Nasir Bhanpuri, Himabindu Lakkaraju, David Miller

Skills for Chicagoland's Future , CareerBuilder - Identifying Skills Gaps to Reduce Unemployment

Skills for Chicagoland’s Future is trying to make it easier for job seekers and employers to connect by providing resources for training workers who are qualified, but require one or two additional skills to meet job requirements. In cooperation with UST Global’s Step IT Up America initiative – which plans to train 1,000 minority women for IT jobs – Skills for Chicagoland’s Future will develop a training program to bring potential candidates back to the job market. However, it is unclear exactly what skills are relevant to include in their programs. Our goal is to use data provided by CareerBuilder to build a system that will allow us to search resumes and CVs for job applicants with skill sets that are similar or related to those required in a job description or posting, but might not use the same terms. This will also allow Skills for Chicagoland’s Future to understand what kind of training programs will need to be implemented and which applicants should be admitted to it.

Mentor: Varun Chandola

Fellows: Nadya Calderon, Scott Cambo, Christopher Lazarus, Raphael Stern

Mexico - Presidencia de la Republica - Reducing Maternal Mortality Rates in Mexico

The maternal deaths in Mexico from pregnancy, childbirth or postpartum complications have decreased from 89 deaths per 100,000 live births in 1990 to 43 in 2011. Despite this improvement, the rate of decline has significantly slowed and Mexico is not on track to achieve its Millennium Development Goal of reducing maternal mortality 75% by 2015.

Our goal is to identify those factors contributing to maternal mortality and determine what can be done to reduce it. In contrast to previous work, we will be analyzing trends at a more granular level. While our initial work will focus on municipalities and localities, our hope is to develop individual-level models of risk using all available data.

Mentor: Ben Yuhas

Fellows: Julius Adebayo, Nick Eng, Eric Potash, Layla Pournajaf

Blog post: overview of problem and solution

Conservation International - Using Sensor Data to Inform and Evaluate Environmental Initiatives

Conservation International (CI) is a non-profit organization that works to protect nature through scientific research and partnerships with communities, industry, and governments. A key aspect for evaluating the impact of conservation projects is to account for natural capital – ecosystem goods and services, such as fresh water, flood control, agriculture, and forest products.

The Tropical Ecology, Assessment and Monitoring (TEAM) Network, originally created by CI, is now a partnership among CI, the Smithsonian Institution, and the Wildlife Conservation Society. TEAM’s global network of scientists is collecting and distributing near-real-time data on trends in biodiversity, climate, land cover change and ecosystem services. In many sites, TEAM collects sensor data to understand the status of tropical ecosystems in terms of meteorological variables, vegetation, and wildlife in the tropical forest. When triggered by motion, camera traps deployed throughout these sites take a photograph and record the temperature.

To help TEAM get the most information from this sensor network, we will develop an algorithm for interpolating and extrapolating camera trap data to generate micro-climate information for protected sites. TEAM will be able to use the algorithm to better understand how micro-climate changes are related to the patterns of movement of vertebrate species that have a very narrow temperature window of comfort. The project will help CI and other environmental organizations understand how this high-resolution, spatiotemporal data can assist current and future monitoring initiatives.

Mentor: Varun Chandola

Fellows: Nadya Calderon, Scott Cambo, Christopher Lazarus, Raphael Stern