2013 Projects

During the program, our fellows work on data science projects in health, education, energy, transportation, and more.

Projects for the 2014 program will be announced soon. We are busy working with our project partners to define and prepare for the projects.

Our program is intensive and hands-on. Fellows work directly with nonprofits, local governments, and federal agencies, learning about what they do and tackling data problems they face.

We're strong believers in open knowledge and open source. We blog extensively about our projects, and make much of our research and code open to everyone.

City of Chicago , Chapin Hall at the University of Chicago - Predictive Analytics for Smarter City Services

From transportation to potholes to graffiti to abandoned buildings, city governments are tasked with innumerable responsibilities to keep their residents moving and their neighborhoods safe and thriving. But cities are vast places - Chicago alone has around 28,000 city blocks - where multifaceted relationships are the norm.

To operate effectively on this scale, municipalities have always relied on city residents to point out where the problems are. Historically, problem reporting was informal and decentralized - a face-to-face meeting or a call to an official.

But in 1999, Chicago changed all that by adopting a comprehensive 311 system that collects information centrally and electronically: residents call a single number to report broken streetlights, unsanitary restaurants, and dozens of other non-emergency problems. As of last year, residents can now submit reports online.

This system streamlined how government receives and responds to issues. And it created a wealth of data about the location and type of problems all across the city. That’s some Big Data.

Today, the City of Chicago is harnessing this data to make city services even smarter. By using predictive analytics, they’re beginning to react to problems faster - and to anticipate them before they happen.

The University of Chicago and Chapin Hall are excited to work with the City to build on this analytics work. Our fellows will analyze 311 data, predicting when and where graffiti, potholes, and other problems are likely to occur. Working with Chapin Hall at the University of Chicago, fellows will build analytics models that could eventually be used to prevent problems and deliver city services more proactively.

Report: in-depth, technical documentation Code: open source code on Github

City of Chicago Dept. of Transportation - Predicting when Divvy bike share stations will be empty or full

The City of Chicago just launched Divvy, a new bike share system designed to connect people to transit, and to make short one-way trips across town easy. Bike share is citywide bike rental - you can take a bike out at a station on one street corner and drop it off at another.

Popular in Europe and Asia, bike share has landed in the United States: Boston, DC, and New York launched systems in the past few years, San Francisco and Seattle are next.

These systems share a central flaw, however: because of commuting patterns, bikes tend to pile up downtown in morning and on the outskirts in the afternoon. This imbalance can make using bikeshare difficult, because people can’t take out bikes from empty stations, or finish their rides at full stations.

To prevent this problem, bikeshare operators drive trucks around to reallocate bikes from full stations to empty ones. But they can only see the current number of bikes at each station - not how many will be there in an hour or two.

We’re working with the City of Chicago’s Department of Transportation to change this: by analyzing weather and bikeshare station trends, we’ll predict how many bikes are likely to be at each Divvy station in the future.

By using predictive analytics, Divvy staff will be able to rebalance bikes proactively across the system, ensuring there’s always a bike there when you need it - and making bike share a first-class mode of transportation in Chicago and beyond.

Blog post: overview of problem and solution Report: in-depth, technical documentation Code: open source code on Github

Chicago Transit Authority - Simulating better bus service

The Chicago Transit Authority (CTA) runs the nation’s second largest transit system. Its buses and trains agency move a ton of people - around 1.6 million trips are taken each day. CTA also gather lots of data: they know where buses and trains are in real-time and how many people get on and off at each bus stop.

We’re building transit planning tools that help CTA better predict the impact of a service change on a route - and all connecting routes - before deploying a single vehicle.

We’ll use CTA’s bus GPS and passenger count data to simulate future demand at every stop in Chicago, and predict how well transit service is likely to perform under a particular schedule change. We’ll also map how different schedules affect Chicagoans’ ability to get around. The goal is to use cutting-edge simulation to enhance the CTA’s ability to plan bus and rail service.

Blog post: overview of problem and solution Report: in-depth, technical documentation Code: open source code on Github
Public Safety

Chicago Police Department , University of Chicago Crime Lab - Predictive analytics of crime

Chicago’s Police Department helped pioneer predictive analytics in policing during the early 2000’s. Now they are looking to take data-driven police work to a new level.

Fellows will work with the CPD and the University of Chicago’s Crime Lab to analyze the agency’s extensive crime data and develop new algorithms that can detect emerging crime problems. At the end of the program, the CPD should have improved daily crime forecasts they can use to deploy police officers more effectively across the city.

Crime Lab works with local governments to experiment with new crime-fighting programs and measure whether they work.

Report: in-depth, technical documentation Code: open source code on Github
Economic Development

Cook County Land Bank , Institute for Housing Studies - Abandoned property analytics tool

The Great Recession flooded Chicago with foreclosures, deepening the blight in many of the city’s economically distressed neighborhoods. Nearly 10% of Cook County’s housing units are vacant, according to the last Census.

To turn the tide in these communities, County Commissioner Bridget Gainer and President Toni Preckwinkle championed the creation of the Cook County Land Bank which was passed by the Cook County Commission in January, 2013. The land bank will return vacant and foreclosed properties to productive use - either as rental units, owner-occupied housing, or open space.

In partnership with the Institute for Housing Studies at DePaul University, we’re building a cutting-edge, open source analytics tool to help the land bank make informed policy decisions about which properties acquire and redevelop. To do that, we’ll analyze Cook County’s real estate market and assemble data about property sales, foreclosures, building inspections, zoning, and much more.

Blog post: overview of problem and solution Report: in-depth, technical documentation Code: open source code on Github

Mesa Public Schools - Getting kids into college

With over 63,000 students, Mesa Public Schools is the largest school district in Arizona. It serves the City of Mesa, a suburb of Phoenix. Around 75% of Mesa students who start ninth grade end up graduating high school, right in line with the national graduation rate.

Although 95% of these graduates intend to go college or other training, only 68% attend within two years of graduation. Six years later, only 26-28% of graduates have a two or four year degree. That’s a lot of dreams deferred or denied.

Many of these students are ready for college but simply don’t apply. Or, they get into college but don’t enroll, or enroll but do not complete a degree. Others are not prepared for college. We know, however, that all of them need post-secondary training for their future success.

We’re going to analyze Mesa’s education data - including students’ classes, grades, test scores, college attendance, and more - in order to identify these college-ready candidates.

We’ll also identify students who are aiming too low - that enroll in a two-year degree when they could be attending a four-year university - and graduates who show up to college on the first day but never collect their degree.

The goal is to empower Mesa Public Schools to identify students likely to get off-track and to target these individual students with the support they need to enroll in college, graduate and embark on a fulfilling career.

Report: in-depth, technical documentation Code: open source code on Github

Lawrence Berkeley National Laboratory , Agentis Energy - Predicting building energy savings

Energy efficiency is supposed to be the low hanging fruit of clean energy. But few people are investing in building energy retrofits. This is because the potential energy savings vary wildly by building, so the return on investment of fixing up property is highly uncertain.

The Lawrence Berkeley National Laboratory - a scientific research facility funded by the Department of Energy - wants to use data to help businesses and homeowners understand their how much less energy their building could be using with the right modifications.

Fellows will analyze energy data from Agentis Energy on thousands of buildings across the United States, using Berkeley Lab’s building fingerprint tool to predict future energy savings for different kinds of buildings. The goal is to make it possible for private investors to fund energy efficiency projects at scale.

Blog post: overview of problem and solution Report: in-depth, technical documentation Code: open source code on Github

Environmental Defense Fund - Predicting building energy loan performance

Besides the difficulty of predicting the energy savings that are likely to result from a building retrofit, there’s another key obstacle to large-scale investment in building energy efficiency: investors lack clear data on past retrofit loans. As a result, they have a hard time predicting whether a building owner is likely to pay back a loan, and are less likely to invest in viable projects.

We’re partnering with the Environmental Defense Fund’s Investor Confidence Project and the Clean Energy Finance Center to help the financial industry make better energy efficiency loans. Since energy efficiency investing is in the early stages, there is limited historical energy loan data, and what does exist has never been brought together.

Fellows will collect loan and energy performance data from leading energy efficiency finance programs across the country. They’ll analyze this data to understand how a loan’s underwriting criteria and a building’s energy savings affect the loan’s performance.

Report: in-depth, technical documentation Code: open source code on Github

NorthShore University HealthSystem - Using electronic medical record data to predict better health

Electronic medical records (EMR) promise to transform our understanding of patients’ ailments and improve their care. NorthShore University HealthSystem in suburban Chicago has been a national leader in the implementation of EMR systems for the past decade. It is the first healthcare provider to be awarded the highest level of EMR deployment for both inpatient and outpatient care. This remarkable effort has generated much anonymized data available for innovative analytics research.

We’ll work with NorthShore scientists to tackle up to three distinct medical problems:

  1. Childhood obesity: Growth charts are percentile curves that illustrate how kids’ height and weight change during childhood. Surprisingly, these growth curves are one-size-fits-all: there’s just one version for each gender. We will build personalized growth curves, allowing doctors to detect childhood obesity earlier and enabling them to intervene early.

  2. Code blue: when a patient goes into cardiac arrest, medical staff issue a code blue alert. Staff stop what they’re doing and attend to the patient and yet less than 80% of victims survive. We are working on predicting these medical crises before they happen, allowing doctors and nurses to intervene before patients have cardiac arrest.

Report: in-depth, technical documentation Code: open source code on Github

Nurse-Family Partnership - Tracking the impact of early childhood health programs

When a young woman becomes pregnant before she’s ready, the risk factors for her and her baby escalate. Too often, the result is poverty, instability, and despair.

Nurse-Family Partnership intervenes by pairing specially trained nurses with low-income families. Expecting mothers receive nurse home visits from pregnancy until the baby is two years old. The result: more successful pregnancies, more stable families, and healthier kids that do better in school and life.

NFP’s approach is based on successful randomized controlled trials. They’ve been around for many years and have programs throughout the country. But as federal health care reform invests millions of dollars into home nurse visits, they’re being asked to quantify their recent impact on child health and family stability.

We will measure their effectiveness by analyzing three years of data on NFP’s interventions and family outcomes.

Blog post: overview of problem and solution Report: in-depth, technical documentation Code: open source code on Github

Case Foundation - The Giving Graph - grassroots philanthropy meets social networks

Every time you log on to Facebook, Spotify, or Amazon, these tech companies are building a network of who you know and what you like. They use this “social graph” to connect you to new people and products. Retailers and marketers are beginning to use it to target ads. What if the social sector could use this graph to connect people to the causes they are passionate about?

The Case Foundation’s Giving Graph would be a new layer on the websites you already use that would make giving, volunteering, and advocating for causes a seamless part of your online social life. The idea is to give organizations working for good the same edge as companies trying to sell you stuff.

To test this idea, we’re building a proof of concept system for the Case Foundation, a philanthropy that promotes everyday giving and civic engagement.

Report: in-depth, technical documentation Code: open source code on Github
Disaster response

Ushahidi - Smarter crowdsourcing for crisis maps

During natural disasters, social upheavals, and contested elections, it can be hard to know what’s going on. Information becomes scarce due to damaged infrastructure, popular unrest, or a silenced official media.

Tech nonprofit Ushahidi is tackling this problem by harnessing the power of the crowd. They amass field reports from SMS and social media, and map them to give governments, aid agencies, election monitors, and journalists a real-time picture of what’s happening on the ground.

Before a report can be mapped, volunteers must assign a category (e.g., “need food”) and location (“at State St and Lake St”) to it. As the number of reports grows, volunteers are spending too much time doing this manual processing, distracting them from more critical tasks like responding to the messages or vetting their accuracy.

We’re using machine learning to build a smarter review system that learns from volunteers as they categorize reports. Our tool will speed up data processing during emergency situations, reduce volunteer burnout, and empower governments, election monitors, and other responders to spot and address emerging problems more quickly and efficiently.

Blog post: overview of problem and solution Report: in-depth, technical documentation Code: open source code on Github

Qatar Computing Research Institute - Measuring disaster damage with tweets

In our connected world, information about natural disasters and other emergencies often appears on social media faster than on traditional channels. But gleaning useful knowledge from this flood of tweets and status updates is tough. We need better methods to find the needles in this haystack of unstructured text.

This experimental project aims to answer a difficult question: how well can social media be used to detect the damage wrought by natural disasters?

We’ll attempt to correlate tweets about infrastructure damage or human casualties during Hurricane Sandy (and other recent disasters) with hard data about where and when destruction actually took place. The goal is to explore to what extent social media might allow earlier and cheaper damage assessment, and faster and better targeted disaster response.

This work is part of the Qatar Computing Research Institute’s Artificial Intelligence for Disaster Response project, which will help the United Nations, the Red Cross, and other emergency responders glean actionable information from social media. We’re collaborating with QCRI’s Social Innovation initiative, whose mission is to tackle large-scale computing challenges that have positive social impact in the Gulf region and beyond.

Report: in-depth, technical documentation Code: open source code on Github