Will Cukierski and his fellow data scientists at Kaggle take the concept of crowdsourcing very seriously. In fact, and they’ll hold a contest for your company–and offer cash prizes-- to prove its effectiveness.
And many global organizations are signing on. In two years since Kaggle has offered its service, at least 40 companies, governments and researchers have ‘crowdsourced’ their toughest problems to Kaggle’s network of experts. The solution-seekers get answers and the problem-solvers get prize money for the winning model. Both sides seem to like the arrangement.
Cukierski, himself a data scientist with a degree in physics from Cornell University and a Ph.D. in biomedical engineering from Rutgers, told a group of MIT CDB faculty and guests recently about Kaggle and “how organizations are using predictive modeling competitions to surpass their best efforts.” His presentation was titled, Crowdsourcing Predictive Analytics: Using 50,000 Heads Without Losing Yours.
To be sure, Kaggle is not American Idol, nor is it aimed at just anyone looking for a part-time job. It does, however, adhere to tech executive and venture capitalist Bill Joy's adage that the best minds may not be working at your company. Kaggle's community is comprised of thousands of PhDs from quantitative fields such as computer science, statistics, econometrics, math and physics. “They come from over 100 countries and 200 universities. In addition to the prize money and data, they use Kaggle to meet, network and collaborate with experts from related fields.”
According to its Web site, “most organizations don't have access to the advanced machine learning and statistical techniques that would allow them to extract maximum value from their data. Meanwhile, data scientists crave real-world data to develop and refine their techniques.” Kaggle hopes to bridge the gap with its "flavor" or crowdsourcing as noted below:
Cukierski said that the contests are not intended to replace domain knowledge expertise, but to tap into resources not otherwise available. To me, it’s interesting to compare this discussion to another recent seminar presented by Jonathan Zittrain at MIT CDB. In his presentation of Minds For Sale, Zittrain noted that while it may be quite acceptable—and even “hyper-efficient marketing” -- to offer payment for people to solve everyday product problems, intellectual property (IP) and security concerns have to be addressed as well.
At Kaggle, IP ownership and contest rules vary with the sponsor, Cukierski said. Some, like banks, want to keep the results proprietary, while others are more open and transparent about the solutions they get. Kaggle also hosts private competitions where contributors must be invited to participate and are placed under nondisclosure agreements. I can see those options becoming even more popular given regulatory constraints and other restrictions that many businesses and public-sector agencies must address.
Here are some results that they have found with predictive models so far:
What are your thoughts about this crowdsourcing model? Is it one you would consider as a research organization? As a data scientist? Share your ideas with this community.