Building Consensus in Large Collaborations, Organizations, and Cooperatives through Deliberation

As groups grow in size, participatory decision-making becomes difficult. Can online platforms make it easier for large groups to deliberate and reach consensus?

My PhD research at the University of Michigan School of Information, supervised by Daniel Romero, focuses on large-scale decision-making in collective action. Groups like Wikipedia, free software projects, worker cooperatives, and social movements all rely on contributions of time and energy from a large number of people. Success often depends on how well contributors can reach decisions and work together. While reviewing the literature, I identified some common themes that set successful collaborations apart from the rest. Now I'm working on translating those themes into theory-driven tools and best practices, and I'm looking for groups to partner with as we co-design tools and run large-scale experiments. If you're part of a project or organization with an interest in participatory decision-making, send me an email at

For the interested, here's a little more background on my dissertation research.

Why Deliberation Matters

In a group setting, the back-and-forth of deliberation is not just a process of persuasion, but also of information diffusion, of idea generation, and of trust-building. But deliberation is much easier with 3 people than with 300. When an organization grows too large, it might switch to voting, and lose many of the benefits of deliberation in the process. At its best, voting can find an existing consensus, but it can't generate new ideas or build the trust needed for compromise. To make matters worse, there are mathematical results suggesting that the results of a vote can never fairly represent a group's preferences (for example: the Condorcet Paradox and Arrow's Impossibility Theorem). 
My work focuses on deliberation because it offers the potential to generate new solutions and actively build consensus before votes are cast, and before a decision is made. And when large groups can deliberate at scale, they can make full use of every contributor's resources to accomplish colossal tasks.

Interlocking Small Groups

Across the examples of large-scale collaboration I've studied, one theme stands out above others: most effective work and discussion happens in small groups. While Wikipedia has millions of editors, each article has only a handful of active contributors at any one time. Likewise, free software projects like the Linux kernel and the Drupal content management system are built from modules, with a small (or at least smaller) community of collaborators focused on each module. In social movement organizations, complex tasks are often delegated to groups with a range of names: committees, working groups, teams, circles, zones, or nodes.

While small groups can make coordination easier, they can also create silos and echo chambers-trapping good ideas and preventing them from being adopted. But in all of the examples above, contributors participate not just in one small group, but in many. These interlocking groups appear to be key to allowing information, ideas, and resources to flow from group to group across the entire organization.

This interlocking group structure seems important, but there are important open questions. In particular, does it matter how contributors are assigned to groups? Groups can be designed so that ideas can spread quickly across an entire organization, or spend time slowly developing within a cluster of groups. The experiments we're designing are intended to evaluate these methods in a real-world setting. With a better understanding of the implications of interlocking group structure, we hope that it will be easier for large organizations to use participatory decision-making.

Work So Far

In December, we invited 18 participants into our lab to deliberate on a policy for electric scooters on campus. They were presented with four options:

  1. Electric scooters should be banned from roads and sidewalks.
  2. Electric scooters should be permitted only on roads.
  3. Electric scooters should be permitted only on sidewalks.
  4. Electric scooters should be permitted on both roads and sidewalks.

Participants were randomly divided into groups of 4 or 5 and deliberated on the policy in an online chat room for 10 minutes. The participants were then assigned to new groups for another round of deliberation. In all, there were three rounds. We asked participants to rank their options before and after each round.

For each group in each round, we calculated which option would win a vote, using both majority vote and some alternative ranked-chioce voting methods (Borda count, and Tideman ranked pairs). The results are shown below.

 Group   Round   Majority   Borda   Tideman  
 ------- ------- ---------- ------- --------- 
    1       1          3       3         3  
    2       1      1,3,4   2,3,4         4  
    3       1          3       3         3  
    4       1          3       3         3  
 ------- ------- ---------- ------- ---------
    5       2          4       3         4  
    6       2          3       3         3  
    7       2          3       3         3  
    8       2          3       3         3  
 ------- ------- ---------- ------- ---------
    9       3        1,3       3         1  
    10      3          3       3         3  
    11      3          3       3         3  
    12      3          3       3         3

While #3 was the clear winner in most cases, other winners or ties appear for some rounds and methods. Option #4 appeared in two groups in the first two rounds, but none in the third. Option #1 appeared in one in the first round and one in the third. Even in this simple example, we can see #4 losing popularity over the course of the deliberation.
Looking at the chat logs, we saw evidence of some notable behaviors, including novel ideas:

In the last group, we prioritized bike lanes as the ideal place for scooters, and then sidewalks if no bike lane existed

… requests for more information:

I think there's a lot of missing information like safety and environmental impact that we don't have

… diffusion of ideas:

the most popular opinion in my previous chats have been the usage of electric scooters ultimately helps the environment

Oh interesting, never thought about the environment

… and opinions changing (or not):

I changed from both to only sidewalks at the first round.

The discussion didn't change my opinion much. The groups did bring up interesting points that I didn't think of before.

This study shows that deliberation in small interlocking groups has the potential to spread information and change opinions. We've also seen that we can use studies like this to measure changes in consensus quantitatively and corroborate those results with qualitative analysis of group discussions. Now, we want to apply these same methods to study real large-scale collaborations and conduct controlled experiments.

Next Steps

Currently, I'm looking for organizations (formal or informal) to partner with for the next stage of this research. Partners will use our tools to deliberate on real issues in their organizations and/or participate in co-design workshops to evaluate and improve the tools we're building. Interested? Say hi by sending an email to Speaking of tools, we're currently extending the free/open-source decision-making suite Loomio to support networked deliberation. If you'd like to follow along, the repository is here.

Crossposted to Medium. Thanks to Daniel Romero and Sarah DeFlon for feedback on a draft of this post.

Liveblog: Edward Sanders, From Prison to Paralegal

This is a liveblog taken at the University of Michigan School of Information on September 19, 2018. Any mistakes are my own.

Edward Sanders
with Kentaro Toyama
Information Alliance for Community Development

Sanders entered prison in 1975, at age 17. Convicted of first degree murder as an accessory to murder and sentenced to life.

Sanders had mentioned to his grandfather that he was going to try to make everything he could out of his time in prison. Had a third grade education going in. Had to give up the luxury of televison and radio, but saved to go to school. Started in remedial classes, then moved to GED programs. Immediately signed up for college through Macomb Community College and worked as a teacher's aide. He intentionally delayed graduation because he would have to stop taking classes after he graduated. Some prisons were more progressive than others, school was a way to get out of that environment for a while.

School wasn't intended for inmates sentenced to life. He would have to wait for someone with a lesser sentence to cancel in order to get into classes. Many inmates, including Sanders, wanted to learn more about how to defend themselves in courts.

Sanders joined the lifer law program or "lifers." It was run mostly by war veterans. Some used their GI payments to pay for attorneys to give lectures for inmates. Talked to peers who were writing their own legal documents. Many of them had been in prison multiple times due to addiction and had learned about the legal system.

Sanders looked forward to graduating with his Bachelor's degree, but had no hope of leaving to find a job. Instead he focused on helping his fellow inmates.

Kentaro asks how Sanders felt when he got out.

The first week was about getting back into society. UMich and WSU social work students worked with him. Three members of his "reentry" team brought his sisters to pick him up. The next couple days were spent getting ID, bridge card (SNAP), and registering to vote. Sanders stresses the importance of voting: "choose your own fate." The first day, his team took him out to lunch. Now he takes others out to lunch as a way to say thank you.

Sanders knew a former inmate who had gotten a law degree and passed the bar, but is not allowed to practice in Michigan. He looked at that situation to judge his own path forward.

Kentaro asks about digital technology.

In recent years, cell phones had started coming in. He says the best technology you could get in prison was a pen. They had typewriters and word processors, but they were taken away because they allowed the inmates to litigate too well. Inmates were given access to a law library after a difficult legal fight. Had computers, but no internet access. The law library was a safe haven for him.

In his first week out, his sister made him get an iPhone (now he uses an Android). He hasn't needed a dictionary since he got out of prison.

There were many challenges getting used to digital technology. Working with Kentaro Toyama and Finda Ogburu helped him overcome these challenges.

Sanders explains that the three worst institutions in society are slavery, war, and prison. They are traumatizing. There's an institutional bias built into prison. Guards don't like imates to be better educated than they are. Inmates had to "dummy down." Draws parallel to war. In war, some remain behind to get education. Education helps soldiers re-integrate into societiy. Sanders argues that it can do the same with prison. It's not easy for inmates to get jobs after being released. Now, you need to use the internet to apply to jobs. Inmates aren't allowed to get that experience.

When applying at McDonalds, he made a mistake and didn't submit the online application. He went in for the interview and they did it anyway. If he had applied online, his application would have been discarded without a human seeing it. By the time the interviewer got to asking about his felony, he'd made such an impression that he got the job.

There is a battle for computer literacy. Inamtes need it to be successful and reenter society. Anyone who has the ability to help can help reduce crime. He praises volunteers from universities who have helped him and other inmates. He explains that the real world experience also helps students gain a deeper understanding of information and improve their retention.

Audience Questions

Q: High school students are often preparing for a life among gangs. What would you tell them?

A: All of us can make decisions, but not all decisions are quality decisions. We need to learn to make quality decisions if we have the faculties to do so. He references the Koran: the worst thing is to have an ability and not use it.

Sanders ends by telling audience to make a difference, even if we've made mistakes.

Liveblog: FCC Commissioner Jessica Rosenworcel @UMich

This is a liveblog taken on 17 September 2018. Apologies for any inaccuracies

Jessica Rosenworcel, FCC Commissioner
Jack Bernard, Associate General Counsel, University of Michigan

Bernard: What is the FCC?

Rosenworcel: FCC oversees 1/6 of US economy.

Bernard: How does the FCC interact with the internet?

Rosenworcel: FCC authority lies with transmission: where there's a wire in the ground or a transmission in the air.

Bernard: Talked to campus community about NN. There is a wide range of views on what NN is. What is it?

Rosenworcel: Broadband providers have to treat traffic on their networks equally, so they do not discriminate based on source, destination, or content. You can go where you want and do what you want online, and your internet provider does not decide for you.

Bernard: What does equal mean?

Rosenworcel: Analogy to basic telephone network: you can call whoever you want. The telephone company can't tell you who to call or edit your conversation. Really talking about nondiscrimination.

Bernard: Different broadband providers provide different services and options. What are you really talking about?

Rosenworcel: Totally OK under net neutrality to choose how fast of a connection you pay for.

Bernard: How could the absence of NN allow broadband providers to undermine experience?

Rosenworcel: Since rollback of NN, broadband providers can block websites, throttle services, or censor content. Can approach entrepreneurs and charge them to access customers. Do they have technical capability? Yes: network management. Business incentive? Yes. When rights, capabilities, and incentives are aligned, behavior will emerge in the market.

Bernard: Broadband argues that internet was working fine. What is the need to pass regulation?

Rosenworcel: Competitive marketplaces are the best moderators of oversight. Broadband is not a competitive marketplace. You can't take your business elsewhere: there's nowhere else to go.

Bernard: Felt constrained by choices even with NN.

Rosenworcel: NN helps manage in absence of competition.

Bernard: What changed after rollback of NN?

Rosenworcel: NN says can't block, throttle, or censor. Rolled back over Rosenworcel's dissent.

Bernard: What was your experience as a dissenter?

Rosenworcel: People don't remember what you said, they remember how you said it. Have to make arguments in a principled way and repeat them again and again.

Bernard: How do you build collegiality?

Rosenworcel: Whatever disagreement we have is like a book on a shelf and we move onto the next volume. Always find something in common.

Bernard: Can you steel-man instead of straw man anti-NN position?

Rosenworcel: Want to give customers most options. Want to make sure there are financial incentives to support providers.

Bernard: We hear that NN is a barrier to investment and providing service.

Rosenworcel: We do have broadband challenges in rural US. Instead of having theoretical arguments about NN, would rather identify where the gaps in coverage are and plan how to fill them.

Bernard: Is it possible to advocate for industry incentives and NN?

Rosenworcel: Can do both at the same time. False choice to say it's one or the other.

Bernard: Can less than equitable service be better overall? Example: carpool lanes?

Rosenworcel: That assumes you have multiple lanes. We don't.

Bernard: If there was greater competition?

Rosenworcel: We could revisit.

Bernard: NN advocates argue that without NN there will be cartels causing content to be only avialable to certain providers. Is there evidence?

Rosenworcel: Yes, there is discussion that it's happening already. Even if conetent is only slowed down, people switch. That's an obstacle for entrepreneurship.

Bernard: Those in the room have internet access through academic network. Difficult to understand other perspectives.


Bernard: Americans now know the name of the FCC chair. What's that like?

Rosenworcel: Likes some anonymity, but it's good for the public. Collect pubic input. Have to figure out how to make issues accessible to allow a broader swath of Americans to participate.

Bernard: Did going against FCC cause a problem?

Rosenworcel: No one likes disagreement, but you have to stand up and do what's right.

Bernard: Pai said that California's proposal are flouting federal law.

Rosenworcel: The FCC is in a strange legal position regarding preemption. FCC argued they didn't have authority to regulate (Rosenworcel disagrees). But if FCC doesn't have authority to regulate, they don't have authority to preempt.

Bernard: What was Pai getting at? First amendment issue?

Rosenworcel: More about commerce clause.

Bernard: How would you summarize your stance?

Rosenworcel: You should be able to go where you want and do what you want without your provider choosing for you.

Bernard: Explain concerns over media consolidation?

Rosenworcel: Media has changed. There was a time when you got news from the morning newspaper and the evening news, and that was it. Modern news cycle is exhausting and stresses media company resources. Media companies have responded with consolidation. Rosenworcel understands, but favors competition. We have less local news now.

Bernard: Concern is that if too many outlets are owned by the same company there isn't enough diversity?

Rosenworcel: There used to be laws preventing, for example, one company from owning a radio station and a newspaper. Not any more. How do you maintain diverse viewpoints?

Bernard: What are the next steps? How should government step in?

Rosenworcel: National laws prevent companies from owning over a certain percentage of broadcast outlets. Most Americans still get their local news through TV and radio.

Bernard: Over the last year, the President has suggested revoking NBC's license. Purportedly this is a result of criticism of the administration. How realistic is that?

Rosenworcel: Trying to be diplomatic. A year ago, saw President's tweet, decided that so much is wrong with it, and tweeted a reply. The reply linked to the FCC manual on broadcast licensing. It's a story about what's to come: antagonism towards the news. Rosenworcel finds it troubling. Politicians criticising news is not new: Alien and Sedition act, Kennedy described the media as his enemy. What worries Rosenworcel is when the government uses its power to stop the media from reporting on abuses of that power.

Bernard: In the 2016 election, about 17% of people under 30 voted. People feel disenfranchised, not part of "we the people."

Rosenworcel: Doesn't have time for cynicism. Public servants have to be impatient optimists. It has in many ways never been easier to build a movement. Thinks as citizens, we need to use it. Nothing stopping everyone here from having a clear voice in Washington.

Audience Questions

Q: Last mile internet issue. 30 million Americans without reliable high-speed internet. What needs to be done to connect these people? Would gap exist if internet was treated more like a utility?

A: FCC estimates 24m without broadband, mostly rural. One thing we should do nationally, is map where broadband is and is not. We need to make it a citizen science project, crowdsource how many bars we have.

Q: How does NN relate to privacy?

A: FCC's ability to regulate privacy online was taken away. Hopes to align privacy policy across sectors of the economy (website, broadband, etc.)

Q: Do connected devices change discussion around connected vehicles? When will FCC decide between 5G and dedicated short range communication service (DSRC)?

A: Speed of change has been unimaginable and exciting. 5G was from 1999, which is now "old." NTSB expects it will take years to be able to deploy DSRC. Need to figure out what we can do in the meantime with what we have.

Bernard: Should we be narrowing the spectrum of any industry?

A: Spectrum is zoning in the sky. Certain frequencies were for different purposes. Now auctioned off for flexible use. Now, everyone wants some. Experimenting with milimeter band technology. Really need to get more creative with sharing. We can't expand the physics, but we can be more efficient with exisitng spectrum. Public safety uses have to be primary.

Bernard: Would that mean throttling?

A: Different services require different expectations.

Q: What's the future of municipal broadband?

A: In about half the states in the US, the state has prohibited it. Rosenworcel finds this rerettable. People are being left behind and need as many solutions as they can get.

Q: Based on the infrastructure analogy, what makes a toll to pay for cybersecurity different from a road toll? Shouldn't providers be able to take a cut of income from pornographic websites?

A: Control needs to be in the hands of the customers.

Bernard: We already don't have a lot of control. Will anything really change without NN?

A: Universities are deploying apps to try to detect altered internet traffic. Important to measure.

Bernard: Claims there's little evidence companies will tamper with traffic.

A: Some enforcement shifted to FTC. They address harm after it occurs in court. Not accessible to small entrepreneurs.

Q: Does FCC regulate fake news? Should internet service providers be able to?

A: FCC? No. Town square is largely digital. A lot of authority offered to online platforms. Granting similar authority to service providers would compound the problem.

Q (twitter): How does NN relate to corporate mergers?

A: Incentive for service providers to privilege content owned by the same company.

Q: Could net neutrality be offered for a fee?

A: Possible after rollback of NN regulations.

Q: What are some ways to encourage broadband competition?

A: We need to identify every possible way. Some are mundane but consequential. "Dig once" policies allow multiple companies to lay fiber when a road is torn up for only 1% additional cost to construction costs. Changes in policy and reducing bureaucracy to access utility poles. Biggest ways occur with technology change.

Q: How closely does the FCC work with network engineers?

A: Have an office. It's not big enough. Rosenworcel advocated for an engineering honors program to bring in young engineers from top universities. Need more onramps. Need more digital natives serving in government, who see opportunites in new technology.

Q: Is the internet a human right?

A: You do not have a fair shot at prosperity in the 21st century if you do not have access to the internet. Figuring out how to get more people connected at a high speed is crucial to civic future of the country.

What HOPE can do better

This year at HOPE, the biennial hacker conference in NYC, several incidents led to a statement of no-confidence in HOPE's code of conduct. You can read more in the statement itself, but the gist is that there were a small number of far-right attendees, including a self-described "nationalist", who were disrupting and intimidating both talks and attendees. The staff didn't handle it well. This is unfortunately consistent with my own experience. I know that the top-level organizers of HOPE genuinely care about creating an inclusive environment, but the current staff and procedures aren't working. Based on both my past experiences at HOPE and on my experiences creating inclusive spaces, here are some observations about where the process is failing and what can be done.

Trained Staff. CoCs need to be applied consistently to create an environment of trust and safety. Enforcement of the CoC at HOPE has been highly variable between staff members, with some seeming to think that it's optional. HOPE requires volunteer work from hundreds of people, but not all of them need to be trained to handle CoC complaints, only the points of contact, which leads to the next point...

Points of Contact. The CoC listed a phone number and an email address for making CoC complaints. If you are being physically threatened, the need is immediate and a phone call or email is not sufficient. Having a single person on call isn't sufficient either. The venue spans 18 floors of the Hotel Pennsylvania, where the line for the elevators takes about the same time as your flight to NYC. Anywhere there are attendees, there need to be identifiable, well-trained points of contact for immediate response to CoC complaints.

Intimidation. Over the past several events, the staff has shown a pattern of refusing to enforce the CoC until a physical assault has occurred, meaning that threats and intimidation have been treated as totally allowable. People who have been going to HOPE for a long time often describe it as feeling like "home." There's no way to have that feeling if you're being threatened or harassed. I understand the importance of free speech in the hacker ethos, but speech can be separated from intimidation. In the recent incident, one attendee bragged (on mic to an entire room) about attending an alt-right rally. That was intimidation, and should have been a sufficient CoC violation to get him removed from the event.

Validation. There has been much less attention to this point, but it's an important one: the effectiveness of a response to a complaint depends just as much on emotional connection as it does on the concrete outcome. Typically, when HOPE staff aren't able to act, they have minimized and invalidated the experience of complaining attendees. Unicorn Riot reported one staffer responding to a complaint by saying he wouldn't care even if an attendee came in with a swastika flag. Instead, just taking the time to understand and discuss a complaint can go a long way towards creating an environment of trust, even if no concrete action is possible.

I'm already looking forward to the next HOPE. A lot of attendees are questioning whether they want to attend again, but I believe in the organizers and in the majority of the community. There's a lot of work to do, but HOPE can and will fix this.

Social Capitalism

CW: refers to sexual assault

When high-profile individuals are outed for patterns of sexual assault, their friends and collaborators always seem to be completely surprised, unlike the other people who inevitably come forward as having been assaulted. Right now, the national dialogue is about Harvey Weinstein, but the same story seems to be repeating itself every few months. Closer to home, several high profile information security activists and researchers have recently been called out for sexual assault. Activists? Committing sexual assault?! The people who are supposed to spend their time and energy making the world better?!?! Yes, them. In fact, it happens so often, there's a book about it. So how is it that high-profile profile people can hide their abusive behavior from their friends, and convince the wider world that they're virtuous moral authorities? And how can it happen so very often? What if it’s not a coincidence? What if people make it to positions of power and visibility specifically because they are willing to take advantage of others? A certain type of high-visibility success is built on social capital, and the perverse incentives of financial capitalism apply just as well to social capitalism.

First, let me be clear about what I mean by “capitalism.” Capitalism is many things, but at its core, it’s a system where people who have resources (land, equipment, money) lend it to those who need it, and then take a cut of what they make. Capitalism is arranged so that the only way to be successful is make someone wealthy even wealthier. The same is true for social capital. Social capital is, as the saying goes, “who you know.” Social capital is introductions to influential people. It’s access to insider information. It’s going out for drinks with the people who make decisions. And just like in financial capitalism, the way to be successful is to help those who are already are, even if it’s at the expense of those who aren’t. This capitalist social dynamic holds even in the most progressive or anti-capitalist communities.

If you look at an industry like entertainment, you have powerful and influential people like Harvey Weinstein at the top, and a lot of folks competing for their favor. Let’s say you’re one of the folks in the middle, and you see someone powerful abusing that power. What do you do? Well, you could take a stand and call them out. In the best case scenario, they’re held accountable, but then you’ve lost any social capital you’ve built up with them. But even worse, so have your peers who were banking on that social capital, and a lot of them will blame you. And other powerful people are more likely to see you as a liability. Of course the powerless folks who were being taken advantage of will be appreciative, but they have nothing to offer you. Under social capitalism, those who speak up for the powerless fade into irrelevance, while those who turn a blind eye climb to the top. So is it really that surprising that the top is rotten with selfish amorality?

So what's to be done? It's tempting to "play the game" to try to change things once you're on top. Maybe it’s possible, but from where I stand, it looks like a long, hard journey that changes most people, and not for the better. So there's the catch 22. You can't change anything without power, and you can't get power without enabling the very behavior you're trying to change. As a sentient AI in a 80s hacker movie once said, "the only winning move is not to play." In my experience, there are people out there making a big difference by building alternative social systems rather than supporting corrupt ones, they just don't get a lot of press. It can be hard to tell the difference between someone who genuinely wants to do good, and someone who wants to be known as a person who does good. But if you want to spot a social capitalist, just look at how they treat people who don't have anything they want.

Re-Decentralizing the Web (Penguicon 2017)

"The internet has become increasingly centralized, with large corporations like Facebook, Google, Apple, and Amazon controlling the most-used services and placing all of us under surveillance. We’ll discuss FOSS tools that can help replace centralized corporate services with community-based alternatives." (slides)

Decentralized Organizing (Penguicon 2017)

"Groups like Occupy in the US and Podemos in Spain have created large-scale movements without a traditional, top-down leadership structure. But are they successful? (Spoiler: yes, just in different ways.) These “leaderless” -- or more accurately “leaderful” -- movements have been able to grow quickly and create large-scale demonstrations without needing grants or full time staff. How’d they do that? Come learn techniques you can use, and how they can be applied within the free/open software community." (slides, sticky notes)

How to start a fire

Somehow, you've found yourself deep in the woods, and a storm is coming. It's dusk, the rain has started, and you know it's going to get a lot darker and a lot colder. So you and your companions decide to make a fire. You've seen other people do it. How hard could it be? You pile some sticks and logs together, throw a match on them, and watch a tiny flame catch, smolder, and disappear. Watching the tiny wisps of smoke fade away after a few more false starts, you come to understand the popularity of lighter fluid. But you don't have any, and you're out in the cold, so you resolve to figure it out with what you have.

You're surrounded by trees, but you realize not just any wood will work. The sap-filled green branches all around you thrive on the rain. They will not burn, they have no reason to. But the soggy, trampled deadfall at your feet won't do either. It has no fire left to give. So you seek out standing dead trees and begin snapping off the dry twigs and bark. For the most part, you collect the smallest you can find, although you don't shy away from larger branches if they look well-seasoned and ready to burn. You collect as much as you think you need, and then a few times more than that. And then you collect more still. Your friends look on skeptically. How will such small twigs be of any use? How much time are you wasting when it will take large logs to keep you warm?

You place a patch of bark on the wet ground, and begin arranging your fuel. First the smallest twigs and scraps of bark. You even take your pocket knife to a twig and meticulously shave paper-thin strips into a curly mass and place it in the pile. Then, a few bigger pieces. Still twigs, but bigger. The twigs are woven together closely, with just enough space to breathe comfortably. You add some bark and larger twigs around the outside to keep the wind out and to keep the heat in. By this time, your friends are upset that you've left them standing in the rain while you play with twigs.

Finally, you ask for a match and slide it under one side of the pile. Then a few more on different sides. Some small shavings catch, burn away, and smolder while the flames move along the matches towards the center. For a minute, it looks as if nothing is happening... then wisps of smoke start to rise. A tiny flame appears on one small twig shaving. It spreads, flickers, then dies, leaving a thin stream of smoke. Part of you wants to start over. Part of you wants to pile more fuel on. But instead, you continue methodically, shielding the small pile from the wind, blowing gently. A few tiny embers glow with each breath. And then, after a few long minutes, a flame erupts in the middle of the twigs, quickly consuming all those small pieces you took so long to collect. But in the process, a couple larger twigs catch solidly on fire. You add more shavings and a few larger sticks. In a moment, flames surround the whole pile and you continue adding larger sticks and logs. Soon, you have a roaring fire, unfazed by the thickening rain.

The fire keeps you and your friends warm while you wait for the storm to pass. You keep feeding it sticks and logs. You place some damp deadfall around the outside and watch the water sizzle and boil out of it, before it too begins to burn. You place a ring of stones around the fire to keep it from spreading too far, because you realize that as bad as the storm is, the fire could be even worse without the proper care. Your friends' doubts have faded, and you are grateful to be with them, warm and dry.

What can I do now?

Donald Trump won the 2016 US Presidential election a few hours ago. Since then, the most common thing I've heard from my friends is: "Now what? What can I do now?" They are, like me, disgusted by the bigotry, misogyny, and xenophobia that Trump represents. My answer: organize. And I don't necessarily mean start a new nonprofit. There are plenty of those doing great work already, which is good, because that means you don't have to start from scratch. I'm talking about something bigger. There are millions of people in this country who want it to be safe for women, safe for queer folks, and safe for people with darker skin. There is more love than hate. The hard part has always been channeling it to create change. And we have a new pattern for that, modeled after groups like Occupy and BlackLivesMatter.

Let's say you're organizing a demonstration, or trying to get some legislation passed. Traditionally, you'd notify your mailing list, talk to strangers on the street, and maybe even make cold calls. But these methods are incredibly wasteful. First, they waste time and resources, because most of the people you talk to won't be able/willing to help with your particular issue, and the ones who are willing to help will be lucky to recognize the issue as one they care about while they're wading through all the other emails, petitions, and calls they get. More importantly, these one-time impersonal interactions don't build relationships. Much of the success of Occupy and BlackLivesMatter has been due to their focus on building relationships, both between people and between organizations.

So what's the big deal about relationships? If you think about your friends, you know who might be interested in helping with a particular issue. And if you want to work on an issue, it's a lot easier to get involved if one of your friends already is. You can spend less time dealing with spam email blasts and more time making change. And more generally, our personal relationships have a huge influence on our world-view and our culture. By engaging in activism through meaningful personal connections, we learn to understand different perspectives and bake our principles into our daily lives and habits. And as people participate in different groups, those values and skills become part of a larger activist culture. And that kind of culture-building can be a powerful force to counteract the dangerous political polarization facing the US. And of course, working on things you care about with friends is fun, and fun is an excellent motivator.

So what does it look like in practice? On top of traditional activism, it looks like meeting regularly with small groups of people who have some common ground (i.e., affinity groups). It's even better if the groups are also diverse in some ways. So for example: people who all live in the same city but work on different issues, or people who work on the same issue but in different formal organizations. On top of the meetings, you can add a mailing list, or an ongoing Google Hangout, or a group message in Signal. When each person is part of multiple groups, skills and ideas quickly spread across the entire activist ecosystem, and it's easy to signal boost a call (or offer) for help to a very large group of people who will actually act on it. If you contact a couple people, and each of them contacts a couple more, and so on, the number of people you reach grows (literally) exponentially. So activism doesn't always have to be about taking steps towards a specific plan, it's sometimes more useful to build a network that allows you to mobilize resources when and where they're needed.

Once more time, with feeling! Toward reproducible computational science

My scientific education was committed at the hands of physicists. And though I've moved on from academic physics, I've taken bits of it with me. In MIT's infamous Junior Lab, all students were assigned lab notebooks, which we used to document our progress reproducing significant experiments in modern physics. It was hard. I made a lot of mistakes. But my professors told me that instead of erasing the mistakes, I should strike them out with a single line, leaving them forever legible. Because mistakes are part of science (or any other human endeavor). So when mistakes happen, the important thing is to understand and document them. I still have my notebooks, and I can look back and see exactly what I did in those experiments, mistakes and all. And that's the point. You can go back and recreate exactly what I did, avoid the mistakes I caught, and identify any I might have missed. Ideally, science is supposed to be reproducible. In current practice though, most research is never replicated, and when it is, the results are very often not reproducible. I'm particularly concerned with reproducibility in the emerging field of computational social science, which relies so heavily on software. Because as everyone knows, software kind of sucks. So here are a few of the tricks I've been using as a researcher to try to make my work a little more reproducible.


When I'm doing anything complicated with large amounts of data, I often like to use a database. Databases are great at searching and sorting through large amounts of data quickly and making the best use of the available hardware, far better than anything I could write myself in most cases. They've also been thoroughly tested. It used to be that relational databases were the only option. Relational databases allow you to link different types of data using a query language (usually SQL) to create complicated queries. A query might translate to something like "show me every movie in which Kevin Bacon does not appear, but an actor who has appeared in another movie with Kevin Bacon does." A lot of the work is done by the database. What's more, most relational databases guarantee a set of properties called ACID. Generally speaking, ACID means that even if you're accessing a database from several threads or processes, it won't (for example) change the data halfway through a query.

In recent years, NoSQL databases (key-value stores, document stores, etc.) have become a popular alternative to relational databases. They're simple and fast, so it's easy to see why they're popular. But their simplicity means your code needs to do more of the work, and that means you have more to test, debug, and document. And the performance is usually achieved by dropping some of the ACID requirements, meaning that in some cases data might change in the middle of a calculation or just plain disappear. That's fine if you're writing a search engine for cat gifs, but not if you're trying to do verifiably correct and reproducible calculations. And the more data you're working with, the more likely one of these problems is to pop up. So when I use a database for scientific work, I currently prefer to stick with relational databases.

Unit tests
Some bugs are harder to find than others. If you have a syntax error, your compiler or interpreter will tell you right away. But if you have a logic error, like a plus where you meant to put a minus, your program will run fine, it'll just give you the wrong output. This is particularly dangerous in research, where by definition you don't know what the output should be. So how do you check for these kinds of errors? One option is to look at the output and see if it makes sense. But this approach opens the door to confirmation bias. You'll only catch the bugs that make the output look less like what you expect. Any bugs that make the output look more like what you expect to see will go unnoticed.

So what's a researcher to do? This is where unit tests come in. Unit tests are common in the world of software engineering, but they haven't caught on in scientific computing yet. The idea is to divide your code into small chunks, and test each part individually. The tests are programs themselves, so they're easy to re-run if you change your code. To do this in a research context, I like to compare the output of each chunk to a hand calculation. I start by creating a data set small enough to analyze by hand, but large enough to capture important features, and writing it down in my lab notebook. Then for each stage of my processing pipeline, I calculate what the input and output will be for that data set by hand and write that down in my lab notebook. Then I write unit tests to verify that my code gives the right output for the test data. It almost always doesn't the first time I run it, but more often than not it's a problem with my hand calculation. You can check out an example here. A nice side effect of doing unit tests is that it gives you confidence in your code, which means you can devote all of your attention to interpreting results, rather than second guessing your code.

Version control

Version control tools like git are becoming more common in scientific computing. On top of making it easy to roll back changes that break your code, they also make it possible to keep track exactly what the code looked like when an analysis was run. That makes it possible to check out an old version of code and re-run an analysis exactly. Now that's how you reproducible! One caveat here: in order to keep an accurate record of the code that was run, you have to make sure all changes have been committed and that the revision id is recorded somewhere.


Finally, logging the process of analysis scripts makes it a lot easier to know exactly what your code did, especially years after the fact. In order to help match my results with the version of code that produced them, I wrote a small logging helper script that automatically opens a standard python log in a directory based on the experiment name, timestamp, and current git hash. It also makes it easy to create output files in that directory. Using a script like this makes it easy to know when a script was run, exactly what version of the code it used, and what happened during the script execution.

As for the specific logging messages, there are a few things I always try to include. First, I always have a message for when the script starts and completes successfully. I also wrap all of my scripts in a try/except block and log any unhandled exceptions. I also like to log progress, so that if the script crashes, I can know where to look for the error and where to restart it. Logging progress also makes it easier to estimate how long a script will take to finish.

Using all of these techniques has definitely made my code easier to follow and retrace, but there's still so much more that we can do to make research truly open and reproducible. Has anyone reading this tried anything similar? Or are there things I've left out? I'd love to hear what other people are up to along these lines.