Past Events

List of Events


Moore-Sloan Data Science Environment Summit: October 23-26, 2016 2015; New Paltz, New York

From October 23-26, 2016, the New York University Moore-Sloan Data Science Environment Program welcomes MSDSE partners, the University of California-Berkeley and the University of Washington, to build community, explore ideas, and encourage collaboration between our three programs. The 2016 Summit will take place at the Mohonk Mountain House in New Paltz, NY from Oct. 23-26. All participants will then join the Summit with industry leaders in data science at New York University in New York City on Wednesday, Oct. 26.

This year’s Summit is focused around "building community around data science for research."


Back to the Top

May 3, 2016 | The Center for Urban Science + Progress, 1 MetroTech Center, 19th Floor, Brooklyn, NY 11201

Achieving reproducibility in scientific research is a laudable goal, however this has been difficult to achieve. While data and data analysis play a central role in many scientific domains, most papers specify their methods and data only informally and omit important supplemental material. High quality journals have responded to this issue by making reproducibility a requirement for publication. Understanding the challenges to reproducibility and combating them with tools and best practices is therefore of cross-disciplinary relevance.

The Moore-Sloan Data Science Environment at NYU is pleased to announce a symposium on reproducibility that will be held on May 3, 2016.

At the NYU Reproducibility Symposium, we will showcase tools to help make the reproducibility process easy along with case studies showing how creating reproducible experiments has helped other research groups. The symposium will consist of keynotes and tutorial sessions in the morning, followed by discussions of research topics in reproducibility and hands-on sessions in the afternoon. Read more.


Back to the Top

April 28, 2016 | Bobst Library, New York University, New York, New York

Modern research involving data analysis increasingly uses programming to increase efficiency and allow for more effective use of data.As code becomes a more and more essential part of research activities, we need to treat it with the same care that we treat other research products. The first step towards more maintainable software development and data analysis is using version control on all research and analysis code. Git is a popular tool for tracking individual and collaborative development of code.

This workshop takes a look at advanced usage and collaboration using Git and GitHub, including: the concept of branches, and how to manipulate them with merge and rebases, forks and pull requests, and we'll even rewrite history using rebase, and possible workflows.

To get the full benefits of this session, you should have attended the introductory Git and GitHub workshop, or already have familiarity with basic version control using Git (the commands "add", "commit", "log", "diff", "status"), working with remotes ("git pull" and "git push") and handling merge conflicts. Please bring your own laptop with Git installed.


Back to the Top

March 29, 2016 | Bobst Library, New York University, New York, New York

Modern research involving data analysis increasingly uses programming to increase efficiency and allow for more effective use of data.As code becomes a more and more essential part of research activities, we need to treat it with the same care that we treat other research products. The first step towards more maintainable software development and data analysis is using version control on all research and analysis code. Git is a popular tool for tracking individual and collaborative development of code.

This workshop takes a look at advanced usage and collaboration using Git and GitHub, including: the concept of branches, and how to manipulate them with merge and rebases, forks and pull requests, and we'll even rewrite history using rebase, and possible workflows.

To get the full benefits of this session, you should have attended the introductory Git and GitHub workshop, or already have familiarity with basic version control using Git (the commands "add", "commit", "log", "diff", "status"), working with remotes ("git pull" and "git push") and handling merge conflicts. Please bring your own laptop with Git installed.


Back to the Top

Bobst Library, New York University, New York, New York

Have you heard about the reproducibility crisis in science (ex. in Nature and Economist)? Do you worry about false positive results? Or ever wondered how you could increase the reproducibility of your own work? Please join us for a workshop, hosted by the Center for Open Science, to learn easy, practical steps to increase the reproducibility of your work. The workshop will be hands-on. Using example studies, attendees will actively participate in creating a reproducible project from start to finish. Attendees will need to bring their own laptop in order to fully participate.

  • Topics we'll cover include:
    • project documentation
    • version control
    • pre-analysis plans
    • open source tools like the COS's Open Science Framework to easily implement these concepts in a scientific workflow

Back to the Top

Bobst Library, New York University, New York, New York

Modern research involving data analysis increasingly uses programming to increase efficiency and allow for more effective use of data.As code becomes a more and more essential part of research activities, we need to treat it with the same care that we treat other research products. The first step towards more maintainable software development and data analysis is using version control on all research and analysis code. Git is a popular tool for tracking individual and collaborative development of code.

This workshop introduces the basic concepts of Git version control. Whether you're new to version control or just need an explanation of Git and GitHub, this two hour tutorial will help you understand the concepts of distributed version control. Get to know basic Git concepts and GitHub workflows through step-by-step lessons. We'll even rewrite a bit of history, and touch on how to undo (almost) anything with Git. This is a class for users who are comfortable with a command-line interface. Version control usually facilitates:

  • easy backup
  • easy retrieval of specific revisions
  • documentation (who did what when and why?)
  • documentation (how do I do ...?)
  • easy synchronization (staying up-to-date, integrating changes between multiple developers/machines)
  • finding the changes introducing regressions (AKA bugs) efficiently

Back to the Top

Bobst Library, New York University, New York, New York

Modern research involving data analysis increasingly uses programming to increase efficiency and allow for more effective use of data.As code becomes a more and more essential part of research activities, we need to treat it with the same care that we treat other research products. The first step towards more maintainable software development and data analysis is using version control on all research and analysis code. Git is a popular tool for tracking individual and collaborative development of code.

This workshop introduces the basic concepts of Git version control. Whether you're new to version control or just need an explanation of Git and GitHub, this two hour tutorial will help you understand the concepts of distributed version control. Get to know basic Git concepts and GitHub workflows through step-by-step lessons. We'll even rewrite a bit of history, and touch on how to undo (almost) anything with Git. This is a class for users who are comfortable with a command-line interface. Version control usually facilitates:

  • easy backup
  • easy retrieval of specific revisions
  • documentation (who did what when and why?)
  • documentation (how do I do ...?)
  • easy synchronization (staying up-to-date, integrating changes between multiple developers/machines)
  • finding the changes introducing regressions (AKA bugs) efficiently

Moore-Sloan Data Science Environment Summit: October 4-7, 2015; Cle Elum, Washington

As a part of the Moore-Sloan Data Science Environment program, we will be gathering for the second annual Summit on October 4-7 2015 to build community, explore ideas, and encourage collaboration between our three programs at UCB, NYU, and UW. This year’s summit will be hosted by the UW team at the Suncadia Resort in Cle Elem, WA.

This Year’s Theme: "building.” After building our community at last year’s summit, our goal this year is to come away with tangible work products. We will schedule hacking and working sessions and also plenty of unstructured time. We also plan to showcase interdisciplinary scientific accomplishments from each university.


Back to the Top

Workshop on Reproducibility in Science: May 1, 2013

While data and data analysis have become central in many scientific domains, most computational experiments and analyses are specified only informally in papers, where results are briefly described in figure captions; the code and scripts that produced the results are seldom available. Because important scientific discoveries are often the result of sequences of smaller, less significant steps, the ability to publish results that are fully documented and reproducible is necessary for advancing science. While concern about repeatability and generalizability cuts across virtually all natural, computational, and social science fields, no single field has identified this concern as a target of a research effort.

With a view towards a broader adoption of reproducibility and as a step towards better understanding the reproducibility requirements and challenges, this workshop will bring together scientists from different disciplines that work on data-intensive research. Each scientist will give a presentation outlining:

  • The kinds of experiments carried out in her/his field
  • What is reproducible in them?
  • Which tools are used?
  • Why (or why not) reproducibility would be desirable?
  • Existing reproducibility efforts in her/his field, if any?
  • What are the existing barriers (if any) to reproducibility?

We will then have open discussions on topics including the similarities and differences across domains and how to broaden the adoption of reproducibility.

Workshop Day

Plan to arrive at the workshop location by 9am. We will meet until 5pm and then head out for dinner. Please let us know if you cannot make it to dinner so that you can plan accordingly.

Schedule & Supporting Materials

  • 09:30 – 10:00 Welcome and opening remarks
  • 10:00 – 11:20 Short talks (Bio and Climate):
  • 11:20 – 12:00 Discussion
  • 12:00 – 1:00 Lunch
  • 1:00 – 2:40 Short talks (Social Science, Urban Data, Computational Science):
  • 2:40 – 3:00 Discussion
  • 3:00 – 3:30 Coffee break
  • 3:30 – 5:00 Open discussion
  • 6:00 Dinner at Colonie
Speaker List
  • Evan Baugh | Biology, NYU
  • Richard Bonneau | Biology & Courant, NYU
  • Rebecca Capone | Elsevier
  • Fernando Chirigati | NYU Poly
  • Juliana Freire | NYU Poly
  • Ann Gabriel | Elsevier
  • Josh Greenberg | Sloan Foundation
  • Xichen Li | Atmospheric and Oceanic Sciences, Courant, NYU
  • Alessandro Lizzeri | Economics NYU
  • Andreas Kloekner | Courant, NYU
  • Steve Koonin | Center for Urban Science and Progress, NYU
  • Susan McGregor | Journalism, Columbia
  • Carlos Scheidegger | AT&T Research
  • Claudio Silva | NYU Poly and CUSP
  • Dennis Shasha | NYU Courant
  • Dean Williams | Climate Data Analysis, LLNL
Local Information

Sponsor

We thank the Alfred P. Sloan Foundation for supporting this workshop.

Organizers

  • Juliana Freire, NYU Poly
  • Dennis Shasha, NYU Courant
  • Claudio Silva, NYU Poly and CUSP

Contact us

If you have any questions, please send email to [email protected]


Back to the Top

Workshop on Reproducibility in Science: May 29, 2013

Description

While data and data analysis have become central in many scientific domains, most computational experiments and analyses are specified only informally in papers, where results are briefly described in figure captions; the code and scripts that produced the results are seldom available. Because important scientific discoveries are often the result of sequences of smaller, less significant steps, the ability to publish results that are fully documented and reproducible is necessary for advancing science. While concern about repeatability and generalizability cuts across virtually all natural, computational, and social science fields, no single field has identified this concern as a target of a research effort.

With a view towards a broader adoption of reproducibility and as a step towards better understanding the reproducibility requirements and challenges, this workshop will bring together scientists from different disciplines that work on data-intensive research. Each scientist will give a presentation outlining:

  • The kinds of experiments carried our in her/his field?
  • What is reproducible in them?
  • Which tools are used?
  • Why (or why not) reproducibility would be desirable?
  • Existing reproducibility efforts in her/his field, if any
  • What are the existing barriers (if any) to reproducibility?

We will then have open discussions on topics including the similarities and differences across domains and how to broaden the adoption of reproducibility.

Speaker & Participant List

  • Juliana Freire | NYU Poly
  • Fernando Chirigati | NYU Poly
  • Alexander Statnikov | NYU Medical Center
  • David Koop | NYU Poly
  • Panos Ipeirotis | NYU Stern School of Business
  • Dennis Shasha | NYU Courant Institute
  • Jeff Morisette | USGS
  • Nicolas Limare | IPOL - ENS Cachan
  • Jerome Simeon | IBM Research
  • Zachary Ives | University of Pennsylvania
  • Jonathan Markow | DuraSpace
  • Ingrid Ellen | NYU Wagner
  • Kyle S. Cranmer | NYU Physics
  • Brian Litt | Neuroengineering, University of Pennsylvania
  • Rita Wright | NYU Archaeology
  • Josh Greenberg | Sloan Foundation
  • Claudio Silva | NYU Poly and CUSP

Schedule & Supporting Material

  • 09:00 – 09:20 Welcome, opening remarks and introduction
  • 09:20 – 11:00 Session 1 (Moderator: Juliana Freire)
  • 11:00 – 12:00 Discussion
  • 12:00 – 1:00 Lunch
  • 1:00 – 2:15 Session 2 (Moderator: Claudio Silva)
  • 2:15 – 2:20 Break
  • 2:20 – 4:00 Session 3 (Moderator: Dennis Shasha)
  • 4:00 – 5:30 Discussion
  • 6:00 Dinner at Saul

Workshop Day

Our meeting will be held at the new CUSP headquarters, located at 1 Metrotech Center, 19th floor, Brooklyn, NY. See Local Information for details on how to get there.

Plan to arrive at the workshop location on Thursday, May 30th by 8:30am. This will give you enough to register with the building security (you will need to have a photo ID). After you register, take the elevator to the 19th floor, where you will be greeted at the lobby.

Sponsor

We thank the Alfred P. Sloan Foundation for supporting this workshop.

Organizers

  • Juliana Freire, NYU Poly
  • Dennis Shasha, NYU Courant
  • Claudio Silva, NYU Poly and CUSP

Contact us

If you have any questions, please send email to [email protected]


Back to the Top

Workshop on Software Infrastructure for Reproducibility in Science: May 30th-31st, 2013

Description

While there are a number of tools that support different aspects of reproducibility, because these tools are often targeted to specific domains and there is no coordination in their development, it is hard and sometimes impossible to combine them. As a result, we lack general components that can be mixed and matched to construct end-to-end solutions that are applicable across different domains.

With a view towards a broader adoption of reproducibility in computational science and as a step towards the design and implementation of a unified platform to support the publication of reproducible results, this workshop will bring together developers of reproducibility tools. The participants will present the state-of-the-art in reproducibility tools and we will have discussions on important (and required) features, limitations of existing tools, and how they can be integrated. One of the planned outcomes for the workshop is a blueprint for the unified reproducibility platform.

We also hope this workshop will help catalyze the creation of a cohesive community developers and scientists who will work closely together to build a platform that supports reproducibility across different domains.

The workshop will be held in New York City, and thanks to a grant from the Sloan Foundation, we will be able to cover travel costs for the invited participants.

Speaker and Participant List

  • Juliana Freire | NYU Poly
  • Fernando Chirigati | NYU Poly
  • Victoria Stodden | Columbia University
  • David Koop | NYU Poly
  • James Taylor | Emory University
  • Mahadev Satyanarayanan | Carnegie Mellon University
  • Geoffrey Brown | Indiana University/NSF
  • Huy Vo | NYU CUSP
  • Josh Greenberg | Sloan Foundation
  • Sibo Lu | Sloan Foundation
  • Carly Strasser | California Digital Library
  • Nicolas Limare | IPOL - ENS Cachan
  • Sergey Fomel | University of Texas at Austin
  • Tommy Ellqvist | NYU Poly
  • Carole Goble | University of Manchester
  • Brian Nosek | University of Virginia and Center for Open Science
  • James Frew | UC Santa Barbara
  • Andrew Davison | UNIC, CNRS, Gif sur Yvette, France
  • Kaitlin Thaney | Mozilla
  • Brian Granger |California Polytechnic State University
  • Ann Gabriel | Elsevier
  • Jonathan Markow | DuraSpace
  • Merce Crosas | Dataverse / Harvard University
  • Ana Nelson | Dexy
  • Kyle S. Cranmer | NYU Physics Dept. & Center for Data Science
  • Matthias Troyer | ETH Zurich
  • Sheila Miguez | Columbia University
  • Manda Wilson | Simons Foundation & MSKCC
  • Tanu Malik | University of Chicago
  • Jeffrey Spies | University of Virginia
  • Duncan Penfold-Brown | NYU Biology
  • Mono Pirun | MSKCC
  • Carlos Scheidegger | AT&T Research
  • Dennis Shasha | NYU Courant
  • Claudio Silva | NYU Poly and CUSP

May 30th Schedule

  • 09:00 – 09:30 Welcome and introductions
  • 09:30 – 10:15 Tools that capture the specification of experiments (Moderator: Juliana Freire)
  • 10:15 – 10:30 Discussion and Break
  • 10:30 – 11:30 Tools that capture the specification of experiments (Moderator: Claudio Silva)
  • 11:30 – 12:00 Discussion
  • 12:00 – 01:00 Lunch
  • 01:00 – 02:00 Environment capture, virtualization and publishing (Moderator: Dennis Shasha)
  • 02:00 – 02:15 Discussion and Break
  • 02:15 – 03:15 Talks on publishing code and data (Moderator: Victoria Stodden)
  • 03:15 – 03:45 Discussion and Break
  • 03:45 – 04:30 Talks on test/validation, requirements and practices (Moderator: David Koop)
    • Brian Nosek - Open Science Framework [presentation]
    • Kyle Cranmer - Reproducibility in High-Energy Physics [presentation]
    • Matthias Troyer - Reproducibility in Quantum Physics [presentation]
  • 04:30 – 05:00 Discussion
  • 05:00 – 06:00 Demos
  • 06:30 – 09:15 Dinner at Spice Market

May 31st Schedule

We will focus on the design of a blueprint for a unified reproducibility framework and discuss models for creating and sustaining a community of developers for reproducibility tools.

  • 9:00 - 10:30 Discussion: What global architecture of tools can we use to enable working researchers capture computations easily, reproduce and vary them on different systems?
  • 10:30 - 11:00 Coffee break
  • 11:00 – 12:30 Breakout groups to define the ideal architecture of a reproducibility framework
  • Three groups: capture/representation, archival, test/validation
  • 12:30 – 1:30 Lunch
  • 1:30 – 3:00 Report from breakouts and discussion on proposed architecture
  • 3:00 – 3:30 Coffee break
  • 3:30 – 5:00 Discussion: How to create and sustain a community of developers of reproducibility tools? Work on white paper.

Contributed Links

Workshop Day

Our meeting will be held at the new CUSP headquarters, located at 1 Metrotech Center, 19th floor, Brooklyn, NY. See Local Information for details on how to get there.

Plan to arrive at the workshop location on Thursday, May 30th by 8:30am. This will give you enough to register with the building security (you will need to have a photo ID). After you register, take the elevator to the 19th floor, where you will be greeted at the lobby.

Sponsor

We thank the Alfred P. Sloan Foundation for supporting this workshop.

Organizers

  • Juliana Freire, NYU Poly
  • Dennis Shasha, NYU Courant
  • Claudio Silva, NYU Poly and CUSP

Contact us

If you have any questions, please send email to [email protected]


Back to the Top

Reproducibility Challenge

The reproducibility benchmark is an experiment on simulating the critical temperature for the Ising model on a square lattice. Please refer to this document for a more detailed description of the benchmark.

Local Information

The workshop will take place at 1 Metrotech -- 19th floor. 1 MetroTech is at the southwest corner of the MetroTech Commons. The entrance to the building is on the side facing Jay Street. Here's the Google Maps link.

Getting Here by Subway

    • A C F train to MetroTech
    • 2 3 4 5 train to Borough Hall (walk one block East to Willoughby Street and make a left onto Jay Street)
    • M R train to Lawrence Street-MetroTech (walk one block North on Lawrence Street)
    • Q B train to Dekalb Avenue (walk two blocks North toward Manhattan Bridge and make a left onto Myrtle Avenue into MetroTech)

Colleagues from Out of Town

Hotel and Travel

You should make your own travel/hotel reservations. But feel free to contact us if you need help. NYU and NYU Poly have negotiated rates with several hotels. We suggest you stay close to Metrotech -- there are a number of hotels nearby. Hotels in Brooklyn and Manhattan & additional Hotels in Manhattan.

Reimbursement Information

For out-of-town participants, we will be able to cover your travel and lodging. Please check the rules and limits. In order to process your reimbursement, please complete the travel reimbursement form. If you have a social security number, fill out Form W9, and if you don't, fill out Form W8. Note: this will be treated as a reimbursement and the amount you receive is not taxable.

Mail the two forms, together with the original receipts, to:

Judy Brown Department of Computer Science and Engineering 2 Metrotech Center, 10th floor Brooklyn, NY 11201-3840 USA

If you have any questions about reimbursement, please contact Judy Brown with the Subject: Reproducibility Workshop reimbursement.

Requests for reimbursement will not be accepted in excess of 6 months after your expected departure date.


Back to the Top