#LYD16 – Data Dispatch https://data-services.hosting.nyu.edu NYU Data Services News and Updates Fri, 04 Mar 2016 18:45:40 +0000 en hourly 1 https://wordpress.org/?v=5.5.15 https://data-services.hosting.nyu.edu/wp-content/uploads/2017/07/DS-icon.png #LYD16 – Data Dispatch https://data-services.hosting.nyu.edu 32 32 Love Your Data Week: Reusing Data and Making Your Data Reusable https://data-services.hosting.nyu.edu/lyd16-friday/ Thu, 11 Feb 2016 18:46:10 +0000 http://data-services.hosting.nyu.edu/?p=392 Continue reading "Love Your Data Week: Reusing Data and Making Your Data Reusable"]]> Steps to making data reusable range from the organizational (document the contents of your files) to the complex (ensuring the computing environment of your data can be re-created). The first is sometimes more of a matter of convenience: a data file may be usable, but a fellow researcher cannot spare the many hours it might take to sort out how to use poorly documented materials. In other cases, badly documented data can be more than an inconvenience. It can make the entire dataset unusable by you or your colleagues because the context for how the data was collected or how it models your real-world referent cannot be recalled.

For the second case, researchers and developers are working hard to build options for capturing the computing environment in which data is built or analyzed. For help in finding those tools, take a look at resources like ReproMatch, a searchable list of reproducibility tools. Or give tools like ReproZip a try. ReproZip is a tool developed by the ViDA (Visualization and Data Analysis) group at NYU to make reproducibility easy. It tracks operating system calls and creates a package that contains all the binaries, files, and dependencies required to run a given command on the original researcher’s computational environment. A reviewer can then extract the experiment in his own environment to reproduce the results, even if the environment has a different operating system from the original one.

2016-02-12_Twitter_ReproZip

Another good place to start to follow developments in using and re-using data is Reproducible Science, a consortium arising out of the Moore/Sloan Data Science Environment and its partners, New York University, the University of California at Berkeley, and the University of Washington.

]]>
Love Your Data Week: Getting Credit for Your Data https://data-services.hosting.nyu.edu/lyd16-thursday/ Thu, 11 Feb 2016 15:03:12 +0000 http://data-services.hosting.nyu.edu/?p=386 If the idea of sharing your finished dataset seems a little uncomfortable, perhaps it would be useful to consider an alternative perspective: gathering credit for your data. After all, as a 2013 report on data citation affirms, “Data citations should be accorded the same importance in the scholarly record as the citation of other objects.”1 In some cases, your dataset may provide a greater contribution to answering scholarly questions than any published results because your work will be reusable by you and by other researchers down the road as they ask new and more pointed questions of a difficult subject.

Being able to cite data, and to track how and where that data is getting cited, is also becoming a visible aspect of publishing. When seeking a permanent home for your dataset in conjunction with results publication, take great care to seek a repository that will make your data available in a way that is advantageous to citation and discovery.

2016-02-11_Twitter_DataCredit

Make sure you understand how your data will be licensed for use, and verify that the repository has all of the proper elements such as DOI provisions, download and even citation tracking, and a good balance between visibility within your discipline and discoverability outside of your scholarly circle. In other words, all of the issues a researcher would consider when seeking a place to publish their written reports, articles, and analyses!

——
1 Yvonne M. Sorcha, ed., “Out of Cite, Out of Mind: The Current State of Practice, Policy, and Technology for the Citation of Data,” Data Science Journal 12 (13 Sept. 2013): 3.2.1 <http://dx.doi.org/10.2481/dsj.OSOM13-043>, and M. Martone, ed., Data Citation Synthesis Group: Joint Declaration of Data Citation Principles (San Diego, CA: FORCE11, 2014) <https://www.force11.org/group/joint-declaration-data-citation-principles-final>.

]]>
Love Your Data Week: Documenting Your Data https://data-services.hosting.nyu.edu/lyd16-wednesday/ Wed, 10 Feb 2016 15:51:37 +0000 http://data-services.hosting.nyu.edu/?p=371 Continue reading "Love Your Data Week: Documenting Your Data"]]> We know as researchers that we want to be able to understand the contents of our scholarly materials, and we understand on some level that this requires documenting files and indexing locations. But why, despite best intentions, does documenting turn out to be so difficult to accomplish? Why do the best intentions in keeping a project organized often inevitably result in a scramble to keep everything sorted into the right bin?

2016-02-10_Twitter_DocumentationUndoubtedly, the answer is largely related to time, a scarce commodity in the current pressured production environment for research. And one cannot avoid the fact that it does take a certain amount of time and effort to get a documentation system off the ground. The good news, however, is that documentation systems often demand close attention and resources early in the process, but once a workable system is in place it can be replicated across projects in a way that makes the effort less burdensome down the road. And good research management is often about increasing efficiency. Use automated computing processes whenever possible to reduce the time in manipulating research files. Deploy central dashboards that bring together the various threads of your work. And if you work on multiple devices in multiple locations (as we often do), get your cross-device working environment in order so that you don’t have to constantly reformat work done on one device so that it conforms with the official documentation scheme you have deployed on another.

Some common practices deployed in documenting data include:

  • Use of readme files to explain the contents of folders and directories. Take a look at this helpful guide from Cornell University’s Research Data Management Service Group for ideas.
  • Reliance on a project management dashboard, e-Lab Notebook, or cloud-based file manager to gather everything together. We list several on our Documenting Data guide.
  • Matching of long-term documentation systems to ongoing research analysis needs as far as possible. In other words, use the same strategies for documenting your research materials in the long run that you will also use to push your analysis forward and enable publication.

Happy documenting, and keep up with Wednesday’s Love Your Data week focus on documenting data on Twitter, #LYD16!

]]>
Love Your Data Week: Organizing Your Data https://data-services.hosting.nyu.edu/lyd16-tuesday/ Tue, 09 Feb 2016 15:07:40 +0000 http://data-services.hosting.nyu.edu/?p=364 Love Your Data week (#LYD16) continues today with tips on organizing and naming research files. Have you ever been sent a package of research files, opened them, and stared blankly at the file names and folder structure without knowing where to start to understand the contents? How often were those research files in fact your own, saved on a hard drive three or four years ago and not touched since?

Documentation, systematic file naming, and good computer file management are all essential to avoid this problem. Follow along at #LYD16 on Twitter for reminders about best practices in selecting file names and recommendations for software to document files. You can also check out the Research Data Management library guide section on file names at guides.nyu.edu/data_management/file_org.

Love Your Data Week Image

]]>
Love Your Data Week: Keeping Your Data Safe https://data-services.hosting.nyu.edu/lyd16-monday/ Mon, 08 Feb 2016 16:41:27 +0000 http://data-services.hosting.nyu.edu/?p=345 Whether you call it the 3-2-1 Rule or the Rule of Three, when it comes to good storage of your data you need to plan for a basic common scenario for data loss: the destruction of two storage instances in the same physical location and the loss of any given storage instance because of hard drive or cloud storage failure.

To avoid this, you need to have your data in two different physical locations and on two different media. This is accomplished by having your data saved in three places: on two hard drives and a cloud, or on a hard drive and two cloud instances.

Keep your data safe image

But remember that a good storage environment needs updating. You data is probably constantly being changed and versioned. So a good storage setup also includes robust provisions for tracking versioning and storing older instances of files. Consider setting up an automated backup schedule, a common feature on many external hard drives, and look for cloud services such as NYU Box that track file changes. Most importantly, work out a file version naming system and document the contents of data files as you go.

For more information about file storage options and practices at NYU, consult the Research Data Management library guide.

]]>
Love Your Data Week #LYD16 Schedule https://data-services.hosting.nyu.edu/lyd16-schedule/ Mon, 08 Feb 2016 16:31:22 +0000 http://data-services.hosting.nyu.edu/?p=348 With Valentine’s Day just around the corner, this week is dedicated to tips and tools for demonstrating your love for your research data, whether it be making sure it is documented nicely or ensuring its storage for the long term. Check back here throughout the week (or follow Love Your Data Week on Twitter, #LYD16) for daily themes dedicated to each aspect of data protection and care.

Monday: We will be looking at ways to keep your data safe through tips on storage, formats, and navigating a cloud environment.

Tuesday: Keeping your data organized and accessible!

Wednesday: What are the best approaches to documenting your data? Check in with #LYD16 on Wednesday for more options.

Thursday: With research verification and reproducibility increasingly featured in today’s scholarly publishing, ensure that you can give and get credit for your data.

Friday: Data resusability. Learn how your data can be transformed so that it is usable by you and by other researchers in the future.

]]>