Data – Data Dispatch https://data-services.hosting.nyu.edu NYU Data Services News and Updates Mon, 12 Oct 2020 17:19:28 +0000 en hourly 1 https://wordpress.org/?v=5.5.15 https://data-services.hosting.nyu.edu/wp-content/uploads/2017/07/DS-icon.png Data – Data Dispatch https://data-services.hosting.nyu.edu 32 32 Five Things I learned at … ICPSR 2017 Biennial Meeting https://data-services.hosting.nyu.edu/five-things-i-learned-at-icpsr-2017-biennial-meeting/ Thu, 19 Oct 2017 21:14:45 +0000 http://data-services.hosting.nyu.edu/?p=1156 Continue reading "Five Things I learned at … ICPSR 2017 Biennial Meeting"]]> Last week I visited Ann Arbor, Michigan for the 2017 Biennial Meeting for ICPSR’s Official Representatives (ORs). The meeting theme, “Building for a Data-Driven Future,” was a chance for ICPSR’s consortial members to learn more about the collections and get to know the ICSPR staff, educational programming, and data archiving efforts. Here’s five things I learned at the conference.

1) ICPSR offers a robust range of resources and guidance for teaching with data. I had already known about the Data-Driven Learning Guides that ICPSR produces, but I got the chance to see that these really are effective, packaged datasets that can be used to teach concepts or statistical methods in a variety of contexts. I hope that anyone teaching with data at NYU makes use of these guides, even if for a point of departure.

2) When teaching data literacy we should focus the discussion on the methods inherent in the data rather than cycling through lists of places where people can find data. This was the focus of Lynette Hoelter’s excellent workshop on the methods, vocabulary, and practices of information confidentiality within social sciences research (slides are available here). The mode of collection matters, as does understanding the inherent biases and limitations of standard data studies. ICPSR metadata allows for users to locate studies according to methodology, coverage, universe, etc. I’m beginning to realize that students need to begin with these considerations before fully forming a research question that involves data.

3) It’s difficult for libraries to provide services and support on campus for patrons who have the need to access secure data. Many of my colleagues are in the same boat I am; we do not have the space or physical infrastructure to facilitate access to data with confidentiality requirements. A big part of our responsibility is to partner with other institutions who may have this capacity and glean ideas; however, we should also work toward helping researchers find alternative datasets or surrogates (public use files) that still may allow them to ask meaningful questions (when possible). ICPSR itself is leading an effort in this area to develop protocol for a virtual data enclave and a process for researcher credentialing, which would facilitate access to data for those who otherwise could not have the institutional support to work with it.

4) It rains a lot in Ann Arbor.

5) Legendary data librarian Bobray Bordelon’s staple karaoke song is allegedly Lady Gaga’s “Bad Romance.” We didn’t get to see a performance in person, but we did get to celebrate his acceptance of the William H. Flanigan award for distinguished service as an ICPSR Official Representative. I, with many other members of the ICPSR and data librarianship community, am in awe of Bobray’s knowledge and generosity to the field. We also learned that Princeton University LibGuides and webpages are the number 2 source of traffic to all of the data housed by IPUMS, which is surely a testament to Bobray’s work. Congratulations, Bobray, on this prestigious award.

Bobray Bordelon, Data Services Librarian at Princeton University, accepts the William H. Flanigan Award.
]]>
Data Services Adds Aerial Laser and Photogrammetry Data for Dublin, Ireland https://data-services.hosting.nyu.edu/data-services-adds-aerial-laser-and-photogrammetry-data-for-dublin-city-ireland/ Wed, 12 Jul 2017 18:30:44 +0000 http://data-services.hosting.nyu.edu/?p=1065 NYU Data Services is excited to publish a collection of 2015 Aerial Laser and Photogrammetry Survey Data for Dublin City, Ireland in the Spatial Data Repository. This high density dataset was collected in March 2015 by Debra F. Laefer and a team of researchers at NYU’s Center for Urban Science and Progress (CUSP). The dataset includes aerial laser scanning (ALS) from 41 flight paths in the form of a 3D point-cloud (LAZ) and 3D full waveform ALS (LAS and Pulsewave), and other imagery data. For more information on this collection, please read the official press release. TechCrunch refers to this set as the “largest LiDAR dataset ever to help urban development.”

Also, refer to the video below for a 3D preview of what the data looks like when visualized.

About the Collection and Data Release

The 2015 LiDAR dataset is a landmark acquisition for geospatial data collections at NYU Libraries. It is the first time since the launch of our new Spatial Data Repository in 2016 that the GIS team has worked with researchers at NYU to bring a complex, multi-format original dataset into our collection. Many thanks to Stephen Balogh, Brittney ONeill, Ahn-Vu Vo, and others who put in incredible amounts of work on organizing the data for release and developing capacity for it.

Because of the size and complexity of the data, we had to take several new steps in order to present the data with enough spatial context to be useful to a range of geospatial researchers. One of the most frequent questions we anticipate about this data is, “what is it, and what can you do with it?” To help, the team has provided a 3D rendering of what the point cloud data looks like when visualized (see below).

This is just one section of point cloud data, which anyone can download and visualize with a library like Potree, though even this visualization is presenting a compressed and down-sampled version of the full waveform LiDAR, which is made available in LAS and Pulsewaves formats. Professor Laefer’s team has provided very robust documentation about the use of this data in research, and its application for urban informatics scholarship. To date, this type of data has been used to explore the detection of road curbs and obstacles, tree growth, and more.

The size and complexity of the data associated with the 2015 aerial laser scan has also required us to revise some of the ways that we have been presenting spatial data. In total, the data associated with just a two square kilometer area in Dublin is well over one terabyte and comes in at least four different formats, including point cloud, full waveform, and infrared GeoTIFF. We needed efficient ways for users to explore smaller subsets of the data and download files efficiently, so we expanded the interface of GeoBlacklight to afford for discovery according to individual flight paths or area of coverage.

A screenshot of the navigation interface for the collection. Users can click on individual tiles or lines (which represent discrete flight paths) in order to download the datasets associated with that area or flight.

Through our spatial discovery application, GeoBlacklight, users can find sections or subsets of the data that are important to them and download accordingly. We hope that this release of LiDAR data benefits the larger geospatial community, and we encourage you to explore the complete collection within NYU’s Spatial Data Repository.

]]>
Five Things We Learned At . . . Geo4LibCamp 2017 https://data-services.hosting.nyu.edu/ftwla-geo4libcamp-2017/ Tue, 07 Feb 2017 15:15:39 +0000 http://data-services.hosting.nyu.edu/?p=892 Continue reading "Five Things We Learned At . . . Geo4LibCamp 2017"]]>
Geo4Lib2017 attendees gather in front of the Branner Earth Sciences Library & Map Collections at Stanford University

Last week, Stephen Balogh and I attended the second annual Geo4LibCamp, hosted by Stanford University. The event marked a great year of progress in the GeoBlacklight community. It was a time to reflect on why our current political situation should influence how libraries collaborate to preserve geospatial data. Here’s five things I learned.

  1. Given our exigent situation, we may need to re-think the scale and process of metadata creation. In his excellent plenary talk, Stace Maples modeled ways in which librarians might want to leverage Google’s Cloud Vision API, for instance, to extract workable metadata for scanned maps. The API has the potential to generate searchable terms to help with discovery.
  2. Index map layers are an interesting organizing principle to help contextualize the discovery of physical maps. Stanford is already implementing systems for presenting the holdings of Japanese military topographic maps. This includes reference layers in EarthWorks, Stanford’s discovery portal, but also a series of maps hosted on ESRI online. These tools allow users to discover specific maps more quickly and to see where there are gaps in holdings.
  3. Jack Reed at Stanford has released a gem called GovScooper that harvests all of the metadata in Data.gov and makes a rudimentary transformation into the GeoBlacklight schema. All of the metadata is now available in OpenGeoMetadata, so anyone can bring records into GeoBlacklight and begin to sort through them. This project, in my opinion, has major potential for rescuing geospatial data and enhancing the discoverability of it.
  4. The David Rumsey Map Center, which opened this past year, is amazing! We got to see some incredibly high resolution maps and hear about the process of digitizing and stitching together images.
  5. The GeoBlacklight community is very much concerned with user experience. In the un-conference planning session, the intersection of GeoBlacklight an user experience was the most popular proposed session, and when we met, we had some great discussions about the intersection of metadata and application design.

Thanks to Darren Hardy and everyone at Stanford University for hosting such a great and informative conference. I’m already looking forward to next year.

]]>
Saving Data: Preservation during Political Turmoil https://data-services.hosting.nyu.edu/saving-data-preservation-during-political-turmoil/ Thu, 26 Jan 2017 20:10:57 +0000 http://data-services.hosting.nyu.edu/?p=871 The first week of the Trump administration has been a disastrous assault on many fundamental human and academic rights. So far, a media blackout has been ordered for employees of the EPA, and moving forward, the administration says that “political staffers” will be required to review all published work and data produced by EPA scientists before release to the public or in academic venues.

Attempts to control access to data are spreading beyond the sciences as well. Last week, two Senators (Mike Lee from Utah and Paul Gosar from Arizona) introduced a bill that would undermine the Fair Housing Act, which prevents access to housing based on racial discrimination. In the text of the bill:

No Federal funds may be used to build, maintain, utilize, or provide access to a Federal database of geospatial information on community racial disparities or disparities in access to affordable housing.

Of course, the larger fear is that the massive amounts of data available from governmental sources, such as the EPA, U.S. Census, Bureau of Labor Statistics, and so much more will be taken down. And as the text of the aforementioned bill suggests, the motivation for hiding data is clearly ideological; our leaders intend to enable the discrimination of people and erase human rights.

To guard against this, there have been many coordinated efforts to rescue and preserve our data. Here in New York, NYU’s ITP and Tisch School of the Arts are hosting a guerrilla data rescue event. Developers, coders, librarians, archivists, and activists will gather to work on scraping data, archiving web sources, and coming up with ways to preserve important data.

Other events have been springing up elsewhere. The New York Academy of Medicine organized a drive to save data related to climate change. In NYU’s sphere, members of the OpenGeoPortal geospatial metadata consortium  have launched efforts to complete a data crawl. Thus far, they have archived 20 terabytes of data from these sources:

  • EPA Data Download Site
  • EPA Data Commons FTP Site
  • EPA eGrid
  • EPA FTP Portal
  • EPA Toxic Relief Inventory (TRI)
  • EIA Open Data Portal
  • EIA Layer Information for Interactive State Maps
  • EIA Natural Gas Annual Respondent Query System
  • USGS National Land Cover Database (2011, 2006)
  • USGS National Hydrography Data set
  • NREL GIS Data Portal
  • US Fish & Wildlife National Wetlands Inventory
  • US Census Bureau Entire 1980,1990, 2000, 2010 Population and Housing Census; ACS 2002-2013; EEO Disability 2002-2008; Econ 1997-2015.
  • HUD Data Portal
  • BTS National Transportation Atlas Database 2011-2015 including all tabular statistical data
  • BJS Bureau of Justice Statistics Raw Data Sources
  • HRSA Data Warehouse
  • NOAA Northern Hemisphere Snow & Ice Archive 1997- Monthly
  • NOAA GSOM Global Temperature (Stations)
  • NOAA Nighttime Lights Time Series 1992 – 2013
  • NOAA Global Self-consistent, Hierarchical, High-resolution Shoreline Database (GSHHG)
  • NOAA   Continually Updated Shoreline Product
  • NOAA   Historical Shoreline Survey
  • NOAA   USGS National Assessment of Shoreline Change Vector Shorelines
  • NASA GISTEMP Global Temperature (Global Mean)
  • National Atlas – Entire Atlas

Also, kudos to Stanford University’s Jack Reed, who developed GovScooper, a tool for scraping data from the Data.gov portal so it can be preserved.

The preservation of this data is only one part of the equation. Creating metadata for it so it can be discovered in new context is a next important step. At NYU, we’ve already been engaged in the process of preserving federal and local data. Our Spatial Data Repository contains a range of U.S. Census data, files from NYC’s Bytes of the Big Apple, and more. Our goal is to join in with these coordinated efforts and continue to make data accessible to as many people as possible.

]]>
Data Services Mediates Access to Gallup World Poll and U.S. Daily Tracking Respondent-Level Data https://data-services.hosting.nyu.edu/data-services-mediates-access-to-gallup-world-poll-and-u-s-daily-tracking-respondent-level-data/ Thu, 20 Oct 2016 18:45:27 +0000 http://data-services.hosting.nyu.edu/?p=636 NYU Libraries has provides access to a range of Gallup public opinion polling data through the Gallup Analytics tool, the Roper Data Archive, and the Gallup Poll interface. Through these tools (with the exception of some datasets in Roper), users can access public opinion data and export data aggregated at the country level.

gallup_corporate_logoNow, individuals in the NYU community can access Gallup World Poll and U.S. Daily Tracking Poll data at the respondent level, which provides a much-enhanced ability to trace connections between concepts and people. The years of coverage are roughly from 2008 – present. Also, beginning in 2020, NYU Libraries also has added access to respondent-level data for the Gallup Poll Social Series (GPSS) and the Gallup Panel COVID 19 Survey. Each of these separate datasets and studies is described below.

How to Request Access to Gallup World Poll

In order to access respondent-level data for Gallup World Poll, you first need to be a current NYU faculty, student, or staff member. We recommend that you read the Gallup World Poll Methodology but also read through the World Poll Codebook. The codebook gives you an overview of all variables present, and you can discover what you need by doing some Ctrl+F searching in the PDF document. Your data extract request should be informed by a research question that has some logical boundaries.

To help the Data Services team access the subset of data that you need, all you have to do from here is get back to us with a list of variables from the data you would like. This list should (ideally) encompass theme, place, and timespan. For example, if your research question is about perception of gender equity in European Union countries, your request list might look something like this:

  • D7 – What is your gender?
  • D6 – What is your race?
  • UCLA3A – Do you describe yourself as male, female, or transgender?
  • WP1223 – What is your marital status?
  • WP1146 – “…women should hold leadership…”
  • WP646 – “…women should have same rights as men…”
  • Countries – (e.g., France, Germany, Italy, Spain)
  • Dates (e.g., 2008-2012)

This list is meant to be a sample; your request will likely contain many more variables. Once we have the list of variables you need, we will compile a data extract and deliver it to you.

Additional Notes: Indicators like gender or educational attainment, which may be useful for your analysis but not directly related to your question, exist as their own question QTAGs (variables), so you may want to request the inclusion of these for more robust analysis. Also note that the data includes methodological variables that may be useful for analysis (e.g., D9 “How many adults are in your household,”) may be useful as well, and you should request these. Also note that the phraseology of the questions changes slightly from year to year, and if you are interested in creating time series analyses, you may need to request variables that are related conceptually but are distinct in the layout of this data.

Note also that the PDF we have created is meant to be an introduction to the data. There are many more questions in the World Poll survey, and these can also be discovered by creating an account with Gallup’s World Poll Reference Tool. The Reference Tool is not the source of the data itself but instead a search mechanism to discover which questions were asked in specific countries during specific years. You can use this tool to locate variables you’d like to extract. Refer to the World Poll Reference Tool Guidebook for instructions on creating an account and performing searches.

There are several ways you can discover which questions or data is available. You can search by place (country), year, and theme or topic. Say you are interested in researching the concept of happiness in the world. You can enter a keyword search for “happiness”:gallup-world-poll-reference-toolIt helps to enter keywords that are liable to be included in the text of a question itself. Once you discover relevant question(s), you can expand the interface and see the specific QTAG, study completion dates, and countries covered. For the sake of ease, you can even export information on question coverage into a .CSV.

How to Request Gallup U.S. Daily Tracking

Getting access to respondent-level data from the Gallup U.S. Daily Tracking is a similar process. There is no comparable discovery tool for U.S. Daily Tracking poll questions; instead, users are encouraged to search in the codebooks listed below. We have also created a matrix of available variables across years for your reference. Follow the same process of assembling question variable as described in the World Poll section above. The only significant difference is that you will want to indicate variables that you want that specify the geographic location of the respondent. And because of the mixed-methdology approach of gathering the data (see the U.S. Daily Tracking Poll Methodology), we highly recommend that you request weight variables as well.

How to Request Gallup GPSS Datasets

The Gallup Poll Social Series (GPSS) dataset  is a set of public opinion surveys designed to monitor U.S. adults’ views on numerous social, economic, and political topics. The topics are arranged thematically across 12 surveys and change each month. Gallup administers these surveys during the same month every year and includes the survey’s core trend questions in the same order each administration (see GPSS methodology). Using this consistent standard allows for unprecedented analysis of changes in trend data that are not susceptible to question order bias and seasonal effects.

NYU’s respondent-level data coverage of this series goes back to January of 2019. Each month has its own codebook (see below), and while the themes of each month’s data are focused, many core questions tend to get asked in multiple surveys. And while the structure, methodology, and sample size is roughly equivalent across months, there are some particularities to each unique survey, so like the U.S. Daily Tracking data, we recommend requesting weighting variables as well. To make a data request, indicate which survey you want and which variables you would like in your extract. Note that access to this data must be mediated by NYU Data Services.

How to Request Access to the COVID-19 Panel Data

The COVID-19 web survey began on March 13, 2020 with daily random samples of U.S. adults, aged 18 and older who are members of the Gallup Panel. Approximately 1,200 daily responses were collected from March 13 through April 26, 2020. From April 27 to August 16, 2020 approximately 500 daily responses are being collected. Starting August 17, 2020, the survey moved from daily surveying to a survey conducted one time per month over a two-week field period (typically the last two weeks of the month).

The Gallup Panel is a probability-based, nationally representative panel of U.S. adults. Members are randomly selected using random-digit-dial phone interviews that cover landline and cellphones and address-based sampling methods. The Gallup Panel is not an opt-in panel. Gallup weights the obtained samples each day to adjust for the probability of select and to correct for nonresponse bias. Nonresponse adjustments are made by adjusting the sample to match the national demographics of gender, age, race, Hispanic ethnicity, education and region. Demographic weighting targets are based on the most recent Current Population Survey figures for the aged-18-and-older U.S. population.

In order to request an extract of this data, please get in touch with Data Services and indicate which variables you would like.

Full List of Codebooks

To help make working with the data a little easier once you’ve gotten access to it, we’ve generated some codebooks to interpret the variables and range of answers within each question. Note that these change over time, so refer to the codebooks below for each year. Unfortunately, there is no common search tool to search across all years, as there is with the Gallup World Poll.

World Poll:

U.S. Daily:

Gallup GPSS:

COVID 19 Panel Survey:

Data Updates

Note that we receive batch updates of data once per quarter for the U.S. Daily Tracking, WorldPoll, GPSS, and COVID Panel data. The WorldPoll data is aggregated into a single file at the end of each year.

What is Gallup World Poll and U.S. Daily Tracking?

Gallup World Poll or U.S. Daily Tracking respondent-level data are developed by collectors who call people in countries that have high access to telephones. Or, in the case of countries that have low access to phones, survey staff actually go on the ground and ask questions relating to economic confidence, quality of life, food access, freedom of media, practice of religion, and other core indicators. Note that the exact text of each question and the exact place its asked can vary; not every question is asked in the same place each year, and the phrasing of each question is meant to approximate a common meaning across language and culture.

gallup-world-migration

The U.S. Daily Tracking poll works similarly. 350 days per year, 1000 random U.S. adults are asked questions pertaining to economic confidence and quality of life. See the full description of the survey here. Part of what makes Gallup data so reliable is its built-in weighting variables to account for variance in the sample. In all, Gallup is one of the most comprehensive and well-documented public opinion surveys that exists, and having full access to the respondent-level data is a great step for the NYU community.

Questions

If you have any questions about accessing Gallup’s respondent-level data for World Poll, U.S. Daily Tracking, GPSS, or the COVID-19 Panel at NYU, don’t hesitate to reach out to Data Services (data.services@nyu.edu).

]]>
NYU Libraries Adds India National Sample Survey Organization Data https://data-services.hosting.nyu.edu/nyu-libraries-adds-india-national-sample-survey-organization-data/ Mon, 26 Sep 2016 12:56:04 +0000 http://data-services.hosting.nyu.edu/?p=623 Recently, NYU Libraries and Data Services acquired a series of data from India’s National Sample Survey Office, which is a program of the Ministry of Statistics. The data we have added span the years of 1987- 2014 and cover core indicators like consumer expenditure, literacy and culture, participation in education, manufacturing, and more. The data is updated to adhere to the DDI standard24-nssoIf you are a member of the NYU community, you can access the data by searching in the Data Services Collection in the Faculty Digital Archive.

In addition to this recently acquired data, we have digitized older editions of India census data from 2001 as well. This data is available in the Faculty Digital Archive Data Services collection. Many thanks to Stephen Balogh for help processing this acquisition and to Aruna Magier, NYU’s South Asia Librarian, for her initiative and efforts in corresponding with the NSS and pursuing the acquisition of this data. Any questions should be directed toward Aruna.

]]>