QUESTION
What data quality standards and procedures do agencies follow for public data release?
0:34:57
·
129 sec
The NYC Office of Technology and Innovation (OTI) implements a data quality checklist for agencies, ensuring data integrity and privacy for public release.
- Agencies are provided with a data quality checklist to guide the creation of datasets intended for public release.
- The OTI team reviews datasets against this checklist post-submission, ensuring agencies have complied.
- Key checklist criteria include data completeness, good documentation, and a mandatory privacy review.
- The privacy review aims to prevent the release of personally identifying information, unless there's a compelling public interest reason.
- Feedback is provided to agencies for any necessary data adjustments before final submission.
Jennifer GutiƩrrez
0:34:57
Are there any data quality standards that agencies are required to follow.
Martha Norrick
0:35:02
We have a data quality checklist that is also available on the open data site in the section of resources for open data coordinators that I referenced earlier.
0:35:12
Mhmm.
0:35:13
That is the checklist that we give to agencies to say, you know, work through this when you're working on creating the dataset for public release.
0:35:20
And then we go over that same checklist once we receive the data.
0:35:24
So, hopefully, agencies have have already worked through it.
0:35:27
But if there's anything that slips through the cracks, our our team also reviews against that same checklist and everybody and then anybody can take a look at that checklist and see, you know, the things we're looking for.
0:35:36
We're looking for completeness.
0:35:37
Mhmm.
0:35:37
You know, for there not to be sort of randomly missing data, We're looking for good documentation.
Zachary Feder
0:35:44
Privacy review.
Martha Norrick
0:35:45
Oh, we we also do a private yeah.
0:35:47
Privacy review.
0:35:49
Know, data that is released on the open data portal.
0:35:53
Generally, we we try not to we're not releasing personally identifying information unless there's a compelling public interest reason, for example, licensing or, you know, DOB applications sometimes, you know, contain the name of the engineer that as as on the application, for example.
0:36:11
But we also we've reviewed datasets to make sure that there's not the possibility for identification or re identification of of
Jennifer GutiƩrrez
0:36:17
So you all the the same folks are cleaning that data as well.
Martha Norrick
0:36:23
We we don't do additional cleaning after we receive the data from the agency, but we will look at the dataset and go back to the agency and say, hey.
0:36:30
Here's what we noticed.
0:36:31
Please address this and sort of resubmit dataset.
Zachary Feder
0:36:34
And sometimes, like, the data that's on open data reflects the data that agencies have internally, Jen, So it's not as if we're taking what agencies have and are using and coming up with this pristine version Yeah.
0:36:47
Otherwise, But the really important thing as Martha was describing is we want to make sure that if there is something that's not clear about the data, if there's some like error in it or some, like, process that's not intuitive, that is clearly explained that anyone could understand what that is.
0:37:03
And and use that in their analysis or kind