OMB Updating Race and Ethnicity Statistical Standards (OMB-2023-0001) Proposal

Written by MHDC (DGC) | Mar 16, 2023 11:32:00 PM

This document is submitted by the Massachusetts Health Data Consortium (MHDC) and its Data Governance Collaborative (DGC) in response to OMB’s Initial Proposals for Updating Race and Ethnicity Statistical Standards (OMB-2023-0001) posted on Regulations.gov on January 27, 2023 and found here: https://www.regulations.gov/document/OMB-2023-0001-0001

About MHDC

Founded in 1978, MHDC, a not-for-profit corporation, convenes the Massachusetts’s health information community in advancing multi-stakeholder health data collaborations. MHDC’s members include payers, providers, industry associations, state and federal agencies, technology and services companies, and consumers. The Consortium is the oldest organization of its kind in the country.

MHDC provides a variety of services to its members including educational and networking opportunities, analytics services on both the administrative and clinical side (Spotlight), and data governance and standardization efforts for both clinical and administrative data (the Data Governance Collaborative and the New England Healthcare Exchange Network, respectively).

About DGC

The DGC is a collaboration between payer and provider organizations convened to discuss, design, and implement data sharing and interoperability among payers, providers, patients/members, and other interested parties who need health data. It is a one stop interoperability resource. The DGC primarily focuses on three areas:

Collaboration: Development of common understanding of and specifications for data standards, exchange mechanisms, and what it means to participate in the modern health IT ecosystem
Education: helping members understand their regulatory obligations, the data and exchange standards they're expected to use, and modern technology and related processes
Innovation: Identification and development of projects and services needed to make modern health data practices and exchange a reality

General Comments

This section comments on the general approach taken by OMB in their posted proposal.

Collection/Interface vs Storage/Analysis/Reporting

The current proposed rule is very focused on the data collection/user interface side of the process. It makes no explicit comment on the data storage or reporting requirements, implying in places that data reporting should use the new format but never laying out specific requirements. These requirements should be explicit.

Paper vs Digital Collection

The data collection interface outlined in the proposal is focused on paper formats and does not consider electronic/digital collection. In particular it does not address options that are only available in electronic collection modes such as conditionalizing the appearance of more detailed options or subsequent questions if they only apply to a subset of respondents. As more and more data collection happens electronically, directly addressing this mechanism becomes essential. At a minimum, direct guidance on how to modify the paper presentation to digital presentation should be included. Ideally this would address any variances between website, desktop applications, mobile apps, and other potential mechanisms for collection. Guidance on issues related to how best to present data in electronic forms should also be provided. Should form developers use dropdown menus to present options? Should there be standards around the number of options or the order of presentation of available options? Are there standardized behaviors expected for keyboard entry into a dropdown – i.e. should multiple letters expand to more detailed answers that match the full string entered or is each keystroke considered an independent, new command? For example, the members of our DGC Working Group unanimously prefer items in a dropdown menu be presented strictly alphabetically rather than in order of popularity or some mix of the two (offering a few very popular options followed by the rest in alphabetical order; the least favored option of the three listed).

Accessibility

Given that equity is a major priority of the current administration and is often the primary driver of collecting race and ethnicity information in the first place, collection of this data should be possible from (and by) the broadest group of people possible including those with disabilities. It should be explicitly outlined that collection mechanisms need to be accessible, particularly to the visually impaired who may have difficulty with paper forms or using an office-based tablet at an agency or organization’s physical location. It would also be helpful to provide guidance on some basic mechanisms to ensure this such as availability of large print paper forms when paper forms are used, using original printouts rather than photocopies when paper forms are used, ensuring that application designs work with non-default browser or device settings, and other specific adjustments or considerations that may not be front of mind when designing a collection program in general.

Standardization and Consistency

While the new minimum requirements are explicitly outlined, all of the more detailed information one level down is presented as sample/example options, leaving it up to each collector to decide what to support at that level. This will lead to extremely inconsistent data that is not easily collated, compared, or useful for research or other purposes.

We firmly believe in the utility of the more detailed options for race and ethnicity and understand that an other option supporting freeform text is the best way to handle it when using paper forms. However, freeform text is not the best storage and reporting mechanism for options that have a structured, standardized data element. We propose that electronic formats require using specific, standardized values for all values that support them and an other value be reserved only for those values that do not have a defined OMB value of any sort. This is a backend requirement; we are not commenting on how much of this be visible to users. Rather, we suggest a requirement to map other values to a set value during backend processing if possible. This same mapping requirement should exist during the data entry phase when paper collection is used.

To ensure this happens, we also support some type of oversight or reporting related to how often other is used and the values supplied in freeform text when it is used. If possible, some analysis of whether other is used when a more specific value could have been used instead and perhaps some penalties for doing so to some defined excessive extent when reporting to the federal government might help enforce the practice of using specific, standardized data whenever possible (we recognize this last suggestion may be difficult to implement).

Use by Other Federal Agencies

In keeping with the idea that these standards be used for Federal and by Federal agencies and that these other agencies mandate the use of these standards for data used in industries that interact with them, we note that some of these proposals might require additional specificity to meet those needs. In particular, HHS (specifically CMS and ONC) use OMB race and ethnicity data within healthcare data and exchange standards they mandate or oversee. The primary mechanism for this is via the baseline clinical dataset called the United States Core Data for Interoperability or USCDI. Race and ethnicity have been part of USCDI since v1 (the current released version is v3, with a draft v4 currently out for public comment).

USCDI currently mandates separate race and ethnicity data elements using extant OMB codes. When used within FHIR (an API framework that the industry is quickly adopting and that is required for many CMS regulations and ONC certifications), the US Core Implementation Guide (IG) is considered equivalent to USCDI by both CMS and ONC. However, it represents race and ethnicity slightly differently than USCDI. US Core supports both high level categories and more detailed codes that roll up, but both are optional. The required content is a single text string that encompasses either race or ethnicity (presumably these would be combined in the new model). Having a standardized way to express a complex set of race and ethnicity data in a single string for this purpose would be highly useful. We understand that this may be outside of the purview of OMB, but feel it is a consideration in the ease of adoption and use of race and ethnicity data moving forward. If each user designs their own way to collate together the category and detailed race/ethnicity data, other organizations receiving that data via a regulated or optional FHIR data exchange will have a significantly more difficult time interpreting the data, collating data from multiple sources together, using that data for quality measures or equity programs, and doing everything else they'd like to do with it.

Cultural vs Biological/Genetic vs Where Someone Lived

One still unaddressed area is context of race and gender. Depending on the use of the data, it may be important to capture self-identity, how a person is perceived by others, genetic or biological background, cultural background, and perhaps other aspects. The door is opened for significantly more variation by using country of origin as a major component of the race and ethnicity designation Someone who is Black and born in Northern Africa but was raised by White Swedish parents who lived in New Zealand for most of his childhood but has now lived in the US for years may legitimately identify across different combinations of races and ethnicities, but do so in different ways. If the goal of race and ethnicity collection is to ensure equity and reduce disparities, how someone is perceived by others might be the most prominent/appropriate data point and identity as Black is likely paramount. For a medical practitioner trying to determine whether someone has a genetic propensity to specific diseases, in those cases where race and ethnicity matter, the biological/genetic background of the individual is paramount so knowing they’re North African is likely most important. For a nutritionist, the cultural background of Swedish may be the most essential with a secondary concern about the country where the patient was raised as food consumption is closely tied to these factors. Having some ability to annotate why an individual associates themselves with a particular race or ethnicity would be an extremely helpful addition to the dataset. Barring that, having clear instructions on what to consider so folks know which of these criteria to use to decide what options to select is important for users of the data to understand what it means/represents.

Translation standards

It is unclear whether this is a current consideration, but in order to optimize standardization, the standardized options for race and ethnicity should all have standard translations into a selection of common languages so that the questions and their potential answers are consistent across all available formats. In addition, any presentation quirks or considerations for languages that do not naturally flow from left to right (Hebrew, Arabic, Mandarin Chinese, Japanese, etc) should come with guidance if not exact presentation standards to ensure optimal utility in those languages.

Introducing the form and why it’s being presented

When presenting the proposal to our DGC Working Group participants, the majority of those present agreed that some specific guidance on how best to introduce the form, indicating that it can be skipped if desired, and how it should be used and/or presented to people was deemed essential. In addition, they felt industry-specific guidance for major industries such as healthcare would be helpful.

Original or converted data

Our DGC Working Group participants also thought it was important to know when data was converted or translated in any way. If an attempt was made to convert existing data into the new format, note that so later users can determine how much to trust that data. If a free text other option was mapped to a standardized detailed race and ethnicity option, note that too (and perhaps maintain the original text in an optional field that accompanies the reported data for validation).

Data Usage

Everyone at the DGC Working Group agreed that there should be a strict requirement to disclose exactly how the collected race and ethnicity data will be used.

Frequency of Request to Provide Data

The DGC Working Group participants feel that it’s important to have some guidance on how often to request race and ethnicity data from individuals and whether those repeated requests should only be made to people who have not provided data in the past or individuals should have the option to change their race and ethnicity after providing it at some point in the past. While it is unlikely individuals will frequently change race and ethnicity, it does happen (and is more frequent than it used to be thanks to the increased availability of commercial DNA tests and other factors that might cause someone to adjust their understanding of their own race and ethnicity).

Chose Not to Answer and Unknown

There should be an explicit chose not to answer option representing people who were presented with the form and made a conscious choice not to supply the requested data. This should be different from unknown which does not impute any knowledge of whether there was a deliberate choice not to provide data, just that no data is available.

Reporting, Metrics, and Validation

If a free text other option is supported, we recommend requiring some form of metrics/data collection around how frequently it’s used and also how frequently it can be mapped to one of the existing detailed options (and, if applicable, whether or not that option was directly available to the user completing the form).

We also recommend some type of reporting around how frequently someone with existing data in the current format chooses different options in the new format now that they are available, how often someone who refused an initial offer to complete a race and ethnicity form chose to complete it on subsequent opportunities, and how long it took to get a critical mass of people with existing data converted to the new format within a specific organization doing collection (we are not certain what a good percentage of conversion for reaching critical mass should be – perhaps 80%?)

We also recommend some level of random validation of conversion from free form other options to standardized detailed options if our suggestion to include a copy of the freeform text along with the standardized version is adopted. This would help determine how well/accurately/frequently this conversion is happening and thus how successful the availability of the other option is for reporting purposes.

Use of a Scribe, Translator, or other Assistance

While we strongly believe all race and ethnicity data should be self-reported (see below), the mechanism for reporting this data may vary for individuals with a disability, language barrier, or other impediment to recording their own race and ethnicity data in the format supplied to them. In these cases, another individual should be able to record the self-reported data as long as the selection recorded comes from the individual in question. When that happens, some record that assistance was used, the form of assistance, and the name of the person used should be recorded. This should also apply if someone answered a verbal questionnaire asked during a customer service call or some other telephone interaction.

Response to Specific Questions

This section will list specific questions asked in the proposal and our responses to them

1b. To what extent would a combined race and ethnicity question that allows for the selection of one or more categories impact people's ability to self-report all aspects of their identity?

In general, we feel that combining race and ethnicity offers the ability to provide a more complete picture of individual identity, particularly with the current ethnicity restrictions (only considering Hispanic ethnicity). However, there are challenges to this approach beyond migration issues and resistance to change. There is not necessarily a clear rollup path between the primary categories of race and ethnicity and more detailed options so requiring a rollup or that the detailed options be assigned to one of the categories is problematic; someone can be White and Haitian or Black and French or Asian and Chilean. Furthermore, there are several categories that are not covered at all within these categories, particularly indigenous people from other nations (Maori, Australian Aborigines, etc.). We recommend that the American Indian or Alaskan Native category be changed to Indigenous Peoples with American Indian, Alaskan Native, Australian Aboriginal, Maori, etc. as the detailed options to better capture the breadth of possible race and ethnicity values. There are likely other examples of this type of gap that are not specifically indigenous peoples that might need to be considered but we did not explicitly identify others.

1c. If a combined race and ethnicity question is implemented, what suggestions do you have for addressing challenges for data collection, processing, analysis, and reporting of data?

The biggest challenge we see with the new model is migration of existing data. There is no clear pathway to take existing race and ethnicity data in the old formats and determine how to transform it to meet the new model. For example, some percentage of people who reported as White in the past are expected to change to North African and Middle Eastern. There is no accurate way to infer which such individuals should be changed to the new value. Similarly, with the inclusion (or increased availability of) more accurate detailed options, someone who only identified as Black before cannot reasonably be placed into the Haitian, Kenyan, South African, etc. bucket without additional direct information.

Our suggestion is to have a 1-2 year period of time where both formats are supported but new data can only be collected in the new format. During this time period attempts to collect new data from people already in the system should be taken. At the end of the transition period, any old data that has not been updated should be marked invalid, archived, or deleted and not be included in any reporting, data exchange, analysis, or used for any other purpose.

1d. What other challenges should we be aware of that respondents or agencies might face in converting their surveys and forms to a one question format from the current two-question format?

Not a challenge, per se, but organizations that present forms or questionnaires in multiple languages often have significant lead time requirements for translation services; these should be considered in the time allotments. This could also be (at least partially) addressed by providing standardized translations as part of the data standard.

2b. Do these proposed nationality and ethnic group examples adequately represent the MENA category? If not, what characteristics or group examples would make the definition more representative?

There is some ambiguity around whether Jews with no recent ties to Israel would/should qualify as Middle Eastern/North African. There are plenty of Jews who do not consider themselves White (but who may present as white to the larger world) who would likely welcome an option to identify as Middle Eastern/North African but it is unclear whether they qualify when using strictly the options presented as they also do not consider themselves Israeli. Further, retaining the identity of Russian or German or Polish or whatever their more recent ethnic identity entails is important for medical reasons; some guidance on how best to handle this case is warranted. Should they identify as Israeli and Russian, German, or Polish? Should they identify as Middle Eastern/North African and Russian, German, or Polish? etc.

3a. Is the example design seen in Figure 2 inclusive such that all individuals are represented?

Figure 2 makes some pretty significant assumptions about how different detailed options roll up into the higher level category options. While these assumptions may be valid for a significant portion of individuals they are by no means always valid. This is particularly true given the choice of country of origin type choices for the detailed options. As noted above, it is very possible for someone to be Black and Swedish, Asian and Haitian, White and Chinese, etc. If limiting to a paper form, this is likely a reasonable approach to the best we can do, but significant use of Other is likely. For electronic forms, allowing cross-listing of detailed options so they do not automatically roll up only into a single category option would be greatly preferred and offer the most flexibility for people to accurately represent themselves without significant use of an other option. The goal should be to collect specific, accurate data that can be mapped to standardized options whenever possible.

3b. The example design seen in Figure 2 collects additional detail primarily by country of origin. What other potential types of detail would create useful data or help respondents to identify themselves?

As noted in the response to 2b above, there are other criteria that can be used for race and ethnicity. One important subset of options are ethnoreligions. People who are Amish, Mennonite, Jewish, Sikh, Copt, etc. are not just practitioners of that religion, but have a shared culture that transcends the religion. They are often also tightly genetically linked, traced to a small number of origin families, and therefore belonging to one of these groups carries with it medical, cultural, and physical traits that affect many aspects of life. Thus, recording these groups as distinct race and ethnicity groupings is important in many different contexts that use race and ethnicity data (equity, tracking and addressing disparities, propensity for certain medical issues requiring special testing or preventative treatments, etc.).

3c. Some Federal information collections are able to use open-ended write-in fields to collect detailed racial and ethnic responses, while some collections must use a residual closed-ended category ( e.g., “Another Asian Group”). What are the impacts of using a closed-ended category without collecting further detail through open-ended written responses?

The use of other without being able to attribute it to a more specific option when that option is presented to others in other contexts is problematic, as is the idea that the same individual will need to give different answers to the same questions when presented to them in different contexts. At the same time, using an open ended other option is also problematic unless careful data mapping is performed on those answers to map them to specific standard options in the backend data before the data is transmitted/exchanged/analyzed/reported elsewhere. We believe the detailed collection with a strict requirement for mapping the other responses to more specific standardized detailed options for storage and future processing when those options exist is the right choice.

3d. What should agencies consider when weighing the benefits and burdens of collecting or providing more granular data than the minimum categories?

In general, more data is better data so long as that data is accurate. It is better to have no data than inaccurate data you assume is correct. We believe the collection of detailed data is long overdue and will go a long way toward allowing a wider swath of individuals to accurately represent their race and ethnicity when asked. As noted several times in this response, there is no clear roll up path from the detailed options to the categories. When someone collects just the categories, so much information about the individual is lost – and often assumptions that may be incorrect are made. Further, people who struggle with fitting themselves into one of the categories without any clarification or ability to refine further may choose to opt out entirely (and perhaps should if the resulting data collected when they don’t is inaccurate). If the goal is to collect accurate and useful data for as large a percentage of the population or user base of a particular agency as possible then having options that reflect the complex race and ethnicity of as many people as possible is essential.

5a. For data providers who collect race and ethnicity data that is then sent to a Federal agency, are there additional guidance needs that have not been addressed in the initial proposals?

In general, most of the recommendations revolve around data presentation and the collection process, particularly on paper forms, and not how data should be stored and then used after initial collection. This should be remedied to ensure consistent reporting. We have pointed out specific areas for additional guidance throughout this document, but to highlight a few key items: we believe there should be an effort to make the resulting backend data as standardized as possible even if the forms collecting it opted for a freeform other option, that provenance of the data is important, and that any data that gets altered during backend processing or at some other point of the process should be noted as such. Data should be collected and reported in as much detail as possible and with as much specific meaning as possible. If data exchanges hands several times before reporting to a Federal agency, the entire flow should be captured and part of the associated metadata.

5d. How should race and ethnicity be collected when some method other than respondent self-identification is necessary ( e.g., by proxy or observation)?

We believe race and ethnicity should always be self-reported. In keeping with the idea that no data is better than incorrect data, we would recommend not allowing data collected by other means to be part of any race and ethnicity collection effort and resulting analysis/reporting/use. However, if this is not deemed acceptable, then at an absolute minimum all data reported by other means should be clearly noted as not being self-reported so users can choose to filter it out as unreliable should they so choose.

5e. What guidance should be provided for the collection and reporting of race and ethnicity data in situations where self-identification is unavailable?

See previous answer.

6c. How can Federal surveys or forms collect data related to descent from enslaved peoples originally from the African continent? For example, when collecting and coding responses, what term best describes this population group ( e.g., is the preferred term “American Descendants of Slavery,” “American Freedmen,” or something else)? How should this group be defined? Should it be collected as a detailed group within the “Black or African American” minimum category, or through a separate question or other approach?

We applaud the collection of data around people descended from formerly enslaved peoples, but we note there are many such people who are not the descendants of Black American slaves. We also note that some who are may not themselves identify as Black or African American. This should definitely be its own question and should capture descendants of enslaved people who do not fit the narrow definition presented in the question asked.

View full post