4 questions about data integration on campus
Creating better and more complete student records requires both good data and coherent data integration across systems. The complex challenges presented by data integration have emerged as institutions have made progress on creating comprehensive learner records (CLRs) -- such as those created as part of the ongoing Lumina-funded AACRAO CLR initiative begun in 2015.
“As part of that project, we learned that integrating data to create records from various systems or sources was one of the greatest challenges the development of a CLR,” said Tom Green, AACRAO Associate Executive Director, Consulting and SEM, and the CLR project lead. “Therefore this issue is receiving attention as part of the second phase of this project.”
Green addressed the topic of “Challenges and Solutions for Data Integration: A Report from and Discussion with the Comprehensive Learner Records Project Data Integration Work Group” during a Monday session at the 2018 AACRAO Technology and Transfer Conference in Minneapolis earlier this month, alongside co-presenters Mark McConahay (University of Indiana), Matt Gee (Brighthive), Shelby Standfield (University of Texas at Austin), and Tom Black (Johns Hopkins University).
What is data integration?
“Data integration is a set of technological and business processes aimed at combining data from multiple sources to create new, meaningful, and valuable information,” Gee explained. “In other words, it’s the software you have to buy and the people you have to convince,” he joked, eliciting a laugh from the full room.
In order to make data models from which decisions can be made, data integration involves a lot of steps that can be technically daunting, including:
- Data profiling (how good is your data)
- Cleaning up bad data to make it useable
- Figuring out how to go from unstructured to structured data (pdfs, etc.)
- Migrating from old to new systems
- ETLT (extraction, transforming, loading and transforming again)
- Working with legacy and third party interfaces
- Data warehousing
- Attending to legal considerations
“All these different technological processes are necessary to complete to make data work -- to turn it into structured, meaningful information,” Gee said. “The good news is your challenges with data integration look a lot like everyone else’s: A community college’s problems look like payroll processing look like financial institutions and so on, so the market for data integration is big and there are a lot of tools to choose from.”
Gee named a number of examples, such as Dxtera, NiFi, Pentaho, Informatica, MuleSoft, Denodo, Talend, Snaplogic, and more.
“The large market for data integration makes the tech side easier,” Gee said. “If i can just choose the right tech for my institution depending on our capacity, what’s so hard about it?”
What’s so hard about data integration?
“People,” Gee quipped. “For every major data integration project we’ve helped with, benefited from, or advised on, we’re ultimately bringing data from places, and each place has a person or people with incentives around that information.”
Those data owners may be anxious about losing control of the data as it goes into other systems -- especially sensitive data such as social security numbers. They may be nervous about how messy their data is. Whatever the reason, there are ways to overcome institutional and individual resistance to sharing data.
Gee offered the following strategies to ease data sharing.
- Identify shared-use cases. “Identify the one or two things at the intersection of what they care about and what they have information on,” Gee said.
- Get cross institutional buy-in. Communicate with key leaders and stakeholders, and get them to see the importance of data integration.
- Improve staff capacity and bandwidth. Again, communication is important, as well as adequate training and awareness of the fears of downsizing/reorganization that come with technological change.
Who’s the record for?
To get cross-institutional buy-in, the goals of the CLR must be clearly defined.
“It’s like the parable of the blind men and the elephant,” Black said. “We don’t necessarily agree what we’re trying to accomplish with the CLR, and the faculty don’t necessarily understand what we’re trying to create. When we’ve done prototyping with faculty, some are moved by the concept, and some are not.”
Black asked rhetorically, “Because learning isn’t recorded or housed uniformly across campus, how many faculty need to sign on for the CLR to be a thing?”
And the shift in focus from a student (transcript) to a learner (CLR) can confuse the matter further.
“Who are we describing here?” Black asked. “Learners can be adults -- several years after a degree -- and we’re not sure how to capture that. We may have different SIS for adult students, undergraduates, and graduate students -- and we may not agree what to record on any of those populations.”
Knowing what the CLR is meant to accomplish helps you to “know what dragon you’re trying to slay,” Black said. “Then you present it, get feedback, redesign, and present again.”
The primary imperative of the CLR must be to serve the student first, Black said.
“We talk about third parties -- employers come up a lot -- but one of the things my experimentation has taught me is that the people most often left out of the equation are the learners. If we think about them first, we’ll get a better result,” Black said. “Learners often don’t contextualize their learning very well; they don’t know what they’re acquiring. They do the practice but don’t know what the practice is for. If they don’t find the meaning of it right away, what happens? The record can be very useful to help them understand what to get out of their learning experiences, to connect the dots, and get something useful out of the experience.”
How (and when) will data be represented?
In a brainstorming session, the workgroup identified a dozen potential data sources, such as SIS, LMS, student life system, and the human resources system. To be integrated, these systems must be interoperable -- though they all have different degrees of authenticity and rigor.
Stanfield identified a number of questions around this issue, such as:
- Who owns the record -- the institution or the student?
- When is the data updated in the Institutional Data Store (IDS)? -- Is it when the source system is updated, or on a routine basis, or in real time when the end user accesses the CLR?
- What is the intended purpose of the record? How can we empower students to highlight different elements to shape their CLR (academic, portfolio, career services, etc.) for different purpose (internships, employment, graduate school, etc.)? “It’s unlikely all that info will be useful in mass to dump out,” Stanfield said.
Work group & white paper
In the face of challenges presented by data integration (and partially articulated above), a 12-member work group of AACRAO members worked to identify issues and barriers when integrating data across multiple information sources in the context of a CLR, and to provide guidance to institutions on how to move forward.
The work group’s initial meeting was in January 2018 to define the scope and scale of their undertaking. The presenters were all members of that work group.
“Each member of the work group was charged with gathering information at their own institutions -- data sources, roles, platforms, interoperability, and barriers to data integration (technical, cultural, political),” McConahay explained. “Then Shelby and Tom gathered and drafted a green paper, which was shared with the committee in May.”
At the end of May, the workgroup and selected corporate partners gathered to review the green paper and discuss next steps. The draft white paper, published in June, can be viewed here.