First of all, it’s great to have the opportunity to contribute to the Center’s blog. I wanted to tell the story of me being one of four student workers on the North Carolina Data Dashboard, and how that has led to amazing opportunities in data science and CIS curriculum.
One of my student co-workers on the Dashboard, Sean Duffy, and I recently were the first and only students ever to make a full presentation at the World-Wide Data Vault Consortium (WWDVC). The WWDVC is an international data science conference aimed at helping improve data processes and other business intelligence functions. Our goal at the conference was to explain how we used the Data Vault 2.0 methodology to build and operate the North Carolina Data Dashboard Project (NCDD). We could not have anticipated the enormous response we received from conference attendees.
After giving our presentation, we became the biggest hit of the conference. A crowd of people came up afterward to ask us questions and encourage our innovative application of the Data Vault 2.0 method. We ended up interacting and talking at length with the founders and pioneers in the field. Everyone wanted to talk to us and were curious about the great things that we are doing at our university. It was an incredibly valuable experience that was only possible due to the support of Professor Dills and the Center.
Data Vault and the NCDD:
In case you didn’t know, Data Vault 2.0 is the most efficient way to structure, build, maintain, and work with data and data warehouses. In short, as more and more organizations reach the point of having to manage big data, just as we did with the NC Data Dashboard, it becomes harder for them to scale effectively.
One temporary, but expensive solution that is commonly used is restructuring and buying more storage. However, this solution only works for so long, as it eventually becomes infeasible. Data Vault 2.0 allows for natural scalability, no matter the data size, without having to restructure each time data needs to be added. It also incorporates Agile and Six Sigma methodologies.
These factors, along with many others solidify it as the future of the data warehousing industry. This annual conference acts as a gathering of the most brilliant minds in the industry to reflect and showcase new innovations with Data Vault 2.0 and what the industry looks like going forward.
Bringing Home Ideas
There are several reasons why this conference was so beneficial, the first of which was the numerous vendors that attended. Among these were Snowflake and WhereScape. These specific ones demonstrated some of the most cutting-edge pieces of software available. Before attending, I didn’t know technology like this was even available. The conference setting served as an easy way to speak to each of the individuals representing these companies on a personal level.
Each of the groups were eager to learn about our position as students relative to our project and to discuss the possibility of using these tools to enhance the NCDD Project. That opportunity would be incredible. WhereScape is a metadata-driven automation and code generation tool used to model and build the data warehouse and all ingest processes. Using it as a data-modeling tool would shorten our development processes immensely, increasing our efficiency by a factor of 3 – 5. This means the NCDD could add between 3 and 5 new data series in the time it currently takes to add one data series. On the same token, Snowflake would decrease the time it takes to structure our data due to their more efficient innovation on SQL: SnowSQL. Furthermore, Snowflake is a cloud-based service. That means both storage and the updating of that storage wouldn’t be a concern for the NCDD team. That would cut costs, freeing up funds for further expansion of the project. Even more importantly, this technology could be used by WCU for all of the same reasons, just on a more massive scale.
The idea of not having to continuously purchase expensive storage space as well as moving important systems to the cloud could be revolutionary for the university. As far as expenses are concerned, Snowflake charges per second per node (computer) of use, giving it a competitive edge on many other technologies. Combining this with Data Vault 2.0 could put WCU in an even better position compared to most of the other colleges in the country and exponentially increase efficiency overall.
Connecting with Founders and Pioneers:
Another key piece of the conference was the relationships Sean and I built with other attendees. The first is Dan Linstedt, the creator of Data Vault. He gave Sean and I the chance to be the first and only student presenters to attend the conference and share our story. He created Data Vault when he was working for Lockheed-Martin in the 90s. Lockheed-Martin was in a unique position. It was dealing with some of the largest datasets in the world at that time. A rough estimate of the size of the data was around 15 terabytes, which was unheard of. 15 terabytes is small potatoes in today’s world, but it was expensive to buy more storage and nearly impossible to restructure in the event of introducing new, large amounts of data. Dan was tasked with finding a way to fix this problem and Data Vault was the answer. More and more organizations are now reaching this point and turning to Data Vault. Dan shared this history with us and helped us understand why Data Vault is the future. Dan found a solution to the problem so many companies are facing today, twenty years ago. What was even more amazing was that he vowed to help us going forward in sharing the importance of this.
Another individual we met was Bill Inmon, who is known as the “Father of Data Warehousing” and was a keynote speaker for the conference. Where Dan created the superior way to house data, Bill created the entire industry. It was daunting when we realized this, and he wanted to spend some time getting to know us. He talked to us in the hour before our presentation about everything other than data. He just wanted to sit and get to know us. He offered us advice for our paths going forward in data and encouraged us to just go and enjoy our presentation, not to be nervous. Bill also explained to us that in early August of last year, he was in a fairly rough accident, where his lungs were damaged. A few problems resulted from this including sporadic coughing and chronic pain. Because of the pain, Bill would typically leave after his presentations, (his keynote being presented the morning of ours), greeting a few folks, but leaving the same day to go home and rest. After meeting us however, he not only waited to see our presentation, but extended his stay into the next afternoon in order bid us farewell. This was a true honor and was incredibly meaningful. His actions, along with the slough of praise given by the other attendees, let Sean and I know we were truly in a unique position.
In conglomeration with meeting Dan, we had the privilege of having dinner with a number of other genius individuals such as Cindi Meyersohn, President of Data Rebels, Kent Graziano, Chief Technical Evangelist at Snowflake, Data Vault Master Sanjay Pande and others. Talking with these folks at length brought me true inspiration for the future of Data Vault and the potential future WCU could have using it and getting involved more in Data Analytics directly through the CIS Program.
Lessons for the Curriculum:
This is what I believe to be something with incredible potential for WCU. Currently, our CIS program doesn’t specialize in any specific aspect, but includes crash courses in many subjects like networking, SQL, and Web Development. I’d like to propose that we move to specialize in Data Analytics as well as business modeling. The benefits that would result from this are objectively huge. The first of which, is the preparation of our students for the real world and business. Part of being a successful CIS major is acquiring and tempering the ability to be a communicative bridge between the Information Technology and the administrative sides of a business. This is of course the reason we take several business core classes, but we are missing more technical skills. Currently, we learn a bit of what data is and how to query it, but the meaning behind our queries and manipulation of the data is lost. In order to analyze data correctly, we have to be able to understand several things about the business/client that is requesting it: What is the information flow in the business, what do they care about most, what are their goals, and what are the questions the business needs to answer? This is where Business Modeling would come into play very naturally, because of our situation in the College of Business. We could incorporate some of the latest tools like Snowflake and WhereScape for experience in both modeling and warehousing. Along with this, we could create more opportunities for students to have Experiential Learning by using our upper level classes to maybe both model a business and its data as well as use analytics to help it grow. This would pair well with MGT 404 and widen the scope of the CIS capstone. Additionally, more project opportunities like the NCDD project could result from this focus, giving students more business experience as well. This would help to continually fulfill WCU’s mission for student Experiential Learning. If we teach the core principals of Data Management/Analytics as the key enablers of business strategy execution, our students will be capable of making immediate significant business contributions upon graduation.
To this end, Dan Linstedt and Sanjay Pande have offered to advise the CoB in establishing a Data Vault 2.0 focused Experiential Learning curriculum in a Data Management/Analytic context. Chris Stewart, WhereScape GM and VP for U.S. Operations, has offered similar support and licensing to WhereScape as part of a Data Vault 2.0 focused Data Management/Analytics curriculum. Kent Graziano has offered advice and access to SnowFlake for students and faculty also in conjunction with a Data Vault 2.0 focused Data Management/Analytics curriculum. Brad Bergh, a 30+ year veteran of Data Warehousing and Decision Analytics, has offered to work with CoB faculty to define a Data Management/Analytics Experiential Learning curriculum incorporating Data Vault 2.0, WhereScape, and SnowFlake.
Just Wow:
This conference was completely eye-opening for the future of data. Sean and I received a degree’s worth of education from our week there and I’m eager to help bring it back for the benefit of WCU and it students. We would prepare our students to be shining stars in the industry upon graduation if we specialize our CIS program in Data Management/Analytics and give them many opportunities for Experiential Learning. So many people are ready and willing to help us succeed and propel our college forward. All we have to do is ask.