The steps of developing an App - Part 3 - Data, Schema and Storage

Welcome to part 3 of the series where we're exploring the various steps of developing a software application. If you happened to miss any of the previous parts, you can check them out at the links below:

Part 1 - The spec
Part 2 - Making mocks

In this post we'll be looking at the data that our apps use and how that data is saved. As an example we're going to use the same Web-Resume project from the rest of the series.

Why is this important?

I'd wager that the purpose of about 99% of all applications in existence is to move, manipulate and respond to data of some kind. And probably about 60% to 70% of the functionality of those apps are simple CRUD (Create, Read, Update, Delete) operations. Sometimes an entire app can simply consist of a way to create, edit and display some specific type of data - CRM systems for example.

Even the apps that you'd normally not consider having anything to do with data in a classical sense. That Arduino project of yours takes in data from sensors and transforms that into some electrical output - or transmits the result to be used by another app as its input data.

PC and console games uses tones of data. It will take in your keystrokes and mouse movements as input data, reads the data from image files and 3d models, has to save the game state and ultimately transforms all of that data into the pixel data on your screen.

Data is literally everywhere and takes on many forms. The demand for professional and competent data scientists just keeps growing every single day - just type in "Data Science Demand" in any search engine, or you can read this article.

Having at least a basic understanding of how data is stored, manipulated and moved is critical for any software developer. If you can't grasp the basics of handling data, you're going to have a very hard time developing anything more complicated than "Hello World!"

Hopefully most of what I just said will be obvious to you.

Where to begin

So the logical thing to do would be to try to figure out where the right place is to start. And once again we're going to be consulting the spec we defined in part 1. For the app we're developing, we're going to store three different types of information:

User
Section
Entry

I derived this list by looking at the description of the first couple of lines on the specification:

Content

The resume should be split up into different sections specified by the user

A section consists of a title, default sort-order and entries

A section could have either a single or multiple entries

An entry could consist of text (including dates and numbers), an image or a data structure

A user should be able to manually show/hide a section or entry

The content should be formatted so that it could easily be printed

For now I'm only going to refer to the different types of data by the generic term stores, since we haven't determined what the best storage method will be yet.

It's all about relationships

What we should do next is try to derive how each piece of information relates to other pieces. There are a couple of types of ways that pieces of information can be related to each other:

No relationship - The two records have almost nothing in common. For example the temperature outside vs the color of my shirt
One-to-One relationship - this usually means we're looking at the same thing from different perspectives. For example we could store employee information and manager information in different places, but a specific manager will likely have a record in both the manager store as well as the employee store.
One-to-many - This type of relationship is generally used to describe ownership of one entity by another entity. An example of this would be if we had a store of temperature sensors and a store of temperature measurements. Each sensor record will be the owner of several measurement records.
Many-to-Many relationships - This is usually the most difficult type of data relationship to grasp, but I'll try my best to explain it. To me personally it is useful to think about many-to-many relationships as different groupings of elements, or another way of stating it would be that this type of relationship describes a way for elements to be grouped together. It's probably best shown in an example. Lets assume we have a store for different student organisations on a campus and another store for the individual students. Each student can belong to multiple organisations, and each organisation (hopefully) has multiple students as members.

Now that we have that background. Let's try to figure out how the data in our app relates to one another. For now I don't think giving users the ability to have multiple resumes would be necessary, so any data associated with a resume can simply be stored with the user information. Each resume will have (own) potentially several sections, so we have a one-to-many relationship between the user store and the section store. Subsequently each section will potentially have multiple entries associated with it, so another one-to-many relationship. There's no real need that I can think of to have entries be shown in multiple sections, so no many-to-many relationships will be required in this app.

In summary we're going to have a structure like this:
yuml schema

What to store?

The final thing we are going to have to think about is what information we want to put in each of our stores. This is usually the most flexible part of the operation and will likely be changed and iterated a few times as we develop the app and when we decide to add features along the way. For a first iteration it should be ok to just wing it and make updates as we go along. What I've come up with for each data store was this:

User

Identifier / username
First name
Last name
Possibly contact information like an email address
Password or other means of "Logging in in order to make edits to one's resume"
Password recovery related fields

Section

Title
Default Sort Order

Entry

Type
Data

Notice that I wasn't very specific with the field names or types. Also notice that I didn't add the fields specifically related to the relationships between different stores. The reason for both of these is that we haven't concretely decided yet what technology we're going to use to store the data. We'll go through technology selection in the next article of this series...