iOS: Reimplementing the Set data structure in Swift

--

Aiming efficiency, the Set type in Swift comes as ideal when we just want a data structure where we need to know if a specific element of a given type belongs to the set or not and list all the elements that exist in there. Differently from arrays and lists, we don't desire a sequence of any elements of a type where a value may be duplicated. Instead, or only goal is:

  • To know if an element belongs or not to the set
  • Get all existing elements
  • Add a non-existing element
  • Remove an existing element

We don't want to display the elements in any sorted order since which matters is the content itself. But how to implement this data structure in an efficient way so operations are as simple as possible? That's what we are about to discuss in this article. There are two basic data structures we are about to rely on for creating our business rules.

Not a Collection

We are not about to create a Collection type. Just you know, Collection protocol establishes, well, a collection of values which you can iterate through, get its count, get first and last elements, between other properties. Aiming simplicity, everything we are about to do is creating a type where we can add not repeated elements and get in a constant time complexity information about an element belonging or not to the set.

Implementation

Aiming our structural rules, let's create a new Xcode Playground and implement a type holding a collection of values(generic ones).

Ok, we are assigning the elements property, but we are forgetting something very important: what about this elements array contains duplicated elements? We need to provide a turnaround for this:

Now we guarantee that elements are unique due to the verification of the element's existence in the elems array for each input, but we still have a problem: Checking if the given element exists is a very expensive operation since we need to perform O(n) for each element we want to verify duplication. So we need a helping data structure where we may check if an element exists in constant time. HashTables are the best choice for that since we can access the value of a single key with O(1) time. Let's create a dictionary in our CustomSet in order to map each element to a Bool :

Now we can retrieve information about the an element existing or not in the Set in O(1) since we are actually retrieving from a HashTable. Since we now have this benefit, why not implementing a new method for the user to check that?

Great, now let's implement two of the most trivial operations in a data Set.

Insertion and Removal

In order to provide integrity to our Set, when adding or removing new elements, we should not only update the elems array but also the exists dictionary. This way we are always getting reliable results when checking our collection:

Every time we may add a new element, as we desire unique elements in our set, we should check in O(1) time if that element exists or not. Just for your information, we placed the mutating keyword before the method declaration because we are dealing with a struct and the function changes its internal state.

The same should happen when we are removing an element: We first check with O(1) if that exists or not and only proceed with the O(n) removal if the element is surely in there.

Union and Intersection

Two fundamental operations of a set as we have already seen at school are Intersection and Union. We define these two results respectively as:

Intersection of two sets consists of a new set with all the elements that exist in both inputs.

Union of two sets consists of a new set with all the elements that exist in at least one of the inputs

Said that, all we need to do is implement each of the operations in a way we maintain the integrity of the data structure and optimize to the best time. As we have two sets with respectively n and m sizes and we need to iterate through all the elements in order to merge in a new one, we aim a O(n+m)(linear) time.

For the Union operation, as we verify duplicity for each element in the input when instantiating a new set, the creation of a new CustomSet by passing the elements from both inputs will be enough:

Union

For the Intersection operation, that will be a bit more complex since we need to verify which elements belong to both structures. In order to optimize, we will iterate only through the smallest set:

Intersection

Conclusion

In this article we took a deep overlook over the Set data structure and which differentiates it from other Collection types. We reimplemented it aiming its core by providing constant time operations to retrieve an element, removed duplicity and implemented its main operations like Union and Intersection. Of course there is more that could improve our implementation like conforming to the Collection protocol, which would allow us to iterate through each element with a simple syntax, just like arrays and strings, but we focused on the concepts, algorithms and time complexity. I hope you have enjoyed this article and learned something new ;).

--

--