Data organisation can be done in different ways. For example, the data can be organised using a logical model or a mathematical model. This data organisation is known as the data structure. There exist two criteria to select a particular data model to store and retrieve data. The two criteria are – first, the structure should be such that it can represent the relationship between data. Second, the structure should be such that data processing can be done effectively.
Data is the term given to a value or group of values or set of values. A single unit of value is called a value. Data items can be grouped based on specified relationships. The collection of data are organised into a hierarchy of fields, records, and files.
An entity has properties or attributes that can be assigned values. The values that are assigned can be numeric or non-numeric. Entities having the same properties build entity sets. The property of each entity in the entity set has a specific value.
The data is organised into fields, records, and files. The organisation must be such that it should represent the relationship between attributes, entities, and entity sets. For example, a field is a single unit that can represent an attribute of an entity.
A record contains values of attributes that define an entity. A file contains a set of records used to define an entity that belongs to an entity set.
A record is described using different fields. To select a particular record, a unique value within the record is selected. The attribute that can be used to identify the record uniquely is known as the primary key.
A record can be identified using its length. A file can contain a record with a fixed length or a record with a variable length. A record is considered a fixed length with all the fields of the same length that can contain values of having the same number of characters. A record is considered variable length if the fields that describe the record are of variable length. That is, the fields describing the record contains a variable number of characters. Each record of variable-length posses a minimum length and maximum length.
This arrangement of data into fields, records, and files helps in the efficient processing of data. To record more complex data, different types of data structures are used.
A data structure can be described using a logical or mathematical model, how these data structures can be implemented on the computer memory, the amount of memory used by the data structure, and the time required to process that data structure.
The efficiency of the data structure is also influenced by its storage on primary memory or secondary memory. Therefore, the study of data structure is done when it is stored on primary memory. When the data structure is stored on the primary memory, the study is done to find the time required to access the data structure and the amount of memory consumed.
If the data structure is stored on the secondary memory, the study of the data structure is done with respect to the file management that is study is done to find the efficient data structure that store and process the file effectively. The scientific study of files on the secondary storage device is done under a database management system.
Arrays are considered to be the simplest type of data structure. This data structure can be linear or one-dimensional. The most important property of arrays is that it contains similar data items. These similar data items can be accessed by using reference elements. The reference elements range from 1,2,3,…,n.
If the array is given the name “AA”, its values can be accessed by index or subscripts having integer values 1,2,3,…,n.
If the name of the array is “AA” then its first value can be accessed by using the following notations:
AA or AA(1)
Here, 1 is the index number, and “AA” is the name of the index.
The generalised notation to access the array value at the index k will be,
AA[k] or AA(1)
Here, “AA” is the name of the array and “k” is the index.
To denote an array, parenthesis notation and bracket notation is used. In addition, the name of the array is denoted by an uppercase letter and subscripts are denoted by integer values. These are linear arrays, also called one-dimensional arrays. In linear arrays, each subscript refers to one element in the array.
There also exist two-dimensional arrays. A two-dimensional array contains similar data items, and each element can be accessed using two subscripts.
Arrays, Records, and Pointers
A Data structure can be linear or non-linear—a linear data structure stores elements in a sequence. A linear data structure can be stored in memory in two ways – first, to store the elements of linear data structure in memory sequentially. That is, each element in the array bears a linear relationship with the other. Such a linear data structure are arrays. Second, to store the elements in the memory using pointers, all the elements in the data structure will have a linear relationship. These linear structures are called linked lists.
Other non-linear structure includes trees and graphs.
The operations that are done linear data structure such as an array or linked list includes – Traversal, Search, Insertion, Deletion, Sorting, and Merging.
To access each element in the array or linked list is known as the Traversal of an array or linked list.
Finding a particular element in the array or linked list is known as Searching an array or linked list.
When a new element is added to the array or linked list, it is known as an Insertion in the array or linked list.
When an element is removed from the linked list, it is known as Deletion from the array or linked list.
When the elements in the array are arranged in a particular order, it is known as Sorting an array or linked list.
When two lists are combined into a single list, it is known as Merging arrays or linked lists.
The selection of a particular data structure depends on its efficiency and effectiveness in storing and retrieving a particular data item from the array or linked. The efficiency and effectiveness are measured in terms of the time taken to access the element and the amount of memory taken to store the element.
A linear array contains a finite number of elements. These elements are of similar data types. These elements can be accessed using an index. All elements of linear arrays are stored in sequential memory locations.
The size of the array is user-defined and is commonly specified by an integer. The total number of elements in the array is found by subtracting the lowest index from the largest index and then adding 1 to it.
If Upper_Bound is the largest index of the array and Lower_Bound is the lowest index in the array, then the formula to calculate the total number of elements denoted by the variable length in the array will be:
Length = Upper_Bound - Lower_Bound + 1
If the array is denoted by “AA” and its elements are accessed by integer values, then each element can be denoted by the following notation:
AA or AA(1)
The generalised form of this notation is as follows:
Here, “AA” is the name of the index, and k is the index of the array.
Representation of Linear Arrays in Memory
Suppose LA denotes the linear array. This linear array is stored in memory. A computer memory contains sequential and non-sequential addresses. Since an array is a contiguous memory allocation, it will be stored sequentially. Thus, a sequential block of memory will be used to store an array.
Thus, to find the particular memory address of the element of an array following formula can be used:
LOC(LA[K]) = address of the element LA[k] of the array LA
Here, LOC is the memory location of an element having index K of an array LA.
All the elements of the array will be stored in the memory sequentially. The computer keeps track of the first element of the array. In addition, the computer keeps the record of the address of the first element of the array. The address of the first element of the array is known as the base address of the array and is denoted as follows,
Here, LA is the array Base is the function used to find the base address of the array.
Using Base(LA), the computer can calculate the address of any element of the array.
The formula to calculate the address of any element of the array is as follows:
LOC(LA[K]) = Base(LA) + w(K - lower bound)
Here, w reflects the number of words that can be accommodated per memory cell. Since the array elements are stored sequentially and accessed using index value, the time taken to calculate the address of any element of the array is the same. Thus the address of any element of the array can be calculated directly using the above formula.
Traversing Linear Arrays
Let A be the collection of sequential elements of the array. To access all the elements of the array, each element has to be visited. Visiting each element of the array sequentially is known as Traversing the array.
The algorithm to traverse array elements is as follows:
// Algorithm to traverse array elements In this algorithm, L_A is a linear array, L_B is the lower bound of the array, U_B is the upper bound of the array. Set K_1 := L_B Repeat Steps 3 and 4 While k_1 <= UB Apply P_ROCESS to L_A[K_1] Set K_1 = K_ + 1 Exit
Another algorithm to traverse the array is as follows:
// Algorithm to traverse array elements In this algorithm, L_A is a linear array, L_B is the lower bound of the array, U_B is the upper bound of the array. Repeat for K_1 = L_B to U_B: Apply P_ROCESS to L_A[K_1] Exit
Inserting and Deleting an Element from the Array
Suppose LA is the array that is stored in computer memory. Suppose an element is added to the array. In that case, this process is known as Inserting an element in the array, and if an element is removed from the array, then this process is known as Deleting an element from the array.
An element in an array can be inserted at the end of the array. An element can be added at the end of the array if the array has memory space for this addition.
Another case of inserting an element in the array is at the middle of the array. In this case, when the element is deleted from the middle, then the array is reshuffled. All the elements coming after the elements to be deleted are shifted one position ahead, but keeping the order of the order of the array maintained.
To delete an element from the array, the array element is removed from that position. For example, if the element to be deleted is at the end of the array, it is removed from the end position, and no reshuffling of array elements occurs.
If the element to be deleted is at the middle or at the beginning of the array, then the array element is deleted, and reshuffling occurs.
// Inserting into a Linear Array L_A denotes a linear array with N elements, K_1 denotes a positive integer and K_1 <= N. Set J_1 := N Repeat the Steps 3 and 4 while J_1 >= K Set L_A[J_1 + 1] := L_A[J_1] Set J_1 := J_1 -1 Set L_A[K_1] := ITEM Set N := N + 1 Exit
An algorithm that deletes the Kth element from the array L_A is as follows:
L_A is a linear array. This linear array contains N elements. K_1 denotes positive integer and K_1 <= N holds. Set ITEM := LA[K_1] Repeat for J_1:= K_1 to N-1 Set LA[J] := L_A[J+1] Set N := N - 1 Exit
Searching an Element in the Linear Array
Searching refers to the process of finding an element in the array. The search process is successful if the element is present in the array; otherwise, it is unsuccessful. The item in the array can be searched successfully using a linear search algorithm or using a binary search algorithm.
Both linear search algorithms and binary search algorithms are different. Both these algorithm takes different time to search the array. The linear search algorithm takes linear time, and the binary search algorithm takes log2n.
Linear Search Algorithm
L_A is the array to be searched. To search an item in the array, each element is compared with the Item to be searched. If the item to be searched is named ITEM, then to compare an element of the array to the ITEM following statement is executed:
L_A = ITEM
A variable LOCATION is taken to traverse each element of the array. Each time the array is traversed, then variable LOCATION is incremented. For example, to increment the variable location following statement is executed:
LOCATION = LOCATION + 1
Following is the algorithm to search an item in array L_A.
// L_A is the array to be searched; variable ITEM contains the value to be searched, variable LOCATION is used as increment variable. Finally, variable N contains the uppermost bound of the array. Set L_A[N+1] := ITEM Set LOC := 1 Repeat while L_A[LOCATION] ≠ ITEM Set LOCATION := LOCATION +1 If LOCATION = N + 1, then: Set LOCATION := 0 Exit
If the ITEM to be searched in the array is at the last index of the array, then the whole array is to be searched. In this case, the algorithm takes the maximum time to search the element. The time taken to search the array is directly related to the length of the array. The higher the number of elements in the array, the more comparison needs to be done.