A bit of C++

15 Sep 2020

Reading time ~14 minutes

Some random Cpp-based knowledge follows

Dynamic allocation of arrays?

It’s well known that an array in C++ is of fixed size, may it be the traditional C-style arrays or an std::array (from C++ 11 onwards), but I’ve noticed that there are programmers (albeit, mostly beginners) who are not aware of the fact that the size of it must be a fixed compile-time constant in the language. Some still make the mistake of dynamically allocating the array size:

int size; std::cin >> size; int arr[size];
// Note that stack-allocated arrays like these are limited to the scope that they are declared in as well (deleted as soon as the scope ends).

Determining its size at runtime by the value given as input to size is not valid C++, even if that works. For flexibility in the size allotment, a std::vector is always the proper replacement for an array.

Virtual functions?

In a nutshell - methods of a class that can be (not necessarily have to be) overriden by subclasses. Using them incurs runtime performance penalties such as 1) the additional memory required for the creation of a vtable (which includes a member pointer in the base class that points it) so we can dynamically dispatch the call to the correct function, and 2) the lookup operation thereafter to pick the correct function from the vtable. Not too big of a penalty to lead to a noticeable difference in the runtime/memory-allocation though, with the flexibility often outweighing the usage.

class X 
{
    public: virtual std::string func() { return "xyz"; }
};
class Y : public X
{
    private: std::string m_name;
    public:  Y(const std::string& name) : m_name(name) {}
    std::string func() override { return m_name; } // defining base class virtual function 'func'.
};

A pure virtual function however, does not only lack an implementation in base class, but also forces the inheriting subclasses to define it. In C++, the classes with such functions act as an interface or an abstract class (interchangeable in C++), with it being not possible for us to instantiate objects of that class, and to rely on subclasses to extend the templatish-sort of format that they provide to follow.

class interfaceExample 
{ 
    public: virtual std::string func() = 0; 
};
class subclassExample : public interfaceExample 
{ 
    public: std::string func() override { return "xyz"; } 
};
int main() 
{
    subclassExample a; // interfaceExample cannot have objects.
    std::cout << a.func();
}

Function pointers?

As the name suggests, it is a pointer to a function. And functions are, after all, just CPU instructions stored somewhere in your binary executable, so this function pointer points to the address of the function in memory (the value you get when you do function instead of function()). Thus, when using them, you would be actually retrieving the location of those CPU instructions or the starting address of the executable code for the function, wherein a reference (&) is used. However, implicit conversions are possible. For instance, in the code below, writing Hi suffices: (instead of going with &Hi, unless the explicitmodifier exists)

#include <iostream>
void Hi(int n) { std::cout << "Hi" << n << std::endl; }
typedef void(*func)(int);
int main()
{
    func Print = Hi;
    Print(5); // Hi5
}

The one used above has a type of void(*Print)(int), i.e. a void type function pointer accepting an integer as an argument. The first () with a de-referencing asterisk contains the name of the function, which is Print in this example, and the second () signifies the type(s) of the argument(s) passed to the function, which in this case is an integer. Here’s an example of using a function pointer passed as a parameter to a function that just uses a ranged-based loop to print integer elements from a vector taken as the other input: (passed by reference, like lvalues as I discussed above)

void ForEach(const std::vector<int>& values, void(*functionPointer)(int))
{
   for(int value : values)
     functionPointer(value);
}

Ordinally, callback functions is where function pointers come into play given the flexibility to switch between different functions of the same signature (i.e., return type and parameters):

int addNums(int a, int b)      { return a + b; }
int multiplyNums(int a, int b) { return a * b; }
int performOperation(int (*functionPointerName)(int, int), int x, int y) { return functionPointerName(x, y); }
int main() 
{
    int (*fPtrName)(int, int) = addNums;
    int result = performOperation(addNums, 3, 3);
    std::cout << result << fPtrName(result, 3) << "\n"; // 69
    std::cout << performOperation(multiplyNums, 3, 23) << "\n"; // 69
}

Quite a few C++ functions accept function pointers (or basically functions) as input (one example being objects of std::thread).

Lambda expressions?

A fancy name for anonymous, temporary functions which can be invoked when necessary (correspondingly discarded like a throw-away function). It can also be passed as an argument to a function if the need arises. Considering the ForEach function that I wrote above, here is an approach to use it inside of main() by just emplacing a lambda in place of the function pointer, which takes an int argument and prints that integer when invoking the call inside of the main function: (note that every function pointer can be a lambda of appropriate signature)

int main()
{
  std::vector<int> values = {1, 2, 3, 4, 5};
  ForEach(values, [](int value)
  { std::cout << value << " "; });
}

Note that the [] in lambdas is for capturing variables, and in this case, the list of captures is empty. The capture clause is used to specify how one’s variables are to be included in the lambda expressions. For instance, [=] pertains to the lambda having access to all the local variables by making a copy of each, while [&] pertains to it having access to those by reference. A specific variable (or multiples) can be introduced as well. For an integer variable v available outside the lambda’s scope say, one can solely assign it by reference in the capture list by using [&v] to use it inside of the function. For just modifying the variable inside the scope of the lambda, using the mutable keyword (after the capture list, before the definition) would do the job.

Another quick example of where lambdas are a perfect fit:

std::vector<int> values = {31, 42, 53, 64, 75};
std::sort(values.begin(), values.end(), [](int x, int y) { return a < b; });

Using the standard namespace?

Essentially, using namespace std; as the global namespace may save some time on behalf of not prefixing std:: to every single item imported from the standard library, but it results in ambiguity wherein two entities of the same name exist. A good example would be the inclusion of a function named count() (a common name indeed), and everything would go well until you hit compile and see g++ or clang complain about the resulting ambiguity between the definition of your count function and the one from the STL library (i.e., std::count()). Another great example is max()/min(). To avoid such, and to be precise about where we are accessing our specific definition of something from, avoiding use of the standard namespace tends to be the preferred option.

On a side note, keep in mind that namespaces can be nested as well:

namespace n1 { namespace n2 { } }
// Equivalent to:
namespace n1::n2 { }

Using <bits/stdc++.h>?

To avoid it has been mentioned over a thousand times on places like SO (and ironically, even some CP forums), but its recurring usage is not a suprise since competitive programmers (newcomers like me, for the most part) are misguided or have an inclination to use it infrequently, with the reason being to save time writing the headers contained in the STL. But then, one can always use a pre-written template when necessary. Additionally, most of the problems don’t involve more than four to five headers and none of them include the entire STL library, so it’s not much of a hassle writing them within the contest duration either.

As to why it is to be avoided, or why its use is discouraged primarily, is because it includes the entire STL family of headers, which makes the program redundant and in turn, slightly slower in compilation. Moreover, although its supported by a few online judges and some compilers, it is not supported in most IDEs (eg: Visual Studio) and compilers. Its usage is highly discouraged in production code as well, so take it as a point to avoid using it. Considering you’ll be a software engineer in the long run, it’s inevitably guaranteed that no one would want you to write this header in their codebase :)

std::getline() and std::endl?

std::getline accepts a third argument, which is a character at which it will stop taking input from the input stream. Consider it to be a terminating trigger which indicates further input wouldn’t be taken after that specific character is encountered. Although specifying it is optional, it matters in cases like when you use a getline after a std::cin. Why? because std::cin leaves a newline character, and by default the terminating character for getline is a newline, or \n (even std::sort accepts a third comparator argument, which is a < by default) so yup, getline() won’t take any input because it already recieved its terminating character from the cin. To avoid such a case, use cin.ignore() or cin.ignore(std::numeric_limits<std::streamsize>::max(), '\n'); which ignores the newline character \n left by cin. Here is a good example describing this case. Not an issue usually, but std::endl should be avoided as well, if you don’t want to clear the output sequence, given that endl inherently calls std::flush.

Smart pointers?

Essentially a wrapper around a raw pointer that automates the process of allocating and deallocating memory. Under the hood, they call new and then delete is called automatically at a certain point based on the smart pointer used.

Unique pointers - Stack allocated object with little to no overhead which dies when the pointer goes out of scope. A unique pointer can’t be copied as multiples of these can be left dangling when one of them dies or gets out of scope (freeing memory that all of those scoped pointers would be pointing to). No converting constructor (or implicit construction) can be used with them (copy constructor and assignment operator is disabled for them), and it is better to use std::make_unique to be safe during exception calls (avoiding dangling pointers with no reference).

Shared pointers - Uses reference counting (keeping track of how many references we have to our pointer, with the allocated memory getting freed as soon as the counter goes to 0) which uses a shared counter (control block) to keep track of the instances.

Weak pointers - For not taking ownership of an object by not accounting for a reference count, essentially helping to avoid circular and dangling references.

lvalues and rvalues?

Holding the position for the next most talked-about thing after pointers (not one, not two, but maybe three-star cases?) which seem to be ideally confusing to newcomers, the “l stands for left, r stands for right” doesn’t work for all scenarios (downvotes to the one who thought about it, trying to find a shortcut - I mean I never would..probably) and a proper, refurbished definition would be “l stands for location, where lvalues occupy some location in memory, and it can be assigned some value/data; wherein rvalues in contrast, can’t assign anything or store data, moreover they are temporary objects with temporary resource acquisition”.

Example?

Here’s a program which uses an overloaded function foobar with the first definition using an rvalue reference (with &&) and the one that follows using a lvalue reference: (with &)

#include <iostream>
void foobar(std::string&& name) // rvalue reference std::string&&
{ std::cout << name; }
void foobar(const std::string& name) //const makes it work for both lvalues and rvalues
{ std::cout << name; } // lvalue reference std::string&
int main()
{ 
  std::string a = "Sea", b = "Plus Plus";
  std::string c = a + b;
  std::cout << c;     // chooses overloaded f() with lvalue reference (&) 
  std::cout << ( a + b ); // chooses overloaded f() with rvalue reference (&&) - could choose above one too since its defined as const, but it preferes &&
  std::cin.get();
  return 0;
}  

Note that defining the parameter as const makes it work with both lvalues and rvalues, but the compiler will still prefer the rvalue reference with && over the const one with & for rvalue calls among the overload.

Quick facts:

const after an asterisk means the pointer is constant (or cannot be reassigned to point to something else), whereas a const before it means the value of the variable it is pointing to is constant. const after a method means the class member variables cannot be modified inside the method (making it a read-only one).
A static variable or function outside a class/struct is only visible to the translation unit that it is compiled in (linkage of the symbol will be thus internal), and is not visible in global scope among other translation units, thus being hidden from the linker (and avoiding issues of multiple declarations of a variable or function across different source files, even when specified with external linkage). Conclusion: Make your global variables static unless you want them to be linked across translation units.
A static variable inside a class means only one instance of that variable exists for all objects of that class (no point of referring to that variable from a class instance). static methods work in a similar way, (use className::staticVariable/FunctionName) but they cannot access non-static variables (given they do not have a class instance, making it the exact same as a method written outside the class). Every non-static method always gets an instance of the current class as a parameter.
Vectors are like arraylists, following the reallocating scheme for dynamic resizing of the array to fit the contents.
public means everything inside the class is accessible outside without any abstraction, whereas protected means the visibility is limited to subclasses only (for instance, main() doesn’t have access). private being further constrained, is class-specific visibility of member variables and methods, being restricted to itself and friend functions (the only exception).
new calls a constructor (default/custom, depending upon the parameters supplied) of the object being instantiated (apart from just doing a malloc internally).
this not only helps to use arguments of the same name as the member variables inside constructors (basic usage), but it also helps to refer and pass that particular instantiated object inside the constructor when required (say inside a function call within the constructor).
enum variables can exist outside the scope of the enum namespace they are declared within, and thus can be referenced without the name of the enum (i.e., just enumMember instead of enumName::enumMember).
(*int)((*char)ptr + 4) is in bits, whereas *(ptr + 1) is the corresponding version in bytes - meaning that the pointer arithmetic we ordinally use (the latter) does increments by the size of the data type automatically (implicit casts).
Copy and move idiom dictates proper move semantics, in order to essentially steal the resources once the copy of an object gets deleted.

Good practices:

Use copy constructors to copy pointer elements that wouldn’t be taken care of by a direct copy of an object (pointers would be the same, leading to one pointing to invalid memory when the other gets deleted/freed)
Use emplace_back to write in-place to the allocated memory that the container (vector for e.g.) gets (avoiding the copying from main to there when using push_back), and use reserve to set the size of the container if known beforehand.
Always dereference pointers before incrementing (++) them (wrong order would result in unwanted results).
For standard enums, using characters (8-bit values) when possible saves a bit of memory (avoiding the use of 32-bit values, as the default is an int specification under the hood).
Create virtual destructors if subclasses are being made to override a base class function, as using polymorphic instances can leave behind the destruction of the derived class objects.
When possible/applicable, use a reference instead of passing a pointer (for instance, dataType& value = xyz; functionCall(xyz); over *dataType value; functionCall(&xyz);)

General: (not restricted or necessarily related to C++)

Both stack and heap memory are in RAM, but the former stacks data in a row (high spatial locality across cache lines) and takes only one CPU instruction to allocate memory, making it faster than the latter which does not necessarily store data in contiguous locations, and takes a fair bit of additional instructions during allocation (new uses malloc and searches for free memory that is at least equal to the requested amount in the freelist). Another difference is that memory allocated on the heap needs to be freed manually by using delete, whereas memory allocated on the stack is automatically freed (popped out) once that code is out of the current scope.

Anirban

09/15/2020

C++Code Snippets