Importance of “R” in Statistical Analysis and Data Science

Statistical Analysis and Data Analytics are getting more popular day by day. R programming language has gained a lot of popularity over the years because of it’s simple and easy to use approach after Python. R was created and developed by Ross Ihaka and Robert Gentleman. The name “R” was partly derived from the first letters of the authors’ names and also as a play on the name of the S programming language. R is very domain specific unlike Python.

Features of R:
  1. It has a consistent and incorporated set of tools which can be used to do various tasks.
  2. The notation of vectors in R programming is a very powerful feature.
  3. R supports various types of calculations on arrays, lists and vectors.
  4. One big feature of R is that it is open source. Anyone can use R, without any limitations. It is an interpreted language.  
Why is R so important?

R is taught all over the world in various universities and used in many companies for vital and important business operations. In various data science and statistics operations and applications, we have to deal with various types of data and numbers. Many tasks have to be performed, like Data Cleaning, Feature Selection, Feature Engineering and so on. R is also easily connected with databases like Spark and Hadoop. R provides excellent features for data exploration and data investigation. Various complex processes like correlation, clustering and data reduction can be done with R. 

Differences between R and Python:
Differences between R and Python:
  1. For data analysis and visualization, R has inbuilt functionalities, but in the case of Python, many of the libraries have to be imported. Seaborn, Matplotlib etc, all have to be imported in the case of Python. 
  2. The key aspect of R is related to data analysis and visualization. On the other hand, Python has multiple applications.
  3. To use R, developers and analysts start with R Studio. In the case of Python, Anaconda is used. 
  4. R has far more libraries as compared to Python. 
  5. Well, in the end, whether to use R or Python is decided by your need and demand of the project you are going to work on. It will also depend on the problem you are trying to solve. 
  6. Getting started with R is very simple. One needs to have basic math, statistics and programming knowledge to get started with R.
Some Advantages of R:
  1. R is platform independent. Basically, R is a cross platform programming language. R can run without any problems on Windows, Mac or Linux. 
  2. R has a wide variety of packages and libraries. There are approximately over 10,000 libraries in R. The number of packages in R are constantly increasing, and there are many packages for data science and machine learning.
  3. R has powerful tools for statistics and R is considered a good language for implementing various statistical methods.
  4. R is open source, we don’t need to pay money or buy a license to use R. This is the benefit of open source software and tools. They are open for everyone to use.
  5. R provides machine learning operations like regression and classification. Artificial neural networks can also be coded using R. 
  6. Visually appealing and stunning plots and graphs can be created in R using libraries like ggplot2 and plotly.
  7. The community of developers in R is constantly growing. Many new packages are getting created in R. Support and community help is also increasing. 
5. Let me give some basic understanding of R language

Before proceeding with this section, you should have a basic understanding of coding. A basic understanding of any of the programming languages will help you in understanding the R programming concepts.

R Language data types

1 . Data Types:

In all programming languages, we store data in various variables. All these variables have their data types. Some space is stored in the memory for storing the data. Let us have a look at the various data types in R.

Logical Data Type

We all know that logical data type is basically either true or false. Let us implement it using code.

var_1<- FALSE  

cat(var_1,"\n")  

cat("The data type is: ",class(var_1),"\n\n")
R Language data types
Output

Numeric Data Type

The float/ decimal value in R is known as Numeric Data type in R. It is taken as the default computational type for data.

var_2<- 234.56  

cat(var_2,"\n")  

cat("The data type is: ",class(var_2),"\n\n")
data types
output

Integer Data Type

Non decimal or floating point numbers are stored as integers. The only difference between the implementation of the numeric and integer data type is the “L” which indicates R to store it as an integer. Integer data type is available in all programming languages and same is the case for R.

var_3<- 45L

cat(var_3,"\n")  

cat("The data type is: ",class(var_3),"\n\n")
 data types
output

Complex Data Type

Complex data types are also available in R. Implementation is very easy and simple. Let us have a look.

var_4<- 34+ 3i

cat(var_4,"\n") 
 
cat("The data type is: ",class(var_4),"\n\n")
R in Data Science
output

Character Data Type

It is used to store strings and characters in R. Use and implementation is very simple and easy. 

var_5<- "R Programming"

cat(var_5,"\n")  

cat("The data type is: ",class(var_5),"\n\n")
data types
Output

2 . Variables:

A variable is nothing but a memory location, which is used to store values in a program. Variables in R language can be used to store numbers (real and complex), words, matrices, and even tables.

# Variable example using equal operator.  
variable.1 = 6
  
# Variable example using leftward operator.  
variable.2 <- "Capable Machine"     
  
#  Variable example rightward operator.     
13L -> variable.3             
  
print(variable.1)  
cat ("variable.1 is ", variable.1 ,"\n")  
cat ("variable.2 is ", variable.2 ,"\n")  
cat ("variable.3 is ", variable.3 ,"\n")  
Variables:
Output
  • Decision making:

Decision making is most familiar concept in coding i.e. If-else statement.
The decision making statement executes a block of code if a specified condition is true (+). If the condition is false (-), another block of code is executed.

Scala - IF ELSE Statements
Image from google
# Create vector quantity
quantity <-  10000

# Set the is-else statement
if (quantity > 7500) {
    print('Popular Blog on CapableMahine')

} 

else {
    print('Not Popular')  
}
R in Data Science
Output

3 . Loops:

A loop statement allows us to execute a statement or group of statements multiple times. There are three loops in R programming languages.

For Loop


This Loop is used for repeating a specific section of code a known number of times. 

for (initialization_Statement; test_Expression; 

update_Statement)  
{  
    // statements inside the body of the loop   
}  

Repeat Loop

This is used to iterate a section of code as other loops. But, It is a special kind of loop where there is no condition to exit from the loop. For exiting, we include a break statement with a user-defined condition.

repeat {  
 
   commands   
   if(condition) {  
      break  
   }  

}  

While Loop


This Loop is used to repeat a specific section of code an unknown number of times, until a condition is met.

while (test_expression)

{  
   statement  
}  

4 . Function

A function is a section of code that performs a specific task. Function can be called and reused multiple times in the code. You can pass some information to a function and it can send that information back. R programming languages have built-in functions that you can access, but you can create your own functions too.

“An R function is created by using the keyword function.” There is the following syntax of R function:

func_name <- function(arg_1, arg_2, ...) 

{  
   Function body   
}  

Function Components

The different parts of a function are −

  • Function Name − Name of the function stored in R environment as an object with name.
  • Arguments − An argument is referred to the values that are passed within a function when the function is called.
  • Function Body − The function body contains a logic part that defines what the function does.
  • Return Value − The return value of a function is the last expression in the function body to be evaluated.

This was the brief overview of R programming language. If you want to learn R in detail then I would suggest to take tutorials from YouTube or other online platforms.

Conclusion –

So, we had a look at why R can be a powerful tool for various Data Science related tasks. There are many benefits of using R, and the best thing about R is that it is free and Open Source. For more information on R, please visit: https://www.r-project.org/

Reference –
For more articles do visit to my profile on Analytics Vidhya –
<strong>Prateek Majumder</strong>
Prateek Majumder

Learner | Engineering Student

R in Data Science R in Data Science R in Data Science

Leave a Reply

Capable Machine