Tokens in C language

Every language has some basic elements and grammatical rules for Tokens in C. For example, the English language has an alphabet from which everything in the language is constructed. It has ruled for forming words, rules for forming sentences etc for tokens in C.

In the same manner, before understanding programming, you must know the basic elements of C language. These elements include the keywords, identifiers, variables, constants, data types, declaration, operators, expressions etc.

The tokens of a language are the basic building blocks that can be put together to construct the program.

What are Tokens in C?

A Token is the Smallest element of a program that is meaningful to the compiler.

C tokens can be classified as shown below:

tokens in c

1. Character Set

The character set consists of all uppercase characters, the lowercase characters, digits, certain special characters and white spaces.

Uppercase Letters: A, B, C……Z.

Lowercase Letters: a, b, c…….z.

Digits: 0, 1, 2,….9.

Special Characters:


White Space Characters or Escape Sequence:

backspace(\b), vertical tab(\v), newline(\n), form feed(\f), horizontal tab(\t), carriage return(\r).

These characteristics combinations are known as an escape sequence.

2. Keywords

Keywords are reserved words that have standard, pre-defined meaning in C programming language.

Each keyword has intended purpose in a program. (Note that the keywords are all lowercase letters.

Keywords are referred names for a compiler, they cannot be used as variable names or user-defined function names. Also, we cannot redefine keywords.

ANSI C supports 32 keywords which are given below:


3. Identifiers

Identifiers are user-defined names that are given to various program elements, such as variables, functions and arrays.

The rules for naming identifiers are as follows:

Identifiers consist of letters (both uppercase and lowercase), digits and underscore ( _ ).

An identifier must begin with an alphabet or underscore ( _ ).

C is case sensitive, the uppercase and lowercase letters are considered different. For example sum, Sum and SUM are three different identifiers.

 The identifier should not be a keyword or reserved word.

No special characters or white spaces are allowed except underscore ( _ ).

An identifier name may be arbitrarily long. But ANSI standard compilers recognize 31 characters.

Some valid identifiers are as follows:

Area, NUM1, _sum, prime_no, max100 etc.

The following are invalid identifiers:

5percent: (invalid) identifier must begin with letter or underscore ( _ ).

Area Circle: (invalid) white spaces are not allowed.

case: (invalid) keywords cannot be used.

Max-no: (invalid) No special characters are allowed except underscore ( _ ).

4. Variables

A program performs an operation on data. This data has to be stored so that operations can be performed. The variable is a name assigned to a memory location in the computer’s memory to store data.

A variable is an identifier that is used to store some specific type of information within a memory location. In other words, it is the data name that refers to the stored value.

The variable can have only one value assigned to it at any given time during program execution’ Its, value may change during the execution of the program.

All variables must be declared before they can be used. Since Variable is an identifier, all rules for naming identifier is applicable for naming variables.

Rules For Naming Variables:

Variables must begin with a letter or underscore ( _ ).

They must consist of only letters, digits, or underscore. No other special character is allowed.

It should not be a keyword.

It must not contain white Spaces.

Should be up to 31 characters long as only the first 31 characters are significant.

C is case sensitive. C treats uppercase and lowercase letters differently.

The meaningful name should be given to variable to indicate what value it stores.

Difference Between Identifier and Variable:

All identifiers are not variables.All variables are identifiers.
The identifier may not have any memory unless it is a variable.All variables have memory.
Mentioning the type of an identifier is not needed unless it is a variable.Type of the variable must be defined.

5. Constants as Tokens in C

Constants or literals refer to fixed values that do not change during the execution of a program.

The Constants are treated just like regular variables except that their values cannot be modified after their definition.

The C supports several types of constants:

Numeric Constants:

Numeric constants consist of numeric digits, they may or may not have decimal point. The rules for defining numeric constants are as follows:

Numeric constant should have at least one digit.

No comma or space is allowed within the numeric constant.

Numeric constants can either be positive or negative but the default sign is always positive.

The value of a constant cannot exceed specified minimum and maximum bounds. For each type of constant, these bounds will vary from one C compiler to another. 

There are two types of numeric constants namely, Integer Constant and Floating-Point Constant.

1. Integer Constant:

Integer constants are whole numbers without any fractional part or decimal point. It must have at least one digit and may contain either + or – sign. A number with no sign is assumed to be positive. 

A size Or sign qualifier can be added at end of constant: U or u (unsigned), L or l (long), S or s (short). 

There are the following three types of integer constants:

  1. Decimal Integer Constant (Base 10).
  2. Octal Integer Constant (Base 8).
  3. Hexadecimal Integer Constant (Base 16).

Decimal integer constant consists of a set of digits 0 to 9 preceded by an optional + or – sign. If the constant contains two or more digits, the first digit must be something other than 0.

An octal integer constant is always preceded with 0 and consists of 0 to 7 digits.

A hexadecimal integer constant is always preceded with Ox or OX. It consists Of digits O to 9 and letters A-F or a-f. Note that the letters a through f (or A through F) represent the (decimal) quantities 10 through 15, respectively.

2. Floating Point Constant or Real Constant:

A floating-point constant is a base 10 number that contains either a decimal point or an exponent or both. It may also have either + or— sign preceding it.

Spaces, commas and other symbols are not permitted between digits. It is possible to omit digits before or after the decimal point.

Example of valid real constants in fractional form or decimal notation, 0.05, -0.905, 562.05, .015, +5., .50.

For expressing very large or very small real constants exponential (scientific) form is used. Here, the number is written as follows:

mantissa e exponent OR mantissa E exponent

The mantissa must be either an integer or a real number expressed in decimal notation. The mantissa can be positive or negative. The exponent must be an integer and can be positive or negative.

The interpretation of a floating-point an exponent is essentially the same as scientific notation, except that the base 10 is replaced by the letter E (or e).

Thus, the number of 1.2×10^-3 would be written as 1.2E-3 or 12e-3.This is equivalent to 0.12e-2, or 12e-4 etc.

Character Constants:

A character constant contains one single character enclosed with single quotes (‘ ‘). Examples of valid character constants are ‘a’, ‘Z’, ‘5’ etc.

The character constants have integer values known as ASCII (i.e American Standard Code for Information Interchange) value, in which each individual character is numerically encoded with its own unique 7-bit combination. For example, the ASCII value of ‘A’ is 65.

Character constant occupies 2 bytes in memory because it has ASCII value.

Escape Sequence:

C language allows us to have certain non-graphic or non-printing characters as character constants. Non-graphic characters are those characters that cannot be typed directly from the keyboard, for example, tabs, carriage return, etc.

These non-graphic characters can be represented by using escape sequences represented by a backslash (\) followed by one or more characters.

An escape sequence always begins with a backward slash and is followed by one or more special characters. An escape sequence consumes only one byte of space as it represents a single character.

Some valid character constants are ‘X’, ‘3’, ‘d’, ‘\n’, ‘\\’, ‘\’.

String Constants:

String constants are a sequence of characters enclosed within double quotes(” “).

For examples, “hello”, “abc”, “12345”.

Every string constant is automatically terminated with a special character ‘\’ called the null character which represents the end of the string.

For example “hello” will represent “hello \0” in the memory. This character is not visible when the string is displayed. Thus, the size of the string is the total number of characters plus One for the null character.

Double quotes act as a delimiter, But when we want “to be part of string then use as escape sequence\”character. For example “Welcome to \”fybsc(comp)” 2020″.

A character constant ‘A’ and string constant “A” are not equivalent. A character constant has an equivalent integer ASCII value, whereas a string constant does not have an equivalent integer value and, in fact, consists of two characters the specified character followed by the null character \0).

6. Data Types as Tokens in C

Every program requires the processing of the data. We have to declare data associated with its type in the program of tokens in C.

Data types indicate what type of value you can store in a variable. The data type may be Numeric or Non-numeric in nature.

C supports different types of data. Storage representation of these data types is different in memory.

1. Primary or Built in Data Types

Primary or built-in data types in C language are char, int, float, double, void.

1. char:

“char” keyword is used to refer to character data type. Character data type allows a variable to store only one character. storage size of character data type is 1 byte.

For Example. ‘A’ can be stored using char datatype. Variables of type character generally used to hold values defined by the ASCII character set.

2. int:

“int” keyword is used to refer integer data type. Integer data type allows a variable to store numeric values. The storage size of int data type varies depending upon the processor.

It may be 2, 4 or 8 bytes on 16-bit, 32-bit, 64-bit processor.

3. float:

float keyword is used to refer real or floating data. It allows a variable to store decimal values. Storage size of float data type is 4 bytes. It has precision up-to 6 digits after decimal point.

4. double:

It is same as float but with a higher precision range that is 8 bytes which gives it an ability to store more numerals after the decimal places.

5. void:

The void is an empty data type that has no value. This can be used in functions and pointers.

It is used in functions to specify the return value or the arguments.

Data TypeSize in Bytes

2. User-Defined Data Types

C supports a feature known as “type definition” that allows the programmer to define a new name or identifier that would represent existing data types. It can be possible using enum and typedef keywords.


C enumeration is a user-defined data type which consists of a list of names and each name corresponds to an integral constant.

The enumerated data type can be defined in C using “enum” keyword.

Syntax: enum identifier_name { name1, name2,…., namen};

Example: enum day {Mon, Tue, Wed, Thu, Fri, Sat, Sun};


typedef is used to give a new name or alias to existing basic data types.

Note that typedef cannot create a new data type.

Syntax: typedef datatype user-defined-name or identifier;

Example: typedef int INTEGER;

To define variables, we can use a user-defined type:

INTEGER sum=0; area circle, square;


So in this lesson, we have learned about the different types of tokens in C. You can also learn tokens in c language Wikipedia and also tokens in c in Hindi.

Also Read: