Character and String (1)

String is an array of characters. Characters are "a - z", "A - Z", "0 - 9", plus some other characters (special symbols, control characters, ... etc).

Library function "getchar(), putchar()"


int getchar(void)
int putchar(int c)

The above two are functions in "stdio.h" (Standard Input Output Header File, which contains all the declarations of input/output subroutines).

getchar() needs no argument ("void" means no argument needed), it gets one character from the keyboard and returns its ASCII code in integer (ASCII = American Standard Characters for Information Interchange). e.g. the ASCII code for letter "a" is 97.

putchar(int c) prints one character on the screen. The ASCII code of the character is reguired.

Notice that getchar() returns an "int" (integer) , and putchar() requires an integer.

You would ask, why not returns a "char" (character), and why not requires a character.

It is because "byte" is not convenient for the computer hardware, though "byte" occupies the smallest space. Integer "int" (which may be 16 bits or 32 bits. For a 16 bits microprocessor, "int" would usually be 16 bits. For a 32 bits microprocessor, "int" would usually be 32 bits) is more suitable for the computer hardware, as its bit length usually coincides with the hardware register's bit length.

But we may assign a "char" to an "int" variable, or vice versa, and "char" works like an integer (0 - 255 or -127 to 127).


Exercise : Write a C program that display the ASCII codes of the characters you input through the keyboard.

Use

"printf("The character inputted is %c and its ASCII value is %d \n", ....)

and getchar().

Ans :


    #include <stdio.h>

    int main()
    {    int c;

    L20 : c = getchar();
          printf("The character inputted is %c and its ASCII value is %d \n", c, c);
          goto L20; 
    }

%c tells the computer to print the "character", %d tells the computer to print a decimal (or integer).

You will notice that nothing will be printed until you press "Enter" key. It is because the Operating System will not send the characters to the program until we press "Enter" key. (Question : what is the ASCII code for newline, i.e. the code generated when you press "Enter" key.)

You will find that the ASCII code for A to Z is 65 to 90, a to z is 97 to 122, 0 to 9 is 48 to 57. (CAUTION : ASCII code for 0 to 9 is not simply 0 to 9 !)

How to assign characters in program

ASCII code for A is 65, the following two statements do the same thing,

        char c;
       
        c = 'A';          /*  Note the use of single quotation mark for character */
        c = 65;           /*  Same, but uses the ASCII decimal value of A */

Exercise :Write a C program, that prints the letter B and its ASCII value.

    #include <stdio.h>

    int main()
    {    char c;

         c = 'B';
         printf("The character  is %c and its ASCII value is %d \n", c, c);
         return 0; 
    }

(Note : single quotation mark must be used in char assignment. If you use double quotation mark, the effect is totally different, and will be explained later.)

Exercise :There are special characters in the ASCII code, e.g. newline, carriage-return, ...

        LF (line feed)             \n         (Notice that C uses backslash to
        HT (horizontal tab)        \t          mean "special character")
        CR (carriage return)       \r
        FF (form feed)             \f
        Single quote               \'
        Double quote               \"
        Question mark              \?
        Octal number               \0...      (Note : backslash then "0" then octal number)
        Hex number                 \x...      (Note : backslash then "x", then hex number)  
               ....
The following program prints the decimal value of "\n",
    #include <stdio.h>

    int main()
    {    char c;
         c = '\n';
         printf(" decimal value of newline characters is %d\n",c);
         return 0;
    }

Exercise : Try to find out the decimal values of various special characters.


Pointer Arithmetics

Exercise : How to declare variable "ptra" to be a pointer for "double", "ptrb" , a pointer for "unsigned long" ?

Ans :

double *ptra;
unsigned long *ptrb;

Exercise : How to assign the address of a variable to a pointer (address register) ? (e.g. we have

double a;
unsigned long b;
How to assign addresses of them to "ptra, ptrb"?

Ans :

double a, *ptra;
unsigned long b, *ptrb;
ptra = &a;
ptrb = &b;

Exercise : How to print the actual address stored in "ptra, ptrb" in hexadecimal ?

Ans :

        printf("Address of ptra is %p\n"
               "Address of ptrb is %p\n", ptra, ptrb);
Notice that, if the "format string" is too long, we may break it up into several strings, and put them on separate lines. But there MUST NOT be a comma between the component strings. Also notice that "%p" is used to output "pointer address" in hexadecimal.

Exercise : What would you expect the output to be (ptra is pointer for double, ptrb is pointer for unsigned long)

        printf("%p   %p \n", ptra, ptra + 1);
        printf("%p   %p \n", ptrb, ptrb - 1);

Ans : Since "ptra" is a pointer for double, therefore, when it is incremented by 1, the address will be 8 bytes more. But "ptrb" is a pointer for unsigned long (which is 4 bytes), so, when it is decremented by 1, it is 4 bytes less.

Exercise : Suppose "ptrc" is a pointer for a derived data type, a structure, whose size is 40 bytes.

(you may find out the size of a data type using
sizeof(data_type)
sizeof variable_name
)
e.g.
    typedef struct { double a[3];
                     unsigned long b[4];
                   } Newtype;

    Newtype  record1, *ptrc;     /* What does this declaration do ? */
    printf("Size of variable 'record 1' is %d\n", sizeof record1);
Question : when ptrc is incremented? what will be the new address ?

Ans : "typedef" and "struct" are two C keywords, they are used to define new data types. Note how they are used. We will come back to "typedef" and "struct" later. At the moment, just guess its syntax.

Now we have defined a new data type "Newtype", and its status is similar to "char, short, long, float, double, ....". And we may use it in declaration statement, e.g. "record1" is a variable of the type "Newtype", and "ptrc" is a pointer for "Newtype". When "ptrc" is incremented by 1, its address is 40 bytes more. ( 3*8 + 4*4 = 40 )

Notice that we may define still other new data types using "Newtype", e.g.

    typedef struct { Newtype a[3];
                     char b[20];
                     double wages;
                   } Newnewtype;

    Newnewtype  rec, *ptrd;
    printf("Size of variable 'rec ' is %d\n", sizeof rec);
i.e. we may define new data type using data types defined already. This is the power of C - we may define new data types based on old data types. In Qbasic, we cannot define data types recursively.

Question : what is the sizeof(Newnewtype) ?
Ans : 3*40+20+8 = 148 bytes.

Exercise : Arrays are declared with [...] and may be initialized, (note the use of curly brackets in initialization) e.g.

        double   *ptra, dum1, dum2, a[6] = {12.3, 45.6, 0.9, -5.6, 18.9, -200.};
        unsigned long   *ptrb, dum3, dum4, b[8] = {1, 20, 6, 8, 30, 100, 3, 27};
        ptra = a;            /* Notice that this is the same as "ptra = &a[0];"   */
        ptrb = b;            /* Same as "ptrb = &b[0]; "  Whenever an array name is
                                used alone, it always means the address of its first
                                element */
        dum1 = *ptra;
        dum2 = *(ptra + 3);

        ptrb += 2*3;
        dum3 = *ptrb;
        dum4 = *(ptrb + 1);
What are the values of dum1, dum2, dum3, dum4 ?

Ans : dum1, 12.3; dum2, -5.6; dum3, 3; dum4, 27.

Exercise : Will the following code work ? And what are the values of dum1, dum2 ?

        double   dum1, dum2, a[6] = {12.3, 45.6, 0.9, -5.6, 18.9, -200.};
        unsigned long  dum3, dum4, b[8] = {1, 20, 6, 8, 30, 100, 3, 27};

        dum1 = *a;
        dum2 = *(a + 3);

Ans : Yes. dum1 will have 12.3, dum2, -5.6

Exercise : Will the following code work ? And what are the values of dum3, dum4 ?

        double   dum1, dum2, a[6] = {12.3, 45.6, 0.9, -5.6, 18.9, -200.};
        unsigned long  dum3, dum4, b[8] = {1, 20, 6, 8, 30, 100, 3, 27};

        b += 2*3;
        dum3 = *b;
        dum4 = *(b + 1);

Ans : No. Though "b", which is an array name, may be used as an address register, we cannot change its value, i.e. " b += 2*3 ; " is illegal. However, the following still works,

        dum3 = *(b + 2*3);
        dum4 = *(b + 2*3 + 1);

Exercise : What will be the values of dum1, dum2, dum3, dum4 in the following codes,

        double   dum1, dum2, a[6] = {12.3, 45.6, 0.9, -5.6, 18.9, -200.};
        unsigned long   dum3, dum4, b[8] = {1, 20, 6, 8, 30, 100, 3, 27};

        dum1 = a[0];
        dum2 = a[3];

        dum3 = b[6];
        dum4 = b[7];

Ans : The same values as before.

Exercise : Guess how arrays are implemented in C, or in other computer languages, e.g. FORTRAN.

Ans : They are all implemented using "pointers" (address registers).

In C, the name of an array may be used as a pointer. "name of array" and pointer serve the same purpose.


How values are passed to subroutines

When the calling program calls the subroutine, e.g.

suba(3, 24.54, &tt, 4000);
values of the arguments are pushed into the argument stack.

The subroutine "suba" uses these memory locations in the stack as if they are his own variables, hence it may changes its value if it wants to, e.g.

i = i*3 ;
x = -100.e2 ;
y = &varx ;
k = 99 ;

The effect is as if we have declared them,

double a, b, c;
int m, n;
int i;
double x, *y;
long k;

Also, the local variables defined in the subroutine,

double a, b, c;
int m,n ;
are given places in the "dynamic memory stack".

When the subroutine finishes its task, it pushes its return value "int suba" into "the return value stack", and releases memory from "argument stack", "dynamic memory stack".

When the calling subroutine has gotten the value, the memory in "return value stack" is released too.

Notice that they may not be three separate stacks, it all depends on the implementation.


How Strings are stored in C

Strings in C are "characters array". By convention, the last character of a string MUST HAVE decimal value 0.

e.g. The string "abc" is stored in memory as

In the program, we may use, e.g.

       int main()
       {   char a[]={97, 98, 99, 0};

   or      char a[]={ 'a', 'b', 'c', '\0' };      /* Note the use of '\0' for decimal 0 */

   or      char a[]="abc";
          ....... 
          .......
       }

In char a[]="abc"; the compiler automatically puts a 0 at the end, hence the size of the array is 4 and not 3. It is the same as char a[4]="abc";

At the very beginning of this chapter, we have discussed two library functions in "stdio.h"

int getchar(void)
int putchar(char c)
which reads in one character from the keyboard, and prints one character on the screen.

Exercise : Write a program that prints the string "abc" defined above on the screen. You should replace the 0 with a newline character '\n'.

Ans :

     #include <stdio.h>

     int main()
     {  char a[]="abc";
        char *ptr;
        int c;

        ptr=a;
     L20:
        c = *ptr;
        if (c != '\0')         /* Note : it is the same as if (c != 0 ) ...  */
            {putchar(c);
             ptr++;
             goto L20;
            }
        else
            {putchar('\n');
            }

        return 0;
      }

Exercise : The two programs below claim to print a string too. Are they correct?

 
(program 1)

     #include <stdio.h>

     int main()
     {  char a[]="abc";
        int c;

     L20:
        c = *a ;
        if (c != '\0')
            {putchar(c);
             a++;
             goto L20;
            }
        else
            {putchar('\n');
            }

        return 0;
      }

(program 2)

     #include <stdio.h>

     int main()
     {  char a[]="abc";
        int i, c;

        for (i=0; ; i++)
            { c = a[i];
              if (c!= '\0')
                  {putchar(c); }
              else
                  {putchar('\n');
                   break;        /* Note: keyword 'break' is used to exit loop immediately */
                  }
             }

        return 0;
      }

Ans : Program 1 is wrong, as we cannot change the value of "a". Though "a" functions like a pointer, but its value is fixed, the memory location of a[] is fixed by the compiler.

Program 2 works.

Exercise : Write a subroutine that prints a string, and rewrite the above program using the subroutine.

Ans :

     #include <stdio.h>
     #include <assert.h>

     int st_puts(char *a)
     {   int i, c;
         for (i=0;;i++)
             {c=*a;
              if (c!='\0')
                  {putchar(c);
                   a++;
                   assert(i<=65535);
                  }
              else
                  {putchar('\n');
                   return 1;
                  }
             }
      }

     int main()
     {  char a[]="abc";
        st_puts(a);
        return 0;
     }

Note :

  1. Subroutine assert(..) is in <assert.h>, viz.
    assert(condition);
    "assert" will do nothing if condition is true, (Note : condition is true <=> expression is non-zero), but will immediately stop the program should the condition be false.
    We should use "assert()" frequently, and this may catch some difficult bugs.

  2. The subroutine name is
    int st_puts(char *a)
    It may also be written as
    int st_puts(char a[])
    We have seen before that array and pointer works similarly, and in fact, the compiler implements arrays using pointers. Hence the two statements function the same.
    (Question : what does the declaration "char **a; " means ?
    Ans : it is the same as "char *a[];", i.e. "a" is an array of character pointers.)

    In C, whenever we pass an array as argument to a function, the address of the first element of the array is passed. And because of this, we may change the contents of the array in the subroutine, using the address provided. Thus, we may say that, "arrays are passed by reference whereas single variables are passed by value."

    In the following program, the value of an array is changed in a subroutine,

      #include <stdio.h>
    
      void changetest(double *x)
      {    int i;
           for (i=0;i<5;i++, x++)
                {*x = (double) i;
                }
           return;
      }
    
      int main()
      {    double a[]={65, 70, 55.4, 80, 90};
           int i;
           for (i=0;i<5;i++)
                {printf("element %d is %10.2f\n",i,a[i]);
                }
           changetest(a);
           printf("After calling the subroutine\n");
           for (i=0;i<5;i++)
                {printf("element %d is %10.2f\n",i,a[i]);
                }
           return 0;
      }
    
  3. If we do not check the length of string, i.e. we do not want
    "assert(i<=65535)", we may use
      int st_puts(char *a)
      {   int c;
          while ((c=*a) != '\0')
              {putchar(c);
               a++;
              }
          putchar('\n');
          return 1;
      }
    
    OR
    
      int st_puts(char *a)
      {   int c;
       L20:
           c=*a;
           if (c != '\0')
              {putchar(c);
               a++;
               goto L20;
              }
           else
               {putchar('\n');
                 return 1;
               }
      }
            
    (Note : Personally, I would use "If , goto", because our logical thinking works better that way. "while (...) {....}" and "do {...} while (..) " may be used. But to use them, often we have to think a little, hence they are not as intuitive to use as "if, goto".)

  4. It is very inconvenient to put "#include ...." every time we write a program, hence, it is advisable to "make" a directory, e.g.
    /home/tom/include/
    and put a file "c.h" in it, /home/tom/include/c.h
    content of the file "c.h" :
    
      #include <stdio.h>
      #include <assert.h>
      #include <ctype.h>
      #include <float.h>
      #include <limits.h>
      #include <math.h>
      #include <stdlib.h>
      #include <string.h>
      #include <time.h>
             

    Then in the main program, we put

      #include <c.h>
             

    Because this header file "c.h" is stored under /home/tom/include/, when we compile, we should inform the compiler about it, e.g.

    gcc -Wall psin.c -I/home/tom/include/

Dewel's Classification Method

The main design philosophy of C is that we should write small function modules, and write the main program using many such small modules.

Hence keywords of C are a little more than 40 (please see Chapter 3, list of all keywords in C). C can be said to be a minimal language, a tool to help assembler programmers, and it itself may be said to be a "standardized assembler".

Because of this, we will find ourself facing with lots and lots of subroutines. We must devise a way to manage all these , or else confusion would result.

Dewel's Classification Method is used widely by librarians all over the world, e.g

510.3845 jack
is a "mathematics" book written by "jack".

In essence, the classification is

top-level-classification _ 2nd-level _ 3rd-level _ .....
and numbers as well as alphabets may be used.

We should do the same, and be a librarian ourself, or else, we cannot manage all our subroutines.

We need not have too many levels of classification, or else writing the names (a name with 30 characters !) would be a burden.

You should devise your own classification scheme.

I advocate testing each subroutine independently before using them in the main program (or before putting them into a library).

I will use the following for the rest of this book,

Suppose the subroutine filename is "st_puts.c", then the test filename for this subroutine is "st_puts_t.c", with "_t" added at the end. Also, we usually need a "makefile", and this makefile, after it has served its purpose, will be archived using the name "mf_st_puts_t.c", i.e. with "mf_" added at the beginning.


Example of how to make a library, and how to use it

Please re-read "Chapter 2 Introduction to Unix Operating System" if you have difficulties in following the material below.

  1. First we "make" a directory, e.g. "/home/tom/lib/" and we will put all our subroutines in library, say, "libtom.a" or, "/home/tom/lib/libtom.a" .

  2. Suppose we are to put the subroutine "st_puts(char *a)" into the library /home/tom/lib/libtom.a , our first step is to "select -> copy -> paste" the following into a file, "st_puts.c"
         #include <stdio.h>
         #include <assert.h>
    
         int st_puts(char *a)
         {   int i, c;
             for (i=0;;i++)
                 {c=*a;
                  if (c!='\0')
                      {putchar(c);
                       a++;
                       assert(i<=65535);
                      }
                  else
                      {putchar('\n');
                       return 1;
                      }
                 }
          }
             


  3. Question : May be omit the 2 "include" statements at the very top, i.e. may we omit

    #include <stdio.h>
    #include <assert.h>

    Ans : No, because then the compiler would not know what type of subroutines "putchar()" , "assert()" are : what return values they provide, what types of arguments they need. Do they need integer argument, or do they need double argument, or do they need pointer argument?)

  4. Next we compile it using,

    gcc -Wall -c st_puts.c

    The "-c" options tells the compiler to "compile" only, and do not "link".

    The compiler will produce an object file, "st_puts.o", with file extension ".o".

    Next, we put this object file into our library, "libtom.a",

    ar -rvs /home/tom/lib/libtom.a st_puts.o

  5. Question : To view the table of content of the library file "/home/tom/lib/libtom.a", what command will you use ?

    Ans :

    ar -t /home/tom/lib/libtom.a

    Question : can we use "relative pathname", say,

    ar -t lib/libtom.a
    instead of "absolute pathname" when we are in directory /home/tom/ ?

    Ans : Yes.

  6. The main program, "st_puts_t.c" should be
         #include <stdio.h>
         #include <assert.h>
    
         extern int st_puts(char *);
     
         int main()
         {  char a[]="abc";
            st_puts(a);
            return 0;
         }
    
           

    Notice that there is a statement " extern int st_puts(char *); ". This tells the compiler that we will be using a subroutine called st_puts(). It needs 1 argument of the type character pointer, and it returns an integer.

    "extern" is a keyword in C, meaning that the subroutine is defined not in this file, but outside.

    This process is called "supplying a proto-type" to the compiler. Notice that we may omit variable names in prototyping, e.g.

    extern double sub1(int, int, double, char *, long);
    instead of
    extern double sub1(int i, int j, double x, char *a, long m);

  7. Now the "gcc" command should be, e.g.
    gcc -Wall st_puts_t.c -I/home/tom/include/ \
    -L/home/tom/lib/ -ltom -lm
    Notice that when the command is too long for a line, we may write
          xxxxxx  \
          xxxxxx  \
          xxxxxx  \
          xxxxxx
    
    i.e. put a backslash before we press "Enter", then it will be continued onto the next line and so on.

    Apart from "libtom.a", it is advisable to include "libm.a", the mathematics library. Because on many occasions, we need the Maths library.

  8. Question : What do the options
    "-I/home/tom/include/" and "-L/home/tom/lib/" mean ? (Please review Chapter two)

  9. Question : Suppose we have not put "st_puts.o" in the library, what should the gcc command be ?

    Ans :

    gcc -Wall st_puts_t.c st_put.c -I/home/tom/include/ -L/home/tom/lib/ -ltom -lm


  10. Question : Suppose we have modified the file "/home/tom/include/c.h" to be

      #include <stdio.h>
      #include <assert.h>
      #include <ctype.h>
      #include <float.h>
      #include <limits.h>
      #include <math.h>
      #include <stdlib.h>
      #include <string.h>
      #include <time.h>
      extern int st_puts(char *);
    
    i.e. we have added "extern int st_puts(char *); " at the end, and suppose our main program is now
         #include <c.h>
         int main()        /* Note that the extern statement is missing */
         {  char a[]="abc";
            st_puts(a);
            return 0;
         }
    
    and we compile it with
    gcc -Wall st_puts_t.c -I/home/tom/include/ \
    -L/home/tom/lib/ -ltom -lm
    Will it work?

    Ans : Yes.



[Previous] [Home] [Next]