Assembly Language Programming Assignments

Princeton University
COS 217: Introduction to Programming Systems

Assignment 4: Assembly Language Programming

Purpose

The purpose of this assignment is to help you learn about computer architecture, assembly language programming, and testing strategies. It also will give you the opportunity to learn more about the GNU/Unix programming tools, especially , , , and for assembly language programs.

The assignment consists of two parts, each of which has subparts. We strongly encourage you to complete Part 1 by the end of the first week of the assignment period.


Rules

Part 2e, as defined below, is the "on your own" part of this assignment. That part is worth 10% of this assignment.


Part 1: A Word Counting Program in Assembly Language

Part 1a: Translate to Assembly Language

The Unix operating system has a command named (word count). In its simplest form, reads characters from the standard input stream until end-of-file, and prints to the standard output stream a count of how many lines, words, and characters it has read. A word is a sequence of characters that is delimited by one or more white space characters.

Consider some examples. In the following, a space is shown as and a newline character as .

If the file named contains these characters:

Learningsissan treasureswhichn accompaniessitsn ownerseverywhere.n --sChinesesproverbn

then the command:

$ wc < proverb

prints this line to the standard output stream:

5 12 82

If the file contains these characters:

Learningsissan treasureswhichn accompaniessitsn ownerseverywhere.n --sssChinesesproverb

(note that the last "line" does not end with a newline character) then the command:

$ wc < proverb2

prints this line to standard output:

4 12 83

The file in the directory contains a C program that implements the subset of the command described above. Translate that program into assembly language, thus creating a file named . It is acceptable to use global (i.e. bss section or data section) variables in , but we encourage you to use local (i.e. stack) variables instead. Your assembly language program should have exactly the same behavior (i.e. should write exactly the same characters to the standard output stream) as the given C program.

Part 1b: Test

Design a test plan for your program. Your test plan should include tests in three categories: (1) boundary testing, (2) statement testing, and (3) stress testing.

Create text files to test your programs. Name each such file such that its prefix is and its suffix is . The command should display the names of all test files, and only those files.

Describe your test plan in your readme file. Your description should have this structure:

boundary tests:

: Description of the characteristics of that file, and how it tests boundary conditions of your program.
: Description of the characteristics of that file, and how it tests boundary conditions of your program.
...

statement tests:

: Description of the characteristics of that file, and which statements of your program it tests. Refer to the statements using the line numbers of the given mywc.c program.
: Description of the characteristics of that file, and which lines of your program it tests. Refer to the statements using the line numbers of the given mywc.c program.
...

Your descriptions of the test files should be of the form "This file contains such-and-such characteristics, and so tests lines such-and-such of the program." Please identify the lines of code tested by line numbers. The line numbers should refer to the given C code.

stress tests:

: Description of the characteristics of that file, and how it stress tests your program.
: Description of the characteristics of that file, and how it stress tests your program.
...

Submit test files that contain no more than approximately 50000 characters. Submitting very large files could exhaust the course's allotted disk space on hats, and so could prohibit other students from submitting any files at all.

Submit test files that contain only printable ASCII characters. Specifically, make sure your computer-generated test files contain only characters having ASCII codes (in hexadecimal) 09, 0A, and 20 through 7E. Submitting test files that contain other characters would make it difficult for your grader to examine those files.

Finally, create a Bash shell script named to automate your test plan. A Bash shell script is simply a text file that contains commands, and that has been made executable via the command, for example, .

The script should build a program from the given C code, build a program from your assembly language code, execute both programs, and compare the output.

It is acceptable for your script to call other scripts that you create. Each such script should have a name that is prefixed with . The command should display the names of all test scripts, and only those files.

Feel free to use the and Bash shell scripts from Assignment 1 as models.


Part 2: Beat the Compiler

Background

Many programming environments contain modules to handle high-precision integer arithmetic. For example, the Java Development Kit (JDK) contains the and classes.

The Fibonacci numbers are used often in computer science. See http://en.wikipedia.org/wiki/Fibonacci_numbers for some background information on Fibonacci numbers. Note in particular that Fibonacci numbers can be very large integers.

This part of the assignment asks you to write a minimal high-precision integer arithmetic module, and use it to compute large Fibonacci numbers.

Part 2a: Add Objects Using C Code

Suppose you must compute Fibonacci number 500000, that is, fib(500000)...

The directory contains a C program that computes Fibonacci numbers. It consists of two modules: a client module and a ADT.

The client consists of the file . The client accepts an integer n as a command-line argument, validates it, and computes and prints fib(n) to stdout as a hexadecimal number. It prints to the standard error stream the amount of CPU time consumed while performing the computation. The client module delegates most of the work to objects.

The ADT performs high precision integer arithmetic. It is a minimal ADT; essentially it implements only an "add" operation. The ADT consists of four files:

  • is the interface. Note that the ADT makes four functions available to clients: , , , and .
  • contains implementations of the , , and functions.
  • contains an implementation of the function.
  • is a "private header file" -- private in the sense that clients never use it. It allows code sharing between the two implementation files, and .

Study the given code. Then build the program with no optimization. Run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Part 2b: Add Objects Using C Code Built with Compiler Optimization

Suppose you decide that the amount of CPU time consumed is unacceptably large. You decide to command the compiler to optimize the code that it produces...

Build the program using optimization. Specifically, specify the option so the preprocessor disables the macro, and the option so the compiler generates optimized code. Run the resulting program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Part 2c: Add Objects Using Assembly Language Code

Suppose you decide that the amount of CPU time consumed still is too large. In an attempt to gain speed, you decide manually to code the function in assembly language...

Manually translate the C code in the file into assembly language, thus creating the file . You need not translate the code in other files into assembly language.

Your assembly language code should store all variables in memory. It should contain definitions of the and functions; the former should call the latter, just as the C code does.

Note that is a parameterized macro, not a function. (See Section 14.3 of the King book for a description of parameterized macros.) So you cannot call from assembly language code. When translating to assembly language, simply pretend that the calls of are not in the C code.

Build the program consisting of the files , , and using the and options. Run the program to compute fib(x) for various values of x, and make sure it writes the same output as the program built from C code does. Finally, run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Part 2d: Add Objects Using Optimized Assembly Language Code

Suppose, to your horror, you discover that you have taken a step backward: the CPU time consumed by your assembly language code is approximately the same as that of the non-optimized compiler-generated code. So you decide to optimize your assembly language code...

Manually optimize your assembly language code in , thus creating the file . Specifically, perform these optimizations:

  1. Store local variables (but not parameters) in registers instead of in memory. Remember that assembly language functions must handle the EBX, ESI, and EDI registers with special care.
  2. "Inline" the call of the function. That is, eliminate the function, placing its code within the function.

Build the program consisting of the files , , and using the and options. Run the program to compute fib(x) for various values of x, and make sure it writes the same output as the program built from C code does. Finally, run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Can you write assembly language code that is approximately as fast as the optimized code that the compiler generates? That is, can you approximately tie the compiler?

Part 2e: Add Objects Using Highly Optimized Assembly Language Code

Finally, suppose you decide to optimize your assembly language code even further, moving away from a statement-by-statement translation of C code into assembly language...

Further optimize your assembly language code in , thus creating the file . Specifically, perform these optimizations:

  1. Factor as much code as possible out of the loop.
  2. Use the assembly language loop patterns described in Section 3.6.5 of the Bryant & O'Hallaron book instead of the simpler but less efficient patterns described in precepts.
  3. Instead of using a variable (stored in memory or in a register) to keep track of carries during addition, use the "carry" bit in the EFLAGS register. The carry bit is examined and set by the ("add with carry long") instruction.

Feel free to implement any additional optimizations you want. However, your function must be a general-purpose function for adding two given objects; the function cannot be specific to the task of adding two Fibonacci numbers to generate a third Fibonacci number.

This part is difficult; we will not think unkindly of you if you decide not to do it. To do it properly you will need to learn about the instruction, and about which instructions affect and do not affect the C ("carry") flag in the EFLAGS register. Hint: When writing , the problem is this: How can you preserve the value of the C flag between executions of the instruction? One approach to solving that problem is to determine how to save and then restore the value of the EFLAGS register. Another approach is to determine how to express the logic such that only instructions that do not affect the C flag are executed between each execution of the instruction.

Build the program consisting of the files , , and using the and options. Run the program to compute fib(x) for various values of x, and make sure it writes the same output as the program built from C code does. Finally, run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Can you beat the compiler?


Logistics

Develop on hats. Use to create source code. Use to debug.

Do not use a C compiler to produce any of your assembly language code. Doing so would be considered an instance of academic dishonesty. Instead produce your assembly language code manually.

We encourage you to develop "flattened" C code (as described in precepts) to bridge the gap between the given "normal" C code and your assembly language code. Using flattened C code as a bridge can eliminate logic errors from your assembly language code, leaving only the possibility of translation errors.

We also encourage you to use your flattened C code as comments in your assembly language code. Such comments can clarify your assembly language code substantially.

You should submit:

  • Your file.
  • Your script, and any other scripts that it calls.
  • The data files that test your program.
  • Your , and files.
  • A file.

Your file should contain:

  • Your name.
  • A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course "Policies" Web page.
  • Your test plan in the format specified above.
  • The times consumed by the programs, as specified above.
  • (Optionally) An indication of how much time you spent doing the assignment.
  • (Optionally) Your assessment of the assignment.
  • (Optionally) Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.

Submit your work electronically using the commands:

submit 4 mywc.s submit 4 testmywc submit 4 anyBashScriptsCalledByTestmywc submit 4 mywc*.txt submit 4 bigintadd.s submit 4 bigintaddopt.s submit 4 bigintaddoptopt.s submit 4 readme

Please do not submit backup files, that is, files that end with '~'.


Grading

As always, we will grade your work on quality from the user's and programmer's points of view. To encourage good coding practices, we will deduct points if generates warning messages.

Comments in your assembly language programs are especially important. Each assembly language function -- especially the function -- should have a comment that describes what the function does. Local comments within your assembly language functions are equally important. Comments copied from corresponding "flattened" C code are particularly helpful.

Your assembly language code should use directives to avoid "magic numbers." In particular, you should use directives to give meaningful names to:

  • Enumerated constants, for example: .
  • Parameter stack offsets, for example: .
  • Local variable stack offsets, for example: .
  • Structure field offsets, for example: .
  • Stack offsets at which callee-save registers are stored, for example: .
  • (In and ) Registers which are devoted to local variables, for example: .

Testing is a substantial aspect of the assignment. Approximately 10% of the grade will be based upon your test plan as described in your readme file, and as implemented by your test scripts and data files.

Your grade for the "on your own" part will be based upon raw performance (How quickly does your code compute fib(500000)?) and style (Does your code contain optimizations that logically should improve performance?).

Purpose

The purpose of this assignment is to help you learn about computer architecture, assembly language programming, and testing strategies. It also will give you the opportunity to learn more about the GNU/UNIX programming tools, especially bash, xemacs, gcc, and gdb for assembly language programs.

The assignment consists of two parts, each of which has subparts.


Part 1: A Word Counting Program in Assembly Language

Part 1a: Translate to Assembly Language

The UNIX operating system has a command named wc (word count). In its simplest form, wc reads characters from stdin until end-of-file, and prints to stdout a count of how many lines, words, and characters it has read. A word is a sequence of characters that is delimited by one or more whitespace characters.

Consider some examples. In the following, a space is shown as "s" and a newline character as "n".

If the file named proverb contains these characters:

Learningsissan treasureswhichn accompaniessitsn ownerseverywhere.n --sChinesesproverbn

then the command:

$ wc < proverb

prints this line to standard output:

51282

If the file proverb2 contains these characters:

Learningsissan treasureswhichn accompaniessitsn ownerseverywhere.n --sssChinesesproverb

(note that the last "line" does not end with a newline character) then the command:

$ wc < proverb2

prints this line to standard output:

41283

The file mywc.c in the /u/cos217/Assignment5 directory contains a C program that implements the subset of the wc command described above.  Translate that program into assembly language, thus creating a file named mywc.s. It is acceptable to use global (i.e. bss section or data section) variables in mywc.s. But we encourage you to use local (i.e. stack) variables instead. Your assembly language program should have exactly the same behavior (i.e. should write exactly the same characters to stdout) as the given C program.

Part 1b: Test

Design a test plan for your mywc program. Your test plan should include tests in three categories: (1) boundary condition testing, (2) logical path testing, and (3) stress testing.

Create text files to test your programs. Name each such file such that its prefix is "mywc" and its suffix is ".txt". The command "ls mywc*.txt" should display the names of all mywc test files, and only those files.

Describe your mywc test plan in your readme file. Your description should have this structure:

mywc boundary condition tests:

mywcXXX.txt: Description of the characteristics of that file, and how it tests boundary conditions of your mywc program.
mywcYYY.txt: Description of the characteristics of that file, and how it tests boundary conditions of your mywc program.
...

mywc logical path tests:

mywcXXX.txt: Description of the characteristics of that file, and which logical path(s) of your mywc program it tests.
mywcYYY.txt:  Description of the characteristics of that file, and which logical path(s) of your mywc program it tests.
...

Collectively your logical path test files should cause the computer to execute every instruction of your mywc program. Your descriptions of the test files should be of the form "This file contains such-and-such characteristics, and so tests lines such-and-such of the program." You may identify the lines of code tested either by description (e.g. "the second 'if' statement") or by line number (e.g. lines 20 though 25). Line numbers may refer to either your assembly language code or the corresponding C code.

mywc stress tests:

mywcXXX.txt: Description of the characteristics of that file, and how it stress tests your mywc program.
mywcYYY.txt: Description of the characteristics of that file, and how it stress tests your mywc program.
...

Do not submit test files that contain more than approximately 1000 reasonably sized lines of text. Submitting very large files might exhaust the course's allotted disk space on hats.

Do not submit test files that contain non-printable characters.

Finally, create a bash shell script named testmywc to automate your mywc test plan. A bash shell script is simply a text file that contains commands, and that has been made executable via the chmod command, for example, "chmod 700 testmywc".

The testmywc script should build a mywc program from the given C code, build a mywc program from your assembly language code, execute both programs, and compare the output.

It is acceptable for your testmywc script to call other scripts that you create. Each such script should have a name that is prefixed with "testmywc". The command "ls testmywc*" should display the names of all mywc test scripts, and only those scripts. You may find the testdecomment and testdecommentdiff scripts from Assignment 1 to be useful models.


Part 2: Beat the Compiler

The Fibonacci numbers are used often in computer science. See http://en.wikipedia.org/wiki/Fibonacci_numbers for some background information on Fibonacci numbers.

Fibonacci numbers can be very large. This part of the assignment asks you to write assembly language code to compute large Fibonacci numbers.

Part 2a: Compute Fibonacci Numbers Using C Code

Suppose you must compute Fibonacci number 500000, that is, fib(500000)...

The /u/cos217/Assignment5 directory contains a C program that computes Fibonacci numbers. It consists of two modules: a client module and a BigInt ADT.

The client consists of the file fib.c. The client accepts an integer n as a command-line argument, validates it, and computes and prints fib(n) to stdout as a hexadecimal number. It prints to stderr the amount of CPU time consumed while performing the computation. The client module delegates most of the work to BigInt objects.

The BigInt ADT performs high precision arithmetic. It is a minimal ADT; essentially it implements only an "add" operation. (See the Java "BigInteger" and "BigNumber" classes for more realistic high precision arithmetic classes.) The BigInt ADT consists of four files:

  • bigint.h is the interface. Note that the ADT makes four functions available to clients: BigInt_new(), BigInt_free(), BigInt_add(), and BigInt_writeHex().
  • bigint.c contains implementations of the BigInt_new(), BigInt_free(), and BigInt_writeHex() functions.
  • bigintadd.c contains an implementation of the BigInt_add() function.
  • bigintprivate.h is a "private header file" -- private in the sense that clients never use it. It allows code sharing between the two implementation files, bigint.c and bigintadd.c.

Study the given code. Then build the program with no optimization. Run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Part 2b: Compute Fibonacci Numbers Using C Code Built with Compiler Optimization

Suppose you decide that the amount of CPU time consumed is unacceptably large. You decide to command the compiler to optimize the code that it produces...

Build the program using optimization. Specifically, specify the -DNDEBUG option so the preprocessor disables the assert() macro, and the -O3 option so the compiler generates optimized code. Run the resulting program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Part 2c: Compute Fibonacci Numbers Using Assembly Language Code

Suppose you decide that the amount of CPU time consumed still is too large. In an attempt to gain speed, you decide to manually code the BigInt_add() function in assembly language...

Manually translate the C code in the bigintadd.c file into assembly language, thus creating the file bigintadd.s. You need not translate the code in other files into assembly language.

Your assembly language code should store all variables in memory. It should contain definitions of the BigInt_add() and BigInt_larger() functions; the former should call the latter, just as the C code does.

Note that assert() is a parameterized macro, not a function. (See Section 14.3 of the King book for a description of parameterized macros.) So you cannot call assert() from assembly language code. When translating bigintadd.c to assembly language, simply pretend that the calls of assert() are not in the C code.

Build the program consisting of the files fib.c, bigint.c, and bigintadd.s using the -DNDEBUG and -O3 options. Run the program to compute fib(x) for various values of x, and make sure it writes the same output as the program built from C code does. Finally, run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Part 2d: Compute Fibonacci Numbers Using Optimized Assembly Language Code

Suppose, to your horror, you discover that you have taken a step backwards: the CPU time consumed by your assembly language code is about the same as that of the non-optimized compiler-generated code. So you decide to optimize your assembly language code...

Manually optimize your assembly language code in bigintadd.s, thus creating the file bigintaddopt.s. Specifically, perform these optimizations:

  1. Store variables in registers instead of in memory. Remember that assembly language functions must handle the EBX, ESI, and EDI registers with special care.
  2. "Inline" the call of the BigInt_larger() function. That is, eliminate the BigInt_larger() function, placing its code within the BigInt_add() function.

Build the program consisting of the files fib.c, bigint.c, and bigintaddopt.s using the -DNDEBUG and -O3 options. Run the program to compute fib(x) for various values of x, and make sure it writes the same output as the program built from C code does. Finally, run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Can you write assembly language code that is as fast as the code that the compiler generates? That is, can you tie the compiler?

Part 2e: Extra Credit (up to 10 points)

Finally, suppose you decide to optimize your assembly language code even further, moving away from a statement-by-statement translation of C code into assembly language...

Further optimize your assembly language code in bigintaddopt.s, thus creating the file bigintaddoptopt.s. Specifically, perform these optimizations:

  1. Factor as much code as possible out of the loop.
  2. Instead of using a "uiCarry" variable (stored in memory or in a register) to keep track of carries during addition, use the "carry" bit in the EFLAGS register. The carry bit is set by the "adcl" ("add with carry long") instruction.

The extra credit optimizations are difficult. To do them you will need to learn about the "adcl" instruction, and about which instructions affect (and do not affect) the "carry" bit in the EFLAGS register. We encourage you to do the extra credit optimizations, but will think no less of you if you decide not to.

Build the program consisting of the files fib.c, bigint.c, and bigintaddoptopt.s using the -DNDEBUG and -O3 options. Run the program to compute fib(x) for various values of x, and make sure it writes the same output as the program built from C code does. Finally, run the program to compute fib(500000). In your readme file note the amount of CPU time consumed.

Can you beat the compiler?


Logistics

Develop on hats. Use xemacs to create source code. Use gdb to debug.

Do not use a C compiler to produce any of your assembly language code. Doing so would be considered an instance of academic dishonesty. Instead produce your assembly language code manually.

We encourage you to develop "flattened" C code (as described in precepts) to bridge the gap between the given "normal" C code and your assembly language code. Using flattened C code as a bridge can eliminate logic errors from your assembly language code, leaving only the possibility of translation errors.

We also encourage you to use your flattened C code as comments in your assembly language code. Such comments can clarify your assembly language code substantially.

You should submit:

  • Your mywc.s file.
  • Your testmywc script, and any other scripts that it calls.
  • The data files that test your mywc program.
  • Your bigintadd.s and bigintaddopt.s files.
  • If you did the extra credit option, your bigintaddoptopt.s file.
  • A readme file.

Your readme file should contain:

  • Your name.
  • A description of whatever help (if any) you received from others while doing the assignment, and the names of any individuals with whom you collaborated, as prescribed by the course Policies web page.
  • Your mywc test plan in the format specified above.
  • The times consumed by the fib programs, as specified above.
  • (Optionally) An indication of how much time you spent doing the assignment.
  • (Optionally) Your assessment of the assignment.
  • (Optionally) Any information that will help us to grade your work in the most favorable light. In particular you should describe all known bugs.

Submit your work electronically using the commands:

submit 5 mywc.s submit 5 testmywc submit 5 anyBashScriptsCalledByTestmywc submit 5 mywc*.txt submit 5 bigintadd.s submit 5 bigintaddopt.s submit 5 bigintaddoptopt.s (if you did the extra credit) submit 5 readme

Please do not submit xemacs backup files, that is, files that end with '~'.


Grading

As always, we will grade your work on quality from the user's and programmer's points of view. To encourage good coding practices, we will take off points based on warning messages during compilation and assembly.

Comments in your assembly language programs are especially important. Each assembly language function -- especially the main() function -- should have a comment that describes what the function does. Local comments within your assembly language functions are equally important. Comments copied from corresponding "flattened" C code are particularly helpful.

Testing is a substantial aspect of the assignment. Approximately 10% of the grade will be based upon your mywc test plan as described in your readme file, and as implemented by your test scripts and data files.

Categories: 1

0 Replies to “Assembly Language Programming Assignments”

Leave a comment

L'indirizzo email non verrà pubblicato. I campi obbligatori sono contrassegnati *