Cool Gate-Level Logic Simulation Tool

November 14th, 2009 admin

Dr. Mark Hill of University of Wisconsin Madison pointed out the existence of this really cool learning aid to his class Logic Design class in October 2009. He also provides some guidelines about how to use the tool. Give it a shot if you like to play around with AND, OR, NOT gates and build logic. You can build “memory” structures too, something that can remember state. The following text is from Dr. Hill’s email. Enjoy.

Here is a cool tool for playing with and visualizing simple logic gates:

http://joshblog.net/projects/logic-gate-simulator/Logicly.html

Build your own circuit:

(a) Drag over a few gates in the middle.
(b) Drag few switches to the left side as inputs
(c) Drag one or more light bulbs to right as output(s)
(d) Connect items together with wires by clicking on needed endpoints.

Operate it:
(e) Click on switches to toggle inputs between one (blue) and zero (white).
(f) Watch the wires change state — one (blue) and zero (white) and light bulbs go on (blue) or off (off).

Posted in Tidbits, Tutorials | No Comments »

The Mathematics of Mortgage, Overpayment and Refinancing Decisions

October 7th, 2009 admin

With mortgage interest rates at historically low values refinancing home loans is an option currently being investigated by many here in the US. I, too, considered the same issue recently and discovered that this is not an easy decision to make. I developed a spreadsheet to figure out if this was a good idea. You can download this spreadsheet by clicking on this link (Microsoft Excel 2003). In general, the spreadsheet was also intended to show how loan repayment terms are set, how banks make money on loans, when overpaying monthly payments makes sense etc. Feel free to use the spreadsheet and improve upon it or tailor it to your situation. The rest of this article is a tutorial on how to make decisions about mortgages, how mortgages work in general, whether overpayment of the monthly payment makes sense, and what to consider when refinancing. The focus will be on the mathematical aspects of the decision making.

Taking an Interest in Loan Mathematics

Let us first try to understand, in simple terms, the philosophy of any loan process. In particular, I’ll focus on the home loan process. You intend to purchase a house. Why purchase, and not continue to rent? Well, that is an interesting question in its own right. But, to not get distracted, let us say you got tired of the rent going up each year, or moving every few years, or actually figured that it was economically better to buy rather than rent. So, you decide to buy a house. You need money. You go to the bank (a mortgage lender). Say you want $200K ($200 thousand). The bank gives you the money at a certain interest rate, say, 5%. What does that really mean? Here in the US, the norm is to calculate the remaining balance every month. The 5% is actually 12*0.4166%, where 0.4166%, or 0.004166, is the monthly interest rate. That is, after the first month, the outstanding balance is $200K + $200K*0.004166. In other words, because the bank did you a favor by giving you $200K which you did not have, it wants $200K*0.004166, which is $833.33. As a quick aside, notice that because the interest is calculated monthly, the annual interest rate is, in reality, greater than the 5% we started with. The real annual interest rate would be 1.004166^12=0.0511, or in other words 5.11%. Nevertheless, the common practice is to quote this as 5% base interest rate, and that is fine, as long as we know what it means.

Now, continuing with our example, in the first month, the bank wants you to pay $833.333 as interest accrued over that month. Say you paid exactly $833.33. The outstanding balance at the beginning of the second month would then be exactly 200K again. And at the end of that second month, the interest would be $833.33 again. Say you pay the bank $833.33 again. The outstanding balance at the beginning of the third month will again be $200K. This pattern could go on endlessly. You may argue that this looks like renting. Every month you pay the rent. You don’t see any of that money. But with buying there is a fundamental difference. After 10 years of doing the above, that is, paying $833.33 each month, you decide to sell the house. The house itself, typically, appreciates in value. Say the value of the house is now $300K. You sell and make a $100K profit. You paid 10*12*$833.33 over the 10 years, which, coincidentally, comes out to exactly 100K. What that means is you basically lived in a house for 10 years for free (of course you did pay property taxes, painted the house a couple of times, bought a lawn mower, replaced light bulbs, and took care of the house in general). But overall, it sounds like a pretty sweet deal.

One word we all glossed over in this discussion is “typically”. Home prices “typically” appreciate. The bank does not gloss over that word. If the value of the house drops, say 10 years later, the value of the house drops to 150K. You have paid your “rent” for 10 years, and are ready to sell. The bank wants its 200K back, since you never paid any principal all these years. The sale, however, would only fetch you 150K. The bank has the title (ownership document) to the house. It will not let you sell. It says give us 50K first, then sell for 150K, and gives us that 150K as well. You do not have 50K to give to the bank. The loan is foreclosed - the bank keeps the title to the house, but the bank does not like this situation. The bank now owns the house. But the house is worth only 150K. The bank does not want to be in the business of selling a house, especially one that won’t bring them their original 200K back.

To prevent this scenario, the bank employs two interesting tactics. Firstly, it does not let you pay only the interest of $833.33 each month. It requires you to pay off some of that principal on top of the interest. Secondly, the amount of principal you pay atop the interest is calculated such that the loan is guaranteed to be paid off in a certain “term”. Further, to keep the payment terms simple for the customer, the total payment each month remains unchanged. It is important to recognize that calculation of interest depends only on the interest rate. The calculation of the monthly payment which includes both the interest and some piece of the principal requires the notion of the “term”. The payment has to at least be the interest due that month. It is a bit more than that each month because of the principal paid. (In reality it ends up being even more because you pay a part of the annual property taxes, hazard insurance etc. each month – but we can ignore that for this discussion). Paying a bit of principal each month causes the outstanding balance reduce each month; this causes the interest payment to reduce a bit each month. This allows you to pay off even more principal each month, and that cascading effect finally ends exactly when the term runs out. Throughout this period, as mentioned earlier, the actual monthly payment does not change. The reduction in interest is compensated for by the increase in principal payment, which in turn reduces the outstanding balance and causes the next month’s interest payment to reduce even further. This constant monthly payment (interest + principal) is carefully calculated to achieve this effect. “Term” is the number of months the loan is supposed to be fully paid off by. The shorter the term, the better the rates, in general, because to pay off a loan faster (shorter term), you have to pay greater amounts each month. So there needs to be an incentive for you to pay more money each month to the bank. And that incentive is the lower rate. Otherwise, wouldn’t you rather go with the longer term, pay less to the bank each month and invest that leftover in the stock market?

By forcing you to pay a bit of principal each month, the bank is earning less interest each month. But the good news is that, if after 10 years, you decide to sell the house and the value of the house is only 150K instead of the 200K you bought it for, the bank risks less. You have already paid off about 40K. So the loss for the bank is only 10K instead of 50K, if it had allowed you to only pay the interest each month. So in other words, the bank wants you to pay principal each month not to help you reduce your interest payments, but rather to help it stave off any chance of losing money on the house if prices fall.

Black Magic - Calculating the Monthly Payment

When you talk to a mortgage banker on the phone, you will notice that they like to quickly tell you that your monthly payment would be some x amount. They use the phrase, “run the numbers”, with some pride. It is interesting and empowering to understand how the monthly payment is actually calculated.

We have all the information we need. At the beginning, we have a 200K loan. Let us use L to indicate this “Loan Amount”. Say, the monthly rate, which is 0.004166 in our example, is represented by c. Say, n represents the term, n months. Say, P represents the monthly payment. We want to determine P ourselves, instead of depending on our loan officer to tell us that information.

After the first month, the outstanding balance is:
L + L*c - P
= L(1+c) - P
This is because the loan amount L increases by the monthly interest amount, L*c, but then we make the payment of P. This is the outstanding balance for the second month.
At the end of the second month, the outstanding balance is:
{L(1+c) - P}(1+c) - P
= L(1+c)^2 - P(1+c) - P
= L(1+c)^2 - P{(1+c)+1}
At the end of the third month, the outstanding balance is
= {L(1+c)^2 - {P(1+c)+P}} (1+c) - P
= L(1+c)^3 - P {(1+c)^2 + 1+c) + 1}
If you are still with me, you may start seeing a pattern emerge. After n months, the outstanding balance will be:
= L(1+c)^n - P {(1+c)^(n-1) + (1+c)^(n-2) + … + (1-c)^2 + (1+c) + 1}
Which, can be rewritten as
= L(1+c)^n - P {1 + (1+c)^2 + (1+c)^3 + … (1+c)^(n-1)}
Now. The punch line. After n months, we *know* that the outstanding balance should be 0. So
0 = L(1+c)^n - P {1 + (1+c)^2 + (1+c)^3 + … (1+c)^(n-1)}
P {1 + (1+c)^2 + (1+c)^3 + … (1+c)^(n-1)} = L(1+c)^n
P = L(1+c)^n/{1 + (1+c)^2 + (1+c)^3 + … (1+c)^(n-1)}
There you go. That is P, your payment each month. Phew! Done? Well, almost. The denominator in the above calculation is not Excel-friendly. Remember, you want this to go into a spreadsheet that can help you with decision making. The number of terms depends on n. Not good. Let’s try to find a closed form solution for the denominator. Thankfully, it is not hard. Notice that the denominator is of the form:
1 + a + a^2 + a^3 + … + a^(n-1)
where I replaced 1+c with a. Let us call the above sum X.
X = 1 + a + a^2 + a^3 + … + a^(n-1)
Adding a^n to both sides (as an aside, this kind of intuition is the reason Kavita hates math)
X + a^n = 1 + a + a^2 + a^3 + … + a^(n-1) + a^n
Shamelessly using some more of that darned intuition, we extract out a common factor, a, from the last n terms to get to:
X + a^n = 1 + a {1 + a + a^2 + … + a^(n-1)}
But notice that the stuff inside the {} is precisely what we defined X to be. So:
X + a^n = 1 + a*X
X + a*X = 1 + a^n
X * (1 + a) = 1 + a^n
Therefore,
X = (1-a^n)/(1-a)
Replacing a with (1+c),
X = (1-(1+c)^n)/(1-(1+c))
X = (1-(1+c)^n)/(-c)
X = ((1+c)^n - 1)/c
Finally, substituting this into the equation for P:
P = L.c.(1+c)^n/{(1+c)^n - 1}
Now we are seriously done with this calculation.

Let us try to use L=200K, c=0.004166 and n=360 (a 30-year term, which is quite common in the US), and calculate P, your monthly payment. P comes out to $1073.64. The interest component is $833.33, and the principal is $1073.64 - $833.33 = $240.31. Because you pay off a tiny bit of the 200K principal, the outstanding balance at the beginning of the second month is $200000 - $240.31 = $199759.69. The interest for the second month is therefore going to be lesser than $833.33. In fact, it is $832.33. This $1 we pay less in interest goes towards the principal, which increases from $240.31 in the first month to $241.31 in the second. Looking a few months into this process the interest payments are $833.33, $832.33, $831.33, $830.32, $829.30, etc., and principal payments are $240.31, $241.31, $242.32 etc.

Fig1Fig2

Click on the thumbnails above to see the monthly and cumulative payment schedules. The first figure shows how much interest, principal and total payment needs to be made each month. The second figure translates that to a cumulative amount, that is, at any given point in time it tells us how much interest, principal and total payment you would have made. It is interesting to note from the first figure that in the first few years the bank makes most of the money it expects to make on the house (the interest tails off during the later years). The second figure shows that by the end of the loan term, you’d pay about 200K in interest!

Does Overpayment Make Sense?

At this point it is important to understand that by paying off the $240.31, $241.31, $242.32 etc. principal each month the benefit you are getting is in terms of reducing the interest you pay each month. By actually being vested in the house, that is, by owning that piece of the house, you do not get any direct benefit; when the house sells, its value will not depend on how much of the house you actually own. Think of it like this - the principal payments you make are investments where the rate of return is determined by the reduction in the interest payments.  Let us take an example. Say, somehow, you convince the bank to allow you to pay only the interest, $833.33, each month. You take the difference between your bank-determined payment of $1073.64 and your negotiated payment of $833.33 and invest it ($1073.64 - $833.33 = 240.31) in the stock market at 10% annual rate of return. Either way, after 10 years, we’d have invested $240.31*12*10 = $28,837.2. Since the investment is accruing a rate of return each month, we need to carefully calculate how much profit we make (I use an Excel spreadsheet to do this, however, we could use a closed form expression similar to the one we developed above). At a 10% rate of return, we make about $21,000. If we put this same $241.31 into the principal payment each month, after 10 years, our profit (the interest savings compared to the case where we do not pay any principal payment each month) is $8,478. Of course, the actual profit by investing is reduced by the tax you need to pay on that profit. Regardless, it is still a sweet deal to invest the money in the stock market, provided you can guarantee the 10% return on investment. Even if we assume a safe 6% rate of return (after taxes and everything), we stand to make $10,900 in the stock market vs. the $8,478 we “make” by putting it into the house. In any case, this is a moot point, since the bank will not allow you to make interest payments only. What this discussion is intended to drive home is that it may not make sense to overpay above the monthly payment of $1073.64, unless you intend to stay at the house for a shorter term. If you stay for a shorter term in the house, then the stock market rate of return may be too risky, whereas the paying into the house guarantees a certain rate of return.

Fig3Fig4

The figures above show monthly and cumulative payments for a 15 year loan term - that is a loan for which the monthly payment has been calculated such that is supposed to be paid off in full in 15 years. Typically, 15 year loans have a slightly better rate than a 30 year loan, to give you the incentive to give up more of your cash each month in payment. However, since I am continuing to use a 5% interest rate to plot these curves, these really indicate how your overall payment time line changes if you overpay each month. The overpayment amount is basically the difference between the monthly payment shown in this figure and the minimum monthly payment shown in the previous section. As you can see here, even in the first month you pay as much towards principal as interest, and secondly, by the end of the loan term, you pay only about $80K in interest. The advantage of this scheme is that you are required to only pay in accordance with the 30-year term, but you may choose to overpay if you wish to reduce your interest payments. That way, if you occasionally miss your overpayment target, that is fine as long as you pay the minimum payment for that month. That said, like we discussed above, it may still make sense to not overpay if you can invest that money instead.

Does Refinancing Make Sense?

Now that we have understood some of the nuances of the loan process, let us consider how to make a refinancing decision. Refinancing is the process of getting a new loan in order to pay off an existing loan. If this were a free process, that is, there were no cost of refinancing, the decision would have been very simple. If the new interest rate is better than the old interest rate refinancing would make sense. However, there is, typically, a cost involved. The question then changes to how long do you need to stay in the same house after refinancing to recoup the cost of refinancing. Let us take an example. Say you currently have a 5% loan with $200K outstanding, and a different lender offers a 4% loan, with a $2000 closing cost. The current monthly payment is $1073.64. The new monthly payment is $954.83. Since the interest rate is lower, you’ll likely be paying less interest each month with the new loan. So over time the cumulative interest you pay the bank may be lesser with the new loan. For this example, after 1 year the total interest paid with the current loan is about $9900, whereas with the new loan it is $7900. This is about $2000 in savings in 1 year just from the interest rate reduction. Since the interest you pay is like the fees the bank charges for its services, you have found a low-fee option. So in 1 year you have overcome the $2000 cost of refinancing. From year 2 onward you stand to gain by doing this refinance. (Note: I am ignoring that interest is tax-free money, that is, you get the taxes you paid on the interest in your next year’s tax returns. The savings from interest reduction are therefore about 20 to 25% lesser than the savings I am quoting here and in the spreadsheet. It is easy to fix that though, if you choose to. Instead of saving $2000, you’d have actually only saved $1500 if you fall in the 25% tax rate bracket.)

Now, let us see what happens to the rest of the money you are paying each month, the principal. With the current loan the cumulative principal payment after 1 year is
$2950. With the new loan the cumulative principal paid in 1 year is $3522. That is, you own more of the house. But this is not important by itself. Yes, you own more of the house, but the net is you converted some cash into a bit of house. If you had not owned any of the house you’d have been left with cash which you could have invested and actually grown it. The house grows or falls equally in value regardless of whether you are invested in it or not.

But there is one more component to this equation other than the interest and principal. The overall monthly payment has reduced from $1073.64 to $954.83. That is a freeing up of $118.81 each month to be invested as you choose. Even if this was invested conservatively in a 6% rate of return investment, you end up with $1472 at the end of the first year. This is money that would not have been available at all with the current loan. So in fact, at the end of year 1, you have save $2000 + $1472, the former coming from the interest savings and the latter from investing the cash freed up. This means, the $2000 cost of refinancing will actually be made up even sooner than 1 year. Given the above 6% assumption it is more like 7 months. If you plan to live in this house for 7 months or more, go for the refinancing

Fig5

The figure above shows the time to recoup the cost of refinancing, considering only the interest savings and also considering the case where the overall reduction in monthly payment can be invested at 6%.

Acknowledgements

My understanding of the issues involved in refinancing, in particular, and mortgages, in general, is based upon my going through this decision-making process recently. Much of this understanding was developed during discussions with my friends Gordie and Srini. If you find flaws in my understanding please let me know. Some online resources that helped me were http://www.mtgprofessor.com/formulas.htm , http://en.wikipedia.org/wiki/Refinancing and http://en.wikipedia.org/wiki/Mortgage.

Posted in Experiences, Information, Tutorials | 2 Comments »

Notes on Memory Consistency and Cache Coherence

June 27th, 2008 admin

Here are some of my notes on the topic of memory consistency and cache coherence, and how uniprocessor and multiprocessor cores have to be built to support the consistency models. Most of this was written up when I was preparing for my Qualifying Exam at NC State University last semester. This is a relatively complicated topic to understand well, and there might still be several mistakes in how I understood the ideas. Also, this might only make sense, and be interesting, to people familiar with these areas of computer architecture. Here’s the pdf file: Notes on Memory Consistency and Cache Coherence

Posted in Information, Tutorials | No Comments »

Setup and Hold Violations in Digital Systems

May 19th, 2008 admin

I wrote this up when trying to prepare for my PhD Qualifying Examination this past semester (Spring 2008). It is a pdf file. You can read it here.

Posted in Tutorials | No Comments »

Finally, one Contact List to rule them all

July 29th, 2007 admin

I have a Yahoo email account. I have a Google email account. I have a few more that I do not use much. Each email application or service provider typically has an Address Book or Contacts List, which we can use to list the names, email addresses and other information about people. For a while, I have been thinking about getting the contact lists organized. There were several layers to the word “organized” and I was apprehensive about starting to peel those layers. The first layer of the problem was figuring out which mail server I wanted to stick to. The next thought was to create a superset of all the contact lists, currently scattered across applications and mail servers, at one place. The next issue was to find a way to update the contact list quickly, rather than clicking around a web-based Address Book or Contact List applications such as the one provided by Yahoo and Google Mail. Then there was the hope that I could keep a copy of the contact list locally on my personal computer, in case, at some point, I did not have internet access to get to the Yahoo Address Book.

This list of requirements seemed formidable in itself, yet, what made me skeptical of a final solution, was one last requirement I had. I had maintained a list of birthdays and anniversaries in a text file separate from the contact lists in the mail servers I mentioned. It was a simple text file and a simple Perl script I wrote could go through this text file everyday and send me an email if it found any upcoming event. I wanted to retain the ability to do such scripting and not have to maintain a separate text file version of the contact list, just for the purposes of being able to run such a reminder script.

After collecting and formulating these thoughts over a long time, I finally spent a few minutes last week looking for a solution to the multi-layered problem. Searching on the internet revealed that there WAS a relatively easy solution that fixes ALL the above problems, including giving me the ability to run a simple script to extract birthday and anniversary information! Here is the solution. Yahoo and Google Address Books allow the existing contacts-list to be exported as a CSV (Comma Separated Variable) file, or a CSV file to be imported to populate the Contact List or Address Book application. A CSV file, as the name suggests, is just a regular text file, with many fields belonging to a record typed across a single line, with the comma symbol (”,”) separating the fields. A new record starts in a new line. The file can be opened with a regular text-editor such as Notepad, Wordpad or Textpad in Windows and vi, pico and emacs in Unix. The file may also be opened using Microsoft Excel spread sheet and the fields show up in separate column and the lines show up in separate rows. This solves the problem of easily modifying the contact list in bulk and storing the contact list as a local file on your personal computer. The CSV file is compatible across Yahoo and Google, and probably across many other applications like Microsoft Outlook and Orkut (web-based networking application). The CSV file can then be imported into Yahoo Mail, Google Mail or other such applications. Problem solved. Single contact list. Storable and updateable locally. Uploadable to multiple web-based servers.

The CSV file based common contact list also allowed me to enter the anniversary and birthday in appropriate columns. I wrote a script called contact.py in the Python scripting language to read the contact list file as a simple text file (in the CSV format) and search for upcoming events. This allowed me to get rid of the earlier text file I had my Perl script read. The CSV file, I called it contactlist.csv, was truly the one file I needed to retain for all my address-book related needs. Whenever I want to add a new contact or update information about an existing contact, I update the local copy of the contact list, contactlist.csv, and then import it into Yahoo Mail and Google Mail to keep them up to date. I have noticed that before I import the latest contactlist.csv file into Yahoo or Google, I need to delete all the existing contacts from Yahoo and Google, respectively. Once, we have an empty contact list on the mail server, the importing of contactlist.csv recreates the complete list. Not starting with an empty contact list on the mail servers, creates duplicates, probably because the “import” function is not smart enough to recognize duplicates.

Here is an example of what a few rows from the CSV file contactlist.csv looks like. It gives us idea of what the fields are. The example also shows that all the fields in a CSV file need not be filled. A field can be left empty if we do not know the information relating to that field for a given contact. Also, I use xxxx for the year field of a date (such as a birthday or an anniversary date), in case I do not know the year. This is OK because the script that parses this CSV file, called contact.py, and which is shown later, does not use the year field to determine if an anniversary is approaching. It only uses the day and month parts of the field.

First,Middle,Last,Nickname,Email,Messenger ID,Home,Work,Pager,Fax,Mobile,Other,Yahoo! Phone,Alternate Email 1,Alternate Email 2,Personal Website,Business Website,Title,Company,Work Address,Work City,Work State,Work ZIP,Work Country,Home Address,Home City,Home State,Home ZIP,Home Country,Birthday,Anniversary,Custom 1,Custom 2,Custom 3,Custom 4,Comments,Messenger ID1,Messenger ID2,Messenger ID3,Messenger ID4,Messenger ID5,Messenger ID6,Messenger ID7,Messenger ID8,Messenger ID9,Skype ID,IRC ID,ICQ ID,Google ID,MSN ID,AIM ID,QQ ID
Shahrukh,Mayur,Khan,srk,srk@bollywood.com,,,,,,,,,,,,,,,,,,,,,,,,,1/2/xxxx,,,,,,,,,,,,,,,,,,,,,,

Here is the contact.py Python script which then works on the CSV file called contactlist.csv with contents as shown above, and sends email to your email account. You might have to appropriately fix some of the fields in the script to get it to work. I present it here just as a hint.

import csv, datetime, re
from string import split
filename = “contactlist.csv”
warnZone = 8 #number of days before which email reminder should be sent

daysInMonth = [’31′,’28′,’31′,’30′,’31′,’30′,’31′,’31′,’30′,’31′,’30′,’31′];
def dayOfYear(month, day):
#print “%s %s” %(month, day)
doy = 0
for n in range(0,int(month)):
if(n == int(month)-1):
doy = doy+int(day)
return doy
else:
doy = doy+int(daysInMonth[n])

now = datetime.datetime.now()
today_month = now.strftime(”%m”)
today_day = now.strftime(”%d”)
today_doy = dayOfYear(today_month, today_day)
#print “%s %s %s” %(today_month, today_day, today_doy)

reader = csv.reader(open(filename))
content = “”
for row in reader:
firstname = (row[0])
middlename = (row[1])
lastname = (row[2])
anniversary = (row[30])
birthday = (row[29])
anni_split = anniversary.split(’/')
bday_split = birthday.split(’/')
#print “len anni_split %s” %(len(anni_split))
#print “len bday_split %s” %(len(bday_split))
if(len(anni_split)>1): #keeps “a/b/c, gets rid of “A”, as in 1st row
anni_month = anni_split[0]
anni_day = anni_split[1]
anni_doy = dayOfYear(anni_month, anni_day)
diff = anni_doy - today_doy
if((anni_doy >= today_doy and anni_doy <= today_doy + warnZone) or (anni_doy <= today_doy + warnZone - 365)):
# print “%s %s %s’s anniversary is on %s/%s” %(firstname, middlename, lastname, anni_month, anni_day)
content += firstname+” “+middlename+” “+lastname+”\’s anniversary is on “+anni_month+” “+anni_day+”\n”
if(len(bday_split)>1): #keeps “a/b/c, gets rid of “A”, as in 1st row
bday_month = bday_split[0]
bday_day = bday_split[1]
bday_doy = dayOfYear(bday_month, bday_day)
diff = bday_doy - today_doy
if((bday_doy >= today_doy and bday_doy <= today_doy + warnZone) or (bday_doy <= today_doy + warnZone - 365)):
# print “%s %s %s’s birthday is on %s/%s” %(firstname, middlename, lastname, anni_month, anni_day)
content += firstname+” “+middlename+” “+lastname+”\’s birthday is on “+bday_month+” “+bday_day+”\n”

#print “%s” %(content)

import smtplib
smtpserver = ‘mailserver.department.company.com’
AUTHREQUIRED = 0
RECIPIENTS = [’gol345die@gmail.com’]
SENDER = [’con789vey@po.doc.com’]
session = smtplib.SMTP(smtpserver)
smtpresult = session.sendmail(SENDER, RECIPIENTS, content)
if smtpresult:
errstr = “”
for recip in smtpresult.keys():
errstr = “”"Could not deliver mail to : %s Server said: %s %s %s”"” % (recip, smtpresult[recip][0], smtpresult[recip][1], errstr)
raise smtplib.SMTPException, errstr

Posted in Information, Tutorials | 2 Comments »

Cycles per Instruction and its components

April 30th, 2006 admin

Cycles per Instruction is one way to evaluate the microarchitecture’s effect on a processor’s performance. Performance also depends on what constitutes a cycle (frequency of the machine) and what constitutes an instruction (the architecture of the machine). Interestingly, the ratio of these two, i.e. Cycles per Instruction, tells us about something other than those two factors. CPI tells us how many cycles, at the given frequency, the machine takes to chomp down instructions of the given architecture. In other words CPI tells us about the microarchitectural robustness of a design. It tells us, given a certain cycle period (frequency) and a certain group of instructions (architecture), how many cycles does it take to execute an “average” instruction. The “average” instruction depends on the workload (the program) being run through the processor. And therefore, when quoting CPI’s the workload to which that CPI applies is also quoted. The smaller the CPI the better performing a computer system is, all other factors such as the frequency and architecture remaining the same. Of course, other factors such as area, complexity and power must be considered, in addition to performance and the definition of “better” changes accordingly.

CPI is affected by several factors. Architects and designers often need to know where to start when trying to improve performance by making microarchtectural changes. The overall CPI has to be broken into components and the component that is contributing the largest is usually attacked first, since improvement in that design component could potentially reduce the CPI by the largest amount. I will come back to this point again with an example, shortly. For this write-up I will break the CPI into two main components. The “core” component and the “memory-nest” component. The core component deals with all the microarchitectural decision within the processing unit, which in modern microprocessors includes several pipelines (superscalar machines), each with several stages and associated logic - instruction fetch, decode, issue, register access, execute, write-back etc. The memory-nest component deals with getting appropriate instructions and data from the memory. Several levels of caching is provided to take advantage of certain properties seen in programs. These properties are spatial-locality (if a certain memory location is used, something nearby will be used soon) and temporal locality (if a certain memory location is used, it will likely be used again). To keep the data close at hand, and to avoid the potentially hundreds of processor cycles worth of time to go to the memory every time a new instruction or data is needed, instructions and data are stored in caches. There are some great resources on the web and in books which talk about caches in more detail. For this write-up it suffices to say that caches usually are arranged in a hierarchy. Level 1 is the smallest cache but is the fastest to access and is called the L1Cache. Level 2 cache is bigger than the Level 1 cache but because it is bigger and is usually only accessed after it it known that the instruction or data the core was looking for was not available in the L1Cache, it is slower. It is called the L2Cache. Usually the hierarchy has 2-3 levels of caching.

Therefore, if the CPI component is mostly affected not by the design of the core itself, but because of the sizes or geometries of the caches, the designers would be better served by the information that the memory-nest, or more accurately, a certain level cache should be the first place to look for potential improvements, rather than the core. This brings us to the two main topics of this article. First, what quantities in the cache must be measured to help understand its effect on the CPI. Secondly, from these statistics collected on the caches, how do we build the overall CPI number.

Cache Statistics - What they are, what they mean

When modeling caches there are 4 statistics that are often talked about. These are all not completely independent, and are, therefore, easy to get confused about. The 4 main performance-realted statistics collected for a cache are:

MissesPerReference
HitsPerReference
MissesPerInstruction
HitsPerInstruction

MissesPerReference indicates the ratio of how many times, of all the times the cache was accessed, was the cache unable to provide the data requested. HitsPerReference is the complement. It is the ratio of how many times, out of all the times the cache was accessed, was the cache able to provide the data requested. The complementary nature of these quantities makes it necessary to only quote one number.

MissesPerReference + HitsPerReference = 1

MissesPerInstruction is not that straightforward. It is the ratio of how many times the accesses to the cache missed, out of all the “instructions in the workload”! This is a strange measure at first, and takes some getting used to. It’s the ratio if a design-dependent variable quantity - cache misses, to an unchanging quantity, the number of instructions in the workload. In the earlier measure of missesPerReference, the denominator, the number of references to the cache, was variable depending on what happened before this cache was reached by the instruction flow. The number of reference to this cache depends on how well the previous levels of cache did. HitsPerInstruction, similarly is the ratio of the number of hits in the cache to the total number of instructions in the workload. MissesPerInstructions and HitsPerInstruction do NOT add up to 1! They add up to the number of references made to the cache per instruction.

These 4 quantities are depicted by an example in Figure1.

Figure1

Why collect PerInstruction statistics

So the question is, “what is the reason we need these two ways of looking at a cache’s performance?”. Why doesn’t the simpler PerReference style of collecting stats suffice? The answer is that the PerReference numbers are great for designing the cache under study provided the number of references does not change, which implies, the design before the cache in study remains unchanged. The PerInstruction numbers are useful to get an idea of how much the cache under study is contributing to the overall CPI. We will shortly see how that overall CPI is calculated from the PerInstruction numbers. Before the PerReference numbers can be used to improve cache designs, the PerInstruction numbers must point out that it is worthwhile to go after a cache design because the cache is adding a big component to the CPI of the system. And we want to keep the CPI low as far as possible. That is why we want to keep track of the PerInstruction statistics.

How to breakdown overall CPI into components

There are two ways to break the overall CPI, or another way to look at it is, two way to build the overall CPI.
One way is to look at what fraction of instructions “pass through” each level (whether they hit or miss at that level is immaterial) in the hierarchy and add up the cycles they spend at each level. The idea is depicted in Figure 2.
The other way is to look at what fraction of instructions “finish at” each level (i.e. they hit) and add up the cycles those fractions take to finish their journey through the memory hierarchy. The idea is depicted in Figure 3.

Method 1: Breaking CPI into components for each level of the memory hierarchy an instruction passes through

Figure2

This method is often used to represent a CPI stack, which assigns a portion of the CPI to each level. This is very useful in figuring out which level of the cache to go after for improvements.

CPI = CoreComponent + L1Component + L2Component + L3Component + MemComponent

Here the overall CPI is reached by adding CPI contributions at each level of the memory hierarchy. Simulation models for the core often generate “infinite” CPIs. Infinite CPIs are used to keep the effect of the memory hierarchy separate from the effect of the core design, so separate teams can work on both models in parallel. The effect of the memory hierarchy on the CPI has to be added in to the “infinite” CPI to get the overall CPI.

infL1CPI = CoreComponent + L1Component
infL2CPI = CoreComponent + L2Component + L2Component
and so on.

The calculation of infL1CPI assumes the L1 always hits, but has the normal hit latency. So it is an infinitely large L1, but miraculously the directory lookup and array access takes time equal to a realistic sized L1.
The calculation of infL2CPI assumes the L2 always hits, but has the normal hit latency. The L1 is realistically modeled, the L2 is the infinite L2 with correct hit latency modeled.

Going back to
CPI = CoreComponent + L1Component + L2Component + L3Component + MemComponent
Say we have cache models for the L1,L2, L2 caches and the memory. So how do we get the L1Component, L2Component, L3Component and MemComponent, such that we can get the overall CPI for the system. This is where the hitsPerInstruction and missesPerInstruction numbers come in very handy.
L1Component = L1ReferencePerInstruction * L1Latency –(A1)
L2Component = L2ReferencesPerInstruction * (L2Latency - L1Latency) –(A2)
L2Component = L3ReferencesPerInstruction * (L3Latency - L2Latency) –(A3)
MemComponent = MemReferencePerInstruction * (MemLatency - L3Latency) –(A4)

Notice that each component is a product of the fraction of instructions that reference that level and the amount of time those instructions which reference that level are guaranteed to spend at that lavel. Two observations here:
1. the ReferencesPerInstruction at each level is the sum of hitsPerInstruction and missesPerInstruction for that level
2. an easy way to determine the common amount of extra time introduced by a level into the overall execution of an instruction is to find the difference between the time when an instruction hits at that level and one level before.

Method 2: Breaking CPI into components for each level of the memory hierarcy an instruction finishes at

Figure3

This method is often used when the aim is to simply calculate the overall CPI and we are not dealing with breaking the CPI into its CPI stack components.

CPI = CoreComponent + (L1HitsPerInstruction*L1Latency) + (L2HitsPerInstr*L2Latency) + (L3HitsPerInstr*L3Latency) + (MemHitsPerInstr*MemLatency) –(B)

Another way to say this is
CPI = Time spent by all instructions which do not have to even go to the L1Cache(s)
+ time spent by the fraction which hit L1 and face the L1Latency on top of the CoreComponent
+ time spent by the fraction which hit L2 and face the L2Latency on top of the CoreComponent
+ time spent by the fraction which hit L3 and face the L3Latency on top of the CoreComponent
+ time spent by the fraction which hit memory and face memory latency on top of the CoreComponent

Notice that each component looked at this way is the product of the fraction of instructions which hit in a particular level of the hierarchy and the time it takes to service those instructions.

CAVEAT: There is a distinct difference between the CoreComponent component in equations A and equation B. In equations A, CoreComponent applies to all the instructions, since all the instructions pass through the core. In equation B, the coreComponent applies to only those instructions which do not have to leave the core to access L1Cache (or caches, since Level 1 is usually broken into separate instruction and data caches). To simplify this, simulation models of the core often are tightly coupled with at least the L1cache or L1caches. There is a significant amount of overlap in accessing the caches, especially the level 1 cache and therefore the time spent in accessing one instruction or data granule might be able to hide the time to access another. This could adversely affect the CPI calculations. That is why this style of breaking up the CPI into its components and adding them up, only works as an estimate. A realistic CPI can only be realized by building in a more detailed understanding of the paralellism in a workload into the equations above, or by just creating a more accurate simulation model with the core model working hand-in-hand with the memory nest model. One way to make the above equations a little bit closer to reality is by only considering loads (i.e. both instructions and data loads) for the missesPerReference, hitsPerInstruction etc. Stores usually do not directly affect CPI since they are handled by mechanisms that are non-critical to CPI calculations.

Posted By: Anil Krishna at 2:30PM on Sunday, April 30th, 2006

Posted in Tutorials | No Comments »

PseudoLRU Replacement Scheme

October 10th, 2005 admin

I am taking ECE521, a Computer Architecture course at NCSU and am thoroughly enjoying it. One of the topics that was touched upon in the class, and with which I later struggled for a few hours was how pseudoLRU replacement policy works . I could not find useful tutorials on the net and decided to write one for myself. Feel free to use it as is if you find it helpful, or let me know and I can send you the ppt version.

Posted in Tutorials | No Comments »

Branch instructions in the PowerPC Architecture

May 31st, 2005 admin

1 Must-Have Branches and Good-To-Have Branches

Branch instructions show up in a program when sequential execution of instructions in memory is not feasible or possible.

1.1 Not Possible category (Must-have-branches):

When the program’s execution direction can go multiple directions depending on a condition, it becomes imperative to have some kind of branching. This type of branch is called a conditional branch. A condition must be checked to decide which direction the branch would take.

1.2 Not Feasible category (Good-to-have branches):

When a certain sequence of instructions repeats multiple times, it might be put into a loop. That way the instruction sequence is just stored once in memory and executed by jumping back to the start of the loop upon hitting the end of the loop. It is sometimes possible to avoid looping (and compilers often use a technique called loop unrolling) by just laying the instruction sequence out repeatedly in memory and running through it without any branches. It is not possible to unroll the loop effectively if the number of times the loop is executed is unknown up until the point the execution reaches the loop. In that case the example falls under the not possible category, since it requires a condition evaluation to determine how many times to execute the sequence of instructions. Often a sequence of instructions that is used several times and/or at several places in execution is put into a “subroutine” and “called” when required. This kind of subroutine/function calls and returns from those functions to mainline execution, require branch instructions, since these instructions are stored only one place in memory and not necessarily after the calling instruction. These branches could also be avoided by not having function calls and instead inlining code, which is another technique compilers sometimes use to reduce branches. Unconditional branches are branches where the program does not have to check for any condition and can branch to the target address unequivocally. Unconditional branches are branches that are taken no matter what. The instructions executed upon branching could have been simply moved into the main line execution to avoid the branch. That is not feasible because the memory holding the instructions could have a large number of copies of the same sequence in a lot of places, thereby utilizing a lot more memory than necessary.

2 Architectural resources required to handle branches

Hardware that supports PowerPC architecture provides three registers that are used by the various flavors of branch instructions

2.1 Condition Register

This 32-bit register has 8 fields (or 4 bits each) and basically stores bits that describe the state the system is in. There are instructions provided by the PowerPC architecture that can set/clear/perform operations on bits in this register. The bits then are used by conditional branch instructions to determine whether to branch or not (non-branching path is known as the flow through path). Such usage of the condition register is required only by the conditional branches.

2.2 Link Register

When branching to a subroutine it is important to know how to join back with the main line execution when the subroutine is executed. The Link Register is the 64-bit register provided by the hardware to store the address of the instruction that follows the branch instruction. This is the address where execution joins the main line upon completing the subroutine. Since not all branches intend to join back the main line (or, in other words, they are not subroutine calls) this storing of the next instruction’s address into the Link Register has to be requested by the branch instruction (or, in other words, the software).

2.3 Count Register

This 64-bit register is used to hold a loop count, which can be checked as a condition for deciding the branch direction, and also be decremented by a branch instruction. This is clearly useful when a branch is used to determine if the requisite number of loops of a sequence of instructions have been executed.

The Count Register can also be used for an entirely different purpose - to store a target address, although such usage might be a redundancy of the PowerPC architecture.


3 The Branch Instructions – basic (orthogonal) and extended (redundant)

Sections 3.1, 3.2, 3.3 and 3.4 describe the 4 sets of basic branch instructions. These are all the instructions that are required for branching in PowerPC software. Section 3.5 discusses branch instructions that are special cases of the basic instructions, and were used often enough to be rightfully given their own opcodes. These are extended mnemonics, which are good to have but not imperative, since one of the basic instructions could be used for an identical operation.

3.1 b, ba, bl, bla : Unconditional branches to relatively nearby addresses

The target address has to be relatively nearby, to permit use of this instruction because the address is specified in an immediate field in the instruction opcode itself.

The instructions ending with the ‘a’ indicate that the target address is absolute as specified in the instruction opcode itself.

The instructions which do not end in ‘a’ indicate that the target address is relative to the address of the current instruction. So the target address is the sum of the current instructions address and the immediate value in the opcode.

How near is “nearby”?

24 bits in the opcode specify the address. This is extended by a 0b00 (two 0 bits) to make it an address aligned at 4 bytes, which is how every instruction in the machine must be placed in memory. The 26 bits specify 64MB of space. Therefore “nearby” refers to 32MB on either side of the current instruction (relative addressing) or 0-64MB of the programs address range (absolute addressing).

The instructions ending in ‘l’ indicate that the next sequential instruction’s address must be saved into the link register. This is useful when a branch is made to a subroutine and at the end of the subroutine the program need only indicate its wish to return to the main flow of execution. It does not have to know the return address for it can use the address stored in the link register as the target.

Target address is known for these instructions, and can be calculated way in advance in the pipeline, since it is part of the instruction opcode, and in the extreme case might need to be added to the current instruction address.


3.2 bc, bca, bcl, bcla : Conditional branches to relatively nearby addresses

Just as is the case with unconditional branches, for conditional branches too the instructions ending in a and l, mean absolute addressing and branch with link, respectively.

The difference, compared to the unconditional branches, is that two more fields are required BO and BI. Each is 5 bits. BI indicates which bit in the CR (Condition Register) must be looked at to make the branching decision. BO indicates a couple of things.

Firstly it specifies what rule should be used when looking at the BI bit in CR to make the branching decision. So say the BI indicates bit 2. BO might indicate a rule that says branch if the CR bit 2 is 0. BO bits can be used to indicate a slightly more complicated rule such as decrement the Count Register by 1 and then branch if the Count Register holds a non-zero value and the CR bit 2 is 0. There are a few other combinations of conditions that can be checked using the bits in the BO field. These conditions are various combinations of

- Decrementing or not decrementing the CTR (Count Register)

- Branching if the CTR is zero or non-zero and a CR bit is 0 or 1

- Branching if the CTR is zero or non-zero

- Branching if a CR bit is 0 or 1

Secondly, the BO field also provides bits that can be used by the compiler to place hints about the direction the branch is likely going to take. This can help speculatively fetching instructions if the branch is predicted taken by the BO field and the target address is known.

Target address is known for these instructions since it is part of the instruction opcode, and in the extreme case might need to be added to the current instruction address.

How near is “nearby” in this case?

14 bits in the opcode specify the address. This is extended by a 0b00 (two 0 bits) to make it an address aligned at 4 bytes, which is how every instruction in the machine must be placed in memory. The 16 bits specify 64KB of space. Therefore “nearby” refers to 32KB on either side of the current instruction (relative addressing) or 0-64KB of the programs address range (absolute addressing).

If the BO field has 1’s in certain positions (1z1zz – where bit denoted by z is ignored), it indicates that the branch is always taken. So that means it is unconditional! So the four instructions described in section 3.1 on unconditional branch instructions, are actually redundant to a limited extent. The reason they are not completely redundant is because those instructions have 10 more bits for the target address field than the conditional branch instructions described in this section.

3.3 bclr, bclrl

These instructions represent conditional branches to the address specified in the link register. The bclrl, as indicated by the l at the end of the opcode, puts in the address of the next sequential instruction to the current instruction in the link register or on the link stack, after figuring out the target address form the link register.

The reason unconditional branches to the link register do not require additional PowerPC instructions (like blr or blrl) is because there is a way to specify a condition in the BO field of the instruction, that makes it “always taken”, or in other words, unconditional.

The target address for this instruction depends on the value residing in the Link Register (or Link Stack) at the time this instruction is being executed (and all previous instructions, that write the Link Register have done so). Therefore, due to the dependency on the Link Register, the target address may not be calculated very early in the pipeline, unless there are no older instructions in the pipeline that might write to the Link Register. Even in that case, an access to the Link Register is required to read its value. The instruction does not contain the target address as opposed to the cases in sections 3.1 and 3.2.

In addition to the BO and BI fields described in section 3.2, there is a 2-bit field in the opcode, BH. The BH field provides additional hints the hardware to assist it in fetching the correct target address ahead of time. The hints are of the type “this is a subroutine return”, “this is not a subroutine return so likely the same target address would be used as was used the last time this branch was taken” or “the target address is not predictable”. This is especially useful, when the compiler uses bclr but it is not for a subroutine call. In that case it might be useful to let the hardware know that there is a good chance that the target used the last time this branch was executed would likely be the target used this time too. The hardware can fetch the instructions from that path without having to wait for the Link Register to be ready and read to find the target address, provided the hardware maintains some kind of a cache to remember the previous targets.

3.4 bcctr, bcctrl

These instructions represent conditional branches to the address specified in the count register. The count register is used here to hold a target address, rather than a count. The name “count register”, is inaccurate in this case. The bcctrl, as indicated by the l at the end of the opcode, puts in the address of the next sequential instruction to the current instruction in the link register or on the link stack.

The reason unconditional branches to the count register do not require additional PowerPC instructions (like bctr or bctrl) is because there is a way to specify a condition in the BO field of the instruction, that makes it “always taken”, or in other words, unconditional.

The target address for this instruction depends on the value residing in the Count Register at the time this instruction is being executed (and all previous instructions, that write the Count Register have done so). Therefore, due to this dependency on the Count Register, the target address may not be calculated very early in the pipeline, unless there are no older instructions in the pipeline that might write to the Count Register. Even in that case, an access to the Count Register is required to read its value. The instruction does not contain the target address as opposed to the cases in sections 3.1 and 3.2.

In addition to the BO and BI fields described in section 3.2, there is a 2-bit field in the opcode, BH. The BH field provides additional hints the hardware to assist it in fetching the correct target address ahead of time. The hints are of the type “this is not a subroutine return so likely the same target address would be used as was used the last time this branch was taken” or “the target address is not predictable”. As seen in section 3.3, the BH field has more use as a hint for the branch to link register instructions.

3.5 Extended Mnemonics

Special cases of conditional instructions (sections 3.2, 3.3 and 3.4) have been given their own mnemonics for ease of readability of assembly listings.

3.5.1 blt

blt target is the same as bc 12, 0, target

0 is the BI field, which indicates the 0th bit in the 32-bit CR.

12 is the BO field, which, looking at the detailed meaning of the bits indicates “branch if CRBI is 1”, which in this case means “branch if CR0 is 1”.

Bit 0 of the CR is set to 1 when the result of a fixed-point computation is negative.

So effectively “bc 12,0,target” means “branch to the target if a fixed point computation yielded a negative result”. This happens often enough that “branch to target if less than zero” was given a new mnemonic “blt target”.

3.5.2 bne

bne cr2, target is the same as bc 4, 10, target

10 is the BI field, which indicates the 10th bit in the 32-bit CR. Notice that the 10th bit of the CR is in CR field 2.

4 is the BO field, which, looking at the detailed meaning of the bits indicates “branch if CRBI is 0″, which in this case means “branch if CR10 is 0″.

Bit 2 of a specific field in the CR is set to 1 when the result of a compare is equal to 0. The “specific field” has to be specified by the compare instruction. Result of a compare is 0 when the values being compared are equal. So if the bit in the CR is set to 0, that means the values being compared are not equal.

So effectively “bc 4,10,target” means “branch to target register when bit 10 of the CR is 0”, which shows up under the special circumstance meaning “branch to the target if a compare operation that writes to CR field 2 evaluated the operands to be not equal”. This happens often enough that “branch to target if less than zero” was given a new mnemonic “blt cri, target”. i is whatever CR field the compare used.

3.5.3 bdnz

bdnz target is the same as bc 16, 0, target

BI is 0.

BO is 16, which indicates, decrement CTR and branch is CTR is non-zero. There is no dependence on the CR!

“bc 16,0,target” therefore is “branch after decrement if non zero”, which has been given the special mnemonic bdnz.


3.5.4 And a few more extended mnemonics …

There are a few other extended mnemonics that extend from the base instruction in the same way as the instructions in sections 3.5.1, 3.5.2 and 3.5.3

Just like bc extends to blt, bne and bdnz, bclr extends to bltlr, bnelr and bdnzlr; and bcctr extends to bltctr and bnectr.

Note - PowerPC Architecture, POWER Architecture and PowerPC are registered trademarks of International Business Machines Corporation. Other trademarks used are owned by the respective owners.

Posted in Tutorials | No Comments »

Microsoft Word and embedding fonts into the file

December 1st, 2004 admin

I have often had problem where I create a document in a certain font and the recepient complains that the fonts are hideous. That happens because sometimes I use fonts that the recepient’s computer does not have and so it decides to use a font it thinks matches my font, but the choice is often a sad one. Recently I was working with my father on my wedding invitation (True! Kavita and I are getting married on February 13th 2005 and the news deserves a separate blog entry) and faced the problem of not being sure if he is seeing the same font that I intended for him to see the invitation in (he was going to get the cards printed back in Visakhapatnam, India, so he better know exactly what I had in mind). And I came to learn after looking for advice on the internet, that you can embed fonts right into the document just before you save it. This works in PowerPoint too. The way to do this is do a File->Save As, before a keen eye catches at the top right of the Save As window an option called Tools. Upon clicking that you will find the “Embed True Type Fonts” choice(PowerPoint) either right there or under General Options(Word).

Posted in Family, Tutorials | No Comments »

Microprocessors and microarchitectural innovations

November 2nd, 2004 admin

Added to the articles section a presentation I put together after reading an interesting paper by Andreas Moshovos and Gurindar Sohi. Check it out if you are interested in how microprocessors try to be smart about running programs fast here.

Posted in Tutorials | No Comments »