Shuttle HW and SW info

  • Thread starter James P Garrett
  • Start date
J

Thread Starter

James P Garrett

All,
In an article in
http://computerworld.com/newsletter/0,4902,78135,0.html?nlid=PM, a reference is made to the shuttle's main control computers as being
"Commercial off-the-shelf computer technologies." Anyone out there familiar enough with the stuff to comment on specs, vintage, os, venders, and
certification processes. Maybe some web sources for more information??

Just curious as to how 'cutting edge' the technology really is.
Thanks!

-Jim
 
I remember hearing years ago when the first shuttle missions were flown that they were controlled by DEC PDP11A's. That would be consitent with something designed in the mid 70's. I am sure that there have been upgrades since then, but then maybe not. After all we still have three CNCs that use PDP8A's.
 
B
I'm not an expert, but an interested onlooker, and (based on comments in recent TV news reporting) am guessing they still use '70's era avionics for the main shuttle computers. I don't know how the "glass cockpit" works into the system, and so far, haven't found any detailed information, although it would have made some sense for it to be built from "off-the-shelf" SW and HW components.

The best resource I've found so far with details concerning shuttle hardware and software is the "NSTS 1988 News Reference Manual" http://science.ksc.nasa.gov/shuttle/technology/sts-newsref/

This is the most up-to-date reference on their site, and causes me some concern. It seems incredible there isn't mention of changes and updates conducted in the intervening 15 years (linked to this page, at least), especially in light of statements made during recent interviews of many former and current astronauts and others that a way to keep the public at large interested in manned space travel is to keep them informed.

I'm hoping there'll be someone with the solid background, insatiable curiosity and ability to discriminate between that-which-is and that-which-isn't of the late R.P. Feynman somewhere in the investigative process.

Challenger Disaster - Roger's Commision Report http://science.ksc.nasa.gov/shuttle/missions/51-l/docs/rogers-commission/tab le-of-contents.html

This is excerpted from Feynman's Appendix F - "Personal observations on the reliability of the Shuttle" << To summarize then, the computer software checking system and attitude is of the highest quality. There appears to be no process of gradually fooling oneself while degrading standards so characteristic of the Solid Rocket Booster or Space Shuttle Main Engine safety systems. To be sure, there have been recent suggestions by management to curtail such elaborate and expensive tests as being unnecessary at this late date in Shuttle history. This must be resisted for it does not appreciate the mutual subtle influences, and sources of error generated by even small changes of one part of a program on another. There are perpetual requests for changes as new payloads and new demands and modifications are suggested by the users. Changes are expensive because they require extensive testing. The proper way to save money is to curtail the number of requested changes, not the quality of testing for each.
>>


It would be interesting to learn whether the software assurance program professor Feynman found so impressive contracted a case of "rot" since then. My gut feeling is (although it may not be directly connected to Columbia's loss) we'll come to learn budgetary constraints cut into software validation, and other safety items.

There is another reason I'd like to see independent, capable, and discerning observers in the process. In Challenger's case, the O-ring problem in the SRB assembly had been somewhat discounted internally as other flights (although there had been evidence of seal degradation) had not ended in disaster, and "engineering judgement" had been used to certify the vehicle spaceworthy. I've heard the same "engineering judgement" phrase used this time around the wheel, and it sent a shiver up my spine.

Bob
 
I remember seeing somewhere that they used OS-9 to run the computers. Also, that the shuttle technology WAS cutting edge circa 1970.
 
G

Greg Goodman

An article in today's New York Times discusses the shuttle's onboard computer systems:

http://www.nytimes.com/2003/02/07/national/nationalspecial/07COMP.html

Excerpted from the article:

Columbia's onboard computer hardware and software -- not the crew -- were driving when the craft made its fatal re-entry last Saturday.

Those computer systems, state-of-the-art designs when the shuttle program began in the 1970's, detected a drag under Columbia's left wing and ordered flight-control jets to compensate by steering the craft to the right. Their performance will be investigated as part of the inquiry into why Columbia was destroyed, NASA officials say.

The computers act as the electronic brain of the flight control system. Computer avionics experts say the shuttle program's hardware and software systems, despite their age, have a record of extraordinary reliability. The technology, they say, is a triumph of custom machines and programming code that has been designed and endlessly tested to perform flawlessly in the harsh conditions of space travel.

For this specialized task, they say, mature computers and code are robust and trustworthy instead of an antiquated safety hazard.

[snip]

As for the shuttle's systems, I.B.M. began development work on them in 1972, nine years before the first spacecraft was launched. The company chose the best and most appropriate pieces of technology from its various products and its research laboratories, and came up with a hybrid machine, the I.B.M. AP-101.

Over the years, shuttle scientists have installed improvements to the AP-101, like solid-state electronics for its memory instead of magnetic disks. Yet the basic design of the five onboard AP-101 computers -- black cubes about 18 inches on a side -- remains the same.

The programming language used for these unusual machines is similarly tailored for its task. It is called HAL/S (high-order assembly language/shuttle), and was specially developed for space-flight applications like instantaneous handling of streams of data from shuttle sensors.

The AP-101 computers process data at a tiny fraction of the rate of today's personal computers. Yet today's computers need a lot of processing firepower because they routinely handle big graphics, as well as audio and video files. All of that is important for people playing computer games or downloading music over the Internet but not relevant to the shuttle's performance.

[snip]

The flight-control system on a shuttle craft is designed mainly to process sequence after sequence of numeric data. The data come from sensors on the guidance system, accelerometers measuring acceleration and gyros measuring the rotation of the craft.

The onboard computers, experts say, are designed to process those chunks of numeric data at the rate of perhaps 1,000 times a second.

"That data coming out of the gyros and accelerometers is not going to come out faster," said Col. John Keesee, an Air Force aeronautical engineer and a senior lecturer at the Massachusetts Institute of Technology. "The guidance functions are not pushing you to faster processors."

The shuttle's software team is famed in the industry for the flawless quality of its programming code. It is one of a handful of projects in the world to receive a Level 5 rating from Carnegie Mellon University's Software Engineering Institute for the reliability of its code and the rigor of its testing processes. The guidance system program has more than 400,000 lines of code; recent versions have had less than one error -- and none that degraded the performance of the program, let alone raised safety concerns.

The working environment of the coders who build the shuttle programs is orderly and regimented -- a world apart from that of young hackers, staying up all night to ship new products every few months.

"They have a system of process improvements, design reviews and testing procedures that almost no one else does," said Mr. Schneidewind, a software engineering expert, of the shuttle coders.

[snip] [end]
 
S

Stephen Shelley

If memory serves me, the Orbiter has five IBM AP-101 computers onboard. These are 32-bit computers, derivatives of the old System 360 architecture, and used in other aircraft such as the B-52 (I believe). They were originally programmed in a language called HAL/S (this was prior to the design of Ada), and there are actually two "operating systems" in use at the same time. During launch and reentry, four of the computers are running the same avionics software, with a "voter" component detecting and rejecting any computer which has wandered in its solution. The remaining computer is running a backup flight system which was designed and developed by a different company, and would take over if a common failure was detected with the primary flight computers. While in orbit, I believe individual computers were utilized for mission application purposes.

The original flight instrumentation was pretty standard for that time, though it did utilize some multi-function displays. Nothing esoteric, and the first shuttles did not even have a HUD (head-up display), though I believe they were added in and around the time of the launch of Atlantis. I am not familiar with the current "glass cockpit" design as used on Columbia, though I would expect it to be on the same level of technology of current commercial flight decks. I do remember that the original Orbiters had a hexadecimal keypad to allow the entry of "patches" to mission software - part of standard flight operational procedures.

I must confess that the previous is based on my recollections of work I was involved with many, many years ago. Much more definitive information can probably be found by roaming around NASA websites, such as http://www.hq.nasa.gov.

Best regards, Steve Shelley
 
F

Fred Townsend

I have been following this thread for some time. I am amazed at the poor quality of the information in some of the referenced articles. It seems some of the reporters haven't bothered to read their own paper's achieves from the seventies. Perhaps because the computer achieves don't go back that far and they are too lazy to go down to morgue in the basement and lookup the original articles. Their articles (new ones) seem to be hell bent at finding fault rather than reporting the facts and they distort the facts they do report. For instance the only major upgrade to the computers has been the replacement of the magnetic core memory (not magnetic disk) with solid state memory.

Let me dispel any notion that off the shelf hardware was used. Nothing could be further from the truth. The computer system is completely unique starting with the architecture and going down to the code. The architecture is double-paired with referee, fault tolerant meaning there are two sets of paired computers. A fifth referee computer makes sure all the computers agree and determines which one is wrong if disagreement occurs. The thing that makes the design so unique is each pair of computers are using different code and a different kernel. The development contract was split between different manufactures. Each pair of vendors (hardware and software) were forbidden from talking to the other pair vendors so the same "mistake" would not be repeated.

Yes there have been problems with the shuttle computer in the past. However the problems were detected by the systems own referee computer. So in a sense there have been computer faults but never a computer failure. So one can guarantee there will never, ever be a computer failure but given the probability of computer failure versus the probability of hardware failure, I'd bet on a hardware failure.

Fred Townsend
 
R

Ramer-1, Carl

There's a Web page for anyone interested in Shuttle information. Rather than perusing second hand reports, get it from the source.

http://www.ksc.nasa.gov/ >Reference Manual >Space Shuttle Orbiter Systems >Avionics Systems

Carl Ramer, Engineer
Controls & Protective Systems Design
Space Gateway Support, Inc.
Kennedy Space Center, Florida
(V) 321-867-1812
(F) 321-867-1495
 
Refering to the referee computer or rather the fifth computer: How do we know that the referee computer is working properly? In some cases of TMR that I am aware of each computer gets all the three input values and each processor has a voting or referee logic and the 2 of 3 are selected. I have also observed that in some rare cases, some errors do exist or some loose ends attributable to human errors while programming these processors, One such case was in a GT using the individual fuel flow signal rather than the signal after voting or 'referee'ing' of the three 'available' flow signals, resulting in a rare trip.

Some new thought and standard on the architecture of the referee computer needs to be done.

In another of the rare failures that I have seen is a 1:1 system failing when the I/O transfer mechanism from one CPU to another failed. Rather there was a relay that was faulty in the I/O transfer mechanism resulting in a trip.

Redundancy does not always mean that the system is fault tolerant and some design errors are always possible.

A bit off topic perhaps but.... IS there any standard on referee computers? If not then we do need one!

Regards,
Anand Iyer
http://www.smbd.org
 
G

Garrett, James P

Thanks to all that responded in this thread. The information and links put forward more than answered my questions.

-Jim
 
F

Fred Townsend

Failure analysis deals with single point, double point, and multipoint failures. Most systems are single point fault tolerant, double point fail graceful meaning that any single failure will not effect performance. Any double point failure will not crash the aircraft but the pilot may be called upon to exercise a greater degree of skill. The space shuttle flight computer is double point fault tolerant, multipoint fail graceful. This does not say that a shuttle flight computer could never cause a crash but it does make the odds pretty low.

Anand I think you confuse aspects of failure proof, fault tolerance, and fail graceful. There is no such thing as a fault proof computer therefore we have fault tolerance. All flight computers are fail graceful as well. We have a rich history of failure of fail graceful flight computers. Many airplanes, such as the Air Bus and the B2 fly totally by wire. We shoot military flight computers full of holes! I have never seen a NTSC or US military review board report that said an aircraft crashed because of flight qualified computer failure. I say "flight qualified" because there have been some developmental aircraft (the Cheyenne Helicopter comes to mind) that crashed because of an analog computer failure. Those systems never made it to market. They were never flight qualified.

So far as I know there is no specific standard for referee computers nor do I see any need for one. The standard for flight computers more than handles referee computers.

The absurdity of the discussion of the computer as a source of the shuttle crash is there are many single point failure modes that are not fault tolerant on the shuttle. If a wing falls off, it's stick you head between your legs and kiss your a-- good bye time. The same is true of a Piper Cub and it doesn't have one computer.

Fred Townsend
 
S

Stephen Shelley

If memory serves me...

The 5th computer running the Backup Flight System (BFS) does not automatically take over control, but must be manually engaged by the flight crew. It actively monitors the I/O traffic of the primary systems during ascent and reentry and annuciates a warning to the crew if sync is lost. At this point the crew gets to decide who to believe, and engages or resets the BFS as (hopefully) appropriate. The other 4 computers are majority wins, if I remember correctly. If a tie... would have to look that up. :)

Steve Shelley formerly of Intermetrics, Inc.
 
Hi,

We many be jumping threads here,

Computer systems do fail and that is why we have fault tolerant or fail graceful or fail safe concepts. But what if in a system this failsafe, fail graceful or fault tolerant mechanism is not properly implemented!

In the case of the Columbia shuttle mishap, I am not saying that these mechanisms were faulty and that caused this mishap, but this is certainly one possibility that cannot be ruled out.

From the little technical news that I have come across, News here being
dominated by war talk and I am sunk upto head in wrok, Some pointers in the shuttle explosion related news, do indicate that: 1. There was a failure of instrumentation systems, Meaning that whatever failsafe or fail graceful mechanisms were there , they failed. One possibility is that there was structural damage or mechanical damage which caused the instrumentation/indication system to fail. Question is were all instrumentation located at vulnerable spots!!! 2. Whether the subsequent explosion was due to instrument/control system malfunction or there was a mechanical failure (failure of structural leading to explosion where instrumentation/control could do nothing) is not clear to me.

And the most important thing is, if we do assume that there was a instrumentation/control failure, which resulted in subsequent explosion, then we need to analyze the fail safe/ fault tolerant/ fail graceful mechanisms!

OF course, the concerned agencies will study and part of the findings may become public knowledge now or later and part may be shrouded in secrecy forever except for a few, who may not come to the Alist...

But the point is that we cannot rule out referee computer failures or failures in refereeing mechanisms entirely.

Looking at refereeing or voting Mechanisms: Referee or voting can be done in various ways.

When we say Referee, then there is the case of a separate computer as a referee or there could be referee logics built into redundant processors.

or the referee function could be done with non intelligent hardware....

Taking the case of referee computers for TMR implementations, we have no way of knowing or analyzing whether the referee will actually work in all cases.

In the case that I quoted, the TMR system had 3 inputs for fuel flow and 3 outputs going to a 3 coil servo valve. But fuel flow was not referee'ed and hence one coil got very high output signal, secondly the output could also have been referee'ed, but this was not done and hence there was perhaps a once in a lifetime trip.

Thus when we say TMR what do we mean? 1. We have 3 inputs and each is processed independently by respective processors! 2. We have 3 inputs and each is referee'ed and then the signal is processed. 3. Do we do output voting or refereeing? 4. Do we keep refereeing/voting mechanisms at the final control element?

So there is sufficient ambiguity in the term TMR to call for some standard on TMR itself. We can generalize and call for a standard for refereeing and have one standard that could be applicable from 1:1 systems to a pentagonal system or higher.

In the case of many 1:1 systems there is a I/O transfer mechanism. This mechanism acts in place of the referee, with referee logic being built into the redundant processors (in most cases). Now this mechanism may not be 1:1 or could be single relay and that is definitely going to fail the1:1 system in case there is a failure at this end.

I have seen several failures in referee computer systems and hence believe there is a standard needed. This standard could be applied to all industries. The standard should clearly define the types and methods of refereeing/voting mechanisms. There can also be some guidelines for selection of a proper system for the application concerned.

And In one of the classical cases of catch phrases, which sometimes misleads the user,

in some cases, users ask for and people quote 1:1 redundant systems, while the quoted system could be warm standby, and the user may have hot standby in mind. These two concepts are somewhat well understood in the world today. But there are many new ones that crop up all the time....

I think that we are jumping threads here, ....

Regards, Anand
****************************************
Visit http://www.smbd.org
The website that changes regularly
Cruise Control project ongoing on www.smbd.org
 
F

Fred Townsend

> Computer systems do fail and that is why we have fault tolerant or fail
> graceful or fail safe concepts. But what if in a system this failsafe, fail
> graceful or fault tolerant mechanism is not properly implemented!

I believe that's called a fault.

> In the case of the Columbia shuttle mishap,
> I am not saying that these mechanisms were faulty and that caused this
> mishap, but this is certainly one possibility that cannot be ruled out.

I don't think anybody is ruling out computer failure. We are just taking the known facts and applying the logic of when you hear hoof beats, you look for horses before you look
for zebras.

> >From the little technical news that I have come across, News here being
> dominated by war talk and I am sunk upto head in wrok,
> Some pointers in the shuttle explosion related news, do indicate that:
> 1. There was a failure of instrumentation systems, Meaning that whatever
> failsafe or fail graceful mechanisms were there , they failed.

Fact or conjecture? The little bit of telemetry I have read of suggests conditions that would kill the crew. In other words the computer was running until after the crew died.

> One
> possibility is that there was structural damage or mechanical damage which
> caused the instrumentation/indication system to fail. Question is were all
> instrumentation located at vulnerable spots!!!

Instrumentation is frequently in vulnerable spots. Fact of life. How about the sensors on the rocket motor gimbals. Dante's Infernal!

Anand, are you saying that sensor failure = computer failure?

> 2. Whether the subsequent explosion was due to instrument/control system
> malfunction or there was a mechanical failure (failure of structural leading
> to explosion where instrumentation/control could do nothing) is not clear to
> me.
>
> And the most important thing is, if we do assume that there was a
> instrumentation/control failure, which resulted in subsequent explosion,
> then we need to analyze the fail safe/ fault tolerant/ fail graceful
> mechanisms!
>
> OF course, the concerned agencies will study and part of the findings may
> become public knowledge now or later and part may be shrouded in secrecy
> forever except for a few, who may not come to the Alist...
>
> But the point is that we cannot rule out referee computer failures or
> failures in refereeing mechanisms entirely.
>
> Looking at refereeing or voting Mechanisms:
> Referee or voting can be done in various ways.
>
> When we say Referee, then there is the case of a separate computer as a
> referee or there could be referee logics built into redundant processors.
>
> or the referee function could be done with non intelligent hardware....
>
> Taking the case of referee computers for TMR implementations,
> we have no way of knowing or analyzing
> whether the referee will actually work in all cases.
>
> In the case that I quoted, the TMR system had 3 inputs for fuel flow and 3
> outputs going to a 3 coil servo valve. But fuel flow was not referee'ed and
> hence one coil got very high output signal, secondly the output could also
> have been referee'ed, but this was not done and hence there was perhaps a
> once in a lifetime trip.
>
> Thus when we say TMR what do we mean?
> 1. We have 3 inputs and each is processed independently by respective
> processors!
> 2. We have 3 inputs and each is referee'ed and then the signal is processed.
> 3. Do we do output voting or refereeing?
> 4. Do we keep refereeing/voting mechanisms at the final control element?
>
> So there is sufficient ambiguity in the term TMR to call for some standard
> on TMR itself.

You can end your ambiguity argument easily. The shuttle isn't TMR. I'm not sure what you would call it. Something like bi-modular, double fault tolerant. There is no ambiguity
in fault detection. There is a fault or there isn't a fault. This is very simple code. No voter required. The problem is writing the error handler. This is also why years are
spent testing the error handling routes.

> We can generalize and call for a standard for refereeing and have one
> standard that could be applicable from 1:1 systems to a pentagonal system or
> higher.
>
> In the case of many 1:1 systems there is a I/O transfer mechanism. This
> mechanism acts in place of the referee, with referee logic being built into
> the redundant processors (in most cases).
> Now this mechanism may not be 1:1 or could be single relay and that is
> definitely going to fail the1:1 system in case there is a failure at this
> end.
>
> I have seen several failures in referee computer systems and hence believe
> there is a standard needed.
> This standard could be applied to all industries.
> The standard should clearly define the types and methods of
> refereeing/voting mechanisms.
> There can also be some guidelines for selection of a proper system for the
> application concerned.
>
> And In one of the classical cases of catch phrases, which sometimes misleads
> the user,
>
> in some cases, users ask for and people quote 1:1 redundant systems, while
> the quoted system could be warm
> standby, and the user may have hot standby in mind. These two concepts are
> somewhat
> well understood in the world today. But there are many new ones that crop up
> all the time....

Again I think your argue the implausible. To the best of my knowledge hot standby, much less warm standby, systems are not allowed in flight systems. It would take too long to
"change over" and no way to reacquire data in some cases. Therefore only parallel systems are used.

Fred Townsend

> I think that we are jumping threads here, ....
>
> Regards,
> Anand
> ****************************************
> Visit http://www.smbd.org
> The website that changes regularly
> Cruise Control project ongoing on www.smbd.org
>
> Fred Townsend wrote:
> > Anand I think you confuse aspects of failure proof, fault tolerance, and
> fail graceful. There is no such thing as a fault proof computer therefore
> we have fault tolerance. All flight computers are fail graceful as well.
> We have a rich history of failure of fail graceful flight computers. Many
> airplanes, such as the Air Bus and the B2 fly totally by wire. We shoot
> military flight computers full of holes! I have never seen a NTSC or US
> military review board report that said an aircraft crashed because of flight
> qualified computer failure. I say "flight qualified" because there have
> been some developmental aircraft (the Cheyenne Helicopter comes to mind)
> that crashed because of an analog computer failure. Those systems never made
> it to market. They were never flight qualified.
> >
> > So far as I know there is no specific standard for referee computers nor
> do I see any need for one. The standard for flight computers more than
> handles referee computers.
> >
> > The absurdity of the discussion of the computer as a source of the shuttle
> crash is there are many single point failure modes that are not fault
> tolerant on the shuttle. ...<
 
Comments below.

> I don't think anybody is ruling out computer failure. We are just taking the known facts and applying the logic of when you hear hoof beats, you look for horses before you look
> for zebras.

> Fact or conjecture? The little bit of telemetry I have read of suggests conditions that would kill the crew. In other words the computer was running until after the crew died.

Either we have to assume that the computer failed first or it failed last. If it failed after the crew was no more, then we cannot attribute all to computer failure. Then structural or other failure leading to subsequent computer failure can be the conclusion we in Alist could draw based on the limited information available with us.

> Instrumentation is frequently in vulnerable spots. Fact of life. How
about the sensors on the rocket motor gimbals. Dante's Infernal!
>
> Anand, are you saying that sensor failure = computer failure?

Sounds idiotic at times, but I have previously commissioned a couple of systems that had I/O's on PC's and found PC hung due to noise problems in field and twice in my career have seen PLC hung on power disturbance. It depends on the design. I think in a shuttle, such elementary things would have been well taken care of, but certainly an area to countercheck. Were cards having proper isolations? Could failure of these isolations lead to problems in the computers?

> You can end your ambiguity argument easily. The shuttle isn't TMR. I'm not sure what you would call it. Something like bi-modular, double fault tolerant. There is no ambiguity
> in fault detection. There is a fault or there isn't a fault. This is very simple code. No voter required. The problem is writing the error handler. This is also why years are
> spent testing the error handling routes.

We are talking of something similar to QMR here (quadraple Modular redundant) used in PLC systems nowadays in process industries. Same arguments as with TMR.

1:1 redundant and TMR and some additional logics, can be used for higher
systems.

> Again I think your argue the implausible. To the best of my knowledge hot
standby, much less warm standby, systems are not allowed in flight systems.
It would take too long to
> "change over" and no way to reacquire data in some cases. Therefore only
parallel systems are used.

No Arguments here, but this was anyway for an example only.


Regards,
Anand

******************************
Visit http://www.smbd.org
The website that changes regularly
Cruise Control project ongoing on www.smbd.org
 
F

Fred Townsend

Sorry it is NOT similar to QMR. Just because 2+2=2*2 doesn't mean addition and multiplication are the same math processes. Everything is paired. Two paired modules paired with two os, two kernels, etc. It is exactly because of the ambiguity of TMR (or QMR) that approach was rejected.

Fred Townsend
 
Hi Fred,

yes, the system that you say is certainly different from QMR.

On the lighter vein, Looks like different software systems working together based on some protocols. Reminds me of old English saying "Too many cooks, spoil the broth". The people on this wild goose chase are going to have one He.. of a time finding 'who went wrong first!'

Regards, Anand
 
Top