Cached at:
05/30/26, 04:27 PM
# Fermat's Library | It Takes Two Neurons To Ride a Bicycle annotated/explained version.
Source: [https://fermatslibrary.com/s/it-takes-two-neurons-to-ride-a-bicycle](https://fermatslibrary.com/s/it-takes-two-neurons-to-ride-a-bicycle)

ItTakesTwoNeuronsToRideaBicycle
MatthewCook
∗
Abstract
Pastattemptstogetcomputerstoridebicycleshaverequiredaninor\-
dinateamountoflearningtime\(1700practiceridesforareinforcement
learningapproach\[1\],whilestillfailingtobeabletorideinastraight
line\),orhaverequiredanalgebraicanalysisoftheexactequationsof
motionforthespecificbicycletobecontrolled\[2,3\]\.Mysteriously,hu\-
mansdonotneedtodoeitherofthesewhenlearningtorideabicycle\.
Herewepresentatwo\-neuronnetwork
1
thatcanrideabicycleinade\-
sireddirection\(forexample,towardsadesiredgoaloralongadesired
path\),whichmaybechosenorchangedatruntime\.
Justaswhenapersonridesabicycle,thenetworkisveryaccuratefor
longrangegoals,butintheshortrunstabilityissuesdominatethebehav\-
ior\.Thishappensnotbyexplicitdesign,butarisesasanaturalconse\-
quenceofhowthenetworkcontrolsthebicycle\.
1Introduction
Thetaskofridingabicyclepresentsaninterestingchallenge,whetherforhumanorfor
computer\.Wedonothavegreatinsightastohowwerideabicycle,andwedonothave
muchusefuladviceforsomeonewhoislearning\.
Infact,inthecourseofthisproject,Ihadthechancetoridea“virtualbicycle”onthe
computer,andIwassurprisedtofindhowcounterintuitiveitis\.Ihadthoughtthat,knowing
perfectlywellhowtorideabicycleinreallife,itwouldbenoprobleminsimulation\.
However,inreallifetheremustbeadditionalinertialcuesthatIsenseorleaningactions
thatImakewhicharemissingfromthesimulation,sinceIhadtolearn,asiffromscratch,
whatcuestoattendtoandhowtoreacttothem\.Ieventhoughtatfirstthattheremustbea
buginthesimulator,sincetoturnrightIfoundIhadtopushthehandlebarstotheleft\.Of
course,ifyoustoptothinkaboutit,thatisexactlycorrect\.Toturnright,thebicyclehasto
leantotheright,andtheonlywaytomakethathappen
2
istoshiftthepointofcontactwith
thegroundtotheleft,whichrequiresaninitialpushtotheleft\.Butthen,oncethebicycleis
leaningtotheright,itwillitselfpushthehandlebarstotherightduetohowitisconstructed
forstability,
3
withaforceevengreaterthanyourinitialpushtotheleft,somaintaininga
∗
CaliforniaInstituteofTechnology,MailStop136\-93,Pasadena,CA91125cook@paradise\.caltech\.edu
1
Actually,thetitleofthispaperisunproven\.Wehavenotruledoutthepossibilitythatasingle
neuroncouldrideabicycle\.
2
Thisisignoringthetorqueeffectduetothespinningofthefrontwheel,butifyoutakethatinto
account,ittoohasexactlythesameeffectastheeffectdescribedabove\(pushingtotheleftmakes
youleantotheright\)\.
3
Seefootnote7onpage6\.

constantgentleleftwardpushdoesindeedcausethebicycletoturntotheright\.Similarly,
tocomeoutoftherightwardturn\(oreventomaintainit\),youneedtopushthehandlebars
gentlytotheright\.
Inthispaperweoutlinethevariousportionsofourproject,whichhasledustofinda
surprisinglycompetenttwo\-neuronnetwork\.Differentportionsofthisworkarelikelyto
beofinteresttodifferentpeople\.Thereadershouldfeelfreetoskipoversectionsthatare
notofinterest—thepaperhasbeenorganizedsothatskippingaheadshouldnotresultina
lossofunderstandingwhenreadinglatersections\.
2Methodology:OverviewoftheSimulatorSystem
2\.1ThePhysics
Inordertoallowustoexperimentwithdifferentbicyclecontrollers,wefirsthavetosetup
avirtualbicycleforthemtocontrol\.Theequationsofmotionforabicyclearesomewhat
complex\[2\],soitseemsnomorecomplicatedandmuchmoreusefultojustwriteageneral
robotsimulator,whichcanreadadescriptionofanarbitraryrobot\(rigidbodieslinked
byhinge\-likeconnections\),andsimulatehowthatrobotwillmovegiventheforcesbeing
appliedtoit\.Thisentailscalculatingthemomentsofinertiaforeachrigidbody,simulating
themotionofasinglerigidbodygivenforcesactingonit\[5\],andsolvingasystemof
equationsateachstepforhowthehinge\-likeconnectionscanapplyforcestothepartsof
therobotsothatthealignmentandco\-locationrequirementsofthehingesaremet\.
4
2\.2TheBicycleRobot
Oncewehavesuchageneralpurposephysicssimulator,thenwecanturntosettingupa
robot,inthiscaseabicycle\.Abicycleiscomposedoffourrigidbodies:thetwowheels,
theframe,andthefrontfork\(thesteeringcolumn\)\.Eachadjacentpairofpartsisconnected
withajointthatallowsrotationalongadefinedaxis,andthewheelsareconnectedtothe
groundbyrequiringthattheirlowestpointmusthavezeroheightandnohorizontalmotion
\(nosliding\)\.
Figure1:Thevirtualbicycle\.
Beyondspecifyingtheconstructionandconnectionsthatformthebicycle,weneedtode\-
cidewhatsensoryinputshouldbeavailabletothecontroller,andhowthecontroller’s
outputsshouldbeconvertedintoforcesonthebicycle\(inroboticsterms,whatthesensors
4
Eventhesupportofthewheelbythegroundcountsasahinge\-likeconnectionforthispurpose\.
andactuatorsshouldbeforthisnon\-holonomicunder\-actuatedsystem\)\.Forourbicycle,
weallowalltheeasilyperceivablequantitiestobeavailabletoaninterestedcontroller:
Position,heading,speed,angleofthehandlebars\(anditsrateofchange\),andtheamount
thebicycleisleaning\(anditsrateofchange\)\.Foractuators,weallowatorqueontheback
wheelandatorqueonthehandlebars\.Humansalsomakegooduseofleaningtooneside
ortheotherwhentheyride,butwewillnothavesuchacontrolontheriderlessbicycle\.
Also,wedonotallowthecontrollertoknowthespecificsofthebicycle,suchasitsexact
proportionsorthemassesofitsparts\.
2\.3TheController
Oncewehavesetuptherobotbicycle,wecanturntothetaskofinterest:Designinga
controllerforthebicycle\.Wewantthecontrollertosolvethesameproblemthatahuman
solveswhenridingthebicycle\.Thehumanknowswheretheyare,whichwaytheyare
going,howfasttheyaregoing,howthebicycleisleaning,andsoon,butasweknowfrom
experiencethehumandoesnotneedtoknowthespecificsoftheconstructionofthebicycle\.
Herewearefinallyfacedwithaproblemthatwedonot,apriori,knowhowtosolve\.So
westareattheceilingforawhile,andwheneverwearestruckwithsomeinspiration,we
quicklywriteacontrollerbasedonit\.
Therearethreemainstylesofcontroller\(prescient,human,andtwo\-neuron\)thathaveled
ustointerestingresultsorobservations,andwewilldiscusstheminthenextthreesections\.
Noneofthemmadesignificantuseofthespeed—theyallmanagedtocontrolthebicycle
usingjustthehandlebars\.Wewillnotdiscussherethosecontrollerswhichdidnothingbut
crashthebicycleateveryopportunity\.
3ThePrescientController:ALookatReinforcementLearning
Oneinterestingideaforacontroller,giventhattheentiresystemisbeingsimulated,isto
letthecontrollercheatbygivingitaccesstothesimulator\.Thiscouldnotbedonewitha
controllerforabicycleintherealworld,soitisnotofinterestforapplications,butwecan
certainlytryitinthesimulatedworldtoseewhathappens\.
Inparticular,wecantrythefollowingalgorithmforthecontroller:Ateachstep,first
simulateandcomparethreeactions\.Theactionsonlydifferinhowthehandlebarsare
pushedatthefirstinstant:pushedleft,pushedright,ornottouched\.Theremainderofeach
ofthethreeactionsistodonothinguntilthebicyclecrashes\.Thesethreeactionscanthen
becomparedonthebasisofwhichonecausesthebicycletoremainuprightforthelongest
time,whichoneresultsinthemostprogresstotheright,orwhateverothercriterionone
decidestooptimize\.Aftersimulatingtheresultsofthethreeactions,thecontrollerdecides
whattodoatthisinstantbasedonthoseresults\.\(Eachdifferentcriterionisthusthebasis
foradifferentcontroller\.\)
Thesesimulationsweretriedwithandwithoutrandommildforces\(“wind”\)beingapplied
tothebicycle\.Theoriginalmotivationforthiswassothatthecontrollerwouldnotbeable
torelyonanabsolutelyperfectpredictionofthefuture\.Itmightalsohelpthecontrollerto
haveamore“continuous”behavior,sinceoverthecourseofseveralconsecutiveinstants,it
wouldbegettingaroughestimationoftheprobabilitydistributionforsuccessofeachofits
actions,leadingtothecontrollertakingasimilarlydistributedaction\.However,suchwind
turnedoutinfacttohavenosignificanteffectontheresults\.
Inthelanguageofreinforcementlearning,suchacontrollerisexactlywhatyouwould
getafteronestepofpolicyiteration,ifyoustartwiththenullpolicyofnevertouching
thehandlebars,andallowyourselfthreeactionsateachstep\(pushleft,pushright,orno

Figure2:Instabilityofanunsteeredbicycle\.Thisshows800runsofabicyclebeingpushedtothe
right\.Foreachrun,thepathofthefrontwheelonthegroundisshownuntilthebicyclehasfallen
over\.Theunstableoscillatorynatureisduetothesubcriticalspeedofthebicycle,whichlosesfurther
speedwitheachoscillation\.
push\)\.Thenifthecontrollerlearnsthevaluefunctionforthispolicy\(whichinpractice
wouldrequirelotsofexperiencewithnottouchingthehandlebars,butwhichwesimulate
bygivingthecontrolleraccesstothesimulator\),itcanthenactgreedilywithrespecttothat
valuefunction\.Thisamountstoonestepofpolicyiteration,andatleastforthegoalofnot
fallingover,anoptimalpolicyisindeedobtainedafterasingleiteration\(i\.e\.,itsuccessfully
doesn’tfalldown\)\.However,itdoesnotdothisinaconventionalway,saybyridingina
straightline,butrathermanagestomaintainstabilityatnear\-zerospeedbydoingstunts
withthefrontwheel,forexamplebyspinningthehandlebarsincircles\(thehandlebars
andfrontwheeldonotbumpintotheframeforourbicycle,andtherearenocablestoget
twisted,sowhynot?\)\.Amovieofthisbizarrebehaviorcanbeseenat:
http://www\.paradise\.caltech\.edu/∼cook/Warehouse/RecursiveBike\.avi
Despitemanyattemptsatformulatingasensiblevaluefunction,wefounditdifficultto
getsensiblebehavioroutofthebicycle\.Byrewardinguprightness,thebicyclewouldstop
ridingnormallyandstartdoingstuntsasdescribedabove\.Ifwetriedtodiscouragethisby
rewardingspeed,thebicyclewouldswoopfromsidetoside,whereeachswoopresultsina
temporaryincreaseinspeed\.Ifwetriedtodiscouragethisbyrewardinggoinginastraight
line,thebicyclewoulddothisverynicely,butofcourseitwouldfalloverrightaway,as
avoidingthefallwouldhaverequireddeviatingfromthestraightline\.Ofcourse,onecould
tryweightedcombinationsoftheseorotherideas,butthenthequestionstartstobenot
howlongitwilltakethecontrollertolearntoridethebicycle,buthowlongitwilltakeus
tolearnhowtoprogramthecontrollertogetittoridenormally\.Ashasbeenpointedout
bypeoplewhohaveworkedwithreinforcementlearning,itcanbeaverytrickybusiness
tryingtopickagoodvaluefunction\.
Evenifwehadhadmarveloussuccesswiththismethod,itwouldnotbeimmediately
applicabletoamorerealisticsituation,duetoitsrelianceongettinganswersfromthe
simulatorto“whatif?”questions,effectivelyhavingaccesstoanoracle\.However,it
wouldhaveprovidedahintthatreinforcementlearningcouldbeusedeffectively\.Asit
was,thehintwegotwasthatreinforcementlearningwouldhaveahardtimeofit\.Thisis
alsothehintweweregettingfromRandløvandAlstrøm’snicepaper\[1\],wheretheytried
variousreinforcementlearningmethodstogettheircontrollertolearntorideabicycle,but
evenwiththebestmethodsittookthousandsofpracticeridestolearnnottofalldown,
andthousandsmoretolearntoridetowardsagoal,andevenafterhaving“learned,”the

controllerappearedtohavearatherdrunkenbehavioratbest\.
4TheHumanController
Onevenerablemethodoflearningistobetaughtbyanexpert,perhapsbyexample\.The
obviousexpertinthiscasewouldbeaskilledhuman\.Toenablethis,wehadthecomputer
presentareal\-timegraphicaldisplayofthebicycle,allowingahumantousethekeyboard
tocontrolthepedalingandthepushingofthehandlebars\.Asformyself,thisledtothe
experiencedescribedintheintroduction\(onpage1\),whereIdiscoveredthatcontrolling
thebicycleisquitecounter\-intuitive\.
5
Thehumancontrollerswhotriedtolearntodrivethevirtualbicyclefoundsubjectivelythat
itwasimportanttopayattentiontotheangleatwhichthebicyclewasleaning,andtocon\-
centrateonmanipulatingthisangle\.Thisobservationbecamethebasisforthetwo\-neuron
controllerdescribedbelow\.Noneofthehumansbecameproficientatgettingthebicycleto
travelinaprecisedirection,askilldemonstratednicelybythetwo\-neuroncontroller\.One
cannotruleoutthepossibilitythatthehumansmighthavegainedthisskillwithfurther
practice\.
Therearetwowaysinwhichwetriedtousethehumanexpertise\.Onewaywasbyrecord\-
ingwhatwasgoingonduringoneoftheirrides\.Collectingthisdataandthenanalyzingit
forpertinentcorrelationsyieldedsurprisinglylittleinthewayofusefulinformation\.
12345678910111213141516
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
12345678910111213141516
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
1:t
2:x
3:y
4:ΘHheadingL
5:sHspeedL
6:ΓHanglewithverticalL
7:ΑHhandlebarangleL
8:s
i
HintendedspeedL
9:Τ
h
HtorqueonhandlebarsL
10:t
11:x
12:y
13:Θ
14:s
15:Γ
16:Α
Figure3:Thecovariancebetweenmanyofthequantitiesavailabletothecontroller\.Themain
diagonalhasbeensettozerosoasnottobesovisuallydistracting\.Thereisgoodcorrelationbetween
thecontroller’soutputτ
h
\(9\)andtheamountofturning
˙
θ\(13\),whichmightseemtosolvetheproblem
immediately,butunfortunatelythecausalityinthisrelationshipgoesthewrongway:thecontroller
needstoadjustτ
h
inresponsetoγtomaintainstability,whereγ\(6\)and
˙
θ\(13\)happentobestrongly
correlated\.\(Thiscorrelationbetweenγ\(6\)and
˙
θ\(13\)canbeimmediatelyunderstoodbasedonthe
simplestphysicsofcentrifugalforce,butwedonotwantourcontrollertoneedknowledgeofany
physicallaws\.\)Thequiteusefulcausalrelationbetweenτ
h
\(9\)and˙γ\(15\)isvisible,butthereislittle
heretoclueoneintoitsimportanceforsuccessfulcontrol\.
Theotherwayinwhichwetriedtousehumanexpertisewasbyhavingthehumanssubjec\-
tivelydescribehowtheyweretryingtocontrolthebicycle\.Sincetheyhadjustgonethrough
theexperienceofhavingtolearnhowtodoit,theywerereasonablywellabletodescribe
5
Thisatleastexplainsinpartwhyittakesacertainamountofpracticeforahumantobecomea
proficientbicycleriderintherealworld\.

whattheiralgorithmwas,andasitturnedouttheyhaddevelopedsimilartechniques,based
oncarefullyadjustingtheangleatwhichthebicycleisleaning\.
Basedonthereportsofthehumans,thetwo\-neuronnetworkcontrollerwasimplemented,
anditworkedalmostimmediately,havingfewparameters,andnotbeingoverlysensitive
toanyofthem\.Thismethodisthesubjectofthenextsection\.
5TheTwo\-NeuronNetwork
Herewepresentatwo\-neuronnetworkwhichcanoperatethebicyclecompetentlyovera
rangeofspeeds\.Theoutputofthefirstneuronisfedintothesecondneuron,whoseoutput
isconnectedtoanactuatorwhichappliesthespecifiedamountoftorquetothehandlebars\.
Asinputstothenetwork,weprovidethedesiredheadingθ
d
,aswellasthecurrentheading
θandthedegreetowhichthebicycleiscurrentlyleaningγ,alongwiththeirderivatives
˙
θ
and˙γ\.
6
Duetothenatureoftheproblem,wewilluseanetworkthatiscontinuousbothintimeand
invalues\.Aunit’soutputisdeterminedsimplybyathresholdingfunctionofaweighted
sumoftheunit’sinputs\.Ifnecessary,thiscanbeinterpretedasamean\-firing\-ratemodel,
butwewillnotexploreissuesofnetworkrealismhere\.Giventhatsuchasmallnetwork
ofthistypesuffices,thereseemslittledoubtthatmorerealisticnetworkscouldsolvethe
problemaswell\.
Thetaskforthenetworkwillbetomakethebicycletravelinthedesireddirection\.This
canthenbeusedbyhigher\-levelplanningsystemstomakethebicycleheadtowardsagoal,
ortofollowapathbyheadingtowardsasequenceofwaypoints\.
Inordertosetthebicycle’sheadingθasdesired,weneedtobeabletocontrol
˙
θ\.Weknow
fromfigure3that
˙
θisstronglyrelatedtoγ,theamountthebicycleisleaning,sowecantry
tocontrol
˙
θindirectlybysimplycontrollingγ\.
7
Tocontrolγ,weneedcontrolover˙γ\.Andindeed,ouractuator,whichcanexertadesired
torqueonthehandlebars,happenstohavereasonablecontrolof˙γ\.Thereisnotadirect
orfixedcorrespondence,butasageneralrule,duringstableriding,ahigherclockwise
torqueonthehandlebarswillcausethebicycletostartleaningmoretotheleft\.Thus,by
settingthetorqueaccordingtohowwewouldlikeγtochange,weshouldbeabletohaveγ
convergetowardsitsdesiredvalue\.\(Notethatitdoesn’tmakeabigdifferenceiftheactual
convergenceisjusttowardssomeapproximationofγ’sdesiredvalue—theexactdesired
valueisnotcriticalforthismethodofcontroltosucceed\.\)
Thefirstneuroninourcircuitwilloutputthedesiredγ,withthenonlinearitybeingapplied
sothatthebicycledoesn’ttrytoleantoofarover\.
8
Thesecondneuroninourcircuitwill
outputthedesiredtorquetobeappliedtothehandlebars\.
Thefirstneuronwilltakeasinputsθandθ
d
\(whichonecanassumetobewithin±πofθ,
6
Actually,aswewillsee,thenetworkdoesnotevenneedtouse
˙
θ\.
7
Oneofthereasonsthatcontrollingγworksisduetotherealisticbicyclegeometry\.Realbicycles
aredesignedtobestable,whichallowsaridertoridewithoutholdingthehandlebars,simplyby
controllingtheamountthebicycleisleaning\.Wenotethatonetypicalimportantfactorinstabilityis
thattheaxisofrotationofthefrontforkshouldpassbelowthehubofthefrontwheelbutaboveits
pointofcontactwiththeground,afeaturewehaveduplicatedonthevirtualbicycle\.
8
Althoughmostofusdonothavedirectexperiencewiththis,bicyclescanbecomequiteunstable
iftheyareinastateofextremeleaning\.Inreallife,usuallythewheelsskidoutfromunderusbefore
thispointisreached\.Inthissimulator,skiddingdoesnotoccur,butsincethiscontrollerspecifically
avoidsstatesofextremeleaning,ittherebyavoidstheproblementirely,regardlessofwhetherthe
problemisthatofslippingorthatofbecomingunstable\.

althoughthisisnotessential\),andsimplycalculatethedesiredchangeinheadingθ
d
−θ,
multiplybyaconstant,applyathreshold,andthenoutputtheresult,whichwewilldenote
byγ
d
\.
Thesecondneuronwilltakeasinputsγ
d
,γ,˙γ,anditsownoutput,whichwewilldenote
byφ\.Itwillcalculatetheamountbywhichγshouldbechanged,γ
d
−γ,andcomparethat
toaconstanttimesthecurrentrate˙γatwhichγischanging\.Thedifferencebetweenthese,
howmuch˙γshouldbeadjustedby,isthenscaledandsentasthecontroller’soutputτ
h
of
howmuchtorquetoapplytothehandlebars\.
γ
d
=σ\(c
1
θ
d
−c
1
θ\)
τ
h
=c
2
γ
d
−c
2
γ−c
3
˙γ
Figure4:Theequationsforthetwo\-neuronnetwork\.σdenotesathresholdingfunction\.Thethree
constantsc
i
needtobesetbytheimplementorinlightofthebicycle’sstabilitycharacteristics,but
thenetwork’sbehaviorisnottoosensitivetotheirprecisevalues,soitisactuallyquiteeasytogeta
workingnetwork\.
6Results
Thetwo\-neuronnetworkcontrollerdoesremarkablywellatcontrollingthebicycle,aswe
canseeinfigure5\.
Figure5:Thepathtakenbythebicyclewhentoldtoaimatthesuccessivewaypointsshown\.Each
timethebicyclegotwithinacertaindistanceofitscurrenttargetwaypoint,thetargetwouldchange
tobecomethenextwaypointinthesequence\.Thisdistancecanbeseenasthedistancebetweenthe
lastwaypointandtheendofthebicycle’spathshown\.Theirregularityinthewritingisduetothe
author’smessyhandwritingwhentryingtowritewithamouse,andisnotthefaultofthebicycle\.
Althoughthetwo\-neuronnetworkcontrollerworkswellforarangeofspeeds,onething
thecontrollerdoesnotdoistotrytodampentheinstabilitiesthatcanarisewhenriding
tooslowlyorintoosharpofaturn\.\(Thiswouldprobablyrequireathirdneuronthatis
dedicatedtothistask\.\)
7FutureDirections:MoreAutomatedLearning
Thefuturedirectionsofinteresttoushavetodowithusingthissystemtounderstandmore
aboutlearning\.Thereareotherinterestingfuturedirections,suchasbuildingarealbicycle
robot,whichweprobablywillnotpursue\.

Oneobviousthingtodowiththissystemistohaveitlearnandtuneitsparameterswith
experience\.Ideally,ifthesystemisplacedonaslightlydifferentbicycle,itshouldquickly
9
learnhowtosuccessfullyoperatethenewbicycle\.
Thisnetworkwasdesignedinanad\-hoc,iftraditional,way\.First,ahumantriedtocontrol
thebicyclewiththesimulator\.Aftermanyattempts,thehumanfinallybecameasomewhat
skilledoperatorofthebicycle,abletoavoidfallingdownbutstillnotabletoheadreliably
towardsadesiredgoal\.Nonetheless,thehumanatthispointwasabletodescribethekey
parameterswhichwerebeingattendedto,andbasedonthis,thetwo\-neuronnetworkwas
designed\.Theskillofthisnetworkimmediatelyexceededthehuman’sskill\.
However,wewouldliketotakethehumanoutofthisloop\.Wewouldlikethecomputer
tobeabletofigureoutonitsownwhatsimplenetworkmightwork,usingaminimumof
experience\(i\.e\.,aminimumnumberofcrashesbeforemasteryofthebicycle\),andusing
nodetailedknowledgeofthephysicalsystem\.Ourattemptssofarhavebeenstatistically
based,andhavenotbeenverysuccessful\.Wefeelthatweneedtohaveacausalmodel
forwhatisobserved,wherethedirectionofcausalityispartofwhatisobserved\.Causal
networks\(beliefpropagationnetworks\)mightbeagoodrepresentationforsuchknowledge,
buttherealissueishowtohavesuchanetworkbeautomaticallydesignedforus\.Itisour
opinionthatageneralsolutiontothisproblemwouldhavemanyapplications\.
Acknowledgments
IwouldliketothankShukiBruckandErikWinfreeformanyhelpfuldiscussions\.This
workwassupportedinpartbythe“AlphaProject”thatisfundedbyagrantfromthe
NationalHumanGenomeResearchInstitute\(GrantNo\.P50HG02370\)\.
References
\[1\]LearningtoDriveaBicycleusingReinforcementLearningandShaping,Jette
RandløvandPrebenAlstrøm,PROCEEDINGSOFTHEFIFTEENTHINTERNATIONAL
CONFERENCEONMACHINELEARNING,\(ISBN:1\-55860\-556\-8\),1998,pp\.463–
471\.
\[2\]ControlforanAutonomousBicycle,NeilH\.GetzandJerroldE\.Marsden,IEEE
INTERNATIONALCONFERENCEONROBOTICSANDAUTOMATION,1995\.
\[3\]DescriptorPredictiveControl:TrackingControllersforaRiderlessBicycle,D\.von
Wissel,R\.Nikoukhah,F\.Delebecque,andS\.L\.Campbell,PROC\.COMPUTATIONAL
ENGINEERINGINSYSTEMSAPPLICATIONS,Lille,France,1996,pp\.292–297\.
\[4\]SteeringControlSystemDesignandImplementationofaRiderlessBicycle,Chi\-Da
Chen,andC\.C\.Tsai,JOURNALOFTECHNOLOGY,vol\.16,no\.2,pp\.243\-251\(July
2001\)\.NSC89\-2213\-E\-005\-052\[Note:Ihavebeenunabletolocateacopyofthispaper,
butIamincludingtheinformationIhaveonithereanyway\.\]
\[5\]AccurateandEfficientSimulationofRigidBodyRotations,SamuelR\.Buss,JOUR\-
NALOFCOMPUTATIONALPHYSICS,vol\.164,no\.2,pp\.377–406\(November2000\)\.
9
Beforefallingoverevenonce,onewouldhope\.