InWorldz Tech Blog

Where your Dreams are our Vision!What goes on behind the scenes

Tech Blog

Sep
14

Internet scale architecture and InWorldz

Whether it is dedicated servers, virtual servers/Virtual Machines, or a cloud computing host, there is no escaping that as a business grows on the internet they will need to utilize more and more resources toward supporting the core of their operations. These core services need to stay up and running at all costs, and need to be able to scale out depending on the demand. Any failures in these core services will cause total downtime.

Without proper support of core business processes a business on the internet is doomed to fail under high growth, or temporary high load scenarios. To application engineers this doesn’t become apparent until they scale out past the single server level, or have a write scaling issue that a traditional database cluster can’t help with.

With that said, I present a rough diagram of the InWorldz core architecture for Q4 2011 and into the future.

Important services to our customers include:

  • Asset Cluster:  (4 servers) Access to assets which represent all the textures, sounds, scripts and other low level data in world.
  • Inventory Cluster: (4 servers) Access to the structure of your inventory, the way your folders and items are laid out. The name of, and permissions on your items in inventory.
  • Region Data Cluster: (3 servers) When your region restarts, where all the “stuff” comes from to bring it back and where all the changes are stored.
  • Simulation: (lots of servers with Virtual Machine partitions) The pool of servers that powers the actual simulation on your region. These are the machines you actually connect with when you log in.

You’ll note there are 13 servers that power just the InWorldz, LLC core. The first cluster in the diagram holds in world assets. Our asset systems run custom written software that help to load balance and make sure assets are available at high speeds to our customers. These assets are stored in a format that is fast to write and easy to back up.

The inventory cluster will be running a NoSQL solution to support heavy reads and writes as well as a three server replication level. This means we can lose up to two servers out of the cluster simultaneously and not have data loss that would result in having to restore a backup. As inventory requirements grow, so does this cluster simply by adding servers and maintaining a data balance. This is the same type of technology used by larger internet sites such as FaceBook and Twitter and it will be employed at InWorldz to provide you with fast and redundant inventory access.

The region data cluster will utilize a high performance SAN and a sharded MySQL installation on a high availability virtual machine software platform.

All this is meaningless without good hardware and a good host.

All of these servers run on our own gigabit network with the core of the datacenter running at 10 gb/s. All of our servers have dual power supplies connected to fully redundant batteries and backup generators and we have a scalable connection to the internet.

Recently San Diego had a major power outage that affected 1.4 million of SDG&E’s customers. Our host Cari.Net is one of those customers. But we didn’t notice at all, and didnt have any hiccups in service. This is because the power infrastructure at Cari.Net is second to none with redundancy all the way to the supply, two huge generators that keep everything running, and a fuel delivery service to keep those generators pumping until the power comes back on. (for more details click here)

Big PowerWe are proud to have a partner working with us that feels the same way we do about quality.

Why does InWorldz need all this “stuff”?  Because we’re thinking big, not small. We want to power your dreams, not your catnaps. We’re looking into the future where we see a need for supporting vast concurrency on an interconnected grid with thousands of concurrent visitors. We’re looking into a future where virtual world technology is so important to some that it can not bear frequent or long lived downtime and outages.

Could we run the grid much cheaper? Yes we could. Would it sacrifice quality? Absolutely, and unacceptably so. As a company that is in this for the long haul we have no intention of trying to be the cheapest flavor of the month. We’re putting serious thought and investment into our network, our partners, and our people to ensure we can grow. We believe we run at a price point that is sustainable for growth and allows us to heavily invest in R&D and quality of service for the future.

There will be bumps in the road while we work towards this ideal, and it is no doubt a long road. There will be continuous learning and changing but in the end a persistent vision of a reliable, dependable world will be realized. At InWorldz, we don’t treat virtual worlds as games. We know they represent far more than that to the people that use them.

Jul
26

Non-Physical Land Racer on Phlox

Enjoy!

Jun
06

The Phlox Virtual Machine

My intention for this blog is not only to display some of the work that InWorldz has done to our grid, but also to share some of what we’ve learned and hopefully get aspiring geeks, junior geeks, and certified geeks excited about virtual worlds and all the possibilities they bring. These articles will be long but I hope they can be of some value to you.

Along with the technical challenges we have to meet comes a lot of hard work, but also a lot of fun, and a lot of learning. This article is the beginning of a multi part series on the construction of the Phlox script engine.  I’m hoping that by presenting this material it can spark an interest in software technology and how it can be applied to something fun like a virtual world, to something serious like a domain specific language for bank transactions.

Normally with a series on script engines I would start from the bottom at the parser and lexer.  However, I think that starting at the end in this case will give some perspective on what the lower components are trying to accomplish. Therefore I’d like to introduce you to the component called the virtual machine. Let’s break down those two words:

Virtual: Existing in essence or effect though not in actual fact
Machine: A device having parts that perform or assist in performing any type of work

So by definition a virtual machine in the context of computing is a machine that performs the work of a physical computer but doesn’t actually exist. Much like a virtual world, a virtual machine has a lot in common with the real thing. It is expected that if you feed a virtual machine instructions it will perform operations in a similar manner to a real, silicon based, honest to goodness computer.

You may not know it, but you probably use a virtual machine somewhere in your everyday life. If you utilize InWorldz as your virtual world host, you have actually been using two different virtual machines to work and play, even before phlox was ever released.

At the first level, our software and OpenSim proper runs on top of one of the flavors of the CLR or Common Language Runtime. Originally developed by Microsoft and then also implemented as an open source product called Mono, this platform powers a great deal of software available today. The virtual machine behind both of these runtimes understands a language called CIL or Common Intermediate Language (more information here). The CIL bytecodes are the instructions that tell the machine how to run the OpenSim software.

Running below the InWorldz server software is actually another virtual machine that pretends to be real hardware to the operating system. This allows InWorldz to easily change server configurations, add and remove RAM and even processor cores. These virtual machines are out of the scope of this article but are related in many ways to the types of VMs we’ll talk about.

Outside the realm of InWorldz, virtual machines are also used in portable devices. If you happen to have an android powered phone in your pocket, you are also using  a virtual machine called Dalvik (really geeky stuff here) which efficiently runs all the apps you’ve decided to download this week.

Dalvik, Phlox, and the CLR all mirror their physical counterparts not only in function, but also in design. In your desktop computer at the lowest levels your CPU has support for storing data in registers and on something called the stack. Likewise, there are two types of virtual machine designs. The first is a stack machine, and the second is a register machine. Each of the two designs comes with trade offs in runtime performance, and implementation complexity. Registers.. Stacks.. what are these things and what do we need them for?

In a virtual machine, (as well as a real machine) registers and the stack are used to store temporary results that are needed while we complete a problem. Though both stack and register machines can do the same types of work, we’re going to concentrate on stack based machine examples, since Phlox is a stack machine.

Each problem given to a machine is reduced to it’s simplest form and then the machine solves and combines the results of many small problems into a solution for the whole thing.  A very simple example follows:

You need to design a machine and can add an unlimited amount of numbers together. Work with the sample 1 + 2 + 3 + 4 to come up with a solution.

The first thing we’ll notice is that there is no limit to the number of terms we might need to deal with. We cant just design a machine that can only add exactly four numbers. We need to deal with any amount. If we were to design this adding machine, we would have to add two of the numbers together at a time, and then put the result somewhere. Then we would need to add that result with the next term and store that result somewhere and so on until the addition problem was completely solved and we had a final answer.

[1 + 2] + 3 + 4
1 + 2 = (3)

[(3) + 3] + 4
3 + 3 = (6)

[(6) + 4]
6 + 4 = (10)

[(10)]

Using this method and a stack to complete the example problem above would look something like this:

Push the number 1 onto the stack    (stack: [1])
Push the number 2 onto the stack   (stack: [1, 2])
Pop the first two numbers off of the  stack and add them [1 + 2]. Push the result of the addition (3) onto the stack (stack: [3])
Push the next number 3 onto the stack (stack: [3, 3])
Pop the first two numbers off of the stack and add them  [(3) + 3]. Push the result of the addition (6) onto the stack (stack: [6])
Push the next number 4 onto the stack (stack: [6, 4])
Pop the first two numbers off of the stack and add them [(6) + 4]. Push the result of the addition (10) onto the stack (stack: [10])
The result of the entire operation remains at the top of the stack. (10)

As we can see, a simple adding stack machine would need instructions for pushing variables and constants onto the stack, and adding stack members together. Each instruction we added to the machine would be given a unique number. This number would allow the machine to read through the stream of instructions and perform the simple operations. When defining these numbers, which we call bytecodes we may also give them a name that makes sense to us as people. Instead of referring to the add instruction as instruction #1, we might simply call it “ADD”. Likewise for pushing an operand onto the stack, we wouldn’t want to say “use instruction #2″ we would simply want to say “use PUSH”. This leads us to a representation of our source code where instructions can be listed out in the lowest level human readable form before actually being assembled into the raw numbers the virtual machine will use.

The bytecode for our simple adding machine example above could be:

push 1
push 2
add
push 3
add
push 4
add

The Phlox virtual machine has an instruction set very similar to this, but with a lot more operations to be able to deal with all the features of LSL and general purpose programming languages. When you input LSL source code into the text editor on InWorldz and press save, a lot of really neat things happen. First the lexer and parser makes sure that the source code is formed correctly and is grammatically correct. Then Phlox performs semantic checks to make sure you’ve not mixed up types, that you haven’t passed the wrong number of arguments to a function, and other similar mistakes. Once these tests pass, the compiler will compile your script into Phlox bytecode some of which we can see below:

Everything above the ^Z in the image is the LSL source code. Below it is the byte compiler output for the simple program that adds 4, 5 and 6 together. The form of the phlox bytecode is very similar to our simple adding machine in this case, but we can also see that Phlox has additional responsibilities over and above the adding machine.

  • .globals Tells the Phlox VM how much space we need for global variables. In this case there is one global variable, i, hence the line .globals 1
  • .statedef Defines all the states that are in this LSL script. If there were more states, there would be more .statedef lines
  • iconst Pushes an integer constant onto the top of the stack
  • iadd Adds the top two integers on the stack together
  • gstore Removes the top operand on the stack and stores it in globals memory (in this case, the variable i)
  • halt Tells the machine that this script is suspended until it receives another event
  • .evt Tells the Phlox VM about an event present in this script. In this case we only have one, which is state_entry and it is part of the default state. It takes no arguments (args=0) and there are no local variables defined inside of the event handler (locals=0)
  • ret Returns to the caller. When called from an event, the script is suspended until another event is triggered.

Every time you press save on a script, this intermediate form is generated before finally compiling down to the machine level bytecode.

InWorldz chose to design a virtual machine for many reasons, but the biggest one comes down to control. The first problem when you compile a script to native code as was being done before Phlox, you lose all scheduling control to the operating system. You can not choose how many instructions will be executed for each script and scripts can run forever in a loop consuming an entire operating system thread. This is bad because each of those threads also comes with a memory penalty that is equivalent to it’s stack size (usually around 1 MB) and too many threads trying to run at the same time will cause the processor to make tons of context switches and eat up valuable CPU time that you would really rather be using for the rest of the simulation than your avatar’s animation override.

If compiling to C# (as was done on InWorldz pre-phlox)  your best bet for ensuring a script does not run too long is to generate a yield return trampoline that emulates coroutines (http://www.replicator.org/node/80). Essentially you’re adding in the C# yield return keyword at specified points in the program, say after every few lines of code. This solution would mostly work for allowing you to better control the scheduling of your scripts and would prevent a script from eating up an operating system thread, but it makes the design of the script engine very cumbersome. Every function now needs to return an enumerator type instead of its normal return type, all the compiled code needs to be able to deal with these and other changes, and you still haven’t solved all the requirements of an LSL environment.

A big part of LSL is that your scripts need to be able to have their state retrieved and saved no matter what code they’re currently running. Even if your AO is in a tight loop selecting various animations and recursively calling functions, this must not interfere with the ability to retrieve the script’s state for a save. At any time we need to be able to pause a script, grab the values of all the global as well as local variables in the script, and preserve the current call stack no matter how deep into a recursive call you are. When you compile down to C# and run the code through the CLR, your access to this information has been revoked. Your script could be JIT compiled and be running native code. Even if you could retrieve and fully save it, the native callstack from one machine’s scripts will not translate to another machine. You’ve lost control of very important information and an incomplete state save here will break the script. The only options here are to wait for the script to exit any events before taking a state save which could take an arbitrary amount of time, or insert some voodoo magic into the runtime you’re using, and hope that it’s Mono where you have the source and can add these changes. This also puts you in a position to have to continuously update your voodoo whenever Mono makes changes leading to extra burden on development.

The design of Phlox separates the runtime state and the compiled script code into separate independent pieces. Using a custom built virtual machine, we always have everything we need to do a complete state save available to us. Even if your script is in an infinite loop calling functions we can suspend it at any time and know that the current state of the script will be entirely preserved. That makes region crossings with active scripts smooth and straightforward. This design also enables bytecode sharing, whereby loading the same script 40 times in 40 separate objects only results in one copy of the script in memory.

The virtual machine also allows us to easily track the amount of memory being used by a script, and kill it if it goes over it’s quota (currently 32kb). Memory usage for each script is minimized by small bytecode since we don’t add features we don’t need. State saves are compact and quick.

Phlox and virtual machines have allowed us the freedom we needed to create an LSL environment that is fully compliant with your expectations. We can easily add features in the future and quickly fix bugs due to the architecture and can expand the runtime environment to make sure our residents can make their dreams come to life. All of this thanks to a machine that thinks it is, but isn’t actually a machine.

Apr
29

Phlox Compatibility

When designing Phlox grammar and semantics we wanted to offer the highest compatibility with the majority of scripts that are already in existence. Most of these scripts have their origin outside of InWorldz and follow strict typing rules.

Somewhere along the line in OpenSim’s design, the LSL types were coded in such a way that just about any type was automatically convertible to any other type. We call this implicit type casting, or implicit conversion. Some types should be able to be implicitly promoted to other types. A good example is an integer to a float.

float f = 1;

We expect that since no loss of information will  occur, and both types are numeric that this code will compile and run as expected which it does under the phlox engine with no warning.

During testing we found a small minority of scripts that were relying on other implicit casts that were not valid for LSL. These include “anything to string” and “anything to list”. The following will compile on our old script engine, but not in SL or on Phlox:

llSay(0, “hello!” + [integer expression]);

This leads us to why strictly typed languages can help in certain situations. For example, what would a beginner scripter expect from the following:

llSay(0, “1″ + 2);

To a beginner this would look very ambiguous. Is the answer “3″ 3 or “12″. That depends on the rules of the language, but should never have compiled in the first place without a cast:

llSay(0, “1″ + (string)2);

That makes it clear we’re performing a string concatenation and not addition. Some languages, like PHP, solve this problem by having different operators for string concatenation vs arithmetic. LSL does not and so relies on semantic rules to try and prevent situations like this.

Good static typing also prevents passing the wrong parameters to functions:

f (string s, list l) ..

string ms;
string s;
list msl;

f(s, ms);

Under the current script engine, ms will be implicitly converted to a list and passed to the function. In this case the author probably meant to use msl. This is an actual case we found when testing scripts. Variable names have been changed to protect the innocent.

Thus far the vast majority of scripts have not been affected by this issue. If you scripts were created in SL and ported over, or run both here and in SL, then the chances of being affected by this incompatibility are very slim. If your scripts were written completely in InWorldz I recommend that you test them on the InWorldz sandboxes (which now are running Phlox) or the beta grid. The majority of scripts are up and running with one or two casts put in.

We’re choosing to move forward here because I don’t see the point in having a statically typed language that has very weak typing rules. That is just making people write out typenames with no purpose.

InWorldz has gathered some of our talented scripters together to help in this transition and to find any other problems with the script engine during the test rollout. If you have any questions or issues with scripts the Phlox Task Force is here to help.

Apr
22

InWorldz Phlox – Welcome center load test

Many times when you see benchmarks on opensim technology you get load tests with bots and no primitives on a region. At InWorldz that’s not really representative of our every day world, so when we want to load test a new branch of code we call upon the help of our residents who never fail to push our software to the limit

Tonight we tested Phlox on our 11k+ prim welcome center with everyone wearing AOs, their favorite tiny avatars, even a unicycle. The results were awesome. We were all able to move around, chat, dance and have an awful lot of fun for over 40 minutes while other avatars were coming and going. We got to 67 avatars having fun on our welcome center with much overhead available to have even more people stop by. It should also be noted that this is an unoptimized build for debugging.

Eventually we hit a network timeout calling into a grid service that took down the region. We’re taking a look at the stack trace and determining how to mitigate the problem.

 

Thousands of scripts running and the only lag we had on the region was when we had a couple avatars teleporting into the region, and on the client. This demonstrates the reason we took on this project and why it was so important. Being together is so important and we want to make sure everyone can gather with their friends and enjoy their world.  No clouds required.

Join us for our upcoming europhlox party. Check our forums for more information.

Apr
13

Phlox – Vehicle Crossing

Smooth sim crossings and vehicle crossings now being tested on the InWorldz Beta Grid

Apr
05

Get top scripts.. Yes really

Sampling each script for 30 ms and calculating the results. This is actual execution time from running scripts tracked during each script’s time slice.  Here we see three scripts in loops taking all of the sim’s available script time.

These statistics are coming to beta grid tonight. Lets see who’s scripts on beta are hogs  :)

Jim Tarber.. Why am I not surprised that he of all people would have the killer fish

Mar
31

Phlox Beta Day 1, Conclusion: Awesome

Sunset on the beta grid taken by Jim Tarber on Twitpic

InWorldz Phlox Rox!

Today we ran the first beta test for the phlox script engine. It started off a bit rocky with the server not being ready on time and finding the first few bugs right away. After we got that sorted out though, wow, what a difference.

We had many different scripts running, loading, and unloading and never had the sim stats drop at all. Stats remained excellent throughout the run and we are definitely seeing a huge improvement in both runtime performance and memory usage for a large amount of scripts. No matter how many timers or loops were thrown at the engine we experienced zero lag and great interactivity. This meets our design goals and InWorldz is about to feel like a completely different place!

We need you!

InWorldz needs more of our scripters to be present during the week to test their scripts. Our next official get together will be April 2nd at 12 pm InWorldz time but you are free to login to the beta grid at any point and test your scripts.

If you find any bugs, use our mantis at http://inworldz.com/mantis, be sure to choose the inworldz phlox beta project all the way on the top right before submitting a bug.

Many thanks to all that took the time and waited for us to get beta up today. We appreciate your excitement and help testing this very important project.

Currently there is one issue about backwards compatibility I’ll be bringing up and asking the opinions of our residents shortly. Until then, see you at our birthday party and the next beta!

Edited: April 2nd, not March 2nd

 

Mar
29

Phlox – 40m maze generation

So I decided I wanted to see how quickly phlox could manage rezzing a 40 meter maze if I pulled the llRezObject delays. This is the result. The awesome thing is that with bytecode sharing each new script that is started for each maze wall is just another reference to the same bytecode and the only work the engine needs to do is create new runtime state and add the script to the run queue.

Mar
25

Imported scripts, no modification

The lauk’s Larrow running on Phlox

This morning I worked out a few of the last bugs in the script engine that I’m finishing up before we run the beta test. I imported the sparrow I created in another place and left the scripts and objects exactly how they were. No script modifications were required to get this 800 lines of script running with timings that work exactly how they do on the other grid.

Very exciting and happy to see the Larrow will have a home he will work just perfectly on.
Added another video

Older posts «

Site Navigation

search engine optimization