Technically a VXML application is just plain text. Any web server is capable of serving the pages. For example, I've built VXML applications using ASP.NET and IIS as well as Python and Apache.
What speech platforms do (be they OCS of some other) is use a voice browser to “view” the page. Voice browsers work on the same principle as Firefox or IE; they simply use the phone to render the UI rather than a monitor. There is a lot more too it but that is the gist of it.
OCS Speech Server isn't required to build a VXML application but it does add some valuable features to Visual Studio 2005. If you’re going to be using OCS to deliver the application then I would recommend installing Speech Server.