Tutorial03

From Osgrid Wiki
Jump to: navigation, search

Troubleshooting your installation setup.

Before we start I would like to mention that there is difference between things you need to know, to run your own regions, and things this grid has no control over. If your ISP does not allow servers, and blocks the traffic, none of us can help you. If you cannot login to your router, we cannot help to setup a simulator either.

This class is advanced level. You need to have read understood and executed the steps in the previous course.

 * Goals for this class : 
 * learning to troubleshoot your local environment when setting up a simulator 
 * collecting the data to pinpoint whats failing
 * Learn to identify and correct installation failures, network addressing problems etc.  

If all went well, you’re happy and on your region. Unfortunately not all environments are alike. So yours might need fine-tuning, and might have crashed with some red error message, and closed before you could even read what it said. Welcome to being a server host.

We differentiate two types of problems, crashes on startup, and inaccessibility. Now since we prepared well, and you did all in the previous tutorials, this excludes a bunch of potential issues.

Still a quick Checklist :

  • windows is up to date
  • you downloaded and installed the latest version of .NET framework, (or Mono) and latest OSG version (not opensimulator, OSG download, from the Osgrid webpage !).
  • You have access to your router and have properly setup port forwarding TCP and UDP
  • You run in IPv4. Mind there is NO support whatshowever for IPv6 and enabling it can cause problems teleporting towards your simulator.
  • You know how to operate your Security suite on PC and any other 3rd party “tools” / windows firewall and allowed services and opened appropriate ports.
  • You have a static IP address set on your PC, which equals the Internal IP parameter in region.ini and the IP to which the router sends the traffic.
  • Your external IP matches that noted at Externalhostname in regions.ini
  • If you have a dynamic public IP from your ISP, consider a free no-ip account, or verify your WAN IP before starting the simulator.
  • Crashes at 1st run are good indicator of a mis-configuration. The most common ones being typo’s, coordinates being already in use, ports in use, database configuration failures, or names / UUID problems.
  • After a reset/restart of your router (power outages etc.), check the if IP addressing still matches your region.ini (You'd be amazed how many networks stop from vacuuming. Unplugged, restart, DHCP hands out a fresh new IP address -> portforwardings are potentially misconfigured, loose cables etc. ).
  • Errors on resolving hostnames means DNS is not resolving your URL / DNS name to your simulator. Try switching it to your public Ip address (You can ping the DNS name and see if it resolves properly to your IP ).
  • If using Dynamic DNS, see if your router has support for it, and put credentials in there. Generally works much more stable than a software cliënt.

If you now think ? huh software client for dynDNS, i don’t have that… consider one, especially if name resolution fails every other few restarts, if the IP’s mismatch .

Google for DUC ( Dynamic Update Client ) it should get you going more stable.

Mind that OpenSim.log in your BIN folder always logs all console output, so you can read back what happened. Just open it in notepad++ and read. Not all messages are “cryptic”. Use this new Tool to test your setup : simulatortoolhelper

Caveats

OSG has some procedures that can make you start pulling hair if you’re not aware of them, especially at a 1st install which seem to complete fine at first. If your region is started fully, and you kill/terminate the console with the red X instead of the "shutdown" command, your region doesn't de-register with the map. It thus keeps the name, UUID and Map spot occupied until you purge it manually. If you're troubleshooting, and you reinstall and don't use the same region uuid and do use the same region name / location of something that still exists on the map, you will thus keep crashing. Killing a sim with X whilst its writing data in your system, can corrupt your SQL database so always use the backup command in the console. To solve/get around it, login to your account on http://www.osgrid.org, click account / my profile and purge your region from the map before you start OpenSim.exe again. If you don't have an account there, you're reading the wrong guide ! Also If the server is started, you can minimize the console but Leave it ON. It physically IS the server, if you shut it down after booting, there is nothing to connect to!

Congratulations, by now you must be looking at that great black server console which is ready for your commands: yourregionname>_ and if you try to find it on map, you can see it!

Yay, connect and your good to go. Connecting, connecting.... pfff. Teleport failed.

No worries. open a browser and go to http://www.canyouseeme.org fill in port 9000 (9000 is the default tcp port) and click check port. If it fails to see your server, the port is blocked. ( mind this page only tests on TCP connections, so it can only test if your server is up, and if the TCP port to it is open and properly forwarded Your UDP connection to the region can thus still fail).

This failure to see your service can have several reasons :

  • You have a firewall that does not allow any inbound traffic to your PC
  • You have not logged in to your router yet, and opened TCP and UDP port 9000 towards the correct internal IP address (the one of your pc).
  • You did open ports, but missed there was another router in front of yours that also needs port forwarding ( Double NAT ). A good indicator for this is when your router seems to have an “internal” external IP address. ( one starting with 172.16. / 192.168. or 10. )
  • Some security suite / software is interfering
  • You don't have permissions to run a server and its is being blocked upstream (ISP).
  • You run on an old / loaded WIFI network / machine that just isn’t capable of providing sufficient bi-directional bandwidth / RAM to keep the connections going.

Now a gazillion routers exist, each with a variety of firmware they can use to operate. It is simply impossible to describe all inner workings of every router / firmware, as each manufacturer uses its own. Network protocols are the same globally, but the names defining them may vary. Now what you DON'T want to set are Port filtering or port triggering, if you have these features, make sure they're off, or at least not filtering the ports you configured. You need Port Forwarding.

Also do not change settings of which you have no clue what they do. Changing UDP timeouts or MTU values can be helpful for people in certain environments, but normally this should not be required, so don’t try to fix stuff that isn’t broken ! Typically such are workarounds for issues that shouldn’t be present. Another caveat is that many routers use tick-boxes to enable rules, which you have to save afterwards with a button. If you configure all properly and don't save or enable the rules you just wrote, well, it can be less intuitive than it looks every now and than…

Ok, now http://www.canyouseeme.org can see your service but i still can’t connect ? Each server uses a TCP port, each region on it uses a UDP port. The TCP port is defined in bin/opensim.ini, (under [Network] http_listener_port) the UDP ports in the bin/regions/region.ini

So Check if you didn't forget to forward both TCP and UDP. Same for the firewalls etc. (some routers fail if they use a setting for BOTH) in such case put them on separate ports TCP 9000, and UDP 9001 ( mind to change it in the region.ini if you do). Also mind that canyouseeme only tests TCP ports. So it will not respond on 9001 from your region, as that is using UDP !

Ok, all my ports are open, and all people can connect but i can't.

Bummer. It means your router either has NAT Loopback disabled, or it doesn't support it. In the first case, just enable it or look for a different firmware that supports it. In the 2nd case, call ISP or buy a new router. Many VOIP gateways will be able to support it, but have it disabled by default. Mind to always try to exclude any other options (static routes, multiple gateways) and point of failure before considering your router just doesn’t support NAT loopback. It’s a nice scapegoat, but really in 95% of the cases something’s just mis-configured. If you tested thoroughly, and you know it’s a loopback issue;

What is NAT Loopback ( also referred to as Hairpin NAT / NAT reflection) and why is it needed to host a public Opensimulator Region?

Currently a hosted region on a home connection with a broadband router needs, what is known as NAT Loopback functionality. Many DSL routers/modems prevent loopback connections as a “security” feature. This means that a machine on your local network (e.g. behind your DSL/router/cable/modem/firewall) cannot connect to a forward facing IP address (such as 99.99.99.99) of a machine that also exists on your local network.

Connecting directly to the local IP address (such as 10.0.04) of that same machine works fine.

This is an issue, since each region has to specify an IP address for the client to connect. This is the ExternalHostName parameter in a region.ini file (e.g. bin/Regions/Regions.ini). In the absence of NAT loopback, if a forward facing IP address is specified (such as 99.99.99.99) then external clients will be able to connect to the region but clients on your local network will not. If the internal address were put in ExternalHostName instead (e.g. 10.0.0.4) then viewers on the local network will be able to connect but viewers from an external network would not.

Now that’s a nice text but let’s see what’s really happening on a packet level ( example) :

[Router: 10.0.0.1] 
[Computer1: 10.0.0.3] 
[Computer2: 10.0.0.4]

Outside IP: 99.99.99.99, with port 9000 forwarded to Computer 2
-->Computer1 sends a packet to Router  with address  [10.0.0.3 -> 99.99.99.99]
Router uses NAT to change the destination on the packet to 10.0.0.4 and pushes it back to the local network

-->Router responds to to Computer2  [10.0.0.3 -> The address of 99.99.99.99 is 10.0.0.4]
Computer2 attempts to respond to the packet by directly sending it to the source IP. 

-->Computer2 to Computer1 on port 9000  [10.0.0.4 -> Hello 10.0.0.3:9000]
( See, it removed 99.99.99.99 as destination address and put 10.0.0.4 instead  Why ? ,It’s how it works, how it is designed to be efficient and logical. The destination is “next” to him, no sense in  routing this traffic via an internet gateway. To visit the toilet , you don’t first  climb out a window to than walk in the front door to goto the bathroom).  

-->Computer1 has a server listening on external IP 99.99.99.99:9000:   WTF?
Computer1 was listening on port 9000  expecting a reply / incoming packet  from 99.99.99.99, got one addressed to 10.0.0.3 instead. 
And since addresses don't match, this imposes a connection failure, so a RST packet is sent back to the client. 

The “security” problem with this is that traffic could theoretically escape monitoring as this takes place at “switch points” typically when traffic switches from one interface to another. Most routers however, allow this as a “feature”, or allow "virtual servers" to get around this. Routers that Do support NAT loopback simply don’t change the outside address on the packets for this type of connections, and thus don’t fail. All “matches” and works.

Then there’s some “ types” of ISP’s :

1. Allows you running servers, does not block or filter at all, gives decent device that supports all, but that’s all, for rest you need to figure it. (configure properly, and have fun)

2. Allows all, but provides in cheap ass wontong hardware with limited features, and does not support anything. ( ditch hardware, buy something decent, and have fun)

3. Does not or limitedly allow servers ( only opens ports 80, 443 and 25 for instance ) and tell you you need a static IP and expensive subscription to go with that for a server. (switch to another ISP, or find a host / VPS)

4. Is an advanced ISP on fiber etc that also provides VOIP, TV etc, and a non supporting modem/ router. ( Disable NAT so that effectively becomes a modem, and put a loopback supporting device behind if possible ).


Workarounds (and that means what it says, not a permanent happy simple fix, and has limitations) :

These are all hacks that require some more extensive knowledge of networking and the ability to install and configure adapters and configuring LMhost files. You need to ask yourself if it’s worth starting to run a server behind hardware that does not support it.

Install KM Test NAT loopback adapter ( device manager / action / add older hardware / select browse, / Microsoft / network adapter / KM ).
A full tutorial on how to configure this is found here ; https://wiki.osgrid.org/index.php/Nat_Loopback

Now if the viewer and server are on the same machine, you can login, and outside world can too. The rest of your local network cannot access, as they have no other route to your machine than though the hardware that does not support it.In some cases people can login to their own sim, bot not teleport there. Since this works fine for all external people it's something you need to live with. (Login to " home" is a viable option, and better than " no region at all ).

If all the ranting above are way too technical and you really don't feel getting into this yourself, your not alone. Techstuff is not everybody's idea of fun. Our advice would be to consider to either buy hardware that allows you to operate a server, or rent a hosted region. Good performance, no hassle, and not expensive generally. The host takes care of the technical stuff, and you can build / relax.

But I already installed the loopback adapter !

Ahh. So try uninstalling it. (Also don't forget to remove any changes in LMhosts if you made any) Once done, restart your PC, boot your regions and try to teleport. “Fixing stuff” that isn't broken can lead to unexpected results.

I also cant connect to my local server on / from wireless.

That’s why a cabled connection is preferred. Wireless isn't great for a region server. (with loads of bi-directional traffic, possible interference, packet loss). It can be done however. Performance completely relies on your router firmware / signal / load.

Just keep in mind that most routers consider wireless traffic to be "unsecure" by default, ( if it's the case it will generally be on a different subnet ) so you should set a static IP, and make appropriate port forwarding to the wireless network, instead of the Local network.

If the router has an internal firewall, it might need additional firewall rules to allow traffic from your wireless into your local network. Some routers do not allow servers in their wireless zone (no options to configure) . In some cases you can override this by putting their IP address in the DMZ.


I have other issues that i think are problems with my viewer, i have those issues everwhere in OSG ?

Have a 64 bit system ? Use a 64 bit viewer
TP problems to all regions  ? Remove all attachments. Try a different viewer.
Feeling lonely ? login to osgrid.org and see where your Friends are under “Users”
Being a cloud and rebakes don’t help ? Remove all. Skin outfit, everything. Some corrupt texture is likely preventing your avatar from rendering properly.
Clear the cache of your viewer
In your C:\Users\Username\AppData\Roaming folder there is a cache storage for every viewer and account ever used. Delete those “viewer” folders and all will be set to default, with a clean cache.
Secondlife viewers might not be compatible as they use Havok Physics
i have other errors not listed here. Try this page


Ok Somewhere along this document you told me to look in that OpenSim.log

I’m looking at it but i don’t understand a word of it. True, a lot won’t make sense as it are databases communicating with each other. But a lot does. Read the errors below. The more you read them, the more you will learn. Many of those messages make sense, and Google will in 99% have seen it before. ( you can easily be no1 though, as Osgrid is “the” test bed for opensimulator). Meaning you can run into a bug, and OSG would very much like to hear from it if so.

Console outputs :

   2016-05-25 15:09:58,125 WARN - OpenSim.Region.ClientStack.LindenUDP.LLUDPServer [LLUDPSERVER]: No packets received from child agent of Foxx    Bode  for 60000ms in LBSA Plaza. Disconnecting.

Foxx connected to the region on TCP, but his teleport expired in transit. Looks like the UDP port is closed, or his traffic timed out for some reason.

   2016-05-25 15:09:59,780 WARN -openSim.Region.CoreModules.World.WorldMap.WorldMapModule [WORLD MAP]: Bad send on GetMapItems Error: ConnectFailure (Connection timed out) 

Your sim cannot connect to some neighbor. No worries. this is normal logging, no headaches for you.

   2016-05-25 15:10:59  Exception: System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.

Oops, your console just crashed. Do you have enough resources to run a viewer and a region at the same time ? You recently added something scripted ? The last line you will see after this is "Application is terminating: True" . OpenSim.log will possibly indicate / point out what’s wrong..

   2015-05-25 15:11:59 WARN - Failed to load plugin openSim.Region.Framework.Interfaces.ISimulationDataStore from OpenSim.Data.MySQL.dll with args datasource=localhost;Database=MYDATABASENAME;UserID=NoROOThere;Password= sillypassword;Old Guids=true....

Either your MySQL database service is not working, or you didn't plan to run MySQL, but did muck around in gridcommon.ini. Revert those changes. By default OSG installs SQLite as database. If you were planning on running mysql, you've mis-configured something.

   2016-05-25 15:12:19,980 WARN  -XMLRPC-GROUPS-CONNECTOR : ERROR:
   2016-05-25 15:12:29,980 WARN  -XMLRPC-GROUPS-CONNECTOR : Key error
   2016-05-25 15:12:39,980 WARN  -XMLRPC-GROUPS-CONNECTOR : must specifiy Group ID or name

It's a bug. Don't worry, it will be fixed. http://opensimulator.org/mantis/view.php?id=6928

   2016-05-25 15:12:55 - [grid connector] - Registration Failed. Region overlaps another region when contacting http://grid.osgrid.org/grid

change location on the map in your region.ini. something is already at the spot your trying to install to. If this is a previous incarnation of your own region, purge it from account like described earlier.

    2016-05-25 15:12:56 - [ENTITY TRANSFER MODULE]: Could not resolve external hostname reddragon.madfoxx.nl  for region 1234567890 (66666, 66666). System.Exception: Unable to resolve local hostname reddragon.madfoxx.nl innerException of type 'System.Net.Sockets.SocketException: The requested name is valid, but no data of the requested type was found. 

The DNS name of the region cannot be found / is not pointing towards a valid host. Try changing the external hostname in your region.ini to the public IP, instead of the hostname.

   2016-05-25 15:12:57 - [REGION DB]: MySQL error in ExecuteNonQuery: Incorrect string value:'\xx1\x11\x11 AA...' for column 'Text' at row 1
   2016-05-25 15:12:57 - [SCENE]: Storing of item, 00000000-1234-abcd-1234-0000000000 in xxxxxxxx failed with exception Incorrect string value: '\xx1\x11\x11 AA...' for column 'Text' at row 1 at MySql.Data.MySqlClient.MySqlStream.ReadPacket()

No worries. UNIcode is not enabled in the database / grid causing this error when it reads an "illegal" character (Chinese, or Cyrillic for instance). It's off on purpose as this prevents lag on TP's. It is supported though. If you reset the offending script after the region is started, it will display the requested symbol.

   2016-05-25 15:12:57 - [APPLICATION]: APPLICATION EXCEPTION DETECTED: System.UnhandledExceptionEventArgs Exception: System.Net.Sockets.SocketException: Address already in use at System.Net.Sockets.Socket.Bind (System.Net.EndPoint local_end) [0x00000] in <filename unknown>:0

You have specified a TCP or UDP port that is already being used by some other process / program / simulator. Your public IP adress has 65000+ x 2 (TCP and UDP) other ports available, but you can only use a port for 1 process at a time. So either kill that other process, or change http listener ports in opensim.ini / and your region port in regions.ini and make appropriate port forwarding in your router. Make sure to choose a port that is not in use on some other machine. Never use ports below 1024, these are reserved for other processes on your machine !

   APPLICATION EXCEPTION DETECTED: System.UnhandledExceptionEventArgsException: System.Exception: Could not load an ISimulationDataService implementation from OpenSim.Services.Connectors.dll:SimulationDataService, as configured in the LocalServiceModule parameter of the [SimulationDataStore] config section.

This is usually due to a typo or other configuration error in one of your ini files. Most likely your gridcommon.ini has an error in its SQL settings or your OpenSim.ini does not have any architecture defined.

17:37:58 - [REGION DB]: MySQL error in ExecuteNonQuery: Packets larger than max_allowed_packet are not allowed.
17:37:58 - MySql.Data.MySqlClient.MySqlException (0x80004005): Packets larger than max_allowed_packet are not allowed.
  at MySql.Data.MySqlClient.MySqlStream.SendPacket(MySqlPacket packet)
  at MySql.Data.MySqlClient.NativeDriver.ExecutePacket(MySqlPacket packetToExecute)
  at MySql.Data.MySqlClient.NativeDriver.SendQuery(MySqlPacket queryPacket)
  at MySql.Data.MySqlClient.Driver.SendQuery(MySqlPacket p)
  at MySql.Data.MySqlClient.Statement.ExecuteNext()
  at MySql.Data.MySqlClient.PreparableStatement.Execute()
  at MySql.Data.MySqlClient.MySqlCommand.ExecuteReader(CommandBehavior behavior)
  at MySql.Data.MySqlClient.MySqlCommand.ExecuteNonQuery()
  at OpenSim.Data.MySQL.MySQLSimulationData.ExecuteNonQuery(MySqlCommand c)

Your packet size in SQL is too small. (go to your MYSQL folder in ProgramData (NOT Program Files) and search for the my.ini file. In there you can find the max_allowed_packet setting. It is set on 4MB by default, you can set it to 32MB. Afterfards reboot your machine. ( the whole PC, not just the simulator).


As you can see, loads of things can fail, and where the software works for the majority, it sometimes fails on specific configurations. OSG doesn’t hate you, your not having bad luck, it’s just finding and tweaking your settings in such a way it can operate. When your up and running you will go and take a close look at OpenSim.ini and configure a lot more eventually. It’s a bit of a learning curve, and there’s some pitfalls. Despite it taking a lot of searching en experimenting, opensim is pretty well documented. So there’s almost always a solution out there somewhere. Don’t give up and the reward is very satisfactory.

Your world, Your imagination. But for free this time.


Below here i will post a checklist of stuff you can quickly browse through, to troubleshoot the most common issues. No elaborate how and why, just steps to try. Checklist :

   windows is up to date
   you downloaded and installed the latest version of .NET framework, and latest OSG version (not opensimulator, OSG download, from the Osgrid webpage !). 
   You have access to your router and know how to setup port forwarding
   You know how to operate your / windows firewall and allow services
   You have a static IP address set on your PC 
   If you have a dynamic public IP from your ISP, consider a free no-ip account, or verify your WAN IP at boot.
   After a reset/restart of your router (power outages etc.), check the IP addressing. You'd be amazed how many networks "stop" from vacuuming....
   crashes at 1st run are good indicator of a mis-configuration
   Errors on resolving hostnames means DNS is not resolving your URL / name to your region. Try switching it to public Ip address
   If using Dynamic DNS, see if your router has support for it, and put credentials in there. Generally works much more stable than a software cliënt. 
   If you now think ? huh software client for dynDNS, i don’t have that… consider one, especially if name resolution fails every other few restarts, if the IP’s mismatch . Google for DUC ( Dynamic Update Client ) it should get you going more stable. 

If simulator is running but you're unable to access :

   can you access other regions, can other users access yours ?
   can the page http://www.canyouseeme.org  see your service ( if so, TCP  works )
   check& validate port forwarding in your router. Both IP's and ports need to be correct. ( TCP and UDP )
   check & validate your port configuration and ip addresses in region.ini
   check & validate the port in opensim.ini
   check & validate firewall software is properly configured / off ( same for other security suites ) 
   No NAT routers ( with a public IP only ) set 0.0.0.0 as internal address in region.ini. Only if that fails use the external one. 
   Routers with a built in firewall, can require additional configuration apart from the port forwarding. 

If the console gives errors

   about not being able to reach stuff from the grid, try adding 8.8.8.8 or 4.2.2.2 or any other public DNS server. ( it’s in DNS tab of your network interface card ). Mind to add them as reserve DNS, as most ISP’s require you to use theirs. 
   Double-check opensim.ini regions.ini and gridcommon.ini for mis-configurations.
   Depending on the error, move you region away from others. Test in open sea, with no neighboring regions. 
   Do NOT put you region within 2 spaces of a megaregion, and especially not against it ! it can crash the both of you, and / or give errors and failures. 
   Try picking a spot in open sea, with no neighbors. You can always move it later.
   Read the error. If a map spot is occupied, it will tell you, if a script crashes, or your connection times out, it will show in both the the console and the logs. 
   Everybody has problems  ? check if there is something wrong with grid services:  http://www.osgrid.org/index.php/sitestats
   Do NOT use INI files from a previous install. Things might have changed, and this now causes you near undefinable headaches. Manually reconfigure them to your needs. ( GridCommon.ini / OpenSim.ini / Regions.ini )

Specific UDP problems :

   Use static addresses and static forwarding. 
   In rare cases, where all else fails split the TCP and UDP ports make them all unique. For example use TCP 9000 and UDP 9001-9005  TCP 9006 UDP 9007 - 9012 etc.  It should not be required, but in some router firmware’s this works, where it else fails. The advantage you have by doing this is bigger when you run multiple regions on multiple servers. If one fails, you can easily troubleshoot it’s IP and port. 
   With hardware firewalls check the logs for spoofs and  UDP floods. If it’s blocking those, it could very well be killing your traffic. 
   Some ISP’s are very pro active and do this for you. Mind they never heard of some obscure service called OSGrid or opensim on port 9000, and could very well block it. You can just request them not to. Also trying different ports can do magic.  Some ISP’s block incoming traffic on certain ports / ranges, to avoid people to run open proxies etc. and since 8080 and 9000 are widely used for open proxy’s some ISP’s don’t allow those ports. Just change to something else 12345 will do fine. Just don’t use any ports below 1024 as those are “reserved” for certain services, and possibly cause your region to fail. Ports 49152 - 65535  are “unreserved” in most environments.  Mind to not forget to change ports in both the router, OpenSim.ini ( the http_listener_port is the TCP port ) and regions.ini ( 1 UDP port for each region ) if you change them.