re:Edit

Zakamutt · June 23, 2016

Holy colors, batman. Okay, this is most likely actually readable to you since your monitor is probably using a high color temperature, but I personally use f.lux at the lowest color temperature I can get. The text is basically unreadable without highlighting it.

Anyway, I actually thought that "?!" was the standard rather than opinions being split. I definitely agree with you in your usage recommendations, though; there are so many !? I'd have preferred to be ?! in various translations...

I must also add that "interro" comes first in "interrobang", and thus it clearly makes more sense for the question mark to be first as well.

Darbury · June 23, 2016

6 minutes ago, Zakamutt said:

Holy colors, batman. Okay, this is most likely actually readable to you since your monitor is probably using a high color temperature, but I personally use f.lux at the lowest color temperature I can get. The text is basically unreadable without highlighting it.

Ouch! I sometimes forget that some people on Fuwa are using white-on-black color schemes. Thanks for letting me know.

I've updated the text so it shouldn't be an issue anymore.

tymmur · June 23, 2016

I love your blogging about issues I haven't really considered, but are actually quite important. On top of informative, it forces me to stop and think, which is a good thing.

Like with your last entry (about quote marks), I have to point out that most VNs use the character encoding cp932. ‽ isn't included in that one, meaning while we might see it in the wild, it will not be inside VNs, at least not those using Japanese engines. It might appear in engines using utf-8 (those claiming international support). Doesn't look like solving !? or ?! with just a merge to ‽.

Darbury · June 23, 2016

7 minutes ago, tymmur said:

Like with your last entry (about quote marks), I have to point out that most VNs use the character encoding cp932. ‽ isn't included in that one, meaning while we might see it in the wild, it will not be inside VNs, at least not those using Japanese engines. It might appear in engines using utf-8 (those claiming international support). Doesn't look like solving !? or ?! with just a merge to ‽.

Thanks! And you'll be happy to hear I wasn't suggesting using ‽ in this situation, just bringing it up as a fun bit of trivia. No one uses the actual interrobang — unless you live in Portland, of course. In which case, ‽ is probably your name.

Nosebleed · June 23, 2016

It really is up to the person writing them, but for me personally I go by this logic: If it's a question (which is the case 99% of the time), the ? should have priority and appear first, and the ! comes after to give it an emphasis, kind of like an áccent.

Darklord Rooke · June 23, 2016

I really don't like the interrobang, it's just terrible

Darbury · June 23, 2016

2 minutes ago, Rooke said:

I really don't like the interrobang, it's just terrible

But what about the ampercent? Or the dollarpound? Or hyphentheses?

Darklord Rooke · June 23, 2016

5 minutes ago, Darbury said:

But what about the ampercent? Or the dollarpound? Or hyphentheses?

In the future, if any of those become real I know who to blame

Darbury · June 23, 2016

20 minutes ago, Rooke said:

In the future, if any of those become real I know who to blame

You're welcome. (Still working on the dollarpound.)

Darklord Rooke · June 23, 2016

OH MY GOD!

tymmur · June 23, 2016

ᵯ ⋛ ␛ ☃ ☠ ☭ ♋ ꈚ ꊪ

Unicode is fun. There is more or less everything there, even Fuwa users. Hello @Rooke ♜♖

ௌ <-- that's one single character

I didn't find the dollarpound sign, but I did find this ₠. It's the original Euro sign. They then changed it to € before they actually printed the money, but nobody came up with the idea not to include to old one in unicode.

Or more on topic, here are 4 characters:

⁇ ⁈ ⁉ ‼

If you claim there are 8 characters on that line, then you better stop drinking now. They are only 4 according to unicode.

Darbury · June 24, 2016

16 minutes ago, tymmur said:

⁇ ⁈ ⁉ ‼

And do you know what those are used for? Chess commentary. I kid you not.

Fred the Barber · June 24, 2016

Hmm. This is one where I have to disagree with you, Darbury. To me, consistency is paramount, even over the subtle additional shade of meaning you might get by carefully considering and ordering the ? and ! based on that.

My preference is to pick one and go with it. I don't actually care which one, but my fingers do type ?! more naturally than !?, so in the absence of any better reason, I'd probably go with that one.

But the best option might be the direction Rooke is going: avoid the issue entirely. Most VNs are filled with more punctuation than a balloon is air - they can benefit from a little puncture and subsequent depunctuation. Drop a bunch of bangs and replace them with periods (or with nothing, if they were previously part of an interrobang), and you can usually end up with something that's probably better overall anyway. Especially if, in doing so, you get to reduce all the ridiculous double-bangs down to single bangs.

Darbury · June 24, 2016

1 hour ago, Fred the Barber said:

My preference is to pick one and go with it. I don't actually care which one.

I'm totally behind that. My own approach is overkill, of course, but it makes my editing OCD happy.

1 hour ago, Fred the Barber said:

Especially if, in doing so, you get to reduce all the ridiculous double-bangs down to single bangs.

So, um, we're still talking about punctuation here, right?

Zakamutt · June 24, 2016

For the love of love, stop typing stuff in Word or other themes or whatever you are doing

Darbury · June 24, 2016

1 minute ago, Zakamutt said:

For the love of love, stop typing stuff in Word or other themes or whatever you are doing

That would be the result of trying to use the awful text window that pops up in the mobile version of the site. Yay for responsive design?

Zakamutt · June 24, 2016

8 minutes ago, Darbury said:

P.S. - This is for you, Zakamutt. ENJOY!

I was about to report this to the mods, but then you... all of a sudden... At this point, all that's left to say are...... ellipses............

Darbury · June 24, 2016

1 minute ago, Zakamutt said:

I was about to report this to the mods, but then you... all of a sudden... At this point, all that's left are...... ellipses............

That's just... cruel...

Zakamutt · June 24, 2016

What can I say... I learnt from the best

Fred the Barber · June 24, 2016

1 hour ago, Darbury said:

So, um, we're still talking about punctuation here, right?

Darbury, when are we not talking about punctuation?

tymmur · June 24, 2016

9 hours ago, tymmur said:

⁇ ⁈ ⁉ ‼

I looked more into those and the more I look at it, the stranger they get.

?? <-- two characters

⁇ <-- one character

They look sort of the same, so I figured if display isn't the main trigger, is text size something? Assuming we use utf-8 (the most common unicode encoding), then ? will use one byte (it's standard ascii). However ⁇ is not a standard ascii and is a multi byte character. To be more precise it uses 3 bytes. This mean even though the number of characters is cut in half, the amount of bytes in the text is increased by 50%. That goes for all 4 of them.

If we switch to utf-16, which is used once in a while, no character can use less than two bytes, meaning the two ? characters will use 4 bytes combined. However due to lower overhead in utf-16, ⁇ has a two byte encoding, meaning in this case the number of bytes is reduced by 50%. In this case it could make sense for size, but the size is increased for every standard character, which will likely eat up way more than is saved this way.

This leaves the question: when is it beneficial to use those combo characters? Looks like they can be used for chess, but that isn't the same as it would be beneficial to do so from a technical point of view. I find parts of unicode to be silly and this certainly seems to be characters we could do without.

On this topic of character byte size. Japan tend to stick to shift-jis/cp932 because they use two bytes for write kanji/kana. Utf-8 use 3 bytes, meaning a 50% increase in text size. That is most likely the major reason why VNs tend to require Japanese locale. Utf-16 however can write Japanese characters using just two bytes and I suspect as windows becomes more and more aimed at unicode that VNs will start using utf-16. That would really be a gift to translation efforts because it will remove the issue of "character not present in cp932".

Fred the Barber · June 24, 2016

3 hours ago, tymmur said:

On this topic of character byte size. Japan tend to stick to shift-jis/cp932 because they use two bytes for write kanji/kana. Utf-8 use 3 bytes, meaning a 50% increase in text size. That is most likely the major reason why VNs tend to require Japanese locale. Utf-16 however can write Japanese characters using just two bytes and I suspect as windows becomes more and more aimed at unicode that VNs will start using utf-16. That would really be a gift to translation efforts because it will remove the issue of "character not present in cp932".

UCS-2 support, which I'm pretty certain has every character a VN would need, has been pretty well baked-in to Windows for an eternity, in software terms; I'd expect that Windows 2000 had everything you'd need to do a VN engine in UCS-2. Most likely, the reason they're sticking to the old multi-byte pages is that nobody wants to update old VN engines, because it's a waste of money and time to try to fix things that aren't broken. And even if/when someone builds a "new engine", they almost certainly pull over a ton of code from a previous one anyway, in all likelihood bringing that multi-byte dependency along with them.

tymmur · June 24, 2016

The problem with UCS-2 is that it was pulled from the unicode standard in 2011 and as such Microsoft no longer officially supports it. While it might work, they no longer even mentions it in their documentation. If you want to tell windows to write code using UCS-2, then you need to get info elsewhere on how tell windows it is UCS-2 and it would be a really poor business situation to use it because it might be broken in the next windows update and Microsoft will likely be like "not supported => won't fix". UCS-2 is most likely good enough for all VNs in more or less all languages. However it should be pointed out that for the most part, UCS-2 and UTF-16 should be more or less interchangeable for two byte characters (can't verify this, documentation is gone).

CP932 has the benefit of using either one or two bytes, which mean smaller scripts. It use a single byte for Latin letters and two bytes for... well everything not in standard ascii (some katakana use one byte). It's the encoding, which provides the smallest files for mixed English/Japanese writing. While it may not look like it at first glance, English writing does matter in an all Japanese release. It's used for pathnames, including names of voice files. There can be so many of those that it adds up that they are half size of what they would be in UCS-2/UTF-16. This mean from a pure size issue, CP932 is a winner.

Reusing old code is certainly an aspect of it, but there is more to it as well. The encoding is dead simple to handle if you write custom code for it. Musumaker is hardcoded to CP932 encoding. When reading a line, it checks the hex value of the first character, or to be more precise, it checks the most significant bit of the first byte. If it's set, it's a line to print in the text box. If it isn't set, it will be a command to execute (like stop bgm, add sprite or whatever). Very easy to code and works well with Japanese text. Works horribly with English text and could very easily be a nightmare to write for a new encoding. Even if they were to rewrite the engine from scratch, they would likely stick to using this approach because "it's fast to write and execute and it works".

Fred the Barber · June 25, 2016

UCS-2 is a strict subset of UTF-16, so anything that works for UCS-2 will just work with a UTF-16 interface, as long as there aren't any 4-byte characters or wonky assumptions about the surrogate pair characters, which, really, I wouldn't put past anybody... programmers love taking shortcuts, as your MSB horror story so clearly indicates. I'm guilty of writing some C++ code that stashed a flag in the low bits of a pointer, so that I didn't have to allocate a separate context object just to hold one boolean flag and a pointer. It'll probably work forever, and if something surprising enough ever happens that makes it not work, well, it won't be my problem - I left that company a few months ago :sachi:

Sign In

24 Comments

Recommended Comments

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment