Jump to content

Hello, I am looking for help


Recommended Posts

Hello, because I had plenty of free time, I decided to try things, and one of them was to try to translate a novel
I use this application that I found there
https://github.com/xmoezzz/SiglusExtract/
I have a doubt and it is that when extracting the text it is done in .ss and .ss.txt files, the .ss.txt are understood while the .ss are encrypted or something and I don't know what they say, but when I touch the button Repack in SiglusExtract apparently works with .ss files that I don't know what to do with.
Can someone tell me what to do?

Link to comment
Share on other sites

15 hours ago, Entai2965 said:

What visual novel are you trying to translate?

I have plenty of free time and I wanted to try something new so I decided to try to translate something
It doesn't translate or anything, it just uses DeepL to translate.
The novel I'm trying is Naisho no Naisho
It's just a way to spend my free time
Yesterday I continued researching and found an app that opens and modifies the .ss, with that I advanced a little but... You have to copy and paste line by line, just thinking about how long the novels are I don't see myself doing CTRL+C CTRL+V for each line.
Isn't there a faster way other than line by line?

Link to comment
Share on other sites

On 3/20/2024 at 6:15 AM, zoeebe said:

Yesterday I continued researching and found an app that opens and modifies the .ss, with that I advanced a little but... You have to copy and paste line by line, just thinking about how long the novels are I don't see myself doing CTRL+C CTRL+V for each line.
Isn't there a faster way other than line by line?

What tool is that exactly?

You were right that the game uses the SiglusEngine, so you are on the right track and already did most of the work. Are you able to get the game engine to load the modified text? What about unmodified text?

Here is the VN: https://vndb.org/r28024

The game is 10-30 hrs. If you are just trying to learn, I would recommend a smaller title (<2 hrs) to make the project much more manageable for your first attempt at translating something. There are a couple ways of increasing the speed of doing things, including using Translator++, or other translation software like fileTranslate, but that requires that input and output work properly first. Does the game load the modified text?

Have you tried following these instructions? https://github.com/arcusmaximus/VNTranslationTools/issues/55#issuecomment-1157210970

Then translate the spreadsheet using DeepL or other translation software, then reinsert the text. Reinserting the text is just a matter if putting the translated text back into the files, so just write a small script can read from spreadsheets and put the text back before repackaging it with the tools.

What part are you stuck on exactly?

Edit:

Haha! It works!

full-Width-Charas-Only.png

Only for full width charas though. Not sure how to get half-width normal ascii working. Maybe a custom font or something?

Edit2:
According to the manual...
半角文字列は "" でくくります。

"" の中で \ を表示するには「\\」、ダブルコーテーションを表示するには「¥"」とします。

In other words, use double quotes for normal width/half-width characters like ascii.

"Hazuki best girl."R


also works and so does

"\"Hazuki best girl."\"R


The manual also gave a few more examples

"「Hello! Siglus」"R
"\"medical herb\"は\\1,000になります。"R


\\ is the yen symbol.

Are you planning on translating all the images and the config screen? If so, I found this interesting related thread full of all sorts of neat information. Apparently Siglus is a refinement of the RealLive game engine, so many RealLive formats are used in Siglus. The manual makes a reference to this as well.

Edited by Entai2965
as above
Link to comment
Share on other sites

19 hours ago, Entai2965 said:

What tool is that exactly?

You were right that the game uses the SiglusEngine, so you are on the right track and already did most of the work. Are you able to get the game engine to load the modified text? What about unmodified text?

Here is the VN: https://vndb.org/r28024

The game is 10-30 hrs. If you are just trying to learn, I would recommend a smaller title (<2 hrs) to make the project much more manageable for your first attempt at translating something. There are a couple ways of increasing the speed of doing things, including using Translator++, or other translation software like fileTranslate, but that requires that input and output work properly first. Does the game load the modified text?

Have you tried following these instructions? https://github.com/arcusmaximus/VNTranslationTools/issues/55#issuecomment-1157210970

Then translate the spreadsheet using DeepL or other translation software, then reinsert the text. Reinserting the text is just a matter if putting the translated text back into the files, so just write a small script can read from spreadsheets and put the text back before repackaging it with the tools.

What part are you stuck on exactly?

Edit:

Haha! It works!

full-Width-Charas-Only.png

Only for full width charas though. Not sure how to get half-width normal ascii working. Maybe a custom font or something?

Edit2:
According to the manual...
半角文字列は "" でくくります。

"" の中で \ を表示するには「\\」、ダブルコーテーションを表示するには「¥"」とします。

In other words, use double quotes for normal width/half-width characters like ascii.

"Hazuki best girl."R


also works and so does

"\"Hazuki best girl."\"R


The manual also gave a few more examples

"「Hello! Siglus」"R
"\"medical herb\"は\\1,000になります。"R


\\ is the yen symbol.

Are you planning on translating all the images and the config screen? If so, I found this interesting related thread full of all sorts of neat information. Apparently Siglus is a refinement of the RealLive game engine, so many RealLive formats are used in Siglus. The manual makes a reference to this as well.

The application I used is one that I took from the description link of this video combined with SiglusExtractor

SiglusExtractor extracts the .ss more easily and allows them to be easily read as .ss.txt as I already mentioned, in any case, the valid file is the .ss
With the video app I can edit the .ss, but it is done line by line, and, as you saw, the novel lasts hours, the editing works, the problem is the large number of lines that one by one I will finish in years.
Download the applications that are in the links you sent me and I'll see if I can do something with them.
At the moment I only plan to move forward with the translation of the text, later I will see if I do something else

Link to comment
Share on other sites

Humm. SiglusSceneManager just crashes for me on both Win 10 22H2 and Win 8.1 whenever I try to load a file. Maybe it works on Win 7 or with DEP disabled or an older .Net Framework version?

I used SiglusSceneDecoder to unpack the *.pck to *.ss and other files and SiglusCompiler to put everything back together. Instructions here.  The manual .chm also works so you can see the original instructions, but requires changing the locale to Japan (Japanese). The manual.txt is actually an HTML file. Rename to .html and open it in a web browser.

The *.ss files are regular shift-jis encoded text files that are meant to be parsed with the SiglusEngine and their compiler program. You can open them in notepad++ and edit them with the syntax described in the manual and the examples I provided. That would be about the same speed or faster than doing it line by line in SiglusSceneManager, but does require formatting the strings yourself. However, you can automate the formatting in Python very easily.

It would take you less time to learn how to automate string parsing in Python than to copy-paste stuff line-by-line using the SiglusSceneManager or Notepad++ for the next few years. Learning Python takes a weekend. But, if you can't be bothered, well, I guess I could do that for you I suppose.

Want some csv files?

Edit: I have been looking at this software recently. fileTranslate

Instead of figuring out how to parse the scripts as lines like normal, it supports regular expressions instead which may be more or less intimidating than learning Python. idk Use https://regex101.com/ to test the expression. Example.

Once you figure out the right regex, and it is also bug free, then it can be used with the fileTranslate software to extract all the strings into .csv files. Then you can batch translate them using either Sugoi or Translator++. Translator++'s DeepL translator does not work as far as I can tell, but it might work if you give the developer enough money. idk. I have not tested it much since I mostly use Sugoi using the repackage I created for batch translations. Translator++ 5.3B public also supports Sugoi, including the repackage.

Edited by Entai2965
Link to comment
Share on other sites

21 hours ago, Entai2965 said:

bra la expresión regular correcta, y también esté libr

 

22 hours ago, Entai2965 said:

Humm. SiglusSceneManager just crashes for me on both Win 10 22H2 and Win 8.1 whenever I try to load a file. Maybe it works on Win 7 or with DEP disabled or an older .Net Framework version?

I used SiglusSceneDecoder to unpack the *.pck to *.ss and other files and SiglusCompiler to put everything back together. Instructions here.  The manual .chm also works so you can see the original instructions, but requires changing the locale to Japan (Japanese). The manual.txt is actually an HTML file. Rename to .html and open it in a web browser.

The *.ss files are regular shift-jis encoded text files that are meant to be parsed with the SiglusEngine and their compiler program. You can open them in notepad++ and edit them with the syntax described in the manual and the examples I provided. That would be about the same speed or faster than doing it line by line in SiglusSceneManager, but does require formatting the strings yourself. However, you can automate the formatting in Python very easily.

It would take you less time to learn how to automate string parsing in Python than to copy-paste stuff line-by-line using the SiglusSceneManager or Notepad++ for the next few years. Learning Python takes a weekend. But, if you can't be bothered, well, I guess I could do that for you I suppose.

Want some csv files?

Edit: I have been looking at this software recently. fileTranslate

Instead of figuring out how to parse the scripts as lines like normal, it supports regular expressions instead which may be more or less intimidating than learning Python. idk Use https://regex101.com/ to test the expression. Example.

Once you figure out the right regex, and it is also bug free, then it can be used with the fileTranslate software to extract all the strings into .csv files. Then you can batch translate them using either Sugoi or Translator++. Translator++'s DeepL translator does not work as far as I can tell, but it might work if you give the developer enough money. idk. I have not tested it much since I mostly use Sugoi using the repackage I created for batch translations. Translator++ 5.3B public also supports Sugoi, including the repackage.

Ok, thank you very much, download all the links you provided me, I'll see what I do with them.
One of the ones I got before extracts the data to an Excel file, but I don't know how to turn the Excel into an .ss
I'm going to continue investigating with the apps from before and now
I'll see what I do, thank you very much

Link to comment
Share on other sites

14 hours ago, zoeebe said:

Ok, thank you very much, download all the links you provided me, I'll see what I do with them.
One of the ones I got before extracts the data to an Excel file, but I don't know how to turn the Excel into an .ss
I'm going to continue investigating with the apps from before and now
I'll see what I do, thank you very much

Sometimes I question my sanity, but then I realize I never had such a thing in the first place and the question ceases to bother me. Anyway, instead of eating food today, here is a tool I wrote to automate that process unique to this specific game.

Naishi no Naisho Translation Instructions

0. Backup the original Gameexe.dat and Scene.pck files.
1. use SiglusSceneDecoder to exact the .pck
https://codeberg.org/VisualArts/SiglusEngineOfficialKit/src/branch/main/SiglusTools
https://archive.org/download/siglusdevkit
2. use NaishiNoNaisho_ExtractInsertTool.exe to extract the translatable strings from *.ss to .csv
- For syntax help type NaishiNoNaisho_ExtractInsertTool.exe --help
3. translate and edit the text using openoffice, libreoffice, or onlyoffice. Put the translated entries in the 4th column.
4. use NaishiNoNaisho_ExtractInsertTool.exe to insert the translated text from .csv to the .ss files
- For syntax help type NaishiNoNaisho_ExtractInsertTool.exe --help
5. use SiglusCompiler, also available with SiglusTools, to repackage the ss back to Scene.pck
6. move the Scene.pck and Gameexe.dat files to the root directory of the game.
7. Repeat starting from step 3 until every .csv is fully translated and edited.

NaishiNoNaisho_ExtractInsertTool.exe can be downloaded from here: https://www.mediafire.com/file/8f9vh8ivjox6dbt

It is in the tools subfolder. When exporting the .csv's from office, make sure not to quote the text fields. For open office, I had to select Text CSV -> edit filter settings

Character set - utf-8
field delimiter - ,  (comma)
text delimiter - blank (nothing, empty cell)
Quote all text cells - unchecked
Safe as shown - checked
fixed column width - unchecked

Since I was doing this anyway and I needed to debug it, the link above also contains a partial English patch for just the first day of Naishi No Naisho to prove that I am definitely not insane in what I am saying.

Edit: This extra disclaimer does not apply if using the tool.exe directly since the real value is always used. This disclaimer only applies if using the .bat.

The scripts.bat will only work if at least one of the following is true:
1. Using a relatively modern version of Windows 10+ (~1809+) that supports native utf-8 in the console properly.
2. The current system locale is set to Japan-Japanese which allows the console to use cp932 allowing for certain japanese unicode characters to display in the console. The current code page can be checked by typing 'chcp' into the command prompt window.

There are some console encoding restrictions to keep in mind. Since the *.ss use unicode file names in explorer, the console must be able to display them in the command prompt to process them properly since the for loop enumerates them. That enumeration replaces their real value with ? for console code pages that cannot display the real value which non-command prompt native executables cannot read transparently. Example - dir * /b >> output.txt

For like.... cp437, this is not an issue if using the tool.exe directly since there is no enumeration, just a straight handoff of the file name to the executable. More information. However, enumerating them with like... dir *.ss /b forces the names into the current code page which means the undisplayable characters get changed to ? (instead of just their appearance), which is not the real file name. The command prompt language can handle converting them back, but handing that string with that looks like 7?28?.ss to an executable program means the program will assume '7?28?.ss' is the literal name which is incorrect.

Edited by Entai2965
Link to comment
Share on other sites

12 hours ago, Entai2965 said:

Sometimes I question my sanity, but then I realize I never had such a thing in the first place and the question ceases to bother me. Anyway, instead of eating food today, here is a tool I wrote to automate that process unique to this specific game.

Naishi no Naisho Translation Instructions

0. Backup the original Gameexe.dat and Scene.pck files.
1. use SiglusSceneDecoder to exact the .pck
https://codeberg.org/VisualArts/SiglusEngineOfficialKit/src/branch/main/SiglusTools
https://archive.org/download/siglusdevkit
2. use NaishiNoNaisho_ExtractInsertTool.exe to extract the translatable strings from *.ss to .csv
- For syntax help type NaishiNoNaisho_ExtractInsertTool.exe --help
3. translate and edit the text using openoffice, libreoffice, or onlyoffice. Put the translated entries in the 4th column.
4. use NaishiNoNaisho_ExtractInsertTool.exe to insert the translated text from .csv to the .ss files
- For syntax help type NaishiNoNaisho_ExtractInsertTool.exe --help
5. use SiglusCompiler, also available with SiglusTools, to repackage the ss back to Scene.pck
6. move the Scene.pck and Gameexe.dat files to the root directory of the game.
7. Repeat starting from step 3 until every .csv is fully translated and edited.

NaishiNoNaisho_ExtractInsertTool.exe can be downloaded from here: https://www.mediafire.com/file/8f9vh8ivjox6dbt

It is in the tools subfolder. When exporting the .csv's from office, make sure not to quote the text fields. For open office, I had to select Text CSV -> edit filter settings

Character set - utf-8
field delimiter - ,  (comma)
text delimiter - blank (nothing, empty cell)
Quote all text cells - unchecked
Safe as shown - checked
fixed column width - unchecked

Since I was doing this anyway and I needed to debug it, the link above also contains a partial English patch for just the first day of Naishi No Naisho to prove that I am definitely not insane in what I am saying.

Edit: This extra disclaimer does not apply if using the tool.exe directly since the real value is always used. This disclaimer only applies if using the .bat.

The scripts.bat will only work if at least one of the following is true:
1. Using a relatively modern version of Windows 10+ (~1809+) that supports native utf-8 in the console properly.
2. The current system locale is set to Japan-Japanese which allows the console to use cp932 allowing for certain japanese unicode characters to display in the console. The current code page can be checked by typing 'chcp' into the command prompt window.

There are some console encoding restrictions to keep in mind. Since the *.ss use unicode file names in explorer, the console must be able to display them in the command prompt to process them properly since the for loop enumerates them. That enumeration replaces their real value with ? for console code pages that cannot display the real value which non-command prompt native executables cannot read transparently. Example - dir * /b >> output.txt

For like.... cp437, this is not an issue if using the tool.exe directly since there is no enumeration, just a straight handoff of the file name to the executable. More information. However, enumerating them with like... dir *.ss /b forces the names into the current code page which means the undisplayable characters get changed to ? (instead of just their appearance), which is not the real file name. The command prompt language can handle converting them back, but handing that string with that looks like 7?28?.ss to an executable program means the program will assume '7?28?.ss' is the literal name which is incorrect.

Wow, incredible.
A specific application and all, you did it, right? Really thanks.
Well then I'll see what I do, although the truth is that yesterday investigating SiglusTools_v0.61 I found a way to convert Excel to .ss, the problem is that the translator that I used to translate the entire document even translates the name of the sheet, which then SiglusTools_v0.61 It could not find the .ss file that corresponded to the translation since it was looking for the file with the translated name.
I already did everything, by the way, I converse with Google Translate, my language is Spanish, and well, I already have the entire game in Spanish.
Poorly translated as expected from a translation app, but at least more understandable than Japanese, which are symbols that I don't understand.
Little details.
The text goes outside the election box (I don't care as long as I can read it)
The texts caused problems because they were extensive, I used a python script to reduce them, every repetition of 6 changes to 5 and once 6 is not repeated the 5 changes to 4 +...
Example
aaaaaa -> aaaaa -> aaaa...
Also the name of the character Ren was translated as Koi or as the word love, using another script I fixed this too
Anyway, I already have everything, it works for me poorly translated, I understand more than in Japanese at least
Now then I'll start playing the novel
Thank you very much for the help

Don't worry about the fact that you sometimes question your sanity, brother, we all question that, especially people like me who play loliges.
Thank you very much, I have already learned how to get a version, even if it is poorly translated, of Siglus Engine novels, a fairly used engine so it gives me the opportunity to use other novels that I find out there that use that engine.

Link to comment
Share on other sites

Are you planning on editing the text and publishing your patch here or translating the UI?

Normally, if you just want to play novels as MTL without translating them, there are tools like textractor, and a few others of questionable quality or origin, that can MTL them as-is along with repackages like the Sugoi Toolkit to make them more user friendly. There are some codes on vndb's discussions for hooking to this game properly as well and figuring out how to get DeepL working should dramatically increase the translation quality.

If you are not planning to release a Spanish patch, then I would recommend Textractor so you can use it for future titles.

Edited by Entai2965
Link to comment
Share on other sites

53 minutes ago, Entai2965 said:

Are you planning on editing the text and publishing your patch here or translating the UI?

Normally, if you just want to play novels as MTL without translating them, there are tools like textractor, and a few others of questionable quality or origin, that can MTL them as-is along with repackages like the Sugoi Toolkit to make them more user friendly. There are some codes on vndb's discussions for hooking to this game properly as well and figuring out how to get DeepL working should dramatically increase the translation quality.

If you are not planning to release a Spanish patch, then I would recommend Textractor so you can use it for future titles.

My plan is to play the novel, I have no reason not to share the translation, but it is done with a translator and therefore poorly, if I upload it I suppose people will complain, I am not complaining because I did it myself for myself knowing that was going to end badly
I know how to use Textractor, but that option is not feasible for me, I am from Cuba, here the internet is very expensive and DeepL is blocked, needing a VPN to open it, increasing internet consumption.
That's why I prefer to translate the entire novel, even if it is poorly translated, and play it with a patch from a bad translation, but play the novel offline.

Link to comment
Share on other sites

Textractor has a couple plugins that help with offline translation available here. There are a couple different options there including some paid software that does offline only translation.

For Spanish in particular, the default Sugoi model is only for JPN->ENG. I am not sure if it would be better or worse quality to do JPN->ENG->SPN or JPN->SPN. Translation software in general is very well developed for English, so it could go either way even if proxy translations like that are not ideal.

DeepL being blocked is uh... evil. I doubt Cuba is doing it though, but maybe DeepL has to comply with sanctions and trade restrictions in its legal jurisdiction or something?

Edited by Entai2965
Link to comment
Share on other sites

On 3/23/2024 at 4:54 PM, Entai2965 said:

Textractor has a couple plugins that help with offline translation available here. There are a couple different options there including some paid software that does offline only translation.

For Spanish in particular, the default Sugoi model is only for JPN->ENG. I am not sure if it would be better or worse quality to do JPN->ENG->SPN or JPN->SPN. Translation software in general is very well developed for English, so it could go either way even if proxy translations like that are not ideal.

DeepL being blocked is uh... evil. I doubt Cuba is doing it though, but maybe DeepL has to comply with sanctions and trade restrictions in its legal jurisdiction or something?

Yes, I suppose it is due to legal issues that block Cuba from DeepL.
English is worldwide and I understand it moderately and I am able to deduce the rest of the sentence. Sometimes... While playing Summer Pockets I never knew what "Fried rice" was and I had to go to Google to find out what it was "Arroz frito", but hey, I can deduce some words but if they talk about food it can be anything you eat .
The problem with the offline translator is... I don't know how to download it.
The link goes to Patreon, which is also blocked, but nothing that a VPN can solve...
The problem is where exactly do I download the Offline Translator from?
I see a lot of publications, and they are ordered by date, where exactly is the offline translator? and I see that some you have to pay to open them and pay... it is not an option for someone from Cuba, our currency has no value abroad.

Link to comment
Share on other sites

4 hours ago, zoeebe said:

The problem with the offline translator is... I don't know how to download it.
The link goes to Patreon, which is also blocked, but nothing that a VPN can solve...
The problem is where exactly do I download the Offline Translator from?
I see a lot of publications, and they are ordered by date, where exactly is the offline translator? and I see that some you have to pay to open them and pay... it is not an option for someone from Cuba, our currency has no value abroad.

The page I pointed to lists 3 options for offline translation. Sugoi, LEC, and Atlas. Once the Textractor plugin is downloaded, the software is available

1. Sugoi has a repackage I made available here which is available on Mega and has additional links for the latest model on other hosting providers. The licenses for everything are at the bottom. The software itself is open source, but the model itself is redistributable but only under a non-commercial license. The software required to run it that went into the repackage is available from github. See the thread for links. Mega and github are not blocked right?

2. LEC might refer to LEC Power Translator. Description - Power Translator is advanced machine translation software capable of translating text, chat, emails, web pages, documents, pdfs, jpgs and scans in English to/from Brazilian, French, German, Italian, Portuguese, Spanish, and Russian. Commercial software. Not redistributable.

3. Atlas might refer to Fujitsu's ATLAS which has been discontinued. Commercial software. Not redistributable.

Edit: In order to produce the patch, I used the UI in my repackage.

But if you just want the Sugoi Toolkit to work for offline stuff, then you can download it from the links I provided in the "Sugoi Toolkit" section. If Patreon is blocked, then perhaps the internet archive link works? There is also this pixeldrain link for Sugoi Toolkit 8.0 that was posted on f 95 zone.

Edit2: Oh! I forgot to mention I solved that repeating character issue. By 'I solved' I mean that I asked an expert and had them do it for me and then I copied their regex. Here is a demo at regex101.

In regex

1. find the characters [a-Z] in the following string
2. For any of those characters that repeats more than 3 times
3. reduce the maximum number of repetitions to a maximum of 3

__code__

([a-z])\1\1\1+/gi, $1$1$1

__code__

So in javascript, it would be...

__code__

const regex = /([a-z])\1\1\1+/gi;

// Alternative syntax using RegExp constructor
// const regex = new RegExp('([a-z])\\1\\1\\1+', 'gi')

const str = "fahuo aaaa dislgjsd oooodfuabfaaaaaagdggs iiiii \n"
"AAAAAAAA BBBBBBBB hhfasfuahaaaaaHHHHHHH OOOHHHHHHHH";
const subst = `$1$1$1`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);
__code__

In Python it would be...

__code__

try:
    import regex
except:
    # https://docs.python.org/3/library/re.html
    import re as regex

my_string='fahuo aaaa dislgjsd oooodfuabfaaaaaagAAAAAAdggs iiiii'

#(?i) means case insensitive and global is enabled by default for re.sub
new_string = regex.sub( r'(?i)([a-z])\1\1\1+', r'\1\1\1', my_string)

print(new_string)
#prints: fahuo aaa dislgjsd ooodfuabfaaagAAAdggs iii

__code__

Edited by Entai2965
Link to comment
Share on other sites

On 3/25/2024 at 1:11 PM, Entai2965 said:

The page I pointed to lists 3 options for offline translation. Sugoi, LEC, and Atlas. Once the Textractor plugin is downloaded, the software is available

1. Sugoi has a repackage I made available here which is available on Mega and has additional links for the latest model on other hosting providers. The licenses for everything are at the bottom. The software itself is open source, but the model itself is redistributable but only under a non-commercial license. The software required to run it that went into the repackage is available from github. See the thread for links. Mega and github are not blocked right?

2. LEC might refer to LEC Power Translator. Description - Power Translator is advanced machine translation software capable of translating text, chat, emails, web pages, documents, pdfs, jpgs and scans in English to/from Brazilian, French, German, Italian, Portuguese, Spanish, and Russian. Commercial software. Not redistributable.

3. Atlas might refer to Fujitsu's ATLAS which has been discontinued. Commercial software. Not redistributable.

Edit: In order to produce the patch, I used the UI in my repackage.

But if you just want the Sugoi Toolkit to work for offline stuff, then you can download it from the links I provided in the "Sugoi Toolkit" section. If Patreon is blocked, then perhaps the internet archive link works? There is also this pixeldrain link for Sugoi Toolkit 8.0 that was posted on f 95 zone.

Edit2: Oh! I forgot to mention I solved that repeating character issue. By 'I solved' I mean that I asked an expert and had them do it for me and then I copied their regex. Here is a demo at regex101.

In regex

1. find the characters [a-Z] in the following string
2. For any of those characters that repeats more than 3 times
3. reduce the maximum number of repetitions to a maximum of 3

__code__

([a-z])\1\1\1+/gi, $1$1$1

__code__

So in javascript, it would be...

__code__

const regex = /([a-z])\1\1\1+/gi;

// Alternative syntax using RegExp constructor
// const regex = new RegExp('([a-z])\\1\\1\\1+', 'gi')

const str = "fahuo aaaa dislgjsd oooodfuabfaaaaaagdggs iiiii \n"
"AAAAAAAA BBBBBBBB hhfasfuahaaaaaHHHHHHH OOOHHHHHHHH";
const subst = `$1$1$1`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);
__code__

In Python it would be...

__code__

try:
    import regex
except:
    # https://docs.python.org/3/library/re.html
    import re as regex

my_string='fahuo aaaa dislgjsd oooodfuabfaaaaaagAAAAAAdggs iiiii'

#(?i) means case insensitive and global is enabled by default for re.sub
new_string = regex.sub( r'(?i)([a-z])\1\1\1+', r'\1\1\1', my_string)

print(new_string)
#prints: fahuo aaa dislgjsd ooodfuabfaaagAAAdggs iii

__code__

Ohh, thanks for the Mega and Internet Archive link
Yes, those two sites are not blocked, Mega does have the little detail that it has a free download limit of only 4GB per IP (to download VN I jump between public PCs)
I'll download them later, today I got sleepy walking around with screens (a lot of vice) (I'm writing in Spanish and translating with Google, if you don't understand what it says, then I don't either, keep alternating between ¨very addictive¨ and ¨a lot of vice¨ that thing, let's see what it turns out to be...)

Anyway, then I will download it and for the repetitions I will write down that code because there are never too many codes
But I created this Python script that reduces the dialogs for me as well

My script works as follows
1- It must be executed in a folder with Excel files and none must be open or the execution will fail
2- It will go through cell by cell of each Excel
3- It will look for the characters from A to Z that are repeated 6 times and change them to 5, it will repeat this process until it fails because it does not find 6 repetitions
4- It will change characters repeated 5 times for 4 followed by ellipses (...)


import you
import openpyxl
import re

# Function to replace sequences of six identical letters with five
def replace_six_by_five(cell_value):
     pattern = re.compile(r'([a-z])\1{5}', re.IGNORECASE)
     while pattern.search(cell_value):
         cell_value = pattern.sub(r'\1\1\1\1\1', cell_value)
     return cell_value

# Function to replace sequences of five identical letters with four and '...'
def replace_five_by_four(cell_value):
     return re.sub(r'([a-z])\1{4}', r'\1\1\1\1...', cell_value, flags=re.IGNORECASE)

# Function to process Excel files
def process_excel_files():
     excel_files = [f for f in os.listdir('.') if f.endswith('.xlsx')]
     for file in excel_files:
         workbook = openpyxl.load_workbook(file)
         for sheet in workbook.worksheets:
             for row in sheet.iter_rows():
                 for cell in row:
                     if isinstance(cell.value, str):
                         cell.value = replace_six_by_five(cell.value)
                         cell.value = replace_five_by_four(cell.value)
         workbook.save(file)
         print(f'File {file} processed.')

# Run the script
process_excel_files()

Link to comment
Share on other sites

Here is the updated script that merges the code I wrote and yours. I have not tested it because running files that do anything without user input terrifies me.


import os
import openpyxl
try:
    import regex
except:
    # https://docs.python.org/3/library/re.html
    import re as regex
#
# Function to replace repeating letters to a maximum of 3
def fix_repeats(cell_value):
    return regex.sub( r'([a-z])\1\1\1+', r'\1\1\1', cell_value, flags=re.IGNORECASE) # for dashes, replace with r'\1\1\1...'
#
# Function to process Excel files
def process_excel_files():
     excel_files = [f for f in os.listdir('.') if f.endswith('.xlsx')]
     for file in excel_files:
         workbook = openpyxl.load_workbook(file)
         for sheet in workbook.worksheets:
             for row in sheet.iter_rows():
                 for cell in row:
                     if isinstance(cell.value, str):
                         cell.value = fix_repeats(cell.value)
         workbook.save(file)
         print(f'File {file} processed.')
#
# Run the script
process_excel_files()


The *.xlsx file enumeration code is very interesting. It only uses the os standard library instead of glob that most people use. It is functional programming though, so no clue how it works!

Edited by Entai2965
Link to comment
Share on other sites

Posted (edited)
On 3/26/2024 at 10:15 PM, Entai2965 said:

Here is the updated script that merges the code I wrote and yours. I have not tested it because running files that do anything without user input terrifies me.

 

  Reveal hidden contents


import os
import openpyxl
try:
    import regex
except:
    # https://docs.python.org/3/library/re.html
    import re as regex
#
# Function to replace repeating letters to a maximum of 3
def fix_repeats(cell_value):
    return regex.sub( r'([a-z])\1\1\1+', r'\1\1\1', cell_value, flags=re.IGNORECASE) # for dashes, replace with r'\1\1\1...'
#
# Function to process Excel files
def process_excel_files():
     excel_files = [f for f in os.listdir('.') if f.endswith('.xlsx')]
     for file in excel_files:
         workbook = openpyxl.load_workbook(file)
         for sheet in workbook.worksheets:
             for row in sheet.iter_rows():
                 for cell in row:
                     if isinstance(cell.value, str):
                         cell.value = fix_repeats(cell.value)
         workbook.save(file)
         print(f'File {file} processed.')
#
# Run the script
process_excel_files()

 


The *.xlsx file enumeration code is very interesting. It only uses the os standard library instead of glob that most people use. It is functional programming though, so no clue how it works!

Honestly, I don't understand the code you gave me very well either.
The use of re is to replace something, I have used it before to download the photos of a Hentai manga to a CBZ file, making the script capture the name of the page and save it in a variable, and then a re acts on the variable deleting the characters that are not accepted in names, because otherwise the script fails because it cannot save the file
The code you give me is in both Java and Python
Where I know a little is Python and

my_string='fahuo aaaa dislgjsd oooodfuabfaaaaaagAAAAAAdggs iiiii'
#(?i) means case insensitive and global is enabled by default for re.sub
new_string = regex.sub( r'(?i)([a-z])\1\1\1+', r'\1\1\1', my_string)

I think what this does is act on the my_string variable to create the new_string variable with the same value as my_string but reducing the repetitions, or I don't know, since it is regex.sub instead of re.sub and I don't know if this checks some difference, but I see that it acts on a code variable already established, and not extracted from elsewhere.

My script on the other hand is automatic execution.
Just create a file.py in the Excel folder, open it with a text editor and paste my script inside it.
Then just by Double Clicking it will reduce the repetitions in all the Excel in the folder (If there is any Excel open it will fail)
Keep in mind that you must have both Python and the necessary libraries to run the script installed on your computer.
I think you only need openpyxl because re is a basic part of Python
Installed with Windows+R/cmd/ pip install openpyxl

Edited by zoeebe
Link to comment
Share on other sites

The merged script does both. It uses your code to act on all files in a directory, which is more than just mildly terrifying, and it uses the regex/re syntax to use the best library available using the simplified syntax that replaces strings in those .xlsx files to a maximum of 3 of any common ascii letter.

According to the Python documentation, 'regex' is a third party library with the same API as the standard library regular expression library 're'.  're' is being imported but renamed to 'regex', if the third party library 'regex' is not available, so the actual code does not have to worry about using 're' or 'regex'. They have the same API anyway. You can install regex with 'pip install regex'

If you copy/paste the merged code, it should work. It is just the simplified version of the code you wrote earlier for dealing with repetitions in Sugoi. Also, did you ever get Textractor to work with Sugoi or another translation extension for local translation?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...