Узнать кодировку файла

Узнать кодировку файла

Version: 20180502

Universal Cyrillic decoder

Հայերեն — Башҡорт — БеларускаяБългарски — Иронау — Қазақша — Кыргызча — Македонски — Монгол
Нохчийн — O’zbekРусскийSlovenskyСрпскиТатарча — Тоҷикӣ — Українська — Чaваш — FrançaisEnglish

Output

Guestbook

Please link to this site!

Donate via PayPal
You can help keeping the service running and without ads.

Donate Bitcoins
192KAYCwZsHNTug634rPkGChiVzXufXsGw

Custom Work
For a small fee I can help you quickly recode/recover large pieces of data — texts, databases, websites… or write custom functions you can use (invoice available).
Contact me! mailto:5ko 5ko fr?subject=Request%20via%202cyr com

About the program

Welcome! You may find this site useful, if you have recieved some texts that you believe are written in the Cyrillic alphabet, but instead are displayed in some strange combination of bizarre characters. This program will try to guess the encoding, and if it does not, it will show samples, examples of all encoding-combinations, so as you will be able to select the good one.

How to

  • Paste the text to decypher in the big text area. The first few words will be analysed so they should be (scrambled) in supposed Cyrillic.
  • The program will try to decypher the text and will print the result below.
  • If the translation is successful, you will see the text in Cyrillic characters and will be able to copy it and save it if it’s important.
  • If the translation isn’t successful (still the text is not in Cyrillic but in the same or other unintelligible characters), you can choose from the newly created select-listbox the variant that is in Cyrillic (if there are more than one, select the longest). By pressing the button OK you will have the correct text converted.
  • If the text is not totally converted, try all other variants in Cyrillic from the select-listbox.

Limits

  • If your text contains question marks "???? ?? ??????", the problem is with the sender and no recovery will be possible. Ask them to resend the text, eventually as an ordinary text file or in LibreOffice/OpenOffice/MSOffice format.
  • There is no claim that every text is decypherable, even if you are certain that the text is in Cyrillic.
  • The analyzed and converted text is limited to 100 KiB.
  • A 100% precision is not always achieved — in a conversion from a codepage to another code page, some characters may be lost, like the Bulgarian quotes or rarely some single letters. Some of this depends on your Windows Clipboard character handling.
  • The program will try a maximum of 6321 variants in two or three levels: if there had been a multiple encoding like , it will not be detected or tested.

    Universal Cyrillic decoder

    Usually the possible and displayed correct variants are between 32 and 255.

  • If a part of the text is encoded with one code page, and another part — with another code page, the program could recognize only one of the parts at a time.

Terms of use

Please notice that this freeware program is created with the hope that it would be useful, but has no warranty, not even an implied warranty for fitness for any particular use. Please use it at your own risk.

If you have very long texts to translate, please make sure you have a backup copy.

What’s new

  • October 2017 : Added "Select all / Copy" button.
  • July 2016 : SSL Certificate installed, you can now access the Decoder on a secure connection.
  • October 2013 : I am trying different optimizations for the system which should make the decoder run faster and handle more text. If you notice any problem, please notify me ASAP.
  • March 2013 : My hosting provider sent me a warning that the Decoder is using too much server CPU power and its processes were killed more than 100 times. I am making some changes so that the program will use less CPU, especially when reposting a previously sampled text, however, the decoded form may load somewhat slower. Please contact me if you have some difficulties using the program.
  • 2012-08-09 : Added French translation, thanks to Arnaud D.
  • 2011-03-06 : Added Belorussian translation, thanks to Зыль and Aliaksandr Hliakau.
  • 31.07.10 : Added Serbian translation, thanks to Miodrag Danilovic (Boston — Beograd).
  • 07.05.09 : Raised limit of MAX text size to 50 kiB.
  • may 2009 : Added Ukrainian interface thanks to Barmalini.
  • 2008-2009 : A number of small fixes and tweaks of the detection algorythm. Changed interface to default to automatic decoding.
  • 12.08.07 : Fixed Russian language translation, thanks to Petr Vasilyev. This page will be significantly restructured in the near future.
  • 10.11.06 : Three new postfilters added: "base64", "unix-to-unix" и "bin-to-hex", theoretically the tested combinations are 4725. Changes to the frequency analysis function (testing).
  • 11.10.06 : The main site is on a new hardware server, should run faster.
  • 11.09.06 : The program now uses PHP5 and should run times faster.
  • 19.08.06 : Because of a broken DNS entry, this site was inaccessible from 06:00 on 15 august up to 15:00 on 18 august. That was the reason for me to set two "mirror" sites (5ko.free.fr/decode and www.accent.bg/decode) with the same program. If the original has a problem, you can find the copies in Google and recover your texts.
  • 17.06.06 : Added two more antique cyrillic encodings, MIK и KOI-7, but you better not need them.
  • 03.03.06 : Added Slovak translation, thanks to Martin from KPR Slovakia.
  • 15.02.06 : More encodings added and tested.
  • 20.10.05 : Small improvement to the frequency-analysis function: for texts, written in all-capital letters.
  • 14.10.05 : Two more gmail-cyrillic encodings were added. Theoretically the tested combinations are 2112.
  • 15.06.05 : Russian language interface was added. Big thanks to chAlx!
  • 16.02.05 : One more postfilter decoding is added, for strings like this: "%u043A%u0438%u0440%u0438%u043B%u0438%u0446%u0430".
  • 05.02.05 : More encodings tests added, the number of tested encodings is doubled, but thus the program may work slightly slower.
  • 03.02.05 : The frequency analysis function that detects the original encoding works much better now. Currently the program recognises most of the encodings if the first few words are not too weird. It although still needs some improvement.
  • 15.01.05 : The input text limit is raised from 10 to 20 kB.
  • 01.12.04 : First public release.

Back to the Latin to Cyrillic convertor.

Кодировка файлов

Как определить в какой кодировке записан файл? Анализировать мы будем содержимое файлов с осмысленным текстом, а не просто с набором символов. Нас, естественно, будут интересовать только кодировки, относящиеся к Русскому языку. Таких кодировок семь, все они используются, понимаются броузерами, покажем какие символы используют эти кодировки. За основную кодировку примем windows-1251. Смысл данной странички задать вопрос, а можно ли нам исследовав файл на наличие символов, чётко сказать в какой кодировке сохранён данный файл. Итак, наш алфавит в разных кодировках:

Посмотрим, как будут смотреться разные кодировки в Win:

Win а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я
Koi Б В Ч З Д Е ? Ц Ъ Й К Л М Н О П Р Т У Ф Х Ж И Г Ю Ы Э Я Щ Ш Ь А С
Iso Р С Т У Ф Х с Ц Ч Ш Щ Ъ Ы Ь Э Ю Я а б в г д е ж з и й к л м н о п
Mac а б в г д е Ю ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю Я
Ibm866   Ў ў ? ¤ ? с ¦ § Ё c Є ¬ R Ї а б в г д е ж з и й к л м н о п
Ibm855   ў л ¬ ¦ Ё " й у · ? Ж Р Т Ф Ц Ш б г е з Є ч ¤ ы х щ ? с н ч ? Ю
Utf-8 Р° Р+ Р? Р? Р? Рч С’ Р¶ Р· Рё Р№ Рє Р> Р? Р? Р? Рї С? С? С’ С? С" С: С+ С+ С? С% С? С С? С? С? С?

Таблицу со всеми возможными кодировками можно посмотреть здесь… Эта таблица не проходит по ширине, поэтому она заведена в отдельный раздел. В тексте мы будем искать эти символы, кодировку будем определять по количеству найденных символов.

Здесь, представлены только маленькие буквы русского языка.

Текстовый декодер онлайн: восстановление текста.

После анализа символов можно сделать такой вывод:

  • кодировка Koi8-r (iso-ir-111), совпадает с Windows-1251, только все символы становятся в большом регистре. Наоборот, все Большие буквы становятся маленькими в Koi8-r. Пусть символы не совпадают друг с другом, но для поиска нам достаточно того, что мы знаем, кроме буквы ‘ё’, чётко сказать, что файл сохранён в кодировке Koi8-r или Windows-1251 нельзя, потому что их символы совпадают. Можно попробовать искать маленькие русские буквы в файле, если их будет больше, чем символов в Большом регистре, можно предположить, что файл сохранён в Windows-1251.

  • В кодировке Iso-8859-5 есть абвгдежзийклмноп, то есть 16 букв из Windows-1251.

  • В кодировке X-mac-cyrillic (X-mac-ukrainian) есть все маленькие символы из Windows-1251, кроме ‘ё’ и ‘я’, буква ‘я’ переходит в верхний регистр. Этот факт не даёт нам возможности со 100% гарантией сказать, что файл сохранён в кодировке Windows-1251 или X-mac-cyrillic.

  • В кодировке Ibm866 есть абвгдежзийклмноп, то есть 16 букв из Windows-1251, что совпадает с кодировкой Iso-8859-5, что для нас плохо, значит надо добавить поиск символов в большом регистре, чтобы получить отличие. Зато, с Windows-1251 есть приличное различие.

  • В кодировке Ibm855 есть бгезйлнсущы, то есть 11 букв из Windows-1251.

  • В кодировке Utf-8 нет символов из Windows-1251, то есть мы четко сможем сказать , что файл, например, сохранён в Utf-8.

Вывод: проверяя файл на наличие в нём тех или иных символов, нельзя точно сказать в какой кодировке сохранён файл. Тем не менее, всё — таки, приведём небольшую функцию по определению кодировки в файле.

© Copyright 2008-2018 by KDG

Version: 20180502

Universal Cyrillic decoder

Հայերեն — Башҡорт — БеларускаяБългарски — Иронау — Қазақша — Кыргызча — Македонски — Монгол
Нохчийн — O’zbekРусскийSlovenskyСрпскиТатарча — Тоҷикӣ — Українська — Чaваш — FrançaisEnglish

Output

Guestbook

Please link to this site!

Donate via PayPal
You can help keeping the service running and without ads.

Donate Bitcoins
192KAYCwZsHNTug634rPkGChiVzXufXsGw

Custom Work
For a small fee I can help you quickly recode/recover large pieces of data — texts, databases, websites… or write custom functions you can use (invoice available).
Contact me! mailto:5ko 5ko fr?subject=Request%20via%202cyr com

About the program

Welcome! You may find this site useful, if you have recieved some texts that you believe are written in the Cyrillic alphabet, but instead are displayed in some strange combination of bizarre characters. This program will try to guess the encoding, and if it does not, it will show samples, examples of all encoding-combinations, so as you will be able to select the good one.

How to

  • Paste the text to decypher in the big text area. The first few words will be analysed so they should be (scrambled) in supposed Cyrillic.
  • The program will try to decypher the text and will print the result below.
  • If the translation is successful, you will see the text in Cyrillic characters and will be able to copy it and save it if it’s important.
  • If the translation isn’t successful (still the text is not in Cyrillic but in the same or other unintelligible characters), you can choose from the newly created select-listbox the variant that is in Cyrillic (if there are more than one, select the longest). By pressing the button OK you will have the correct text converted.
  • If the text is not totally converted, try all other variants in Cyrillic from the select-listbox.

Limits

  • If your text contains question marks "???? ?? ??????", the problem is with the sender and no recovery will be possible. Ask them to resend the text, eventually as an ordinary text file or in LibreOffice/OpenOffice/MSOffice format.
  • There is no claim that every text is decypherable, even if you are certain that the text is in Cyrillic.
  • The analyzed and converted text is limited to 100 KiB.
  • A 100% precision is not always achieved — in a conversion from a codepage to another code page, some characters may be lost, like the Bulgarian quotes or rarely some single letters. Some of this depends on your Windows Clipboard character handling.
  • The program will try a maximum of 6321 variants in two or three levels: if there had been a multiple encoding like , it will not be detected or tested. Usually the possible and displayed correct variants are between 32 and 255.
  • If a part of the text is encoded with one code page, and another part — with another code page, the program could recognize only one of the parts at a time.

Terms of use

Please notice that this freeware program is created with the hope that it would be useful, but has no warranty, not even an implied warranty for fitness for any particular use. Please use it at your own risk.

If you have very long texts to translate, please make sure you have a backup copy.

What’s new

  • October 2017 : Added "Select all / Copy" button.
  • July 2016 : SSL Certificate installed, you can now access the Decoder on a secure connection.
  • October 2013 : I am trying different optimizations for the system which should make the decoder run faster and handle more text. If you notice any problem, please notify me ASAP.
  • March 2013 : My hosting provider sent me a warning that the Decoder is using too much server CPU power and its processes were killed more than 100 times. I am making some changes so that the program will use less CPU, especially when reposting a previously sampled text, however, the decoded form may load somewhat slower. Please contact me if you have some difficulties using the program.
  • 2012-08-09 : Added French translation, thanks to Arnaud D.
  • 2011-03-06 : Added Belorussian translation, thanks to Зыль and Aliaksandr Hliakau.
  • 31.07.10 : Added Serbian translation, thanks to Miodrag Danilovic (Boston — Beograd).
  • 07.05.09 : Raised limit of MAX text size to 50 kiB.
  • may 2009 : Added Ukrainian interface thanks to Barmalini.
  • 2008-2009 : A number of small fixes and tweaks of the detection algorythm. Changed interface to default to automatic decoding.
  • 12.08.07 : Fixed Russian language translation, thanks to Petr Vasilyev. This page will be significantly restructured in the near future.
  • 10.11.06 : Three new postfilters added: "base64", "unix-to-unix" и "bin-to-hex", theoretically the tested combinations are 4725. Changes to the frequency analysis function (testing).
  • 11.10.06 : The main site is on a new hardware server, should run faster.
  • 11.09.06 : The program now uses PHP5 and should run times faster.
  • 19.08.06 : Because of a broken DNS entry, this site was inaccessible from 06:00 on 15 august up to 15:00 on 18 august. That was the reason for me to set two "mirror" sites (5ko.free.fr/decode and www.accent.bg/decode) with the same program. If the original has a problem, you can find the copies in Google and recover your texts.
  • 17.06.06 : Added two more antique cyrillic encodings, MIK и KOI-7, but you better not need them.
  • 03.03.06 : Added Slovak translation, thanks to Martin from KPR Slovakia.
  • 15.02.06 : More encodings added and tested.
  • 20.10.05 : Small improvement to the frequency-analysis function: for texts, written in all-capital letters.
  • 14.10.05 : Two more gmail-cyrillic encodings were added.

    Как определить кодировку файла?

    Theoretically the tested combinations are 2112.

  • 15.06.05 : Russian language interface was added. Big thanks to chAlx!
  • 16.02.05 : One more postfilter decoding is added, for strings like this: "%u043A%u0438%u0440%u0438%u043B%u0438%u0446%u0430".
  • 05.02.05 : More encodings tests added, the number of tested encodings is doubled, but thus the program may work slightly slower.
  • 03.02.05 : The frequency analysis function that detects the original encoding works much better now. Currently the program recognises most of the encodings if the first few words are not too weird. It although still needs some improvement.
  • 15.01.05 : The input text limit is raised from 10 to 20 kB.
  • 01.12.04 : First public release.

Back to the Latin to Cyrillic convertor.

Version: 20180502

Universal Cyrillic decoder

Հայերեն — Башҡорт — БеларускаяБългарски — Иронау — Қазақша — Кыргызча — Македонски — Монгол
Нохчийн — O’zbekРусскийSlovenskyСрпскиТатарча — Тоҷикӣ — Українська — Чaваш — FrançaisEnglish

Output

Guestbook

Please link to this site!

Donate via PayPal
You can help keeping the service running and without ads.

Donate Bitcoins
192KAYCwZsHNTug634rPkGChiVzXufXsGw

Custom Work
For a small fee I can help you quickly recode/recover large pieces of data — texts, databases, websites… or write custom functions you can use (invoice available).
Contact me! mailto:5ko 5ko fr?subject=Request%20via%202cyr com

About the program

Welcome! You may find this site useful, if you have recieved some texts that you believe are written in the Cyrillic alphabet, but instead are displayed in some strange combination of bizarre characters. This program will try to guess the encoding, and if it does not, it will show samples, examples of all encoding-combinations, so as you will be able to select the good one.

How to

  • Paste the text to decypher in the big text area. The first few words will be analysed so they should be (scrambled) in supposed Cyrillic.
  • The program will try to decypher the text and will print the result below.
  • If the translation is successful, you will see the text in Cyrillic characters and will be able to copy it and save it if it’s important.
  • If the translation isn’t successful (still the text is not in Cyrillic but in the same or other unintelligible characters), you can choose from the newly created select-listbox the variant that is in Cyrillic (if there are more than one, select the longest). By pressing the button OK you will have the correct text converted.
  • If the text is not totally converted, try all other variants in Cyrillic from the select-listbox.

Limits

  • If your text contains question marks "???? ?? ??????", the problem is with the sender and no recovery will be possible. Ask them to resend the text, eventually as an ordinary text file or in LibreOffice/OpenOffice/MSOffice format.
  • There is no claim that every text is decypherable, even if you are certain that the text is in Cyrillic.
  • The analyzed and converted text is limited to 100 KiB.
  • A 100% precision is not always achieved — in a conversion from a codepage to another code page, some characters may be lost, like the Bulgarian quotes or rarely some single letters. Some of this depends on your Windows Clipboard character handling.
  • The program will try a maximum of 6321 variants in two or three levels: if there had been a multiple encoding like , it will not be detected or tested. Usually the possible and displayed correct variants are between 32 and 255.
  • If a part of the text is encoded with one code page, and another part — with another code page, the program could recognize only one of the parts at a time.

Terms of use

Please notice that this freeware program is created with the hope that it would be useful, but has no warranty, not even an implied warranty for fitness for any particular use. Please use it at your own risk.

If you have very long texts to translate, please make sure you have a backup copy.

What’s new

  • October 2017 : Added "Select all / Copy" button.
  • July 2016 : SSL Certificate installed, you can now access the Decoder on a secure connection.
  • October 2013 : I am trying different optimizations for the system which should make the decoder run faster and handle more text. If you notice any problem, please notify me ASAP.
  • March 2013 : My hosting provider sent me a warning that the Decoder is using too much server CPU power and its processes were killed more than 100 times.

    на форуме пользователей MATLAB и Simulink

    I am making some changes so that the program will use less CPU, especially when reposting a previously sampled text, however, the decoded form may load somewhat slower. Please contact me if you have some difficulties using the program.

  • 2012-08-09 : Added French translation, thanks to Arnaud D.
  • 2011-03-06 : Added Belorussian translation, thanks to Зыль and Aliaksandr Hliakau.
  • 31.07.10 : Added Serbian translation, thanks to Miodrag Danilovic (Boston — Beograd).
  • 07.05.09 : Raised limit of MAX text size to 50 kiB.
  • may 2009 : Added Ukrainian interface thanks to Barmalini.
  • 2008-2009 : A number of small fixes and tweaks of the detection algorythm. Changed interface to default to automatic decoding.
  • 12.08.07 : Fixed Russian language translation, thanks to Petr Vasilyev. This page will be significantly restructured in the near future.
  • 10.11.06 : Three new postfilters added: "base64", "unix-to-unix" и "bin-to-hex", theoretically the tested combinations are 4725. Changes to the frequency analysis function (testing).
  • 11.10.06 : The main site is on a new hardware server, should run faster.
  • 11.09.06 : The program now uses PHP5 and should run times faster.
  • 19.08.06 : Because of a broken DNS entry, this site was inaccessible from 06:00 on 15 august up to 15:00 on 18 august. That was the reason for me to set two "mirror" sites (5ko.free.fr/decode and www.accent.bg/decode) with the same program. If the original has a problem, you can find the copies in Google and recover your texts.
  • 17.06.06 : Added two more antique cyrillic encodings, MIK и KOI-7, but you better not need them.
  • 03.03.06 : Added Slovak translation, thanks to Martin from KPR Slovakia.
  • 15.02.06 : More encodings added and tested.
  • 20.10.05 : Small improvement to the frequency-analysis function: for texts, written in all-capital letters.
  • 14.10.05 : Two more gmail-cyrillic encodings were added. Theoretically the tested combinations are 2112.
  • 15.06.05 : Russian language interface was added. Big thanks to chAlx!
  • 16.02.05 : One more postfilter decoding is added, for strings like this: "%u043A%u0438%u0440%u0438%u043B%u0438%u0446%u0430".
  • 05.02.05 : More encodings tests added, the number of tested encodings is doubled, but thus the program may work slightly slower.
  • 03.02.05 : The frequency analysis function that detects the original encoding works much better now. Currently the program recognises most of the encodings if the first few words are not too weird. It although still needs some improvement.
  • 15.01.05 : The input text limit is raised from 10 to 20 kB.
  • 01.12.04 : First public release.

Back to the Latin to Cyrillic convertor.

Кодировка файлов

Как определить в какой кодировке записан файл?

Как сменить кодировку текстового файла с помощью Блокнота в Windows

Анализировать мы будем содержимое файлов с осмысленным текстом, а не просто с набором символов. Нас, естественно, будут интересовать только кодировки, относящиеся к Русскому языку. Таких кодировок семь, все они используются, понимаются броузерами, покажем какие символы используют эти кодировки. За основную кодировку примем windows-1251. Смысл данной странички задать вопрос, а можно ли нам исследовав файл на наличие символов, чётко сказать в какой кодировке сохранён данный файл. Итак, наш алфавит в разных кодировках:

Посмотрим, как будут смотреться разные кодировки в Win:

Win а б в г д е ё ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю я
Koi Б В Ч З Д Е ? Ц Ъ Й К Л М Н О П Р Т У Ф Х Ж И Г Ю Ы Э Я Щ Ш Ь А С
Iso Р С Т У Ф Х с Ц Ч Ш Щ Ъ Ы Ь Э Ю Я а б в г д е ж з и й к л м н о п
Mac а б в г д е Ю ж з и й к л м н о п р с т у ф х ц ч ш щ ъ ы ь э ю Я
Ibm866   Ў ў ? ¤ ? с ¦ § Ё c Є ¬ R Ї а б в г д е ж з и й к л м н о п
Ibm855   ў л ¬ ¦ Ё " й у · ? Ж Р Т Ф Ц Ш б г е з Є ч ¤ ы х щ ? с н ч ? Ю
Utf-8 Р° Р+ Р? Р? Р? Рч С’ Р¶ Р· Рё Р№ Рє Р> Р? Р? Р? Рї С? С? С’ С? С" С: С+ С+ С? С% С? С С? С? С? С?

Таблицу со всеми возможными кодировками можно посмотреть здесь… Эта таблица не проходит по ширине, поэтому она заведена в отдельный раздел. В тексте мы будем искать эти символы, кодировку будем определять по количеству найденных символов.

Здесь, представлены только маленькие буквы русского языка. После анализа символов можно сделать такой вывод:

  • кодировка Koi8-r (iso-ir-111), совпадает с Windows-1251, только все символы становятся в большом регистре. Наоборот, все Большие буквы становятся маленькими в Koi8-r. Пусть символы не совпадают друг с другом, но для поиска нам достаточно того, что мы знаем, кроме буквы ‘ё’, чётко сказать, что файл сохранён в кодировке Koi8-r или Windows-1251 нельзя, потому что их символы совпадают. Можно попробовать искать маленькие русские буквы в файле, если их будет больше, чем символов в Большом регистре, можно предположить, что файл сохранён в Windows-1251.

  • В кодировке Iso-8859-5 есть абвгдежзийклмноп, то есть 16 букв из Windows-1251.

  • В кодировке X-mac-cyrillic (X-mac-ukrainian) есть все маленькие символы из Windows-1251, кроме ‘ё’ и ‘я’, буква ‘я’ переходит в верхний регистр. Этот факт не даёт нам возможности со 100% гарантией сказать, что файл сохранён в кодировке Windows-1251 или X-mac-cyrillic.

  • В кодировке Ibm866 есть абвгдежзийклмноп, то есть 16 букв из Windows-1251, что совпадает с кодировкой Iso-8859-5, что для нас плохо, значит надо добавить поиск символов в большом регистре, чтобы получить отличие. Зато, с Windows-1251 есть приличное различие.

  • В кодировке Ibm855 есть бгезйлнсущы, то есть 11 букв из Windows-1251.

  • В кодировке Utf-8 нет символов из Windows-1251, то есть мы четко сможем сказать , что файл, например, сохранён в Utf-8.

Вывод: проверяя файл на наличие в нём тех или иных символов, нельзя точно сказать в какой кодировке сохранён файл. Тем не менее, всё — таки, приведём небольшую функцию по определению кодировки в файле.

© Copyright 2008-2018 by KDG

admin