Thu, 23 Aug 2018 19:45:36 +0200
adds simple tiny test suite and updates license headers
20
43725438ac50
Changed author comments + added signatures for upcomming bfile heuristics
Mike Becker <universe@uap-core.de>
parents:
diff
changeset
|
1 | /* |
34
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
2 | * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS HEADER. |
57
68018eac46c3
adds simple tiny test suite and updates license headers
Mike Becker <universe@uap-core.de>
parents:
48
diff
changeset
|
3 | * Copyright 2018 Mike Becker. All rights reserved. |
34
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
4 | * |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
5 | * Redistribution and use in source and binary forms, with or without |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
6 | * modification, are permitted provided that the following conditions are met: |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
7 | * |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
8 | * 1. Redistributions of source code must retain the above copyright |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
9 | * notice, this list of conditions and the following disclaimer. |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
10 | * |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
11 | * 2. Redistributions in binary form must reproduce the above copyright |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
12 | * notice, this list of conditions and the following disclaimer in the |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
13 | * documentation and/or other materials provided with the distribution. |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
14 | * |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
15 | * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
16 | * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
17 | * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
18 | * DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
19 | * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
20 | * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
21 | * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
22 | * CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, |
fa9bda32de17
moved src files to src subdirectory and added licence text
Mike Becker <universe@uap-core.de>
parents:
23
diff
changeset
|
23 | * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE |
57
68018eac46c3
adds simple tiny test suite and updates license headers
Mike Becker <universe@uap-core.de>
parents:
48
diff
changeset
|
24 | * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
20
43725438ac50
Changed author comments + added signatures for upcomming bfile heuristics
Mike Becker <universe@uap-core.de>
parents:
diff
changeset
|
25 | */ |
43725438ac50
Changed author comments + added signatures for upcomming bfile heuristics
Mike Becker <universe@uap-core.de>
parents:
diff
changeset
|
26 | |
43725438ac50
Changed author comments + added signatures for upcomming bfile heuristics
Mike Becker <universe@uap-core.de>
parents:
diff
changeset
|
27 | #include "bfile_heuristics.h" |
22
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
28 | #include <ctype.h> |
20
43725438ac50
Changed author comments + added signatures for upcomming bfile heuristics
Mike Becker <universe@uap-core.de>
parents:
diff
changeset
|
29 | |
21
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
30 | bfile_heuristics_t *new_bfile_heuristics_t() { |
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
31 | bfile_heuristics_t *ret = malloc(sizeof(bfile_heuristics_t)); |
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
32 | ret->level = BFILE_MEDIUM_ACCURACY; |
22
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
33 | bfile_reset(ret); |
21
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
34 | return ret; |
20
43725438ac50
Changed author comments + added signatures for upcomming bfile heuristics
Mike Becker <universe@uap-core.de>
parents:
diff
changeset
|
35 | } |
43725438ac50
Changed author comments + added signatures for upcomming bfile heuristics
Mike Becker <universe@uap-core.de>
parents:
diff
changeset
|
36 | |
21
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
37 | void destroy_bfile_heuristics_t(bfile_heuristics_t *def) { |
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
38 | free(def); |
20
43725438ac50
Changed author comments + added signatures for upcomming bfile heuristics
Mike Becker <universe@uap-core.de>
parents:
diff
changeset
|
39 | } |
21
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
40 | |
22
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
41 | void bfile_reset(bfile_heuristics_t *def) { |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
42 | def->bcount = 0; |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
43 | def->tcount = 0; |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
44 | } |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
45 | |
21
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
46 | bool bfile_check(bfile_heuristics_t *def, int next_char) { |
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
47 | bool ret = false; |
22
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
48 | if (def->level != BFILE_IGNORE) { |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
49 | def->tcount++; |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
50 | if (!isprint(next_char) && !isspace(next_char)) { |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
51 | def->bcount++; |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
52 | } |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
53 | |
23
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
54 | if (def->tcount > 1) { /* empty files are text files */ |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
55 | switch (def->level) { |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
56 | case BFILE_LOW_ACCURACY: |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
57 | if (def->tcount > 15 || next_char == EOF) { |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
58 | ret = (1.0*def->bcount)/def->tcount > 0.32; |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
59 | } |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
60 | break; |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
61 | case BFILE_HIGH_ACCURACY: |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
62 | if (def->tcount > 500 || next_char == EOF) { |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
63 | ret = (1.0*def->bcount)/def->tcount > 0.1; |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
64 | } |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
65 | break; |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
66 | default: /* BFILE_MEDIUM_ACCURACY */ |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
67 | if (def->tcount > 100 || next_char == EOF) { |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
68 | ret = (1.0*def->bcount)/def->tcount > 0.1; |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
69 | } |
778388400f7b
encapsulated scanner arguments + enabled optimizer + empty file is no bfile
Mike Becker <universe@uap-core.de>
parents:
22
diff
changeset
|
70 | break; |
22
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
71 | } |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
72 | } |
4508da679ffb
completed binary file heuristics
Mike Becker <universe@uap-core.de>
parents:
21
diff
changeset
|
73 | } |
21
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
74 | |
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
75 | return ret; |
91e0890464b0
implemented bfile heuristics option + TODO: implement algorithm
Mike Becker <universe@uap-core.de>
parents:
20
diff
changeset
|
76 | } |