In this post I am making a program which checks a text file for a word / phrase which the user wants. It will print all sentences with that word.
Plan
- Find a file to use for testing
- Input
- Save file
- Format text
- Split text
- Check each sentence
- Print results
Find a file to use for testing
I used this random text generator and saved it as a text file. You can use anything (make sure your text has full stops).
Input
phrase=str(input("What phrase do you want to find in your text? "))
phrase=phrase.lower()
This lets the user input a phrase and makes the phrase lowercase.
Save file
#Open file
f=open('Text.txt','r')
text=f.read()
f.close()
This opens the file, saves the file and closes the file.
Format text
#Remove useless characters
text=' '.join(text.split('\n'))
text=''.join(text.split(','))
text=''.join(text.split(':'))
text=''.join(text.split(';'))
text=' '.join(text.split(' '))
text=text.lower()
#Change Punctuation
text='.'.join(text.split('!'))
text='.'.join(text.split('?'))
This code gets rid of common punctuation so if you search ‘man’ a sentence containing ‘man:’ would appear.
Then the text is made lowercase. This prevents cap sensitive issues occurring.
Then I changed punctuation so both “?” and “!” count as full stop.
Split text
#Split text
sentences=text.split('.')
valid=[]
This code simply just split the text into sentences.
Check each sentence
#Check each sentence
for x in sentences:
words=x.split(' ')
for y in words:
if phrase==y:
if x not in valid:
valid.append(x)
This checks every word in every sentence to select the sentences containing the phrase.
Print results
#print results
for z in valid:
print(z)
This prints every sentence containing the phrase as a new line. You could just do:
print(valid)
But it would look less visually appealing.
Full code
#Input
phrase=str(input("What phrase do you want to find in your text? "))
phrase=phrase.lower()
#Open file
f=open('Text.txt','r')
text=f.read()
f.close()
#Remove useless characters
text=' '.join(text.split('\n'))
text=''.join(text.split(','))
text=''.join(text.split(':'))
text=''.join(text.split(';'))
text=' '.join(text.split(' '))
text=text.lower()
#Change Punctuation
text='.'.join(text.split('!'))
text='.'.join(text.split('?'))
#Split text
sentences=text.split('.')
valid=[]
#Check each sentence
for x in sentences:
words=x.split(' ')
for y in words:
if phrase==y:
if x not in valid:
valid.append(x)
#print results
for z in valid:
print(z)